Bug 217599

Summary: Adaptec 71605z hangs with aacraid: Host adapter abort request after update to linux 6.4.0
Product: SCSI Drivers Reporter: pheidologeton
Component: AACRAIDAssignee: scsi_drivers-aacraid
Status: NEW ---    
Severity: high CC: aacraid, amigo.elite, bagasdotme, bobinium, carnil, encore2097, f.gruenbichler, ghosh, hare, info, john.g.garry, joop.boonen, kernel, kernel, leyyyyy, mail.spyden, maokaman, markus, mkp, randyg503, regressions, rider, ro_ux, sagar.biradar, samuelwolf85, thenzl, vincent.sadys, vrytired
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: attachment-7198-0.html
signature.asc
0001-aacraid-submit-internal-commands-on-vector-0.patch
Patch for aacraid FC38 6.3.12-200 vs 6.4.4-200
The kernel.log file wenn the system is hanging.
0001-aacraid-fix-vector-calculation-when-submitting-command.patch
Test result of 0001-aacraid-fix-vector-calculation-when-submitting-command.patch

Description pheidologeton 2023-06-26 22:36:13 UTC
The controller works fine for a few minutes. Then it hangs for a few tens of seconds to a few minutes, then also works normally for a while. This bug is present in the 6.4.0 kernel release (6.3.9 works without hanging)
The messages in dmesg are as follows

[  287.137901] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137909] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137912] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137914] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137916] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137919] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137921] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137924] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137926] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137928] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137930] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137933] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137934] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137937] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137939] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137941] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137943] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137945] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137947] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137949] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137951] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137952] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137954] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137956] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137958] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137960] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137962] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137964] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137966] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137967] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137969] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.137971] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  287.157697] aacraid: Host bus reset request. SCSI hang ?
[  287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
[  287.157708] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
[  287.157709] aacraid 0000:02:00.0: outstanding cmd: error handler-0
[  287.157711] aacraid 0000:02:00.0: outstanding cmd: firmware-32
[  287.157712] aacraid 0000:02:00.0: outstanding cmd: kernel-0
[  287.167040] aacraid 0000:02:00.0: Controller reset type is 3
[  287.167042] aacraid 0000:02:00.0: Issuing IOP reset
[  321.029712] aacraid 0000:02:00.0: IOP reset succeeded
[  321.066201] numacb=512 ignored
[  321.066843] aacraid: Comm Interface type2 enabled
[  344.845370] aacraid 0000:02:00.0: Scheduling bus rescan
[  358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[  442.109147] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109155] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109158] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109160] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109162] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109164] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109166] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109168] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109170] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109172] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109174] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109176] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109178] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109179] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109181] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109183] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109185] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109187] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109189] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109191] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109193] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109194] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109196] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109198] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109200] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109201] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109203] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109205] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109207] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109208] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.109210] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.137144] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (10,0,0,0):
[  442.154292] aacraid: Host bus reset request. SCSI hang ?
[  442.154302] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
[  442.154305] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
[  442.154307] aacraid 0000:02:00.0: outstanding cmd: error handler-0
[  442.154308] aacraid 0000:02:00.0: outstanding cmd: firmware-32
[  442.154310] aacraid 0000:02:00.0: outstanding cmd: kernel-0
[  442.171131] aacraid 0000:02:00.0: Controller reset type is 3
[  442.171133] aacraid 0000:02:00.0: Issuing IOP reset
[  476.040983] aacraid 0000:02:00.0: IOP reset succeeded
[  476.078055] numacb=512 ignored
[  476.078606] aacraid: Comm Interface type2 enabled
[  494.747632] aacraid 0000:02:00.0: Scheduling bus rescan
[  507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
Comment 1 Bagas Sanjaya 2023-06-27 01:31:31 UTC
(In reply to pheidologeton from comment #0)
> The controller works fine for a few minutes. Then it hangs for a few tens of
> seconds to a few minutes, then also works normally for a while. This bug is
> present in the 6.4.0 kernel release (6.3.9 works without hanging)
> The messages in dmesg are as follows
> 
> [  287.137901] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137909] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137912] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137914] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137916] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137919] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137921] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137924] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137926] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137928] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137930] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137933] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137934] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137937] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137939] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137941] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137943] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137945] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137947] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137949] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137951] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137952] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137954] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137956] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137958] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137960] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137962] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137964] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137966] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137967] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137969] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137971] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.157697] aacraid: Host bus reset request. SCSI hang ?
> [  287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
> [  287.157708] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
> [  287.157709] aacraid 0000:02:00.0: outstanding cmd: error handler-0
> [  287.157711] aacraid 0000:02:00.0: outstanding cmd: firmware-32
> [  287.157712] aacraid 0000:02:00.0: outstanding cmd: kernel-0
> [  287.167040] aacraid 0000:02:00.0: Controller reset type is 3
> [  287.167042] aacraid 0000:02:00.0: Issuing IOP reset
> [  321.029712] aacraid 0000:02:00.0: IOP reset succeeded
> [  321.066201] numacb=512 ignored
> [  321.066843] aacraid: Comm Interface type2 enabled
> [  344.845370] aacraid 0000:02:00.0: Scheduling bus rescan
> [  358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [  442.109147] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109155] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109158] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109160] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109162] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109164] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109166] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109168] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109170] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109172] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109174] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109176] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109178] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109179] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109181] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109183] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109185] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109187] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109189] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109191] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109193] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109194] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109196] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109198] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109200] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109201] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109203] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109205] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109207] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109208] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109210] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.137144] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.154292] aacraid: Host bus reset request. SCSI hang ?
> [  442.154302] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
> [  442.154305] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
> [  442.154307] aacraid 0000:02:00.0: outstanding cmd: error handler-0
> [  442.154308] aacraid 0000:02:00.0: outstanding cmd: firmware-32
> [  442.154310] aacraid 0000:02:00.0: outstanding cmd: kernel-0
> [  442.171131] aacraid 0000:02:00.0: Controller reset type is 3
> [  442.171133] aacraid 0000:02:00.0: Issuing IOP reset
> [  476.040983] aacraid 0000:02:00.0: IOP reset succeeded
> [  476.078055] numacb=512 ignored
> [  476.078606] aacraid: Comm Interface type2 enabled
> [  494.747632] aacraid 0000:02:00.0: Scheduling bus rescan
> [  507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).

Can you do bisection between v6.3 and v6.4 please?
Comment 2 pheidologeton 2023-06-27 01:47:50 UTC
Created attachment 304479 [details]
attachment-7198-0.html

I don't understand a bit, since I use a translator. I can attach dmesg as files from kernel 6.4 and 6.3.9
-------- Исходное сообщение --------
27 июн. 2023 г., 04:31, написал:

> https://bugzilla.kernel.org/show_bug.cgi?id=217599 Bagas Sanjaya
> (bagasdotme@gmail.com) changed: What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |bagasdotme@gmail.com --- Comment #1 from Bagas Sanjaya
> (bagasdotme@gmail.com) --- (In reply to pheidologeton from comment #0) > The
> controller works fine for a few minutes. Then it hangs for a few tens of >
> seconds to a few minutes, then also works normally for a while. This bug is >
> present in the 6.4.0 kernel release (6.3.9 works without hanging) > The
> messages in dmesg are as follows > > [ 287.137901] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137909]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137912] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137914] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137916]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137919] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137921] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137924]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137926] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137928] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137930]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137933] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137934] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137937]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137939] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137941] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137943]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137945] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137947] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137949]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137951] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137952] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137954]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137956] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137958] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137960]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137962] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137964] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137966]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.137967] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 287.137969] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137971]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 287.157697] aacraid: Host bus reset request. SCSI hang ? > [
> 287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 > [ 287.157708]
> aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 287.157709] aacraid
> 0000:02:00.0: outstanding cmd: error handler-0 > [ 287.157711] aacraid
> 0000:02:00.0: outstanding cmd: firmware-32 > [ 287.157712] aacraid
> 0000:02:00.0: outstanding cmd: kernel-0 > [ 287.167040] aacraid 0000:02:00.0:
> Controller reset type is 3 > [ 287.167042] aacraid 0000:02:00.0: Issuing IOP
> reset > [ 321.029712] aacraid 0000:02:00.0: IOP reset succeeded > [
> 321.066201] numacb=512 ignored > [ 321.066843] aacraid: Comm Interface type2
> enabled > [ 344.845370] aacraid 0000:02:00.0: Scheduling bus rescan > [
> 358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ >
> CAPACITY(16). > [ 442.109147] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109155] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109158]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109160] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109162] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109164]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109166] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109168] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109170]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109172] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109174] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109176]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109178] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109179] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109181]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109183] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109185] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109187]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109189] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109191] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109193]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109194] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109196] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109198]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109200] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109201] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109203]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109205] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.109207] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109208]
> aacraid: Host adapter abort request. > aacraid: Outstanding commands on
> (10,0,0,0): > [ 442.109210] aacraid: Host adapter abort request. > aacraid:
> Outstanding commands on (10,0,0,0): > [ 442.137144] aacraid: Host adapter
> abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.154292]
> aacraid: Host bus reset request. SCSI hang ? > [ 442.154302] aacraid
> 0000:02:00.0: outstanding cmd: midlevel-0 > [ 442.154305] aacraid
> 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 442.154307] aacraid
> 0000:02:00.0: outstanding cmd: error handler-0 > [ 442.154308] aacraid
> 0000:02:00.0: outstanding cmd: firmware-32 > [ 442.154310] aacraid
> 0000:02:00.0: outstanding cmd: kernel-0 > [ 442.171131] aacraid 0000:02:00.0:
> Controller reset type is 3 > [ 442.171133] aacraid 0000:02:00.0: Issuing IOP
> reset > [ 476.040983] aacraid 0000:02:00.0: IOP reset succeeded > [
> 476.078055] numacb=512 ignored > [ 476.078606] aacraid: Comm Interface type2
> enabled > [ 494.747632] aacraid 0000:02:00.0: Scheduling bus rescan > [
> 507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ >
> CAPACITY(16). Can you do bisection between v6.3 and v6.4 please? -- You may
> reply to this email to add a comment. You are receiving this mail because:
> You reported the bug.
Comment 3 Bagas Sanjaya 2023-06-27 02:11:23 UTC
On 6/27/23 08:47, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217599
> 
> --- Comment #2 from pheidologeton@protonmail.com ---
> I don't understand a bit, since I use a translator. I can attach dmesg as
> files
> from kernel 6.4 and 6.3.9

Sorry, you have to do bisection to help kernel developers fixing your regression.
Please see Documentation/admin-guide/bug-bisect.rst in kernel sources for
how to do it. And because you need to compile your own kernel during bisection,
see Documentation/admin-guide/quickly-build-trimmed-linux.rst for compiling
howto.

See you in your bisection report!
Comment 4 Bagas Sanjaya 2023-06-27 02:39:30 UTC
Created attachment 304480 [details]
signature.asc

[also Cc: aacraid and SCSI subsystem maintainers]

On Mon, Jun 26, 2023 at 10:36:13PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217599
> 
>             Bug ID: 217599
>            Summary: Adaptec 71605z hangs with aacraid: Host adapter abort
>                     request after update to linux 6.4.0
>            Product: SCSI Drivers
>            Version: 2.5
>           Hardware: All
>                 OS: Linux
>             Status: NEW
>           Severity: high
>           Priority: P3
>          Component: AACRAID
>           Assignee: scsi_drivers-aacraid@kernel-bugs.osdl.org
>           Reporter: pheidologeton@protonmail.com
>         Regression: No
> 
> The controller works fine for a few minutes. Then it hangs for a few tens of
> seconds to a few minutes, then also works normally for a while. This bug is
> present in the 6.4.0 kernel release (6.3.9 works without hanging)
> The messages in dmesg are as follows
> 
> [  287.137901] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137909] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137912] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137914] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137916] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137919] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137921] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137924] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137926] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137928] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137930] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137933] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137934] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137937] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137939] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137941] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137943] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137945] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137947] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137949] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137951] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137952] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137954] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137956] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137958] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137960] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137962] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137964] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137966] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137967] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137969] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.137971] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  287.157697] aacraid: Host bus reset request. SCSI hang ?
> [  287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
> [  287.157708] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
> [  287.157709] aacraid 0000:02:00.0: outstanding cmd: error handler-0
> [  287.157711] aacraid 0000:02:00.0: outstanding cmd: firmware-32
> [  287.157712] aacraid 0000:02:00.0: outstanding cmd: kernel-0
> [  287.167040] aacraid 0000:02:00.0: Controller reset type is 3
> [  287.167042] aacraid 0000:02:00.0: Issuing IOP reset
> [  321.029712] aacraid 0000:02:00.0: IOP reset succeeded
> [  321.066201] numacb=512 ignored
> [  321.066843] aacraid: Comm Interface type2 enabled
> [  344.845370] aacraid 0000:02:00.0: Scheduling bus rescan
> [  358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [  442.109147] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109155] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109158] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109160] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109162] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109164] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109166] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109168] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109170] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109172] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109174] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109176] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109178] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109179] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109181] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109183] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109185] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109187] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109189] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109191] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109193] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109194] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109196] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109198] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109200] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109201] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109203] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109205] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109207] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109208] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.109210] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.137144] aacraid: Host adapter abort request.
>                aacraid: Outstanding commands on (10,0,0,0):
> [  442.154292] aacraid: Host bus reset request. SCSI hang ?
> [  442.154302] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
> [  442.154305] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
> [  442.154307] aacraid 0000:02:00.0: outstanding cmd: error handler-0
> [  442.154308] aacraid 0000:02:00.0: outstanding cmd: firmware-32
> [  442.154310] aacraid 0000:02:00.0: outstanding cmd: kernel-0
> [  442.171131] aacraid 0000:02:00.0: Controller reset type is 3
> [  442.171133] aacraid 0000:02:00.0: Issuing IOP reset
> [  476.040983] aacraid 0000:02:00.0: IOP reset succeeded
> [  476.078055] numacb=512 ignored
> [  476.078606] aacraid: Comm Interface type2 enabled
> [  494.747632] aacraid 0000:02:00.0: Scheduling bus rescan
> [  507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> 

Thanks for automatically forwarding Bugzilla report. I'm adding it to
regzbot to ensure it doesn't get fallen through cracks unnoticed:

#regzbot ^introduced: v6.3..v6.4
#regzbot title: Adaptec 71605z hangs with aacraid: Host adapter abort request after update
#regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=217599
Comment 5 pheidologeton 2023-06-28 09:38:10 UTC
I do not have another server with an adaptec controller, and downtime of this server is highly undesirable. If there are any 6.4.1 fixes, I will do a kexec from 6.3.9 to 6.4.1 and report back
Comment 6 pheidologeton 2023-07-03 17:21:05 UTC
An interesting observation. After changing the i/o scheduler to none, controller hangs started to happen much less often. At the moment kernel 6.4.1
Comment 7 pheidologeton 2023-07-22 18:48:25 UTC
One more observation. If you disable the controller write cache (set wt) in arcconf settings, this problem is not observed, but the random write speed drops 3-4 times.
Comment 8 pheidologeton 2023-07-23 18:03:58 UTC
Update. After disabling cache the error still occurs, but less often and only during operations with very large i/o, e.g. btrfs balance
Comment 9 pheidologeton 2023-07-23 18:14:20 UTC
Attached is the kernel log during btrfs balance. Additional information: arch linux, / on btrfs. The kernel is built from kernel.org, I don't use arch kernels. I can send the config if needed
[ 3316.617309] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3319.482222] BTRFS info (device dm-0): relocating block group 40045365952512 flags data
[ 3329.220422] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3330.759383] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3331.458286] BTRFS info (device dm-0): relocating block group 40044292210688 flags data
[ 3344.973440] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 3347.383541] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 3349.321716] BTRFS info (device dm-0): relocating block group 40043218468864 flags data
[ 3365.872341] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3368.168591] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3369.726373] BTRFS info (device dm-0): relocating block group 40042144727040 flags data
[ 3382.975757] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 3385.968211] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 3386.724714] BTRFS info (device dm-0): relocating block group 40041070985216 flags data
[ 3394.540433] BTRFS info (device dm-0): found 2048 extents, stage: move data extents
[ 3397.185759] BTRFS info (device dm-0): found 2048 extents, stage: update data pointers
[ 3399.119172] BTRFS info (device dm-0): relocating block group 40039997243392 flags data
[ 3407.703926] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3408.814660] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3409.582049] BTRFS info (device dm-0): relocating block group 40038923501568 flags data
[ 3419.867057] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 3422.106158] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 3422.938727] BTRFS info (device dm-0): relocating block group 40037849759744 flags data
[ 3428.406170] BTRFS info (device dm-0): found 870 extents, stage: move data extents
[ 3433.431310] BTRFS info (device dm-0): found 870 extents, stage: update data pointers
[ 3437.507903] BTRFS info (device dm-0): relocating block group 40036776017920 flags data
[ 3448.653028] BTRFS info (device dm-0): found 1960 extents, stage: move data extents
[ 3455.940281] BTRFS info (device dm-0): found 1960 extents, stage: update data pointers
[ 3459.627764] BTRFS info (device dm-0): relocating block group 40035702276096 flags data
[ 3468.108075] BTRFS info (device dm-0): found 2059 extents, stage: move data extents
[ 3469.458665] BTRFS info (device dm-0): found 2059 extents, stage: update data pointers
[ 3470.245608] BTRFS info (device dm-0): relocating block group 40034628534272 flags data
[ 3477.993083] BTRFS info (device dm-0): found 2056 extents, stage: move data extents
[ 3479.662803] BTRFS info (device dm-0): found 2056 extents, stage: update data pointers
[ 3481.908411] BTRFS info (device dm-0): relocating block group 40033554792448 flags data
[ 3491.709283] BTRFS info (device dm-0): found 2066 extents, stage: move data extents
[ 3493.035815] BTRFS info (device dm-0): found 2066 extents, stage: update data pointers
[ 3494.446714] BTRFS info (device dm-0): relocating block group 40032481050624 flags data
[ 3505.523295] BTRFS info (device dm-0): found 2062 extents, stage: move data extents
[ 3508.145601] BTRFS info (device dm-0): found 2062 extents, stage: update data pointers
[ 3509.167778] BTRFS info (device dm-0): relocating block group 40031407308800 flags data
[ 3518.814308] BTRFS info (device dm-0): found 2063 extents, stage: move data extents
[ 3520.993505] BTRFS info (device dm-0): found 2063 extents, stage: update data pointers
[ 3522.503043] BTRFS info (device dm-0): relocating block group 40030333566976 flags data
[ 3531.751190] BTRFS info (device dm-0): found 2064 extents, stage: move data extents
[ 3533.421101] BTRFS info (device dm-0): found 2064 extents, stage: update data pointers
[ 3534.289901] BTRFS info (device dm-0): relocating block group 40029259825152 flags data
[ 3542.304610] BTRFS info (device dm-0): found 1901 extents, stage: move data extents
[ 3543.452137] BTRFS info (device dm-0): found 1901 extents, stage: update data pointers
[ 3545.360085] BTRFS info (device dm-0): relocating block group 40028186083328 flags data
[ 3554.996710] BTRFS info (device dm-0): found 2056 extents, stage: move data extents
[ 3558.154072] BTRFS info (device dm-0): found 2056 extents, stage: update data pointers
[ 3559.682636] BTRFS info (device dm-0): relocating block group 40027112341504 flags data
[ 3568.657974] BTRFS info (device dm-0): found 2064 extents, stage: move data extents
[ 3571.539808] BTRFS info (device dm-0): found 2064 extents, stage: update data pointers
[ 3572.893027] BTRFS info (device dm-0): relocating block group 40026038599680 flags data
[ 3583.292919] BTRFS info (device dm-0): found 2061 extents, stage: move data extents
[ 3586.633532] BTRFS info (device dm-0): found 2061 extents, stage: update data pointers
[ 3587.914016] BTRFS info (device dm-0): relocating block group 40024964857856 flags data
[ 3600.062952] BTRFS info (device dm-0): found 2055 extents, stage: move data extents
[ 3602.371266] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers
[ 3603.144455] BTRFS info (device dm-0): relocating block group 40023891116032 flags data
[ 3613.895505] BTRFS info (device dm-0): found 2116 extents, stage: move data extents
[ 3620.446434] BTRFS info (device dm-0): found 2116 extents, stage: update data pointers
[ 3623.076019] BTRFS info (device dm-0): relocating block group 40022817374208 flags data
[ 3631.646738] BTRFS info (device dm-0): found 2077 extents, stage: move data extents
[ 3634.235518] BTRFS info (device dm-0): found 2077 extents, stage: update data pointers
[ 3635.573847] BTRFS info (device dm-0): relocating block group 40021743632384 flags data
[ 3646.460339] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 3649.351023] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 3650.478747] BTRFS info (device dm-0): relocating block group 40020669890560 flags data
[ 3776.018632] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.270455] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.270463] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.270466] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.270468] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.270471] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.286451] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.310451] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.342450] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.362449] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.362453] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.362455] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.378449] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.378452] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.378454] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.378456] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 3784.393980] aacraid: Host bus reset request. SCSI hang ?
[ 3784.393989] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
[ 3784.393991] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
[ 3784.393992] aacraid 0000:02:00.0: outstanding cmd: error handler-0
[ 3784.393994] aacraid 0000:02:00.0: outstanding cmd: firmware-16
[ 3784.393995] aacraid 0000:02:00.0: outstanding cmd: kernel-0
[ 3784.406273] aacraid 0000:02:00.0: Controller reset type is 3
[ 3784.406275] aacraid 0000:02:00.0: Issuing IOP reset
[ 3818.052903] aacraid 0000:02:00.0: IOP reset succeeded
[ 3818.089560] numacb=512 ignored
[ 3818.090077] aacraid: Comm Interface type2 enabled
[ 3831.327204] aacraid 0000:02:00.0: Scheduling bus rescan
[ 3844.356263] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 3853.956497] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3864.355224] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3865.992752] BTRFS info (device dm-0): relocating block group 40019596148736 flags data
[ 3874.735672] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3878.298428] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3880.455850] BTRFS info (device dm-0): relocating block group 40018522406912 flags data
[ 3891.125829] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3893.431071] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3894.271993] BTRFS info (device dm-0): relocating block group 40017448665088 flags data
[ 3903.344106] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 3905.644193] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 3906.914578] BTRFS info (device dm-0): relocating block group 40016374923264 flags data
[ 3916.797696] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3921.481208] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3924.388232] BTRFS info (device dm-0): relocating block group 40015301181440 flags data
[ 3934.062280] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 3936.989412] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 3938.757170] BTRFS info (device dm-0): relocating block group 40014227439616 flags data
[ 3947.023461] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 3948.628886] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 3949.501986] BTRFS info (device dm-0): relocating block group 40013153697792 flags data
[ 3959.582364] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 3962.158521] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 3962.849216] BTRFS info (device dm-0): relocating block group 40012079955968 flags data
[ 3971.717710] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 3972.936642] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 3975.062969] BTRFS info (device dm-0): relocating block group 40011006214144 flags data
[ 4101.963623] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963630] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963632] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963635] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963637] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963639] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963641] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963643] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4101.963646] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4102.055618] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4102.055621] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4102.115617] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4102.211615] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4134.726992] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4134.727000] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4144.966793] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4144.978357] aacraid: Host bus reset request. SCSI hang ?
[ 4144.978373] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
[ 4144.978375] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
[ 4144.978377] aacraid 0000:02:00.0: outstanding cmd: error handler-0
[ 4144.978378] aacraid 0000:02:00.0: outstanding cmd: firmware-16
[ 4144.978380] aacraid 0000:02:00.0: outstanding cmd: kernel-0
[ 4144.994613] aacraid 0000:02:00.0: Controller reset type is 3
[ 4144.994615] aacraid 0000:02:00.0: Issuing IOP reset
[ 4178.492092] aacraid 0000:02:00.0: IOP reset succeeded
[ 4178.517963] numacb=512 ignored
[ 4178.518491] aacraid: Comm Interface type2 enabled
[ 4191.766331] aacraid 0000:02:00.0: Scheduling bus rescan
[ 4204.785434] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 4206.168041] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 4209.562446] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 4210.236071] BTRFS info (device dm-0): relocating block group 40009932472320 flags data
[ 4220.376248] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4222.252717] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4223.045918] BTRFS info (device dm-0): relocating block group 40008858730496 flags data
[ 4231.408480] BTRFS info (device dm-0): found 2056 extents, stage: move data extents
[ 4233.376778] BTRFS info (device dm-0): found 2056 extents, stage: update data pointers
[ 4236.074206] BTRFS info (device dm-0): relocating block group 40007784988672 flags data
[ 4247.118270] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4249.734482] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4251.493058] BTRFS info (device dm-0): relocating block group 40006711246848 flags data
[ 4259.821443] BTRFS info (device dm-0): found 2050 extents, stage: move data extents
[ 4260.805451] BTRFS info (device dm-0): found 2050 extents, stage: update data pointers
[ 4261.456743] BTRFS info (device dm-0): relocating block group 40005637505024 flags data
[ 4271.616724] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4274.831768] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4275.695407] BTRFS info (device dm-0): relocating block group 40004563763200 flags data
[ 4285.371736] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 4287.650953] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 4288.171152] BTRFS info (device dm-0): relocating block group 40003490021376 flags data
[ 4296.368957] BTRFS info (device dm-0): found 2058 extents, stage: move data extents
[ 4303.096606] BTRFS info (device dm-0): found 2058 extents, stage: update data pointers
[ 4308.672345] BTRFS info (device dm-0): relocating block group 40002416279552 flags data
[ 4317.848496] BTRFS info (device dm-0): found 2055 extents, stage: move data extents
[ 4324.380461] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers
[ 4329.313954] BTRFS info (device dm-0): relocating block group 40001342537728 flags data
[ 4340.352759] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 4346.055272] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 4350.341563] BTRFS info (device dm-0): relocating block group 40000268795904 flags data
[ 4360.008253] BTRFS info (device dm-0): found 2073 extents, stage: move data extents
[ 4363.968894] BTRFS info (device dm-0): found 2073 extents, stage: update data pointers
[ 4366.667086] BTRFS info (device dm-0): relocating block group 39999195054080 flags data
[ 4376.714089] BTRFS info (device dm-0): found 2065 extents, stage: move data extents
[ 4381.321506] BTRFS info (device dm-0): found 2065 extents, stage: update data pointers
[ 4383.772033] BTRFS info (device dm-0): relocating block group 39998121312256 flags data
[ 4392.346068] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4394.681397] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4401.709513] BTRFS info (device dm-0): relocating block group 39997047570432 flags data
[ 4415.742970] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4418.352930] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4419.406295] BTRFS info (device dm-0): relocating block group 39995973828608 flags data
[ 4427.092182] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4428.429700] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4429.549291] BTRFS info (device dm-0): relocating block group 39994900086784 flags data
[ 4440.330092] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4443.546015] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4444.589392] BTRFS info (device dm-0): relocating block group 39993826344960 flags data
[ 4452.441851] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4454.065084] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4455.766007] BTRFS info (device dm-0): relocating block group 39992752603136 flags data
[ 4466.379938] BTRFS info (device dm-0): found 2055 extents, stage: move data extents
[ 4469.010725] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers
[ 4471.997477] BTRFS info (device dm-0): relocating block group 39991678861312 flags data
[ 4483.094733] BTRFS info (device dm-0): found 2055 extents, stage: move data extents
[ 4485.152335] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers
[ 4486.408387] BTRFS info (device dm-0): relocating block group 39990605119488 flags data
[ 4495.974786] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4498.534411] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4499.563931] BTRFS info (device dm-0): relocating block group 39989531377664 flags data
[ 4517.353372] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4524.608562] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4527.224028] BTRFS info (device dm-0): relocating block group 39988457635840 flags data
[ 4536.621647] BTRFS info (device dm-0): found 2055 extents, stage: move data extents
[ 4538.678644] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers
[ 4540.145613] BTRFS info (device dm-0): relocating block group 39987383894016 flags data
[ 4550.384635] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4552.970776] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4554.435255] BTRFS info (device dm-0): relocating block group 39986310152192 flags data
[ 4563.872637] BTRFS info (device dm-0): found 2051 extents, stage: move data extents
[ 4566.764789] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers
[ 4568.159495] BTRFS info (device dm-0): relocating block group 39985236410368 flags data
[ 4579.681116] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 4583.466767] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 4586.247517] BTRFS info (device dm-0): relocating block group 39984162668544 flags data
[ 4596.054908] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4597.544655] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4598.583901] BTRFS info (device dm-0): relocating block group 39983088926720 flags data
[ 4605.907058] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4607.398318] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4608.508892] BTRFS info (device dm-0): relocating block group 39982015184896 flags data
[ 4619.110922] BTRFS info (device dm-0): found 2051 extents, stage: move data extents
[ 4621.813152] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers
[ 4622.825766] BTRFS info (device dm-0): relocating block group 39980941443072 flags data
[ 4630.319726] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4631.692092] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4632.754884] BTRFS info (device dm-0): relocating block group 39979867701248 flags data
[ 4645.459829] BTRFS info (device dm-0): found 2052 extents, stage: move data extents
[ 4649.317496] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers
[ 4650.643743] BTRFS info (device dm-0): relocating block group 39978793959424 flags data
[ 4658.358713] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4659.663771] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4661.089138] BTRFS info (device dm-0): relocating block group 39977720217600 flags data
[ 4669.235843] BTRFS info (device dm-0): found 2051 extents, stage: move data extents
[ 4670.530009] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers
[ 4671.579253] BTRFS info (device dm-0): relocating block group 39976646475776 flags data
[ 4681.371892] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4684.417891] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4686.145066] BTRFS info (device dm-0): relocating block group 39975572733952 flags data
[ 4696.044819] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4699.884181] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4702.201894] BTRFS info (device dm-0): relocating block group 39974498992128 flags data
[ 4711.594449] BTRFS info (device dm-0): found 2054 extents, stage: move data extents
[ 4713.643472] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers
[ 4714.512799] BTRFS info (device dm-0): relocating block group 39973425250304 flags data
[ 4722.817900] BTRFS info (device dm-0): found 2057 extents, stage: move data extents
[ 4724.101966] BTRFS info (device dm-0): found 2057 extents, stage: update data pointers
[ 4725.202619] BTRFS info (device dm-0): relocating block group 39972351508480 flags data
[ 4734.518748] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 4736.529322] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 4737.507277] BTRFS info (device dm-0): relocating block group 39971277766656 flags data
[ 4747.890650] BTRFS info (device dm-0): found 2051 extents, stage: move data extents
[ 4750.395047] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers
[ 4751.356231] BTRFS info (device dm-0): relocating block group 39970204024832 flags data
[ 4881.470732] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470740] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470743] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470745] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470747] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470750] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470752] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470754] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470757] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470758] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470761] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470763] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470764] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470766] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470769] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.470771] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 4881.494268] aacraid: Host bus reset request. SCSI hang ?
[ 4881.494276] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
[ 4881.494278] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
[ 4881.494280] aacraid 0000:02:00.0: outstanding cmd: error handler-0
[ 4881.494282] aacraid 0000:02:00.0: outstanding cmd: firmware-16
[ 4881.494283] aacraid 0000:02:00.0: outstanding cmd: kernel-0
[ 4881.510554] aacraid 0000:02:00.0: Controller reset type is 3
[ 4881.510556] aacraid 0000:02:00.0: Issuing IOP reset
[ 4915.135236] aacraid 0000:02:00.0: IOP reset succeeded
[ 4915.173754] numacb=512 ignored
[ 4915.174291] aacraid: Comm Interface type2 enabled
[ 4928.760752] aacraid 0000:02:00.0: Scheduling bus rescan
[ 4939.400353] BTRFS info (device dm-0): found 2063 extents, stage: move data extents
[ 4941.769902] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 4943.044681] BTRFS info (device dm-0): found 2063 extents, stage: update data pointers
[ 4945.687414] BTRFS info (device dm-0): relocating block group 39969130283008 flags data
[ 4954.864202] BTRFS info (device dm-0): found 2072 extents, stage: move data extents
[ 4957.822957] BTRFS info (device dm-0): found 2072 extents, stage: update data pointers
[ 4959.299452] BTRFS info (device dm-0): relocating block group 39968056541184 flags data
[ 4968.339907] BTRFS info (device dm-0): found 2073 extents, stage: move data extents
[ 4973.463936] BTRFS info (device dm-0): found 2073 extents, stage: update data pointers
[ 4977.130345] BTRFS info (device dm-0): relocating block group 39966982799360 flags data
[ 4989.513486] BTRFS info (device dm-0): found 2050 extents, stage: move data extents
[ 5001.120554] BTRFS info (device dm-0): found 2050 extents, stage: update data pointers
[ 5010.087463] BTRFS info (device dm-0): relocating block group 39965909057536 flags data
[ 5022.123373] BTRFS info (device dm-0): found 2077 extents, stage: move data extents
[ 5066.139984] BTRFS info (device dm-0): found 2077 extents, stage: update data pointers
[ 5094.950339] BTRFS info (device dm-0): relocating block group 39964835315712 flags data
[ 5107.270236] BTRFS info (device dm-0): found 2065 extents, stage: move data extents
[ 5140.991508] BTRFS info (device dm-0): found 2065 extents, stage: update data pointers
[ 5164.250651] BTRFS info (device dm-0): relocating block group 39963761573888 flags data
[ 5177.484766] BTRFS info (device dm-0): found 2053 extents, stage: move data extents
[ 5201.668181] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers
[ 5227.474665] BTRFS info (device dm-0): relocating block group 39962687832064 flags data
[ 5243.028304] BTRFS info (device dm-0): found 2045 extents, stage: move data extents
[ 5260.284955] BTRFS info (device dm-0): found 2045 extents, stage: update data pointers
[ 5275.825851] BTRFS info (device dm-0): relocating block group 39961614090240 flags data
[ 5289.419014] BTRFS info (device dm-0): found 2069 extents, stage: move data extents
[ 5305.187664] BTRFS info (device dm-0): found 2069 extents, stage: update data pointers
[ 5318.520602] BTRFS info (device dm-0): relocating block group 39960540348416 flags data
[ 5330.435004] BTRFS info (device dm-0): found 2058 extents, stage: move data extents
[ 5338.603566] BTRFS info (device dm-0): found 2058 extents, stage: update data pointers
[ 5343.611832] BTRFS info (device dm-0): balance: canceled
[ 5439.752346] BTRFS info (device dm-0): balance: start -dusage=90
[ 5439.755515] BTRFS info (device dm-0): relocating block group 40088315625472 flags data
[ 5565.485701] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485708] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485710] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485712] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485714] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485716] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485719] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485721] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485723] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485725] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485727] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485729] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485731] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485732] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485734] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.485736] aacraid: Host adapter abort request.
aacraid: Outstanding commands on (10,0,0,0):
[ 5565.504649] aacraid: Host bus reset request. SCSI hang ?
[ 5565.504658] aacraid 0000:02:00.0: outstanding cmd: midlevel-0
[ 5565.504661] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0
[ 5565.504662] aacraid 0000:02:00.0: outstanding cmd: error handler-0
[ 5565.504664] aacraid 0000:02:00.0: outstanding cmd: firmware-16
[ 5565.504665] aacraid 0000:02:00.0: outstanding cmd: kernel-0
[ 5565.521936] aacraid 0000:02:00.0: Controller reset type is 3
[ 5565.521937] aacraid 0000:02:00.0: Issuing IOP reset
[ 5584.653112] INFO: task btrfs:41548 blocked for more than 120 seconds.
Comment 10 Jason Hatley 2023-07-27 21:22:25 UTC
I have the exact same issue after upgrading to Kernel 6.4.7 using an Adaptec Adaptec ASR71605. Using previous kernel 6.0.0 the problem goes away.  

The controller hangs approximately every 5 minutes for a period of about 2 minutes .

2023-07-28T07:11:44.380218+10:00 linux kernel: [ 1906.291075] aacraid: Host bus reset request. SCSI hang ?
2023-07-28T07:11:44.380224+10:00 linux kernel: [ 1906.291084] aacraid 0000:04:00.0: outstanding cmd: midlevel-0
2023-07-28T07:11:44.380225+10:00 linux kernel: [ 1906.291086] aacraid 0000:04:00.0: outstanding cmd: lowlevel-0
2023-07-28T07:11:44.380226+10:00 linux kernel: [ 1906.291087] aacraid 0000:04:00.0: outstanding cmd: error handler-0
2023-07-28T07:11:44.380226+10:00 linux kernel: [ 1906.291088] aacraid 0000:04:00.0: outstanding cmd: firmware-32
2023-07-28T07:11:44.380227+10:00 linux kernel: [ 1906.291089] aacraid 0000:04:00.0: outstanding cmd: kernel-0
2023-07-28T07:11:44.400215+10:00 linux kernel: [ 1906.311066] aacraid 0000:04:00.0: Controller reset type is 3
2023-07-28T07:11:44.400221+10:00 linux kernel: [ 1906.311071] aacraid 0000:04:00.0: Issuing IOP reset
2023-07-28T07:12:29.044219+10:00 linux kernel: [ 1950.957989] aacraid 0000:04:00.0: IOP reset succeeded
2023-07-28T07:12:29.108222+10:00 linux kernel: [ 1951.018606] aacraid: Comm Interface type2 enabled
2023-07-28T07:12:38.144232+10:00 linux kernel: [ 1960.056334] aacraid 0000:04:00.0: Scheduling bus rescan
2023-07-28T07:12:48.821198+10:00 linux kernel: [ 1970.734618] sd 0:1:8:0: [sdi] tag#312 timing out command, waited 120s
2023-07-28T07:15:18.872254+10:00 linux kernel: [ 2120.779775] md: md126: reshape interrupted.
2023-07-28T07:15:52.172242+10:00 linux kernel: [ 2154.074084] aacraid: Host adapter abort request.
2023-07-28T07:15:52.172257+10:00 linux kernel: [ 2154.074084] aacraid: Outstanding commands on (0,1,12,0):
2023-07-28T07:15:52.172259+10:00 linux kernel: [ 2154.074109] aacraid: Host adapter abort request.
2023-07-28T07:15:52.172260+10:00 linux kernel: [ 2154.074109] aacraid: Outstanding commands on (0,1,3,0):
2023-07-28T07:15:52.172261+10:00 linux kernel: [ 2154.074119] aacraid: Host adapter abort request.
2023-07-28T07:15:52.172262+10:00 linux kernel: [ 2154.074119] aacraid: Outstanding commands on (0,1,2,0):
2023-07-28T07:15:52.292291+10:00 linux kernel: [ 2154.196250] sd 0:1:3:0: [sdd] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
2023-07-28T07:15:52.292302+10:00 linux kernel: [ 2154.196254] sd 0:1:3:0: [sdd] 4096-byte physical blocks
2023-07-28T07:15:56.008234+10:00 linux kernel: [ 2157.909736] aacraid: Host adapter abort request.
2023-07-28T07:15:56.008250+10:00 linux kernel: [ 2157.909736] aacraid: Outstanding commands on (0,1,10,0):
2023-07-28T07:15:56.016229+10:00 linux kernel: [ 2157.917723] aacraid: Host adapter abort request.
2023-07-28T07:15:56.016238+10:00 linux kernel: [ 2157.917723] aacraid: Outstanding commands on (0,1,8,0):
2023-07-28T07:15:56.276248+10:00 linux kernel: [ 2158.178018] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276264+10:00 linux kernel: [ 2158.178018] aacraid: Outstanding commands on (0,1,13,0):
2023-07-28T07:15:56.276266+10:00 linux kernel: [ 2158.178029] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276267+10:00 linux kernel: [ 2158.178029] aacraid: Outstanding commands on (0,1,2,0):
2023-07-28T07:15:56.276268+10:00 linux kernel: [ 2158.178033] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276268+10:00 linux kernel: [ 2158.178033] aacraid: Outstanding commands on (0,1,12,0):
2023-07-28T07:15:56.276269+10:00 linux kernel: [ 2158.178037] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276270+10:00 linux kernel: [ 2158.178037] aacraid: Outstanding commands on (0,1,4,0):
2023-07-28T07:15:56.276271+10:00 linux kernel: [ 2158.178041] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276272+10:00 linux kernel: [ 2158.178041] aacraid: Outstanding commands on (0,1,5,0):
2023-07-28T07:15:56.276273+10:00 linux kernel: [ 2158.178045] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276273+10:00 linux kernel: [ 2158.178045] aacraid: Outstanding commands on (0,1,5,0):
2023-07-28T07:15:56.276274+10:00 linux kernel: [ 2158.178071] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276275+10:00 linux kernel: [ 2158.178071] aacraid: Outstanding commands on (0,1,5,0):
2023-07-28T07:15:56.276275+10:00 linux kernel: [ 2158.178074] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276276+10:00 linux kernel: [ 2158.178074] aacraid: Outstanding commands on (0,1,0,0):
2023-07-28T07:15:56.276277+10:00 linux kernel: [ 2158.181733] aacraid: Host adapter abort request.
2023-07-28T07:15:56.276278+10:00 linux kernel: [ 2158.181733] aacraid: Outstanding commands on (0,1,14,0):
2023-07-28T07:15:58.588241+10:00 linux kernel: [ 2160.489557] aacraid: Host adapter abort request.
2023-07-28T07:15:58.588266+10:00 linux kernel: [ 2160.489557] aacraid: Outstanding commands on (0,1,4,0):
2023-07-28T07:16:00.876228+10:00 linux kernel: [ 2162.777426] aacraid: Host adapter abort request.
2023-07-28T07:16:00.876246+10:00 linux kernel: [ 2162.777426] aacraid: Outstanding commands on (0,1,0,0):
2023-07-28T07:16:06.512253+10:00 linux kernel: [ 2168.413075] aacraid: Host adapter abort request.
2023-07-28T07:16:06.512273+10:00 linux kernel: [ 2168.413075] aacraid: Outstanding commands on (0,1,15,0):
2023-07-28T07:16:07.032219+10:00 linux kernel: [ 2168.933027] aacraid: Host adapter abort request.
2023-07-28T07:16:07.032232+10:00 linux kernel: [ 2168.933027] aacraid: Outstanding commands on (0,1,14,0):
2023-07-28T07:16:10.872240+10:00 linux kernel: [ 2172.772793] aacraid: Host adapter abort request.
2023-07-28T07:16:10.872256+10:00 linux kernel: [ 2172.772793] aacraid: Outstanding commands on (0,1,10,0):
2023-07-28T07:16:14.700245+10:00 linux kernel: [ 2176.600526] aacraid: Host adapter abort request.
2023-07-28T07:16:14.700259+10:00 linux kernel: [ 2176.600526] aacraid: Outstanding commands on (0,1,8,0):
2023-07-28T07:16:14.700262+10:00 linux kernel: [ 2176.604511] aacraid: Host adapter abort request.
2023-07-28T07:16:14.700263+10:00 linux kernel: [ 2176.604511] aacraid: Outstanding commands on (0,1,9,0):
2023-07-28T07:16:18.804238+10:00 linux kernel: [ 2180.704264] aacraid: Host adapter abort request.
2023-07-28T07:16:18.804258+10:00 linux kernel: [ 2180.704264] aacraid: Outstanding commands on (0,1,13,0):
2023-07-28T07:16:22.892239+10:00 linux kernel: [ 2184.791994] aacraid: Host adapter abort request.
2023-07-28T07:16:22.892253+10:00 linux kernel: [ 2184.791994] aacraid: Outstanding commands on (0,1,1,0):
2023-07-28T07:16:22.892256+10:00 linux kernel: [ 2184.792047] aacraid: Host bus reset request. SCSI hang ?
2023-07-28T07:16:22.892257+10:00 linux kernel: [ 2184.792056] aacraid 0000:04:00.0: outstanding cmd: midlevel-0
2023-07-28T07:16:22.892258+10:00 linux kernel: [ 2184.792059] aacraid 0000:04:00.0: outstanding cmd: lowlevel-0
2023-07-28T07:16:22.892260+10:00 linux kernel: [ 2184.792060] aacraid 0000:04:00.0: outstanding cmd: error handler-10
2023-07-28T07:16:22.892261+10:00 linux kernel: [ 2184.792061] aacraid 0000:04:00.0: outstanding cmd: firmware-0
2023-07-28T07:16:22.892262+10:00 linux kernel: [ 2184.792062] aacraid 0000:04:00.0: outstanding cmd: kernel-0
2023-07-28T07:16:22.924446+10:00 linux kernel: [ 2184.824253] aacraid 0000:04:00.0: Controller reset type is 3
2023-07-28T07:16:22.924453+10:00 linux kernel: [ 2184.824257] aacraid 0000:04:00.0: Issuing IOP reset
2023-07-28T07:17:07.708235+10:00 linux kernel: [ 2229.607605] aacraid 0000:04:00.0: IOP reset succeeded
2023-07-28T07:17:07.768229+10:00 linux kernel: [ 2229.665769] aacraid: Comm Interface type2 enabled
2023-07-28T07:17:16.808238+10:00 linux kernel: [ 2238.704668] aacraid 0000:04:00.0: Scheduling bus rescan
Comment 11 pheidologeton 2023-07-31 19:04:12 UTC
Most likely the problem is with btrfs. When using zfs in the same luks2 container (sectorsize is 4k, stripesize on controller is 128k as cryptsetup does not support sector size more than 4k now) with -o ashift=16 no controller hangs are observed at any i/o, no resets under stress test for 4 hours already. kernel 6.4.7
Comment 12 pheidologeton 2023-08-02 13:01:10 UTC
After 6 hours of stress tests with lots of small files, the problem repeated itself. It does not occur with sequential i/o. Repeated the test with btrfs, no problem on the newly created filesystem, but it repeats with random i/o. I think it is caused by this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=48dc810012a6b4f4ba94073d6b7edb4f76edeb72.
Comment 13 Maokaman 2023-08-03 15:53:53 UTC
I'm experiencing the same problem with the Adaptec ASR81605Z on 6.4.x kernels.
Reverting the following commit resolves the issue:
https://github.com/torvalds/linux/commit/9dc704dcc09eae7d21b5da0615eb2ed79278f63e
Comment 14 Tomas Henzl 2023-09-05 14:57:21 UTC
Sometimes this might be a bug in controller's firmware, can you check that you use the latest possible version of the aacraid controllers you all use?
Comment 15 Maokaman 2023-09-05 16:04:12 UTC
I have the most recent firmware version:

# arcconf getconfig 1 AD | grep 'Model'
   Controller Model                           : Adaptec ASR81605Z

# arcconf getversion 1
Controllers found: 1
Controller #1
==============
Firmware                               : 7.18-0 (33556)
Staged Firmware                        : 7.18-0 (33556)
BIOS                                   : 7.18-0 (33556)
Driver                                 : 1.2-1 (50983)
Boot Flash                             : 7.18-0 (33556)
CPLD (Load version/ Flash version)     : 5/ 12
SEEPROM (Load version/ Flash version)  : 1/ 1


#regzbot ^introduced 9dc704dcc09eae7d21b5da0615eb2ed79278f63e
Comment 16 Sagar 2023-09-07 17:41:34 UTC
For the problems reported on Series-7 controllers :

At Microchip, we tried to duplicate this issue on 6.4.9 kernel with a 71605 and 7805 controllers with the latest FW from adaptec.com (Version 32118) and we do not see the issue.

Could you please mention what FW version is being used at your configuration?
The exact server model and the config details would also help us.
Also, could you please try with the latest FW from the website and confirm if you continue to see this issue?

You can pick the latest FW version for the controller model can be downloaded at 
https://storage.microsemi.com/en-us/support/series7/index.php

We look forward to hear your results.

Thanks 
Sagar
Comment 17 Sagar 2023-09-07 17:45:05 UTC
(In reply to Maokaman from comment #15)
> I have the most recent firmware version:
> 
> # arcconf getconfig 1 AD | grep 'Model'
>    Controller Model                           : Adaptec ASR81605Z
> 
> # arcconf getversion 1
> Controllers found: 1
> Controller #1
> ==============
> Firmware                               : 7.18-0 (33556)
> Staged Firmware                        : 7.18-0 (33556)
> BIOS                                   : 7.18-0 (33556)
> Driver                                 : 1.2-1 (50983)
> Boot Flash                             : 7.18-0 (33556)
> CPLD (Load version/ Flash version)     : 5/ 12
> SEEPROM (Load version/ Flash version)  : 1/ 1
> 
> 
> #regzbot ^introduced 9dc704dcc09eae7d21b5da0615eb2ed79278f63e

Hi Maokaman,
Could you please provide additional details on which specific kernel you are seeing this issue on and the details of the server would also help us?

We tried with 6.4.9 kernel on a 81605 controller and we do not see this issue on our setup.
We are trying to understand the environment 

Thanks
Comment 18 Maokaman 2023-09-07 21:38:40 UTC
(In reply to Sagar from comment #17)
> Hi Maokaman,
> Could you please provide additional details on which specific kernel you are
> seeing this issue on and the details of the server would also help us?
> 
> We tried with 6.4.9 kernel on a 81605 controller and we do not see this
> issue on our setup.
> We are trying to understand the environment 
> 
> Thanks

Hi Sagar,

I've tested multiple 6.4.x kernels and the last one was 6.4.7.

Distro: Arch Linux

Kernel build script:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/ed40dc54e86cec6758ee43684f0fd37d78c5ba53/PKGBUILD

Kernel config:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/ed40dc54e86cec6758ee43684f0fd37d78c5ba53/config

# cat /proc/cmdline
initrd=\intel-ucode.img initrd=\initramfs-linux.img root=PARTUUID="5029e4ab-f734-4729-97b2-99eb53de8b0a" rw intel_idle.max_cstate=1 idle=halt transparent_hugepage=never mitigations=off audit=0 selinux=0 nmi_watchdog=0 nosoftlockup=0
Comment 19 Maokaman 2023-09-07 21:48:11 UTC
Hardware:

# dmidecode | grep -A 3 'Base Board Information'
Base Board Information
        Manufacturer: Supermicro
        Product Name: X11DPi-N
        Version: 2.00

# lscpu | egrep '^Model name:|Socket'
Model name: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
Socket(s):  2
Comment 20 sanic 2023-09-26 07:36:17 UTC
Hello!

I can confirm this behavior with 6.1.53-gentoo-r1 kernel. My previous kernel 6.1.46-gentoo working ok. Patch mentioned in comment #13 has been applied in 6.1.53-gentoo release. Controller Adaptec ASR71605E with 7.5-0 (32118) firmware (latest). I will try to revert this patch and provide results.
Comment 21 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-10-20 11:08:21 UTC
Sagar Biradar, what's the status here? Is there a patch in sight? Or would it be best to revert 9dc704dcc09e for now?
Comment 22 Hannes Reinecke 2023-10-25 10:22:53 UTC
Created attachment 305290 [details]
0001-aacraid-submit-internal-commands-on-vector-0.patch

aacraid: submit internal commands on vector 0
Comment 23 Hannes Reinecke 2023-10-25 10:23:19 UTC
Can you try with the above patch?
Comment 24 Maokaman 2023-10-26 07:35:01 UTC
(In reply to Hannes Reinecke from comment #23)
> Can you try with the above patch?

The patch does not fix the issue.

===
Oct 25 21:41:24 server-name kernel: aacraid: Host adapter abort request.
                                  aacraid: Outstanding commands on (0,0,0,0):
Oct 25 21:41:24 server-name kernel: aacraid: Host adapter abort request.
                                  aacraid: Outstanding commands on (0,0,0,0):
Oct 25 21:41:24 server-name kernel: aacraid: Host bus reset request. SCSI hang ?
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: midlevel-0
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: lowlevel-0
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: error handler-0
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: firmware-32
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: kernel-0
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: Controller reset type is 3
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: Issuing IOP reset
Oct 25 21:41:24 server-name kernel: INFO: task worker:6701 blocked for more than 122 seconds.
Oct 25 21:41:24 server-name kernel:       Not tainted 6.4.7-arch1-61 #1
Oct 25 21:41:24 server-name kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 25 21:41:24 server-name kernel: task:worker          state:D stack:0     pid:6701  ppid:1      flags:0x00004002
Oct 25 21:41:24 server-name kernel: Call Trace:
Oct 25 21:41:24 server-name kernel:  <TASK>
Oct 25 21:41:24 server-name kernel:  __schedule+0x3e8/0x13f0
Oct 25 21:41:24 server-name kernel:  schedule+0x5e/0xd0
Oct 25 21:41:24 server-name kernel:  io_schedule+0x46/0x70
Oct 25 21:41:24 server-name kernel:  folio_wait_bit_common+0x13d/0x350
Oct 25 21:41:24 server-name kernel:  ? __pfx_wake_page_function+0x10/0x10
Oct 25 21:41:24 server-name kernel:  folio_wait_writeback+0x2c/0x90
Oct 25 21:41:24 server-name kernel:  __filemap_fdatawait_range+0x80/0xe0
Oct 25 21:41:24 server-name kernel:  file_write_and_wait_range+0x8b/0xb0
Oct 25 21:41:24 server-name kernel:  xfs_file_fsync+0x5e/0x2a0 [xfs 2be3d2e4a125ddff8482931cb8f078f6393b16a6]
Oct 25 21:41:24 server-name kernel:  __x64_sys_fdatasync+0x4c/0x90
Oct 25 21:41:24 server-name kernel:  do_syscall_64+0x5c/0x90
Oct 25 21:41:24 server-name kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Oct 25 21:41:24 server-name kernel:  ? do_syscall_64+0x6b/0x90
Oct 25 21:41:24 server-name kernel:  ? exit_to_user_mode_prepare+0x132/0x1e0
Oct 25 21:41:24 server-name kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Oct 25 21:41:24 server-name kernel:  ? do_syscall_64+0x6b/0x90
Oct 25 21:41:24 server-name kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Oct 25 21:41:24 server-name kernel:  ? do_syscall_64+0x6b/0x90
Oct 25 21:41:24 server-name kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
Oct 25 21:41:24 server-name kernel:  ? do_syscall_64+0x6b/0x90
Oct 25 21:41:24 server-name kernel:  ? do_syscall_64+0x6b/0x90
Oct 25 21:41:24 server-name kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Oct 25 21:41:24 server-name kernel: RIP: 0033:0x7f9ca1d087aa
Oct 25 21:41:24 server-name kernel: RSP: 002b:00007f93237fd6c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Oct 25 21:41:24 server-name kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9ca1d087aa
Oct 25 21:41:24 server-name kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000f
Oct 25 21:41:24 server-name kernel: RBP: 0000560d212c66a0 R08: 0000000000000000 R09: 0000000000000000
Oct 25 21:41:24 server-name kernel: R10: 00007f93237fd6e0 R11: 0000000000000293 R12: 0000560d1f112e80
Oct 25 21:41:24 server-name kernel: R13: 0000560d2107b6c8 R14: 0000560d212cd9f0 R15: 00007f9322ffe000
Oct 25 21:41:24 server-name kernel:  </TASK>
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: IOP reset succeeded
Oct 25 21:41:24 server-name kernel: aacraid: Comm Interface type2 enabled
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: Scheduling bus rescan
Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: DDR cache data recovered successfully
Oct 25 21:41:25 server-name kernel: sd 0:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
Oct 25 21:41:25 server-name kernel: sd 0:0:1:0: [sda] Very big device. Trying to use READ CAPACITY(16).
Comment 25 Robert Langlois 2023-11-11 06:43:51 UTC
Created attachment 305397 [details]
Patch for aacraid FC38 6.3.12-200 vs 6.4.4-200

I have the same issue on two servers running Fedora 38 with different Adaptec coontrollers.

=== Server 1:

# arcconf getconfig 1 AD | grep 'Model'
   Controller Model                         : Adaptec ASR72405

# arcconf getversion 1
Controllers found: 1
Controller #1
==============
Firmware                               : 7.5-0 (32118)
Staged Firmware                        : 7.5-0 (32118)
BIOS                                   : 7.5-0 (32118)
Driver                                 : 1.2-1 (50983)
Boot Flash                             : 7.5-0 (32118)
CPLD (Load version/ Flash version)     : 8/ 10
SEEPROM (Load version/ Flash version)  : 1/ 1

=== Server 2:

# arcconf getconfig 1 AD | grep 'Model'
   Controller Model                           : Adaptec ASR71685

# arcconf getversion 1
Controllers found: 1
Controller #1
==============
Firmware                               : 7.5-0 (32118)
Staged Firmware                        : 7.5-0 (32118)
BIOS                                   : 7.5-0 (32118)
Driver                                 : 1.2-1 (50983)
Boot Flash                             : 7.5-0 (32118)
CPLD (Load version/ Flash version)     : 7/ 10
SEEPROM (Load version/ Flash version)  : 0/ 1

Both controllers have latest firmware.

Last known working kernel: 6.3.12-200<br>
First known non-working kernel: 6.4.4-200

Patch from Comment #22 did not work for me, still getting errors.

Submitted patch was done between two kernels above on 'drivers/scsi/aacraid'. Applied and working on the folling Fedora kernels:

6.5.9-200.fc38
6.5.10-200.fc38
6.5.10-300.fc39
6.5.11-300.fc39
Comment 26 Joop Boonen 2023-11-16 08:45:41 UTC
We have noticed on our Server using an Adaptec ASR8805 RAID controller running Debian 12 i.e. Bookworm Kernel 6.1.55
That we get 100% wait states that causes the system to hang.
top - 12:57:32 up 7 min,  2 users,  load average: 5.02, 1.71, 0.65
Tasks: 451 total,   2 running, 449 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni, 81.8 id, 18.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,100.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257590.5 total, 242751.4 free,  10355.7 used,   6092.0 buff/cache    
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 247234.8 avail Mem

When it's running with a < 6.1.53 Kernel we never see 100% wait states, certainly not staining for a long time.

We also saw repeatedly:
[ 1376.837737] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.841731] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.842412] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.843004] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.843587] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.844169] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.844747] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.845322] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.845906] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.846484] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.847055] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.847628] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.848199] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.848767] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.849336] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.849995] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.850560] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.789765] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.889767] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.890899] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.892002] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.893103] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.897790] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.898918] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.900009] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.901094] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.902199] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.903287] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.904384] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.905472] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.906585] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.907678] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.945954] aacraid: Host bus reset request. SCSI hang ?
[ 1378.946602] aacraid 0000:af:00.0: outstanding cmd: midlevel-0
[ 1378.946607] aacraid 0000:af:00.0: outstanding cmd: lowlevel-0
[ 1378.946610] aacraid 0000:af:00.0: outstanding cmd: error handler-0
[ 1378.946613] aacraid 0000:af:00.0: outstanding cmd: firmware-32
[ 1378.946616] aacraid 0000:af:00.0: outstanding cmd: kernel-0
[ 1378.961850] aacraid 0000:af:00.0: Controller reset type is 3
[ 1378.962435] aacraid 0000:af:00.0: Issuing IOP reset
[ 1412.498211] aacraid 0000:af:00.0: IOP reset succeeded
[ 1412.523256] aacraid: Comm Interface type2 enabled
[ 1424.734176] aacraid 0000:af:00.0: Scheduling bus rescan
[ 1434.755589] aacraid 0000:af:00.0: DDR cache data recovered successfully

On another server that has an Adaptec ASR8405 raid controller running exactly the same Distribution and kernel we don't see this issue at all.

The only major difference is that the system that has the problem has two sockets i.e. CPUs.
This one also has SSD drives, but I don't think this could be an issue?

We have found out that this issue exists since Kernel 6.1.53. 
We found that Kernel 6.1.53 incorporated this patch: 
scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity

https://www.spinics.net/lists/stable-commits/msg313381.html

I think that this ticket is related to this issue.
https://bugzilla.kernel.org/show_bug.cgi?id=217599

and this email/link
https://lore.kernel.org/regressions/4a639fff-445e-455b-9a31-57368d6b7021@leemhuis.info/

We have tested Kernel 6.1.55 like the one in Debian Bookworm with the above-mentioned patch reverted. It worked flawlessly.

Might it be related to multiple CPU sockets i.e. CPUs. As we don't have an issue on a single Socket system.

Both systems have an Intel Xeon CPU(s).
Comment 27 Maxim 2023-11-18 14:23:38 UTC
I have the same issue with openSUSE and Fedora new kernels.
When I installed Leap 15.5 with 5.14 kernel or COPR for Fedora with kernel 5.15 all works fine.

First time I got the issue with kernel 5.19 on Fedora and ASR-72405 controller with connected NetApp DS.

openUSE 6.6.1 dmesg log:
```
[ 7427.739081] aacraid: Host bus reset request. SCSI hang ?
[ 7427.739101] aacraid 0000:08:00.0: outstanding cmd: midlevel-0
[ 7427.739105] aacraid 0000:08:00.0: outstanding cmd: lowlevel-0
[ 7427.739107] aacraid 0000:08:00.0: outstanding cmd: error handler-0
[ 7427.739109] aacraid 0000:08:00.0: outstanding cmd: firmware-48
[ 7427.739111] aacraid 0000:08:00.0: outstanding cmd: kernel-0
[ 7427.765640] aacraid 0000:08:00.0: Controller reset type is 3
[ 7427.765652] aacraid 0000:08:00.0: Issuing IOP reset
[ 7469.875692] aacraid 0000:08:00.0: IOP reset succeeded
[ 7469.936116] aacraid: Comm Interface type2 enabled
[ 7483.472661] aacraid 0000:08:00.0: Scheduling bus rescan
[ 7496.491585] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 7496.491764] sd 0:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
[ 7553.768632] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.768644] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.768650] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.768655] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.768660] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.768664] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.768669] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.818630] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835297] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835306] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835312] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835317] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835322] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835326] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835331] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835335] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835340] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835344] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835348] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835353] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7553.835357] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355616] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355631] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355637] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355642] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355647] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355652] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355657] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355661] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.355666] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7598.382419] aacraid: Host bus reset request. SCSI hang ?
[ 7598.382439] aacraid 0000:08:00.0: outstanding cmd: midlevel-0
[ 7598.382443] aacraid 0000:08:00.0: outstanding cmd: lowlevel-0
[ 7598.382445] aacraid 0000:08:00.0: outstanding cmd: error handler-0
[ 7598.382447] aacraid 0000:08:00.0: outstanding cmd: firmware-30
[ 7598.382449] aacraid 0000:08:00.0: outstanding cmd: kernel-0
[ 7598.402360] aacraid 0000:08:00.0: Controller reset type is 3
[ 7598.402371] aacraid 0000:08:00.0: Issuing IOP reset
[ 7640.363459] aacraid 0000:08:00.0: IOP reset succeeded
[ 7640.422724] aacraid: Comm Interface type2 enabled
[ 7653.650105] aacraid 0000:08:00.0: Scheduling bus rescan
[ 7666.688374] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 7666.688467] sd 0:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
[ 7814.462297] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462308] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462313] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462318] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462322] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462326] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462330] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462334] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.462338] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,1,0):
[ 7814.478923] aacraid: Host bus reset request. SCSI hang ?
[ 7814.478936] aacraid 0000:08:00.0: outstanding cmd: midlevel-0
[ 7814.478938] aacraid 0000:08:00.0: outstanding cmd: lowlevel-0
[ 7814.478940] aacraid 0000:08:00.0: outstanding cmd: error handler-0
[ 7814.478942] aacraid 0000:08:00.0: outstanding cmd: firmware-9
[ 7814.478943] aacraid 0000:08:00.0: outstanding cmd: kernel-0
[ 7814.495578] aacraid 0000:08:00.0: Controller reset type is 3
[ 7814.495587] aacraid 0000:08:00.0: Issuing IOP reset
[ 7856.730598] aacraid 0000:08:00.0: IOP reset succeeded
[ 7856.779371] aacraid: Comm Interface type2 enabled
[ 7869.946607] aacraid 0000:08:00.0: Scheduling bus rescan
[ 7882.985564] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
[ 7882.985661] sd 0:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16).
```
Comment 28 Maxim 2023-11-18 22:47:18 UTC
Just to be correct: name of controller was ASR-71685 (not ASR-72405).

openSUSE tested on ASR-71605, error happens for example on copying big amount of data to disk with high speeds

if you run fsck.ext4 on ext4 file system with buggy kernel it will damage file system and its data

using buggy kernel BTRFS scrub also says that checksums are wrong
Comment 29 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-21 09:54:03 UTC
TWIMC, I raised the issue once more in a mail to the people that should handle this: https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/T/#u
Comment 30 James.Bottomley 2023-11-21 13:24:35 UTC
On Tue, 2023-11-21 at 09:54 +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217599
> 
> --- Comment #29 from The Linux kernel's regression tracker (Thorsten
> Leemhuis) (regressions@leemhuis.info) ---
> TWIMC, I raised the issue once more in a mail to the people that
> should handle this:
>
> https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/T/#u

Switching to email since the bugzilla seems to have stalled.  The
kernel lists will discard text/html email, so if you have email
problems, you can reply by using bugzilla.

Firstly, can as many reporters as possible check to see if reverting
this commit:

https://github.com/torvalds/linux/commit/9dc704dcc09eae7d21b5da0615eb2ed79278f63e

Fixes your problem with an upstream kernel?

Secondly, John Garry asked if you could provide:

> Is there a full kernel log for this hanging system?
> 
> I can only see snippets in the ticket.
> 
> And what does /sys/class/scsi_host/host*/nr_hw_queues show?

Regards,

James
Comment 31 Joop Boonen 2023-11-21 13:30:14 UTC
Created attachment 305451 [details]
The kernel.log file wenn the system is hanging.

The kernel log wenn including when the system hung.

The output of: cat /sys/class/scsi_host/host*/nr_hw_queues

root@ganeti-node2:~# cat /sys/class/scsi_host/host*/nr_hw_queues
32
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
root@ganeti-node2:~#
Comment 33 John Garry 2023-11-21 15:27:15 UTC
Hannes' patch was to revert to using hw queue #0 always for internal commands, and it didn't help.

@Sagar, Could there be any issue in using hw queue #0 for regular SCSI commands? AFAICS, that's a significant change in "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity" patch. Previously we would use fib->vector_no to decide the queue, which was in range (0, dev->max_msix).
Comment 34 Randy 2023-11-22 22:18:58 UTC
Probably of little help, I will share that I can ping-pong the reported behavior by switching between two Flatcar Container Linux releases: alpha 3717.0.0 (kernel 6.1.50) where I DON'T experience the reported behavior and alpha 3732.0.0 (kernel 6.1.54) where I DO experience the reported behavior.
Comment 35 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-11-23 08:02:32 UTC
Would be really great if more people could do what James asked for in https://bugzilla.kernel.org/show_bug.cgi?id=217599#c30 (e.g. check something and if possible try a revert).

(In reply to Randy from comment #34)
> Probably of little help

FWIW, as it was not mentioned here yet. As stated in https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/ it's known that the culprit was picked up for 6.1.53, so earlier versions should be fine.
Comment 36 Sagar 2023-11-23 14:39:20 UTC
Hi all,
Sorry for the delay in response since I was OOO.

We have tried to duplicate this issues on multiple servers with no luck.
I will come up to the speed on the latest activity on the ticket and I plan to attempt to recreate this issue on a server with more cores to see if that will help us dupe the issue.

I am actively working on this and I will keep the ticket updated with my findings.
Thanks
Comment 37 Joop Boonen 2023-11-23 14:58:37 UTC
Hi Sagar,

Have you also tested this on a multi CPU/Socket server?

I've tested this on a single CPU/Socket server, no problem at all (1x Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz).

On a Dual Socket/CPU server I get this issue (2x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz).
Comment 38 Maxim 2023-11-23 17:26:09 UTC
(In reply to Joop Boonen from comment #37)
> Hi Sagar,
> 
> Have you also tested this on a multi CPU/Socket server?
> 
> I've tested this on a single CPU/Socket server, no problem at all (1x
> Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz).
> 
> On a Dual Socket/CPU server I get this issue (2x Intel(R) Xeon(R) Silver
> 4210 CPU @ 2.20GHz).

As I understand him they tested 8-series controllers. But we all reported about 7-series (71605Z, 71605, 71685).

I do not know if it happens on 8-series, unfortunately I do not have enough HDDs for now to check with 8805.
Comment 39 Hannes Reinecke 2023-11-24 06:57:53 UTC
Created attachment 305466 [details]
0001-aacraid-fix-vector-calculation-when-submitting-command.patch

aacraid: fix vector calculation when submitting commands.
Comment 40 Hannes Reinecke 2023-11-24 06:58:19 UTC
Next idea; can you try with the above patch?
Comment 41 Joop Boonen 2023-11-24 12:19:44 UTC
Created attachment 305469 [details]
Test result of 0001-aacraid-fix-vector-calculation-when-submitting-command.patch

I've tested patch 0001-aacraid-fix-vector-calculation-when-submitting-command.patch on vanilla kernel 6.1.55 .
Doesn't boot properly, I don't know if this patch should work on this kernel version.
I've attached the output of dmesg
Comment 42 Fabian Grünbichler 2023-11-28 14:15:53 UTC
while testing the proposed patch 0001-aacraid-fix-vector-calculation-when-submitting-command.patch I also ran into the system not booting - this is even the case for systems not using aacraid at all ;) I tested with 6.5 based on Ubuntu's kernel, which is also affected (the VM in question was actually running Proxmox VE 8.1, since that is also affected: https://bugzilla.proxmox.com/show_bug.cgi?id=5077 )
Comment 43 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-06 13:52:16 UTC
So it looks like Hannes' patch didn't help (thx for trying!) and things stalled again since about a week. Anyone still working on it? Or is a revert of the culprit slowly becoming the least worst option?
Comment 44 Martin K. Petersen 2023-12-08 17:20:50 UTC
Looks like we have lost momentum getting this fixed. I have queued a revert for now.
Comment 45 Sagar 2023-12-09 00:56:30 UTC
I am looking into this issue actively.
My efforts to dupe this locally on a machine with 2 CPUs is underway, and I will keep this ticket updated.

If we happen to dupe, and find a fix - then we can consider the patch queued for revert with some tweak.
Comment 46 encore2097 2023-12-09 21:13:01 UTC
Hello,

I'm also experiencing this issue on a single core system using the controller as an HBA with a 10 disk ZFS pool:

Card: Adaptec ASR-71605
CPU: AMD Ryzen 5950X
OS: debian bookworm
kernel: Linux stratus 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux

Syslog:
kernel: aacraid: Outstanding commands on (0,1,4,0):
kernel: aacraid: Host adapter abort request.
kernel: aacraid: Outstanding commands on (0,1,4,0):
kernel: aacraid: Host adapter abort request.
kernel: aacraid: Outstanding commands on (0,1,11,0):
kernel: aacraid: Host adapter abort request.
kernel: aacraid: Outstanding commands on (0,1,4,0):
kernel: aacraid: Host adapter abort request.
kernel: aacraid: Outstanding commands on (0,1,4,0):
kernel: aacraid: Host bus reset request. SCSI hang ?
kernel: aacraid 0000:0c:00.0: outstanding cmd: midlevel-0
kernel: aacraid 0000:0c:00.0: outstanding cmd: lowlevel-0
kernel: aacraid 0000:0c:00.0: outstanding cmd: error handler-0
kernel: aacraid 0000:0c:00.0: outstanding cmd: firmware-28
kernel: aacraid 0000:0c:00.0: outstanding cmd: kernel-0
kernel: aacraid 0000:0c:00.0: Controller reset type is 3
kernel: aacraid 0000:0c:00.0: Issuing IOP reset
kernel: aacraid 0000:0c:00.0: IOP reset succeeded
kernel: aacraid: Comm Interface type2 enabled
kernel: aacraid 0000:0c:00.0: Scheduling bus rescan


Controller config:

./arcconf getconfig 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Controller Mode                          : HBA
   Channel description                      : SAS/SATA
   Controller Model                         : Adaptec ASR71605
   Controller Serial Number                 : ##########
   Temperature                              : 46 C/ 114 F (Normal)
   Installed memory                         : 1024 MB
   Copyback                                 : Disabled
   Background consistency check             : Disabled
   Automatic Failover                       : Enabled
   Global task priority                     : High
   Performance Mode                         : Default/Dynamic
   Host bus type                            : PCIe
   Host bus speed                           : 8000 MHz
   Host bus link width                      : 8 bit(s)/link(s)
   Stayawake period                         : Disabled
   Spinup limit internal drives             : 0
   Spinup limit external drives             : 0
   Defunct disk drive count                 : 0
   Logical devices/Failed/Degraded          : 0/0/0
   NCQ status                               : Enabled
   Statistics data collection mode          : Disabled
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 7.5-0 (32118)
   Firmware                                 : 7.5-0 (32118)
   Driver                                   : 1.2-1 (50983)
   Boot Flash                               : 7.5-0 (32118)

   --------------------------------------------------------
   Controller Cache Backup Unit Information
   --------------------------------------------------------

    Overall Backup Unit Status              : Not Ready

         Backup Unit Type                   : AFM-700/700LP
         Non-Volatile Storage Status        : Ready
         Supercap Status                    : Fatal

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
   No logical devices configured

----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
      Device #0
         Device is a Hard drive
         State                              : Raw (Pass Through)
         Block Size                         : 4K
         Supported                          : Yes
         Transfer Speed                     : SATA 6.0 Gb/s
         Reported Channel,Device(T:L)       : 0,0(0:0)
         Reported Location                  : Connector 0, Device 0
         Vendor                             : ATA
         Model                              : WDC WD140EDFZ-11
         Firmware                           : 81.00A81
         Serial number                      : #####
         World-wide name                    : #####
         Reserved Size                      : 0 KB
         Used Size                          : 0 MB
         Unused Size                        : 13351936 MB
         Total Size                         : 13351936 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
         S.M.A.R.T. warnings                : 0
         Power State                        : Full rpm
         Supported Power States             : Full rpm,Powered off,Reduced rpm
         SSD                                : No
         NCQ status                         : Enabled
      Device #1
         Device is a Hard drive
         State                              : Raw (Pass Through)
         Block Size                         : 4K
         Supported                          : Yes
         Transfer Speed                     : SATA 6.0 Gb/s
         Reported Channel,Device(T:L)       : 0,1(1:0)
         Reported Location                  : Connector 0, Device 1
         Vendor                             : ATA
         Model                              : WDC WD140EDFZ-11
         Firmware                           : 81.00A81
         Serial number                      : #####
         World-wide name                    : #####
         Reserved Size                      : 0 KB
         Used Size                          : 0 MB
         Unused Size                        : 13351936 MB
         Total Size                         : 13351936 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
         S.M.A.R.T. warnings                : 0
         Power State                        : Full rpm
         Supported Power States             : Full rpm,Powered off,Reduced rpm
         SSD                                : No
         NCQ status                         : Enabled
      Device #2
         Device is a Hard drive
         State                              : Raw (Pass Through)
         Block Size                         : 4K
         Supported                          : Yes
         Transfer Speed                     : SATA 6.0 Gb/s
         Reported Channel,Device(T:L)       : 0,4(4:0)
         Reported Location                  : Connector 1, Device 0
         Vendor                             : ATA
         Model                              : WDC WD140EDFZ-11
         Firmware                           : 81.00A81
         Serial number                      : #####
         World-wide name                    : #####
         Reserved Size                      : 0 KB
         Used Size                          : 0 MB
         Unused Size                        : 13351936 MB
         Total Size                         : 13351936 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
         S.M.A.R.T. warnings                : 0
         Power State                        : Full rpm
         Supported Power States             : Full rpm,Powered off,Reduced rpm
         SSD                                : No
         NCQ status                         : Enabled

... snip for brevity ...

      Device #9
         Device is a Hard drive
         State                              : Raw (Pass Through)
         Block Size                         : 4K
         Supported                          : Yes
         Transfer Speed                     : SATA 6.0 Gb/s
         Reported Channel,Device(T:L)       : 0,11(11:0)
         Reported Location                  : Connector 2, Device 3
         Vendor                             : ATA
         Model                              : WDC WD140EDFZ-11
         Firmware                           : 81.00A81
         Serial number                      : #####
         World-wide name                    : #####
         Reserved Size                      : 0 KB
         Used Size                          : 0 MB
         Unused Size                        : 13351936 MB
         Total Size                         : 13351936 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
         S.M.A.R.T. warnings                : 0
         Power State                        : Full rpm
         Supported Power States             : Full rpm,Powered off,Reduced rpm
         SSD                                : No
         NCQ status                         : Enabled



Command completed successfully.


zfs errors:
zed: eid=8 class=delay pool='tank' vdev=hdd-14-3 size=4096 offset=4762294419456 priority=3 err=0 flags=0x180880 delay=163243ms bookmark=643:0:0:75777
zed: eid=10 class=delay pool='tank' vdev=hdd-14-3 size=8192 offset=4766619205632 priority=1 err=0 flags=0x180880 delay=115074ms bookmark=515:456914:0:0
zed: eid=9 class=delay pool='tank' vdev=hdd-14-3 size=4096 offset=2135816388608 priority=3 err=0 flags=0x180880 delay=163242ms bookmark=643:0:1:74
zed: eid=11 class=delay pool='tank' vdev=hdd-14-3 size=16384 offset=13953519005696 priority=1 err=0 flags=0x180880 delay=115074ms bookmark=515:0:-2:1
zed: eid=12 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:8
zed: eid=13 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405183991808 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:12
zed: eid=14 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405183995904 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:13
zed: eid=16 class=delay pool='tank' vdev=hdd-14-8 size=4096 offset=12405183959040 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:1
zed: eid=15 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405184004096 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:16
zed: eid=17 class=delay pool='tank' vdev=hdd-14-8 size=4096 offset=12405183963136 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:0
zed: eid=18 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183979520 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:7
zed: eid=19 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:9
zed: eid=20 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183995904 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:14
zed: eid=21 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183991808 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:11
zed: eid=22 class=delay pool='tank' vdev=hdd-14-5 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:9
zed: eid=23 class=delay pool='tank' vdev=hdd-14-5 size=4096 offset=12405183979520 priority=0 err=0 flags=0x180880 delay=172383ms bookmark=515:780332:0:7
zed: eid=24 class=delay pool='tank' vdev=hdd-14-5 size=4096 offset=12405183991808 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:11
zed: eid=25 class=delay pool='tank' vdev=hdd-14-2 size=4096 offset=12405184008192 priority=0 err=0 flags=0x180880 delay=172384ms bookmark=515:780332:0:18
zed: eid=26 class=delay pool='tank' vdev=hdd-14-2 size=4096 offset=12405184020480 priority=0 err=0 flags=0x180880 delay=172384ms bookmark=515:780332:0:20
zed: eid=27 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183971328 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:4
zed: eid=28 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183975424 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:6
zed: eid=29 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183987712 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:12
zed: eid=30 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:9
zed: eid=31 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183995904 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:14
zed: eid=32 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405184000000 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:16
zed: eid=33 class=delay pool='tank' vdev=hdd-14-1 size=4096 offset=12405183975424 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:5
Comment 47 Sagar 2023-12-16 04:07:58 UTC
Hi Joop, Maxim et all,
I have been trying to duplicate this issue on two different machines with 2 CPUs and little luck. I was curious to know the magic ingredient that is missing in the setup.

I have series-7 controllers on both machines, 3 drives attached to both the controllers, with four Raid-5 arrays created on these drives.
I am running fio on all the arrays (/dev/sdb to /dev/sde).

Could you share the details of tool (or any script?) that is being run on the system when you see the issue?

I am mentioning both the configs here, for your reference, and please let me know if something seems conspicuous to you. Also it would really help me if you give similar details other than what has already been mentioned.

Thanks
Sagar



Config-1 Details
System Information
	Manufacturer: Supermicro
	Product Name: SYS-220U-TNR
	Version: 0123456789
	Serial Number: S411795X2826083

BIOS Information
	Vendor: American Megatrends International, LLC.
	Version: 1.4
	BIOS Revision: 5.22

System Slot Information
	Designation: RSC-W2-8888G4 SLOT3 PCI-E 4.0 X8
	Type: x8 PCI Express 4 x8
	Current Usage: In Use
	Length: Short
	ID: 3
	Characteristics:
		3.3 V is provided
		PME signal is supported
	Bus Address: 0000:4b:00.0

Processor Information
	Socket Designation: CPU1
	Type: Central Processor
	Family: Xeon
	Manufacturer: Intel(R) Corporation
	ID: A6 06 06 00 FF FB EB BF
	Signature: Type 0, Family 6, Model 106, Stepping 6
	Version: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
	Core Count: 12
	Core Enabled: 12
	Thread Count: 24

	Socket Designation: CPU2
	Type: Central Processor
	Family: Xeon
	Manufacturer: Intel(R) Corporation
	ID: A6 06 06 00 FF FB EB BF
	Signature: Type 0, Family 6, Model 106, Stepping 6
	Version: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
	Core Count: 12
	Core Enabled: 12
	Thread Count: 24
	

Controller Details
Controller			: ASR71605
BIOS				: 7.6-0 (32136)
Firmware			: 7.6-0 (32136)
Driver				: 1.2-1 (50983)
Boot Flash			: 2.57-0 (432)
CPLD (Load version/ Flash version)	: 8/ 8
SEEPROM (Load version/ Flash version)	: 1/ 1


uname -r
6.4.0


lscpu

Architecture:		x86_64
CPU op-mode(s):	32-bit, 64-bit
Address sizes:		46 bits physical, 57 bits virtual
Byte Order:		Little Endian
CPU(s):			48
On-line CPU(s) list:	0-47
Vendor ID:		GenuineIntel
BIOS Vendor ID:		Intel(R) Corporation
Model name:		Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
BIOS Model name:	Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
CPU family:		6
Model:			106
Thread(s) per core:	2
Core(s) per socket:	12
Socket(s):		2
Stepping:		6
NUMA node(s):		2
NUMA node0 CPU(s):	0-11,24-35
NUMA node1 CPU(s):	12-23,36-47


lspci -s 4b:00.0 -k
4b:00.0 RAID bus controller: Adaptec Series 7 6G SAS/PCIe 3 (rev 01)
        Subsystem: Adaptec Series 7 - ASR-71605 - 16 internal 6G SAS Port/PCIe 3.0
        Kernel driver in use: aacraid
        Kernel modules: aacraid



Config-2 Details
System Information
        Manufacturer: HPE
        Product Name: ProLiant DL380 Gen11
        Version: Not Specified
        Serial Number: CNX2070BND

BIOS Information
        Vendor: HPE
        Version: 1.40
        Release Date: 06/01/2023


CPU Information
Processor Information
	Socket Designation: Proc 1
	Type: Central Processor
	Family: Xeon
	Manufacturer: Intel(R) Corporation
	Signature: Type 0, Family 6, Model 143, Stepping 6
	Version: Intel(R) Xeon(R) Platinum 8454H
	Core Count: 32
	Core Enabled: 32
	Thread Count: 64
	

	Socket Designation: Proc 2
	Type: Central Processor
	Family: Xeon
	Manufacturer: Intel(R) Corporation
	ID: F6 06 08 00 FF FB EB BF
	Signature: Type 0, Family 6, Model 143, Stepping 6
	Version: Intel(R) Xeon(R) Platinum 8454H
	Core Count: 32
	Core Enabled: 32
	Thread Count: 64


Controller Information
Controller				: ASR7805
BIOS					: 7.5-0 (32118)
Firmware				: 7.5-0 (32118)
Driver					: 1.2-1 (50983)
Boot Flash				: 7.5-0 (32118)
CPLD (Load version/ Flash version)	: 7/ 10
SEEPROM (Load version/ Flash version)	: 0/ 1

uname -r
6.4.0


Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  128
  On-line CPU(s) list:   0-127
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Platinum 8454H
    BIOS Model name:     Intel(R) Xeon(R) Platinum 8454H
    CPU family:          6
    Model:               143
    Thread(s) per core:  2
    Core(s) per socket:  32
    Socket(s):           2
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-31,64-95
  NUMA node1 CPU(s):     32-63,96-127


lspci -s 23:00.0 -k
23:00.0 RAID bus controller: Adaptec Series 7 6G SAS/PCIe 3 (rev 01)
        Subsystem: Adaptec Series 7 - ASR-7805 - 8 internal 6G SAS Port/PCIe 3.0
        Kernel driver in use: aacraid
        Kernel modules: aacraid
Comment 48 encore2097 2023-12-16 05:35:26 UTC
Hi Sagar,

I'm using a setup with 10 SATA disks in HBA mode and running a zfs raidz2 filesystem (akin to raid-6). This is a single CPU system so I don't believe the CPU count is the main issue here —- although its likely related.

From examining the logs, doing some research, and drawing from my experience, it seems that timeouts and queues are the primary culprits. My suspicion is that during heavy loads, there's an overflow somewhere in the stack (could be in the kernel driver, firmware, or hardware), causing I/O requests to get lost and timeout. After a series of these timeouts, the driver triggers an error and resets the adapter.

I stumbled upon threads dating back to around 2017 where users faced similar issues (check this one: https://forum.proxmox.com/threads/pve-5-1-aacraid-scsi-hang.38259/). One suggestion for a fix was to extend the disk timeout window for waiting on I/O. However, the current kernel (set at 60s) has already doubled the previous value of 30s, which makes me think it might not be the root cause but is also related.

I'm not sure of the physical disk setup of other users connecting to their controllers, but I reliably see this issue with my 10 disk setup so my recommendation would be to increase the number of disks attached to the controller and stress test it with simultaneous sequential and random I/O using tools like dd and fio at the same time. 

My specific use case involves a file server and database with multiple users. I consistently observe the adapter aborting requests and resetting a few minutes after boot, when the file server and database applications start and warm up their caches (cache size is approximately 120GB in RAM).

Upon further investigation, I found that anyone experiencing this issue could gather more information by modifying aacraid with dump_stack() added around line 713 of linux/latest/source/drivers/scsi/aacraid/linit.c within aac_eh_abort (refer to this: https://stackoverflow.com/questions/32557040/how-to-get-stack-trace-at-various-points-in-kernel-device-driver-code).

Unfortunately, due to unacceptable downtime I had to revert my system to a different HBA and lack spare systems to test with.

Best regards.
Comment 49 Maxim 2023-12-16 22:00:15 UTC
I used 2 raid arrays:

- RAID0 with 8 SATA drives (8x4TB)
- RAID0 with 2 SATA drives (2x16TB)

Both was with LUKS and Ext4/BTRFS (tried both) on that moment.

To reproduce it I just started copying files from one array (2nd one) to another using some file manager like Midnight Commander.

During copying it can process some amount of data before problem happens and message appears in dmesg output. When it happens copying becomes slow, all hangs, and finally the copying is rejected.

btrfs scrub gives similar result, it says that data is corrupt after starting scanning (maybe 200-300 GB is scanned OK before it happens).

It is single CPU system, I never used multi-socket MB in in such scenarios.

I do not think you need fio, I think you need to move a lot of data to array (for example big media files, VM images, backup files and so on).

2nd time when I got the same issue was different system, and it is also copying of data from RAID0 10x1TB SSD to RAID0 6x8TB HDD (LUKS and BTRFS).

--

I do not think it is hardware issue because on old kernels it works fine with the same settings of the array. If FS is not damaged I can just boot in older kernel and all will work fine on the same file system. Usually FS will not be damaged if you not try to repair it by fsck.ext2 using problematic aacraid driver.
Comment 50 Denis V. Kuznetsov 2023-12-18 06:58:06 UTC
Hi.

I have the same problem after update Proxmox 8.0 -> 8.1 (kernel version 6.2.19 to 6.5.11).

My config is:
Controller Model : Adaptec ASR81605Z
BIOS             : 7.16-0 (33456)
Firmware         : 7.16-0 (33456)
Driver           : 1.2-1 (50983)
Boot Flash       : 7.16-0 (33456)

I use ext4 over lvm volume (neither BTRFS, LUKS or ZFS) and have same problem with periodical hangs of adapter:

2023-12-17T20:02:57.135482+03:00 ve5 kernel: [ 9568.092740] aacraid: Outstanding commands on (0,0,0,0):
2023-12-17T20:02:57.135483+03:00 ve5 kernel: [ 9568.093590] aacraid: Host adapter abort request.
2023-12-17T20:02:57.135484+03:00 ve5 kernel: [ 9568.093590] aacraid: Outstanding commands on (0,0,0,0):
2023-12-17T20:03:30.675479+03:00 ve5 kernel: [ 9601.630477] aacraid: Host bus reset request. SCSI hang ?
Comment 51 Vladimir 2023-12-18 07:14:11 UTC
Hi,

I was able to bisect this issue down to https://github.com/torvalds/linux/commit/9dc704dcc09eae7d21b5da0615eb2ed79278f63e

I'm using Adaptec RAID 8405 and 6.1.68 kernel with this applied in reverse and everything went back to normal.

I hope this patch could be reverted in 6.1.x and mainline.

Cheers
Comment 52 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-12-18 07:31:18 UTC
FWIW, the revert is queued for 10+ days already, just sadly was not sent to Linus yet:
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=fixes&id=c5becf57dd5659c687d41d623a69f42d63f59eb2
Comment 53 Rafał 2023-12-21 07:49:55 UTC
I don't know if this is exactly related, but I got here due to the dmesg matching.

I'm using a 8805 card, on a Supermicro H12SSL-I motherboard and an single Epyc 7302 CPU.

On 6.1.0-15 I can't boot/mount a dependent drive (this error happens).
On 6.1.0-10 I can use my system fine.
Comment 54 Samuel Wolf 2023-12-30 00:22:00 UTC
We are also affected by this issue on Debian 12 machines.

Adaptec ASR8805

BIOS                                     : 7.18-0 (33556)
Firmware                                 : 7.18-0 (33556)
Driver                                   : 1.2-1 (50983)
Boot Flash                               : 7.18-0 (33556)

See also Debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624
Comment 55 Samuel Wolf 2023-12-30 12:41:12 UTC
We see this abort request / hang on Debian 12 with the ASR8805 two or three months ago.
Since this was not the newest server we replaced this system because we thought this
was an hardware issue.

[two month later]

But now, this week we upgraded the next server with ASR8805
from Debian 11.8 to 12.4 and saw exactly the same issue (Debian 6.1.67-1).

This could not be the same hardware error, so we found this bug
report and opened on at Debian bug tracker [1].

Salvatore build a test kernel [2] and we fired up the old "faulty" server for testing.
With the new knowledge we was able to reproduce [3] this with an ASR8805 raid6 and 58TB LUKS drive.

luksOpen and mount reproduce this every time with kernel 6.1.67-1 and need ~ 1 minute (because the hang and reset request, I guess). Now booting into Salvatore's test kernel (6.1.67-1a~test) and tried the same again, no issue and the luksOpen mount was really quick.

This bug was more serious than I guess and it needed some time to get this puzzle together.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624#30
[3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624#47
Comment 56 Salvatore Bonaccorso 2024-01-06 19:38:33 UTC
#regzbot fixed-by: c5becf57dd56
#regzbot fixed-by: 71758d4d87ef
#regzbot fixed-by: 72e472a91c0d
Comment 57 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-01-07 10:11:42 UTC
(In reply to Salvatore Bonaccorso from comment #56)
> #regzbot fixed-by: […]

Thx, but this confused regzbot a bit, as it tracks the issue as a mainline commit only this is needed:

#regzbot fixed-by: c5becf57dd56
Comment 58 Salvatore Bonaccorso 2024-01-07 10:48:08 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #57)
> (In reply to Salvatore Bonaccorso from comment #56)
> > #regzbot fixed-by: […]
> 
> Thx, but this confused regzbot a bit, as it tracks the issue as a mainline
> commit only this is needed:
> [...]

apologies for that, this was not my intention (and cause more work)! I though we can track as well the regression fixes in the stable series with it.
Comment 59 Sagar 2024-01-25 03:07:51 UTC
I have duplicated the issue locally and we are able to see the issue consistently.
I am currently debugging the issue. I will keep posting the updates.

Thank you
Comment 60 Netix 2024-01-25 18:30:08 UTC
I had the same issue and I've put my x16 pcie slot in x8 and it seems to have considerably reduce the occurence I had while running on unRAID 6.12.6.
Comment 61 Netix 2024-02-01 19:46:21 UTC
(In reply to Netix from comment #60)
> I had the same issue and I've put my x16 pcie slot in x8 and it seems to
> have considerably reduce the occurence I had while running on unRAID 6.12.6.

Small update. I've made the switch (pcie slot to x8) on January 19th and since then it didn't happen at all. No occurence. Still running unRAID 6.12.6.
Comment 62 rgpublic 2024-02-14 13:11:05 UTC
Could someone perhaps clarify a few things, please? Which versions and hardware are really affected by this? We're seeing this with an Adaptec ASR8405 after an Ubuntu upgrade 23.04 => 23.10. The mentioned Debian issue (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624) talks about an ASR8805 controller. So this isn't only about 7-series devices it seems. And I wonder whether we could downgrade to a version that still works but I'm unsure which is the last version that didn't have this bug...? An upgrade isn't possible I guess as this bug isn't fixed yet, right? But why does the Debian issue say: "We believe that the bug you reported is fixed in the latest version".?
Comment 63 Joop Boonen 2024-02-14 14:10:56 UTC
From Debians Bookworm linux-image-amd64 changelog [1]:
<q>
  * Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity"
    (Closes: #1059624)
</q>


Debian Has reverted the Patch. 
I've tested on a dual Intel Socket Server with a  
RAID bus controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01) 
Subsystem: Adaptec Series 8 - ASR-8805 - 8 internal 0 external 12G SAS Port/PCIe 3.0

I didn't experience any Problems any more.

[1] https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_6.1.76+1_changelog
Comment 64 Maxim 2024-10-13 00:52:23 UTC
I am still encountering issues with the aacraid driver.

While the reverted version has shown significant improvement in terms of stability and consistency, and I no longer face issues after the system has successfully booted, there is still one persistent problem: the system hangs for 120 seconds during the boot process. It is related to NetApp disk shelf (0000:03:00.0) connected to ASR-78165.

For example Ubuntu Server 24.04 (live ISO):

$ cat dmesg | grep aacraid

[    0.915806] kernel: Adaptec aacraid driver 1.2.1[50983]-custom
[    0.916680] kernel: aacraid 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
[    0.928906] kernel: aacraid: Comm Interface type2 enabled
[    0.958831] kernel: aacraid 0000:03:00.0: 64 Bit DAC enabled
[    0.975916] kernel: scsi host0: aacraid
[    1.275877] kernel: aacraid: Host bus reset request. SCSI hang ?
[    1.277602] kernel: aacraid 0000:03:00.0: outstanding cmd: midlevel-1
[    1.279315] kernel: aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
[    1.280986] kernel: aacraid 0000:03:00.0: outstanding cmd: error handler-0
[    1.282672] kernel: aacraid 0000:03:00.0: outstanding cmd: firmware-0
[    1.284357] kernel: aacraid 0000:03:00.0: outstanding cmd: kernel-0
[    1.292931] kernel: aacraid 0000:03:00.0: Controller reset type is 3
[    1.294620] kernel: aacraid 0000:03:00.0: Issuing IOP reset
[   34.039710] kernel: aacraid 0000:03:00.0: IOP reset succeeded
[   34.045955] kernel: aacraid: Comm Interface type2 enabled
[   49.015695] kernel: aacraid 0000:03:00.0: Scheduling bus rescan
[   59.522832] kernel: aacraid: Host bus reset request. SCSI hang ?
[   59.525823] kernel: aacraid 0000:03:00.0: outstanding cmd: midlevel-1
[   59.528323] kernel: aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
[   59.529023] kernel: aacraid 0000:03:00.0: outstanding cmd: error handler-0
[   59.529707] kernel: aacraid 0000:03:00.0: outstanding cmd: firmware-0
[   59.530372] kernel: aacraid 0000:03:00.0: outstanding cmd: kernel-0
[   59.537907] kernel: aacraid 0000:03:00.0: Controller reset type is 3
[   59.538609] kernel: aacraid 0000:03:00.0: Issuing IOP reset
[   91.184486] kernel: aacraid 0000:03:00.0: IOP reset succeeded
[   91.191966] kernel: aacraid: Comm Interface type2 enabled
[  106.042633] kernel: aacraid 0000:03:00.0: Scheduling bus rescan
[  116.351867] kernel: aacraid: Host bus reset request. SCSI hang ?
[  116.354649] kernel: aacraid 0000:03:00.0: outstanding cmd: midlevel-1
[  116.357268] kernel: aacraid 0000:03:00.0: outstanding cmd: lowlevel-0
[  116.357904] kernel: aacraid 0000:03:00.0: outstanding cmd: error handler-0
[  116.358523] kernel: aacraid 0000:03:00.0: outstanding cmd: firmware-0
[  116.359113] kernel: aacraid 0000:03:00.0: outstanding cmd: kernel-0
[  116.366903] kernel: aacraid 0000:03:00.0: Controller reset type is 3
[  116.367580] kernel: aacraid 0000:03:00.0: Issuing IOP reset
[  147.979313] kernel: aacraid 0000:03:00.0: IOP reset succeeded
[  147.981970] kernel: aacraid: Comm Interface type2 enabled
[  162.916919] kernel: aacraid 0000:03:00.0: Scheduling bus rescan


As I mentioned previously, the 5.15 kernel version is fully functional and performs consistently well. The same boot issue occurs even with RHEL8 forks, such as Rocky 8.10, likely because the driver has been backported.

On the other hand, Ubuntu 22.04, which relies on the 5.15 kernel, works exceptionally well. This is the reason I can't use OpenSUSE Leap 15.6 and must stick with Ubuntu.

In other words, some errors still exist in the driver’s source code, and a bisect starting from 5.15 is needed to resolve them.
Comment 65 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-14 07:29:46 UTC
(In reply to Maxim from comment #64)
>
> In other words, some errors still exist in the driver’s source code, and a
> bisect starting from 5.15 is needed to resolve them.

That's a pity, but best discussed in a new ticket, as things otherwise get highly confusing. Could you file one and then drop a link to it here? And I guess we really need a bisection to make progress here. When performing one, apply the revert with "git cherry-pick --no-commit c5becf57dd56" at each bisection step while in the range containing 9dc704dcc09ea)
Comment 66 Sagar 2024-10-21 22:56:23 UTC
Hi Maxim, Thorsten,

Currently I am investigating this issue.
I do not have a definite timeline on the fix, but I will post an update here once I am certain of the solution and the testing timeline.

Thanks
Sagar