The controller works fine for a few minutes. Then it hangs for a few tens of seconds to a few minutes, then also works normally for a while. This bug is present in the 6.4.0 kernel release (6.3.9 works without hanging) The messages in dmesg are as follows [ 287.137901] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137909] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137912] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137914] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137916] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137919] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137921] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137924] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137926] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137928] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137930] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137933] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137934] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137937] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137939] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137941] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137943] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137945] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137947] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137949] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137951] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137952] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137954] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137956] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137958] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137960] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137962] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137964] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137966] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137967] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137969] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.137971] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 287.157697] aacraid: Host bus reset request. SCSI hang ? [ 287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 [ 287.157708] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 [ 287.157709] aacraid 0000:02:00.0: outstanding cmd: error handler-0 [ 287.157711] aacraid 0000:02:00.0: outstanding cmd: firmware-32 [ 287.157712] aacraid 0000:02:00.0: outstanding cmd: kernel-0 [ 287.167040] aacraid 0000:02:00.0: Controller reset type is 3 [ 287.167042] aacraid 0000:02:00.0: Issuing IOP reset [ 321.029712] aacraid 0000:02:00.0: IOP reset succeeded [ 321.066201] numacb=512 ignored [ 321.066843] aacraid: Comm Interface type2 enabled [ 344.845370] aacraid 0000:02:00.0: Scheduling bus rescan [ 358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 442.109147] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109155] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109158] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109160] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109162] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109164] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109166] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109168] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109170] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109172] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109174] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109176] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109178] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109179] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109181] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109183] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109185] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109187] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109189] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109191] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109193] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109194] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109196] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109198] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109200] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109201] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109203] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109205] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109207] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109208] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.109210] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.137144] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 442.154292] aacraid: Host bus reset request. SCSI hang ? [ 442.154302] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 [ 442.154305] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 [ 442.154307] aacraid 0000:02:00.0: outstanding cmd: error handler-0 [ 442.154308] aacraid 0000:02:00.0: outstanding cmd: firmware-32 [ 442.154310] aacraid 0000:02:00.0: outstanding cmd: kernel-0 [ 442.171131] aacraid 0000:02:00.0: Controller reset type is 3 [ 442.171133] aacraid 0000:02:00.0: Issuing IOP reset [ 476.040983] aacraid 0000:02:00.0: IOP reset succeeded [ 476.078055] numacb=512 ignored [ 476.078606] aacraid: Comm Interface type2 enabled [ 494.747632] aacraid 0000:02:00.0: Scheduling bus rescan [ 507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16).
(In reply to pheidologeton from comment #0) > The controller works fine for a few minutes. Then it hangs for a few tens of > seconds to a few minutes, then also works normally for a while. This bug is > present in the 6.4.0 kernel release (6.3.9 works without hanging) > The messages in dmesg are as follows > > [ 287.137901] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137909] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137912] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137914] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137916] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137919] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137921] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137924] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137926] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137928] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137930] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137933] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137934] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137937] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137939] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137941] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137943] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137945] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137947] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137949] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137951] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137952] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137954] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137956] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137958] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137960] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137962] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137964] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137966] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137967] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137969] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137971] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.157697] aacraid: Host bus reset request. SCSI hang ? > [ 287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 > [ 287.157708] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 287.157709] aacraid 0000:02:00.0: outstanding cmd: error handler-0 > [ 287.157711] aacraid 0000:02:00.0: outstanding cmd: firmware-32 > [ 287.157712] aacraid 0000:02:00.0: outstanding cmd: kernel-0 > [ 287.167040] aacraid 0000:02:00.0: Controller reset type is 3 > [ 287.167042] aacraid 0000:02:00.0: Issuing IOP reset > [ 321.029712] aacraid 0000:02:00.0: IOP reset succeeded > [ 321.066201] numacb=512 ignored > [ 321.066843] aacraid: Comm Interface type2 enabled > [ 344.845370] aacraid 0000:02:00.0: Scheduling bus rescan > [ 358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ > CAPACITY(16). > [ 442.109147] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109155] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109158] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109160] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109162] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109164] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109166] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109168] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109170] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109172] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109174] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109176] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109178] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109179] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109181] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109183] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109185] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109187] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109189] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109191] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109193] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109194] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109196] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109198] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109200] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109201] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109203] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109205] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109207] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109208] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109210] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.137144] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.154292] aacraid: Host bus reset request. SCSI hang ? > [ 442.154302] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 > [ 442.154305] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 442.154307] aacraid 0000:02:00.0: outstanding cmd: error handler-0 > [ 442.154308] aacraid 0000:02:00.0: outstanding cmd: firmware-32 > [ 442.154310] aacraid 0000:02:00.0: outstanding cmd: kernel-0 > [ 442.171131] aacraid 0000:02:00.0: Controller reset type is 3 > [ 442.171133] aacraid 0000:02:00.0: Issuing IOP reset > [ 476.040983] aacraid 0000:02:00.0: IOP reset succeeded > [ 476.078055] numacb=512 ignored > [ 476.078606] aacraid: Comm Interface type2 enabled > [ 494.747632] aacraid 0000:02:00.0: Scheduling bus rescan > [ 507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ > CAPACITY(16). Can you do bisection between v6.3 and v6.4 please?
Created attachment 304479 [details] attachment-7198-0.html I don't understand a bit, since I use a translator. I can attach dmesg as files from kernel 6.4 and 6.3.9 -------- Исходное сообщение -------- 27 июн. 2023 г., 04:31, написал: > https://bugzilla.kernel.org/show_bug.cgi?id=217599 Bagas Sanjaya > (bagasdotme@gmail.com) changed: What |Removed |Added > ---------------------------------------------------------------------------- > CC| |bagasdotme@gmail.com --- Comment #1 from Bagas Sanjaya > (bagasdotme@gmail.com) --- (In reply to pheidologeton from comment #0) > The > controller works fine for a few minutes. Then it hangs for a few tens of > > seconds to a few minutes, then also works normally for a while. This bug is > > present in the 6.4.0 kernel release (6.3.9 works without hanging) > The > messages in dmesg are as follows > > [ 287.137901] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137909] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137912] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137914] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137916] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137919] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137921] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137924] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137926] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137928] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137930] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137933] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137934] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137937] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137939] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137941] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137943] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137945] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137947] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137949] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137951] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137952] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137954] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137956] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137958] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137960] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137962] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137964] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137966] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.137967] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 287.137969] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137971] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 287.157697] aacraid: Host bus reset request. SCSI hang ? > [ > 287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 > [ 287.157708] > aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 287.157709] aacraid > 0000:02:00.0: outstanding cmd: error handler-0 > [ 287.157711] aacraid > 0000:02:00.0: outstanding cmd: firmware-32 > [ 287.157712] aacraid > 0000:02:00.0: outstanding cmd: kernel-0 > [ 287.167040] aacraid 0000:02:00.0: > Controller reset type is 3 > [ 287.167042] aacraid 0000:02:00.0: Issuing IOP > reset > [ 321.029712] aacraid 0000:02:00.0: IOP reset succeeded > [ > 321.066201] numacb=512 ignored > [ 321.066843] aacraid: Comm Interface type2 > enabled > [ 344.845370] aacraid 0000:02:00.0: Scheduling bus rescan > [ > 358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ > > CAPACITY(16). > [ 442.109147] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109155] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109158] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109160] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109162] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109164] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109166] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109168] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109170] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109172] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109174] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109176] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109178] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109179] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109181] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109183] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109185] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109187] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109189] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109191] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109193] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109194] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109196] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109198] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109200] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109201] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109203] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109205] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.109207] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109208] > aacraid: Host adapter abort request. > aacraid: Outstanding commands on > (10,0,0,0): > [ 442.109210] aacraid: Host adapter abort request. > aacraid: > Outstanding commands on (10,0,0,0): > [ 442.137144] aacraid: Host adapter > abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.154292] > aacraid: Host bus reset request. SCSI hang ? > [ 442.154302] aacraid > 0000:02:00.0: outstanding cmd: midlevel-0 > [ 442.154305] aacraid > 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 442.154307] aacraid > 0000:02:00.0: outstanding cmd: error handler-0 > [ 442.154308] aacraid > 0000:02:00.0: outstanding cmd: firmware-32 > [ 442.154310] aacraid > 0000:02:00.0: outstanding cmd: kernel-0 > [ 442.171131] aacraid 0000:02:00.0: > Controller reset type is 3 > [ 442.171133] aacraid 0000:02:00.0: Issuing IOP > reset > [ 476.040983] aacraid 0000:02:00.0: IOP reset succeeded > [ > 476.078055] numacb=512 ignored > [ 476.078606] aacraid: Comm Interface type2 > enabled > [ 494.747632] aacraid 0000:02:00.0: Scheduling bus rescan > [ > 507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ > > CAPACITY(16). Can you do bisection between v6.3 and v6.4 please? -- You may > reply to this email to add a comment. You are receiving this mail because: > You reported the bug.
On 6/27/23 08:47, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217599 > > --- Comment #2 from pheidologeton@protonmail.com --- > I don't understand a bit, since I use a translator. I can attach dmesg as > files > from kernel 6.4 and 6.3.9 Sorry, you have to do bisection to help kernel developers fixing your regression. Please see Documentation/admin-guide/bug-bisect.rst in kernel sources for how to do it. And because you need to compile your own kernel during bisection, see Documentation/admin-guide/quickly-build-trimmed-linux.rst for compiling howto. See you in your bisection report!
Created attachment 304480 [details] signature.asc [also Cc: aacraid and SCSI subsystem maintainers] On Mon, Jun 26, 2023 at 10:36:13PM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217599 > > Bug ID: 217599 > Summary: Adaptec 71605z hangs with aacraid: Host adapter abort > request after update to linux 6.4.0 > Product: SCSI Drivers > Version: 2.5 > Hardware: All > OS: Linux > Status: NEW > Severity: high > Priority: P3 > Component: AACRAID > Assignee: scsi_drivers-aacraid@kernel-bugs.osdl.org > Reporter: pheidologeton@protonmail.com > Regression: No > > The controller works fine for a few minutes. Then it hangs for a few tens of > seconds to a few minutes, then also works normally for a while. This bug is > present in the 6.4.0 kernel release (6.3.9 works without hanging) > The messages in dmesg are as follows > > [ 287.137901] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137909] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137912] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137914] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137916] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137919] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137921] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137924] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137926] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137928] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137930] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137933] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137934] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137937] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137939] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137941] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137943] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137945] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137947] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137949] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137951] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137952] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137954] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137956] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137958] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137960] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137962] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137964] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137966] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137967] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137969] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.137971] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 287.157697] aacraid: Host bus reset request. SCSI hang ? > [ 287.157706] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 > [ 287.157708] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 287.157709] aacraid 0000:02:00.0: outstanding cmd: error handler-0 > [ 287.157711] aacraid 0000:02:00.0: outstanding cmd: firmware-32 > [ 287.157712] aacraid 0000:02:00.0: outstanding cmd: kernel-0 > [ 287.167040] aacraid 0000:02:00.0: Controller reset type is 3 > [ 287.167042] aacraid 0000:02:00.0: Issuing IOP reset > [ 321.029712] aacraid 0000:02:00.0: IOP reset succeeded > [ 321.066201] numacb=512 ignored > [ 321.066843] aacraid: Comm Interface type2 enabled > [ 344.845370] aacraid 0000:02:00.0: Scheduling bus rescan > [ 358.294342] sd 10:0:0:0: [sda] Very big device. Trying to use READ > CAPACITY(16). > [ 442.109147] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109155] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109158] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109160] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109162] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109164] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109166] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109168] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109170] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109172] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109174] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109176] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109178] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109179] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109181] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109183] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109185] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109187] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109189] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109191] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109193] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109194] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109196] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109198] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109200] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109201] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109203] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109205] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109207] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109208] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.109210] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.137144] aacraid: Host adapter abort request. > aacraid: Outstanding commands on (10,0,0,0): > [ 442.154292] aacraid: Host bus reset request. SCSI hang ? > [ 442.154302] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 > [ 442.154305] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 > [ 442.154307] aacraid 0000:02:00.0: outstanding cmd: error handler-0 > [ 442.154308] aacraid 0000:02:00.0: outstanding cmd: firmware-32 > [ 442.154310] aacraid 0000:02:00.0: outstanding cmd: kernel-0 > [ 442.171131] aacraid 0000:02:00.0: Controller reset type is 3 > [ 442.171133] aacraid 0000:02:00.0: Issuing IOP reset > [ 476.040983] aacraid 0000:02:00.0: IOP reset succeeded > [ 476.078055] numacb=512 ignored > [ 476.078606] aacraid: Comm Interface type2 enabled > [ 494.747632] aacraid 0000:02:00.0: Scheduling bus rescan > [ 507.896453] sd 10:0:0:0: [sda] Very big device. Trying to use READ > CAPACITY(16). > Thanks for automatically forwarding Bugzilla report. I'm adding it to regzbot to ensure it doesn't get fallen through cracks unnoticed: #regzbot ^introduced: v6.3..v6.4 #regzbot title: Adaptec 71605z hangs with aacraid: Host adapter abort request after update #regzbot monitor: https://bugzilla.kernel.org/show_bug.cgi?id=217599
I do not have another server with an adaptec controller, and downtime of this server is highly undesirable. If there are any 6.4.1 fixes, I will do a kexec from 6.3.9 to 6.4.1 and report back
An interesting observation. After changing the i/o scheduler to none, controller hangs started to happen much less often. At the moment kernel 6.4.1
One more observation. If you disable the controller write cache (set wt) in arcconf settings, this problem is not observed, but the random write speed drops 3-4 times.
Update. After disabling cache the error still occurs, but less often and only during operations with very large i/o, e.g. btrfs balance
Attached is the kernel log during btrfs balance. Additional information: arch linux, / on btrfs. The kernel is built from kernel.org, I don't use arch kernels. I can send the config if needed [ 3316.617309] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3319.482222] BTRFS info (device dm-0): relocating block group 40045365952512 flags data [ 3329.220422] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3330.759383] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3331.458286] BTRFS info (device dm-0): relocating block group 40044292210688 flags data [ 3344.973440] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 3347.383541] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 3349.321716] BTRFS info (device dm-0): relocating block group 40043218468864 flags data [ 3365.872341] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3368.168591] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3369.726373] BTRFS info (device dm-0): relocating block group 40042144727040 flags data [ 3382.975757] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 3385.968211] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 3386.724714] BTRFS info (device dm-0): relocating block group 40041070985216 flags data [ 3394.540433] BTRFS info (device dm-0): found 2048 extents, stage: move data extents [ 3397.185759] BTRFS info (device dm-0): found 2048 extents, stage: update data pointers [ 3399.119172] BTRFS info (device dm-0): relocating block group 40039997243392 flags data [ 3407.703926] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3408.814660] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3409.582049] BTRFS info (device dm-0): relocating block group 40038923501568 flags data [ 3419.867057] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 3422.106158] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 3422.938727] BTRFS info (device dm-0): relocating block group 40037849759744 flags data [ 3428.406170] BTRFS info (device dm-0): found 870 extents, stage: move data extents [ 3433.431310] BTRFS info (device dm-0): found 870 extents, stage: update data pointers [ 3437.507903] BTRFS info (device dm-0): relocating block group 40036776017920 flags data [ 3448.653028] BTRFS info (device dm-0): found 1960 extents, stage: move data extents [ 3455.940281] BTRFS info (device dm-0): found 1960 extents, stage: update data pointers [ 3459.627764] BTRFS info (device dm-0): relocating block group 40035702276096 flags data [ 3468.108075] BTRFS info (device dm-0): found 2059 extents, stage: move data extents [ 3469.458665] BTRFS info (device dm-0): found 2059 extents, stage: update data pointers [ 3470.245608] BTRFS info (device dm-0): relocating block group 40034628534272 flags data [ 3477.993083] BTRFS info (device dm-0): found 2056 extents, stage: move data extents [ 3479.662803] BTRFS info (device dm-0): found 2056 extents, stage: update data pointers [ 3481.908411] BTRFS info (device dm-0): relocating block group 40033554792448 flags data [ 3491.709283] BTRFS info (device dm-0): found 2066 extents, stage: move data extents [ 3493.035815] BTRFS info (device dm-0): found 2066 extents, stage: update data pointers [ 3494.446714] BTRFS info (device dm-0): relocating block group 40032481050624 flags data [ 3505.523295] BTRFS info (device dm-0): found 2062 extents, stage: move data extents [ 3508.145601] BTRFS info (device dm-0): found 2062 extents, stage: update data pointers [ 3509.167778] BTRFS info (device dm-0): relocating block group 40031407308800 flags data [ 3518.814308] BTRFS info (device dm-0): found 2063 extents, stage: move data extents [ 3520.993505] BTRFS info (device dm-0): found 2063 extents, stage: update data pointers [ 3522.503043] BTRFS info (device dm-0): relocating block group 40030333566976 flags data [ 3531.751190] BTRFS info (device dm-0): found 2064 extents, stage: move data extents [ 3533.421101] BTRFS info (device dm-0): found 2064 extents, stage: update data pointers [ 3534.289901] BTRFS info (device dm-0): relocating block group 40029259825152 flags data [ 3542.304610] BTRFS info (device dm-0): found 1901 extents, stage: move data extents [ 3543.452137] BTRFS info (device dm-0): found 1901 extents, stage: update data pointers [ 3545.360085] BTRFS info (device dm-0): relocating block group 40028186083328 flags data [ 3554.996710] BTRFS info (device dm-0): found 2056 extents, stage: move data extents [ 3558.154072] BTRFS info (device dm-0): found 2056 extents, stage: update data pointers [ 3559.682636] BTRFS info (device dm-0): relocating block group 40027112341504 flags data [ 3568.657974] BTRFS info (device dm-0): found 2064 extents, stage: move data extents [ 3571.539808] BTRFS info (device dm-0): found 2064 extents, stage: update data pointers [ 3572.893027] BTRFS info (device dm-0): relocating block group 40026038599680 flags data [ 3583.292919] BTRFS info (device dm-0): found 2061 extents, stage: move data extents [ 3586.633532] BTRFS info (device dm-0): found 2061 extents, stage: update data pointers [ 3587.914016] BTRFS info (device dm-0): relocating block group 40024964857856 flags data [ 3600.062952] BTRFS info (device dm-0): found 2055 extents, stage: move data extents [ 3602.371266] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers [ 3603.144455] BTRFS info (device dm-0): relocating block group 40023891116032 flags data [ 3613.895505] BTRFS info (device dm-0): found 2116 extents, stage: move data extents [ 3620.446434] BTRFS info (device dm-0): found 2116 extents, stage: update data pointers [ 3623.076019] BTRFS info (device dm-0): relocating block group 40022817374208 flags data [ 3631.646738] BTRFS info (device dm-0): found 2077 extents, stage: move data extents [ 3634.235518] BTRFS info (device dm-0): found 2077 extents, stage: update data pointers [ 3635.573847] BTRFS info (device dm-0): relocating block group 40021743632384 flags data [ 3646.460339] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 3649.351023] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 3650.478747] BTRFS info (device dm-0): relocating block group 40020669890560 flags data [ 3776.018632] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.270455] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.270463] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.270466] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.270468] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.270471] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.286451] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.310451] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.342450] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.362449] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.362453] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.362455] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.378449] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.378452] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.378454] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.378456] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 3784.393980] aacraid: Host bus reset request. SCSI hang ? [ 3784.393989] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 [ 3784.393991] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 [ 3784.393992] aacraid 0000:02:00.0: outstanding cmd: error handler-0 [ 3784.393994] aacraid 0000:02:00.0: outstanding cmd: firmware-16 [ 3784.393995] aacraid 0000:02:00.0: outstanding cmd: kernel-0 [ 3784.406273] aacraid 0000:02:00.0: Controller reset type is 3 [ 3784.406275] aacraid 0000:02:00.0: Issuing IOP reset [ 3818.052903] aacraid 0000:02:00.0: IOP reset succeeded [ 3818.089560] numacb=512 ignored [ 3818.090077] aacraid: Comm Interface type2 enabled [ 3831.327204] aacraid 0000:02:00.0: Scheduling bus rescan [ 3844.356263] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 3853.956497] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3864.355224] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3865.992752] BTRFS info (device dm-0): relocating block group 40019596148736 flags data [ 3874.735672] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3878.298428] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3880.455850] BTRFS info (device dm-0): relocating block group 40018522406912 flags data [ 3891.125829] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3893.431071] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3894.271993] BTRFS info (device dm-0): relocating block group 40017448665088 flags data [ 3903.344106] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 3905.644193] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 3906.914578] BTRFS info (device dm-0): relocating block group 40016374923264 flags data [ 3916.797696] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3921.481208] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3924.388232] BTRFS info (device dm-0): relocating block group 40015301181440 flags data [ 3934.062280] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 3936.989412] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 3938.757170] BTRFS info (device dm-0): relocating block group 40014227439616 flags data [ 3947.023461] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 3948.628886] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 3949.501986] BTRFS info (device dm-0): relocating block group 40013153697792 flags data [ 3959.582364] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 3962.158521] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 3962.849216] BTRFS info (device dm-0): relocating block group 40012079955968 flags data [ 3971.717710] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 3972.936642] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 3975.062969] BTRFS info (device dm-0): relocating block group 40011006214144 flags data [ 4101.963623] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963630] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963632] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963635] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963637] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963639] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963641] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963643] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4101.963646] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4102.055618] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4102.055621] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4102.115617] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4102.211615] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4134.726992] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4134.727000] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4144.966793] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4144.978357] aacraid: Host bus reset request. SCSI hang ? [ 4144.978373] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 [ 4144.978375] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 [ 4144.978377] aacraid 0000:02:00.0: outstanding cmd: error handler-0 [ 4144.978378] aacraid 0000:02:00.0: outstanding cmd: firmware-16 [ 4144.978380] aacraid 0000:02:00.0: outstanding cmd: kernel-0 [ 4144.994613] aacraid 0000:02:00.0: Controller reset type is 3 [ 4144.994615] aacraid 0000:02:00.0: Issuing IOP reset [ 4178.492092] aacraid 0000:02:00.0: IOP reset succeeded [ 4178.517963] numacb=512 ignored [ 4178.518491] aacraid: Comm Interface type2 enabled [ 4191.766331] aacraid 0000:02:00.0: Scheduling bus rescan [ 4204.785434] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 4206.168041] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 4209.562446] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 4210.236071] BTRFS info (device dm-0): relocating block group 40009932472320 flags data [ 4220.376248] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4222.252717] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4223.045918] BTRFS info (device dm-0): relocating block group 40008858730496 flags data [ 4231.408480] BTRFS info (device dm-0): found 2056 extents, stage: move data extents [ 4233.376778] BTRFS info (device dm-0): found 2056 extents, stage: update data pointers [ 4236.074206] BTRFS info (device dm-0): relocating block group 40007784988672 flags data [ 4247.118270] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4249.734482] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4251.493058] BTRFS info (device dm-0): relocating block group 40006711246848 flags data [ 4259.821443] BTRFS info (device dm-0): found 2050 extents, stage: move data extents [ 4260.805451] BTRFS info (device dm-0): found 2050 extents, stage: update data pointers [ 4261.456743] BTRFS info (device dm-0): relocating block group 40005637505024 flags data [ 4271.616724] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4274.831768] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4275.695407] BTRFS info (device dm-0): relocating block group 40004563763200 flags data [ 4285.371736] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 4287.650953] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 4288.171152] BTRFS info (device dm-0): relocating block group 40003490021376 flags data [ 4296.368957] BTRFS info (device dm-0): found 2058 extents, stage: move data extents [ 4303.096606] BTRFS info (device dm-0): found 2058 extents, stage: update data pointers [ 4308.672345] BTRFS info (device dm-0): relocating block group 40002416279552 flags data [ 4317.848496] BTRFS info (device dm-0): found 2055 extents, stage: move data extents [ 4324.380461] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers [ 4329.313954] BTRFS info (device dm-0): relocating block group 40001342537728 flags data [ 4340.352759] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 4346.055272] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 4350.341563] BTRFS info (device dm-0): relocating block group 40000268795904 flags data [ 4360.008253] BTRFS info (device dm-0): found 2073 extents, stage: move data extents [ 4363.968894] BTRFS info (device dm-0): found 2073 extents, stage: update data pointers [ 4366.667086] BTRFS info (device dm-0): relocating block group 39999195054080 flags data [ 4376.714089] BTRFS info (device dm-0): found 2065 extents, stage: move data extents [ 4381.321506] BTRFS info (device dm-0): found 2065 extents, stage: update data pointers [ 4383.772033] BTRFS info (device dm-0): relocating block group 39998121312256 flags data [ 4392.346068] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4394.681397] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4401.709513] BTRFS info (device dm-0): relocating block group 39997047570432 flags data [ 4415.742970] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4418.352930] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4419.406295] BTRFS info (device dm-0): relocating block group 39995973828608 flags data [ 4427.092182] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4428.429700] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4429.549291] BTRFS info (device dm-0): relocating block group 39994900086784 flags data [ 4440.330092] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4443.546015] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4444.589392] BTRFS info (device dm-0): relocating block group 39993826344960 flags data [ 4452.441851] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4454.065084] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4455.766007] BTRFS info (device dm-0): relocating block group 39992752603136 flags data [ 4466.379938] BTRFS info (device dm-0): found 2055 extents, stage: move data extents [ 4469.010725] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers [ 4471.997477] BTRFS info (device dm-0): relocating block group 39991678861312 flags data [ 4483.094733] BTRFS info (device dm-0): found 2055 extents, stage: move data extents [ 4485.152335] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers [ 4486.408387] BTRFS info (device dm-0): relocating block group 39990605119488 flags data [ 4495.974786] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4498.534411] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4499.563931] BTRFS info (device dm-0): relocating block group 39989531377664 flags data [ 4517.353372] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4524.608562] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4527.224028] BTRFS info (device dm-0): relocating block group 39988457635840 flags data [ 4536.621647] BTRFS info (device dm-0): found 2055 extents, stage: move data extents [ 4538.678644] BTRFS info (device dm-0): found 2055 extents, stage: update data pointers [ 4540.145613] BTRFS info (device dm-0): relocating block group 39987383894016 flags data [ 4550.384635] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4552.970776] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4554.435255] BTRFS info (device dm-0): relocating block group 39986310152192 flags data [ 4563.872637] BTRFS info (device dm-0): found 2051 extents, stage: move data extents [ 4566.764789] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers [ 4568.159495] BTRFS info (device dm-0): relocating block group 39985236410368 flags data [ 4579.681116] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 4583.466767] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 4586.247517] BTRFS info (device dm-0): relocating block group 39984162668544 flags data [ 4596.054908] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4597.544655] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4598.583901] BTRFS info (device dm-0): relocating block group 39983088926720 flags data [ 4605.907058] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4607.398318] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4608.508892] BTRFS info (device dm-0): relocating block group 39982015184896 flags data [ 4619.110922] BTRFS info (device dm-0): found 2051 extents, stage: move data extents [ 4621.813152] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers [ 4622.825766] BTRFS info (device dm-0): relocating block group 39980941443072 flags data [ 4630.319726] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4631.692092] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4632.754884] BTRFS info (device dm-0): relocating block group 39979867701248 flags data [ 4645.459829] BTRFS info (device dm-0): found 2052 extents, stage: move data extents [ 4649.317496] BTRFS info (device dm-0): found 2052 extents, stage: update data pointers [ 4650.643743] BTRFS info (device dm-0): relocating block group 39978793959424 flags data [ 4658.358713] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4659.663771] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4661.089138] BTRFS info (device dm-0): relocating block group 39977720217600 flags data [ 4669.235843] BTRFS info (device dm-0): found 2051 extents, stage: move data extents [ 4670.530009] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers [ 4671.579253] BTRFS info (device dm-0): relocating block group 39976646475776 flags data [ 4681.371892] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4684.417891] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4686.145066] BTRFS info (device dm-0): relocating block group 39975572733952 flags data [ 4696.044819] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4699.884181] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4702.201894] BTRFS info (device dm-0): relocating block group 39974498992128 flags data [ 4711.594449] BTRFS info (device dm-0): found 2054 extents, stage: move data extents [ 4713.643472] BTRFS info (device dm-0): found 2054 extents, stage: update data pointers [ 4714.512799] BTRFS info (device dm-0): relocating block group 39973425250304 flags data [ 4722.817900] BTRFS info (device dm-0): found 2057 extents, stage: move data extents [ 4724.101966] BTRFS info (device dm-0): found 2057 extents, stage: update data pointers [ 4725.202619] BTRFS info (device dm-0): relocating block group 39972351508480 flags data [ 4734.518748] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 4736.529322] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 4737.507277] BTRFS info (device dm-0): relocating block group 39971277766656 flags data [ 4747.890650] BTRFS info (device dm-0): found 2051 extents, stage: move data extents [ 4750.395047] BTRFS info (device dm-0): found 2051 extents, stage: update data pointers [ 4751.356231] BTRFS info (device dm-0): relocating block group 39970204024832 flags data [ 4881.470732] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470740] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470743] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470745] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470747] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470750] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470752] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470754] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470757] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470758] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470761] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470763] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470764] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470766] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470769] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.470771] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 4881.494268] aacraid: Host bus reset request. SCSI hang ? [ 4881.494276] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 [ 4881.494278] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 [ 4881.494280] aacraid 0000:02:00.0: outstanding cmd: error handler-0 [ 4881.494282] aacraid 0000:02:00.0: outstanding cmd: firmware-16 [ 4881.494283] aacraid 0000:02:00.0: outstanding cmd: kernel-0 [ 4881.510554] aacraid 0000:02:00.0: Controller reset type is 3 [ 4881.510556] aacraid 0000:02:00.0: Issuing IOP reset [ 4915.135236] aacraid 0000:02:00.0: IOP reset succeeded [ 4915.173754] numacb=512 ignored [ 4915.174291] aacraid: Comm Interface type2 enabled [ 4928.760752] aacraid 0000:02:00.0: Scheduling bus rescan [ 4939.400353] BTRFS info (device dm-0): found 2063 extents, stage: move data extents [ 4941.769902] sd 10:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 4943.044681] BTRFS info (device dm-0): found 2063 extents, stage: update data pointers [ 4945.687414] BTRFS info (device dm-0): relocating block group 39969130283008 flags data [ 4954.864202] BTRFS info (device dm-0): found 2072 extents, stage: move data extents [ 4957.822957] BTRFS info (device dm-0): found 2072 extents, stage: update data pointers [ 4959.299452] BTRFS info (device dm-0): relocating block group 39968056541184 flags data [ 4968.339907] BTRFS info (device dm-0): found 2073 extents, stage: move data extents [ 4973.463936] BTRFS info (device dm-0): found 2073 extents, stage: update data pointers [ 4977.130345] BTRFS info (device dm-0): relocating block group 39966982799360 flags data [ 4989.513486] BTRFS info (device dm-0): found 2050 extents, stage: move data extents [ 5001.120554] BTRFS info (device dm-0): found 2050 extents, stage: update data pointers [ 5010.087463] BTRFS info (device dm-0): relocating block group 39965909057536 flags data [ 5022.123373] BTRFS info (device dm-0): found 2077 extents, stage: move data extents [ 5066.139984] BTRFS info (device dm-0): found 2077 extents, stage: update data pointers [ 5094.950339] BTRFS info (device dm-0): relocating block group 39964835315712 flags data [ 5107.270236] BTRFS info (device dm-0): found 2065 extents, stage: move data extents [ 5140.991508] BTRFS info (device dm-0): found 2065 extents, stage: update data pointers [ 5164.250651] BTRFS info (device dm-0): relocating block group 39963761573888 flags data [ 5177.484766] BTRFS info (device dm-0): found 2053 extents, stage: move data extents [ 5201.668181] BTRFS info (device dm-0): found 2053 extents, stage: update data pointers [ 5227.474665] BTRFS info (device dm-0): relocating block group 39962687832064 flags data [ 5243.028304] BTRFS info (device dm-0): found 2045 extents, stage: move data extents [ 5260.284955] BTRFS info (device dm-0): found 2045 extents, stage: update data pointers [ 5275.825851] BTRFS info (device dm-0): relocating block group 39961614090240 flags data [ 5289.419014] BTRFS info (device dm-0): found 2069 extents, stage: move data extents [ 5305.187664] BTRFS info (device dm-0): found 2069 extents, stage: update data pointers [ 5318.520602] BTRFS info (device dm-0): relocating block group 39960540348416 flags data [ 5330.435004] BTRFS info (device dm-0): found 2058 extents, stage: move data extents [ 5338.603566] BTRFS info (device dm-0): found 2058 extents, stage: update data pointers [ 5343.611832] BTRFS info (device dm-0): balance: canceled [ 5439.752346] BTRFS info (device dm-0): balance: start -dusage=90 [ 5439.755515] BTRFS info (device dm-0): relocating block group 40088315625472 flags data [ 5565.485701] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485708] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485710] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485712] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485714] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485716] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485719] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485721] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485723] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485725] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485727] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485729] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485731] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485732] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485734] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.485736] aacraid: Host adapter abort request. aacraid: Outstanding commands on (10,0,0,0): [ 5565.504649] aacraid: Host bus reset request. SCSI hang ? [ 5565.504658] aacraid 0000:02:00.0: outstanding cmd: midlevel-0 [ 5565.504661] aacraid 0000:02:00.0: outstanding cmd: lowlevel-0 [ 5565.504662] aacraid 0000:02:00.0: outstanding cmd: error handler-0 [ 5565.504664] aacraid 0000:02:00.0: outstanding cmd: firmware-16 [ 5565.504665] aacraid 0000:02:00.0: outstanding cmd: kernel-0 [ 5565.521936] aacraid 0000:02:00.0: Controller reset type is 3 [ 5565.521937] aacraid 0000:02:00.0: Issuing IOP reset [ 5584.653112] INFO: task btrfs:41548 blocked for more than 120 seconds.
I have the exact same issue after upgrading to Kernel 6.4.7 using an Adaptec Adaptec ASR71605. Using previous kernel 6.0.0 the problem goes away. The controller hangs approximately every 5 minutes for a period of about 2 minutes . 2023-07-28T07:11:44.380218+10:00 linux kernel: [ 1906.291075] aacraid: Host bus reset request. SCSI hang ? 2023-07-28T07:11:44.380224+10:00 linux kernel: [ 1906.291084] aacraid 0000:04:00.0: outstanding cmd: midlevel-0 2023-07-28T07:11:44.380225+10:00 linux kernel: [ 1906.291086] aacraid 0000:04:00.0: outstanding cmd: lowlevel-0 2023-07-28T07:11:44.380226+10:00 linux kernel: [ 1906.291087] aacraid 0000:04:00.0: outstanding cmd: error handler-0 2023-07-28T07:11:44.380226+10:00 linux kernel: [ 1906.291088] aacraid 0000:04:00.0: outstanding cmd: firmware-32 2023-07-28T07:11:44.380227+10:00 linux kernel: [ 1906.291089] aacraid 0000:04:00.0: outstanding cmd: kernel-0 2023-07-28T07:11:44.400215+10:00 linux kernel: [ 1906.311066] aacraid 0000:04:00.0: Controller reset type is 3 2023-07-28T07:11:44.400221+10:00 linux kernel: [ 1906.311071] aacraid 0000:04:00.0: Issuing IOP reset 2023-07-28T07:12:29.044219+10:00 linux kernel: [ 1950.957989] aacraid 0000:04:00.0: IOP reset succeeded 2023-07-28T07:12:29.108222+10:00 linux kernel: [ 1951.018606] aacraid: Comm Interface type2 enabled 2023-07-28T07:12:38.144232+10:00 linux kernel: [ 1960.056334] aacraid 0000:04:00.0: Scheduling bus rescan 2023-07-28T07:12:48.821198+10:00 linux kernel: [ 1970.734618] sd 0:1:8:0: [sdi] tag#312 timing out command, waited 120s 2023-07-28T07:15:18.872254+10:00 linux kernel: [ 2120.779775] md: md126: reshape interrupted. 2023-07-28T07:15:52.172242+10:00 linux kernel: [ 2154.074084] aacraid: Host adapter abort request. 2023-07-28T07:15:52.172257+10:00 linux kernel: [ 2154.074084] aacraid: Outstanding commands on (0,1,12,0): 2023-07-28T07:15:52.172259+10:00 linux kernel: [ 2154.074109] aacraid: Host adapter abort request. 2023-07-28T07:15:52.172260+10:00 linux kernel: [ 2154.074109] aacraid: Outstanding commands on (0,1,3,0): 2023-07-28T07:15:52.172261+10:00 linux kernel: [ 2154.074119] aacraid: Host adapter abort request. 2023-07-28T07:15:52.172262+10:00 linux kernel: [ 2154.074119] aacraid: Outstanding commands on (0,1,2,0): 2023-07-28T07:15:52.292291+10:00 linux kernel: [ 2154.196250] sd 0:1:3:0: [sdd] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB) 2023-07-28T07:15:52.292302+10:00 linux kernel: [ 2154.196254] sd 0:1:3:0: [sdd] 4096-byte physical blocks 2023-07-28T07:15:56.008234+10:00 linux kernel: [ 2157.909736] aacraid: Host adapter abort request. 2023-07-28T07:15:56.008250+10:00 linux kernel: [ 2157.909736] aacraid: Outstanding commands on (0,1,10,0): 2023-07-28T07:15:56.016229+10:00 linux kernel: [ 2157.917723] aacraid: Host adapter abort request. 2023-07-28T07:15:56.016238+10:00 linux kernel: [ 2157.917723] aacraid: Outstanding commands on (0,1,8,0): 2023-07-28T07:15:56.276248+10:00 linux kernel: [ 2158.178018] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276264+10:00 linux kernel: [ 2158.178018] aacraid: Outstanding commands on (0,1,13,0): 2023-07-28T07:15:56.276266+10:00 linux kernel: [ 2158.178029] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276267+10:00 linux kernel: [ 2158.178029] aacraid: Outstanding commands on (0,1,2,0): 2023-07-28T07:15:56.276268+10:00 linux kernel: [ 2158.178033] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276268+10:00 linux kernel: [ 2158.178033] aacraid: Outstanding commands on (0,1,12,0): 2023-07-28T07:15:56.276269+10:00 linux kernel: [ 2158.178037] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276270+10:00 linux kernel: [ 2158.178037] aacraid: Outstanding commands on (0,1,4,0): 2023-07-28T07:15:56.276271+10:00 linux kernel: [ 2158.178041] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276272+10:00 linux kernel: [ 2158.178041] aacraid: Outstanding commands on (0,1,5,0): 2023-07-28T07:15:56.276273+10:00 linux kernel: [ 2158.178045] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276273+10:00 linux kernel: [ 2158.178045] aacraid: Outstanding commands on (0,1,5,0): 2023-07-28T07:15:56.276274+10:00 linux kernel: [ 2158.178071] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276275+10:00 linux kernel: [ 2158.178071] aacraid: Outstanding commands on (0,1,5,0): 2023-07-28T07:15:56.276275+10:00 linux kernel: [ 2158.178074] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276276+10:00 linux kernel: [ 2158.178074] aacraid: Outstanding commands on (0,1,0,0): 2023-07-28T07:15:56.276277+10:00 linux kernel: [ 2158.181733] aacraid: Host adapter abort request. 2023-07-28T07:15:56.276278+10:00 linux kernel: [ 2158.181733] aacraid: Outstanding commands on (0,1,14,0): 2023-07-28T07:15:58.588241+10:00 linux kernel: [ 2160.489557] aacraid: Host adapter abort request. 2023-07-28T07:15:58.588266+10:00 linux kernel: [ 2160.489557] aacraid: Outstanding commands on (0,1,4,0): 2023-07-28T07:16:00.876228+10:00 linux kernel: [ 2162.777426] aacraid: Host adapter abort request. 2023-07-28T07:16:00.876246+10:00 linux kernel: [ 2162.777426] aacraid: Outstanding commands on (0,1,0,0): 2023-07-28T07:16:06.512253+10:00 linux kernel: [ 2168.413075] aacraid: Host adapter abort request. 2023-07-28T07:16:06.512273+10:00 linux kernel: [ 2168.413075] aacraid: Outstanding commands on (0,1,15,0): 2023-07-28T07:16:07.032219+10:00 linux kernel: [ 2168.933027] aacraid: Host adapter abort request. 2023-07-28T07:16:07.032232+10:00 linux kernel: [ 2168.933027] aacraid: Outstanding commands on (0,1,14,0): 2023-07-28T07:16:10.872240+10:00 linux kernel: [ 2172.772793] aacraid: Host adapter abort request. 2023-07-28T07:16:10.872256+10:00 linux kernel: [ 2172.772793] aacraid: Outstanding commands on (0,1,10,0): 2023-07-28T07:16:14.700245+10:00 linux kernel: [ 2176.600526] aacraid: Host adapter abort request. 2023-07-28T07:16:14.700259+10:00 linux kernel: [ 2176.600526] aacraid: Outstanding commands on (0,1,8,0): 2023-07-28T07:16:14.700262+10:00 linux kernel: [ 2176.604511] aacraid: Host adapter abort request. 2023-07-28T07:16:14.700263+10:00 linux kernel: [ 2176.604511] aacraid: Outstanding commands on (0,1,9,0): 2023-07-28T07:16:18.804238+10:00 linux kernel: [ 2180.704264] aacraid: Host adapter abort request. 2023-07-28T07:16:18.804258+10:00 linux kernel: [ 2180.704264] aacraid: Outstanding commands on (0,1,13,0): 2023-07-28T07:16:22.892239+10:00 linux kernel: [ 2184.791994] aacraid: Host adapter abort request. 2023-07-28T07:16:22.892253+10:00 linux kernel: [ 2184.791994] aacraid: Outstanding commands on (0,1,1,0): 2023-07-28T07:16:22.892256+10:00 linux kernel: [ 2184.792047] aacraid: Host bus reset request. SCSI hang ? 2023-07-28T07:16:22.892257+10:00 linux kernel: [ 2184.792056] aacraid 0000:04:00.0: outstanding cmd: midlevel-0 2023-07-28T07:16:22.892258+10:00 linux kernel: [ 2184.792059] aacraid 0000:04:00.0: outstanding cmd: lowlevel-0 2023-07-28T07:16:22.892260+10:00 linux kernel: [ 2184.792060] aacraid 0000:04:00.0: outstanding cmd: error handler-10 2023-07-28T07:16:22.892261+10:00 linux kernel: [ 2184.792061] aacraid 0000:04:00.0: outstanding cmd: firmware-0 2023-07-28T07:16:22.892262+10:00 linux kernel: [ 2184.792062] aacraid 0000:04:00.0: outstanding cmd: kernel-0 2023-07-28T07:16:22.924446+10:00 linux kernel: [ 2184.824253] aacraid 0000:04:00.0: Controller reset type is 3 2023-07-28T07:16:22.924453+10:00 linux kernel: [ 2184.824257] aacraid 0000:04:00.0: Issuing IOP reset 2023-07-28T07:17:07.708235+10:00 linux kernel: [ 2229.607605] aacraid 0000:04:00.0: IOP reset succeeded 2023-07-28T07:17:07.768229+10:00 linux kernel: [ 2229.665769] aacraid: Comm Interface type2 enabled 2023-07-28T07:17:16.808238+10:00 linux kernel: [ 2238.704668] aacraid 0000:04:00.0: Scheduling bus rescan
Most likely the problem is with btrfs. When using zfs in the same luks2 container (sectorsize is 4k, stripesize on controller is 128k as cryptsetup does not support sector size more than 4k now) with -o ashift=16 no controller hangs are observed at any i/o, no resets under stress test for 4 hours already. kernel 6.4.7
After 6 hours of stress tests with lots of small files, the problem repeated itself. It does not occur with sequential i/o. Repeated the test with btrfs, no problem on the newly created filesystem, but it repeats with random i/o. I think it is caused by this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=48dc810012a6b4f4ba94073d6b7edb4f76edeb72.
I'm experiencing the same problem with the Adaptec ASR81605Z on 6.4.x kernels. Reverting the following commit resolves the issue: https://github.com/torvalds/linux/commit/9dc704dcc09eae7d21b5da0615eb2ed79278f63e
Sometimes this might be a bug in controller's firmware, can you check that you use the latest possible version of the aacraid controllers you all use?
I have the most recent firmware version: # arcconf getconfig 1 AD | grep 'Model' Controller Model : Adaptec ASR81605Z # arcconf getversion 1 Controllers found: 1 Controller #1 ============== Firmware : 7.18-0 (33556) Staged Firmware : 7.18-0 (33556) BIOS : 7.18-0 (33556) Driver : 1.2-1 (50983) Boot Flash : 7.18-0 (33556) CPLD (Load version/ Flash version) : 5/ 12 SEEPROM (Load version/ Flash version) : 1/ 1 #regzbot ^introduced 9dc704dcc09eae7d21b5da0615eb2ed79278f63e
For the problems reported on Series-7 controllers : At Microchip, we tried to duplicate this issue on 6.4.9 kernel with a 71605 and 7805 controllers with the latest FW from adaptec.com (Version 32118) and we do not see the issue. Could you please mention what FW version is being used at your configuration? The exact server model and the config details would also help us. Also, could you please try with the latest FW from the website and confirm if you continue to see this issue? You can pick the latest FW version for the controller model can be downloaded at https://storage.microsemi.com/en-us/support/series7/index.php We look forward to hear your results. Thanks Sagar
(In reply to Maokaman from comment #15) > I have the most recent firmware version: > > # arcconf getconfig 1 AD | grep 'Model' > Controller Model : Adaptec ASR81605Z > > # arcconf getversion 1 > Controllers found: 1 > Controller #1 > ============== > Firmware : 7.18-0 (33556) > Staged Firmware : 7.18-0 (33556) > BIOS : 7.18-0 (33556) > Driver : 1.2-1 (50983) > Boot Flash : 7.18-0 (33556) > CPLD (Load version/ Flash version) : 5/ 12 > SEEPROM (Load version/ Flash version) : 1/ 1 > > > #regzbot ^introduced 9dc704dcc09eae7d21b5da0615eb2ed79278f63e Hi Maokaman, Could you please provide additional details on which specific kernel you are seeing this issue on and the details of the server would also help us? We tried with 6.4.9 kernel on a 81605 controller and we do not see this issue on our setup. We are trying to understand the environment Thanks
(In reply to Sagar from comment #17) > Hi Maokaman, > Could you please provide additional details on which specific kernel you are > seeing this issue on and the details of the server would also help us? > > We tried with 6.4.9 kernel on a 81605 controller and we do not see this > issue on our setup. > We are trying to understand the environment > > Thanks Hi Sagar, I've tested multiple 6.4.x kernels and the last one was 6.4.7. Distro: Arch Linux Kernel build script: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/ed40dc54e86cec6758ee43684f0fd37d78c5ba53/PKGBUILD Kernel config: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/ed40dc54e86cec6758ee43684f0fd37d78c5ba53/config # cat /proc/cmdline initrd=\intel-ucode.img initrd=\initramfs-linux.img root=PARTUUID="5029e4ab-f734-4729-97b2-99eb53de8b0a" rw intel_idle.max_cstate=1 idle=halt transparent_hugepage=never mitigations=off audit=0 selinux=0 nmi_watchdog=0 nosoftlockup=0
Hardware: # dmidecode | grep -A 3 'Base Board Information' Base Board Information Manufacturer: Supermicro Product Name: X11DPi-N Version: 2.00 # lscpu | egrep '^Model name:|Socket' Model name: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz Socket(s): 2
Hello! I can confirm this behavior with 6.1.53-gentoo-r1 kernel. My previous kernel 6.1.46-gentoo working ok. Patch mentioned in comment #13 has been applied in 6.1.53-gentoo release. Controller Adaptec ASR71605E with 7.5-0 (32118) firmware (latest). I will try to revert this patch and provide results.
Sagar Biradar, what's the status here? Is there a patch in sight? Or would it be best to revert 9dc704dcc09e for now?
Created attachment 305290 [details] 0001-aacraid-submit-internal-commands-on-vector-0.patch aacraid: submit internal commands on vector 0
Can you try with the above patch?
(In reply to Hannes Reinecke from comment #23) > Can you try with the above patch? The patch does not fix the issue. === Oct 25 21:41:24 server-name kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): Oct 25 21:41:24 server-name kernel: aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): Oct 25 21:41:24 server-name kernel: aacraid: Host bus reset request. SCSI hang ? Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: midlevel-0 Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: lowlevel-0 Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: error handler-0 Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: firmware-32 Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: outstanding cmd: kernel-0 Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: Controller reset type is 3 Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: Issuing IOP reset Oct 25 21:41:24 server-name kernel: INFO: task worker:6701 blocked for more than 122 seconds. Oct 25 21:41:24 server-name kernel: Not tainted 6.4.7-arch1-61 #1 Oct 25 21:41:24 server-name kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 25 21:41:24 server-name kernel: task:worker state:D stack:0 pid:6701 ppid:1 flags:0x00004002 Oct 25 21:41:24 server-name kernel: Call Trace: Oct 25 21:41:24 server-name kernel: <TASK> Oct 25 21:41:24 server-name kernel: __schedule+0x3e8/0x13f0 Oct 25 21:41:24 server-name kernel: schedule+0x5e/0xd0 Oct 25 21:41:24 server-name kernel: io_schedule+0x46/0x70 Oct 25 21:41:24 server-name kernel: folio_wait_bit_common+0x13d/0x350 Oct 25 21:41:24 server-name kernel: ? __pfx_wake_page_function+0x10/0x10 Oct 25 21:41:24 server-name kernel: folio_wait_writeback+0x2c/0x90 Oct 25 21:41:24 server-name kernel: __filemap_fdatawait_range+0x80/0xe0 Oct 25 21:41:24 server-name kernel: file_write_and_wait_range+0x8b/0xb0 Oct 25 21:41:24 server-name kernel: xfs_file_fsync+0x5e/0x2a0 [xfs 2be3d2e4a125ddff8482931cb8f078f6393b16a6] Oct 25 21:41:24 server-name kernel: __x64_sys_fdatasync+0x4c/0x90 Oct 25 21:41:24 server-name kernel: do_syscall_64+0x5c/0x90 Oct 25 21:41:24 server-name kernel: ? syscall_exit_to_user_mode+0x1b/0x40 Oct 25 21:41:24 server-name kernel: ? do_syscall_64+0x6b/0x90 Oct 25 21:41:24 server-name kernel: ? exit_to_user_mode_prepare+0x132/0x1e0 Oct 25 21:41:24 server-name kernel: ? syscall_exit_to_user_mode+0x1b/0x40 Oct 25 21:41:24 server-name kernel: ? do_syscall_64+0x6b/0x90 Oct 25 21:41:24 server-name kernel: ? syscall_exit_to_user_mode+0x1b/0x40 Oct 25 21:41:24 server-name kernel: ? do_syscall_64+0x6b/0x90 Oct 25 21:41:24 server-name kernel: ? syscall_exit_to_user_mode+0x1b/0x40 Oct 25 21:41:24 server-name kernel: ? do_syscall_64+0x6b/0x90 Oct 25 21:41:24 server-name kernel: ? do_syscall_64+0x6b/0x90 Oct 25 21:41:24 server-name kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc Oct 25 21:41:24 server-name kernel: RIP: 0033:0x7f9ca1d087aa Oct 25 21:41:24 server-name kernel: RSP: 002b:00007f93237fd6c0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b Oct 25 21:41:24 server-name kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9ca1d087aa Oct 25 21:41:24 server-name kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000f Oct 25 21:41:24 server-name kernel: RBP: 0000560d212c66a0 R08: 0000000000000000 R09: 0000000000000000 Oct 25 21:41:24 server-name kernel: R10: 00007f93237fd6e0 R11: 0000000000000293 R12: 0000560d1f112e80 Oct 25 21:41:24 server-name kernel: R13: 0000560d2107b6c8 R14: 0000560d212cd9f0 R15: 00007f9322ffe000 Oct 25 21:41:24 server-name kernel: </TASK> Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: IOP reset succeeded Oct 25 21:41:24 server-name kernel: aacraid: Comm Interface type2 enabled Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: Scheduling bus rescan Oct 25 21:41:24 server-name kernel: aacraid 0000:af:00.0: DDR cache data recovered successfully Oct 25 21:41:25 server-name kernel: sd 0:0:0:0: [sdb] Very big device. Trying to use READ CAPACITY(16). Oct 25 21:41:25 server-name kernel: sd 0:0:1:0: [sda] Very big device. Trying to use READ CAPACITY(16).
Created attachment 305397 [details] Patch for aacraid FC38 6.3.12-200 vs 6.4.4-200 I have the same issue on two servers running Fedora 38 with different Adaptec coontrollers. === Server 1: # arcconf getconfig 1 AD | grep 'Model' Controller Model : Adaptec ASR72405 # arcconf getversion 1 Controllers found: 1 Controller #1 ============== Firmware : 7.5-0 (32118) Staged Firmware : 7.5-0 (32118) BIOS : 7.5-0 (32118) Driver : 1.2-1 (50983) Boot Flash : 7.5-0 (32118) CPLD (Load version/ Flash version) : 8/ 10 SEEPROM (Load version/ Flash version) : 1/ 1 === Server 2: # arcconf getconfig 1 AD | grep 'Model' Controller Model : Adaptec ASR71685 # arcconf getversion 1 Controllers found: 1 Controller #1 ============== Firmware : 7.5-0 (32118) Staged Firmware : 7.5-0 (32118) BIOS : 7.5-0 (32118) Driver : 1.2-1 (50983) Boot Flash : 7.5-0 (32118) CPLD (Load version/ Flash version) : 7/ 10 SEEPROM (Load version/ Flash version) : 0/ 1 Both controllers have latest firmware. Last known working kernel: 6.3.12-200<br> First known non-working kernel: 6.4.4-200 Patch from Comment #22 did not work for me, still getting errors. Submitted patch was done between two kernels above on 'drivers/scsi/aacraid'. Applied and working on the folling Fedora kernels: 6.5.9-200.fc38 6.5.10-200.fc38 6.5.10-300.fc39 6.5.11-300.fc39
We have noticed on our Server using an Adaptec ASR8805 RAID controller running Debian 12 i.e. Bookworm Kernel 6.1.55 That we get 100% wait states that causes the system to hang. top - 12:57:32 up 7 min, 2 users, load average: 5.02, 1.71, 0.65 Tasks: 451 total, 2 running, 449 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 81.8 id, 18.2 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu9 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu10 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu16 : 0.0 us,100.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu22 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu23 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu24 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu25 : 0.0 us, 0.0 sy, 0.0 ni, 0.0 id,100.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu26 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu27 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu28 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu29 : 0.0 us, 0.0 sy, 0.0 ni, 0.0 id,100.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu30 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu31 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu32 : 0.0 us, 0.0 sy, 0.0 ni, 0.0 id,100.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu33 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu34 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu35 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu36 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu37 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu38 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu39 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 257590.5 total, 242751.4 free, 10355.7 used, 6092.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 247234.8 avail Mem When it's running with a < 6.1.53 Kernel we never see 100% wait states, certainly not staining for a long time. We also saw repeatedly: [ 1376.837737] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.841731] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.842412] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.843004] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.843587] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.844169] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.844747] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.845322] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.845906] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.846484] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.847055] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.847628] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.848199] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.848767] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.849336] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.849995] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1376.850560] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.789765] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.889767] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.890899] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.892002] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.893103] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.897790] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.898918] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.900009] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.901094] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.902199] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.903287] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.904384] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.905472] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.906585] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.907678] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,0,0): [ 1378.945954] aacraid: Host bus reset request. SCSI hang ? [ 1378.946602] aacraid 0000:af:00.0: outstanding cmd: midlevel-0 [ 1378.946607] aacraid 0000:af:00.0: outstanding cmd: lowlevel-0 [ 1378.946610] aacraid 0000:af:00.0: outstanding cmd: error handler-0 [ 1378.946613] aacraid 0000:af:00.0: outstanding cmd: firmware-32 [ 1378.946616] aacraid 0000:af:00.0: outstanding cmd: kernel-0 [ 1378.961850] aacraid 0000:af:00.0: Controller reset type is 3 [ 1378.962435] aacraid 0000:af:00.0: Issuing IOP reset [ 1412.498211] aacraid 0000:af:00.0: IOP reset succeeded [ 1412.523256] aacraid: Comm Interface type2 enabled [ 1424.734176] aacraid 0000:af:00.0: Scheduling bus rescan [ 1434.755589] aacraid 0000:af:00.0: DDR cache data recovered successfully On another server that has an Adaptec ASR8405 raid controller running exactly the same Distribution and kernel we don't see this issue at all. The only major difference is that the system that has the problem has two sockets i.e. CPUs. This one also has SSD drives, but I don't think this could be an issue? We have found out that this issue exists since Kernel 6.1.53. We found that Kernel 6.1.53 incorporated this patch: scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity https://www.spinics.net/lists/stable-commits/msg313381.html I think that this ticket is related to this issue. https://bugzilla.kernel.org/show_bug.cgi?id=217599 and this email/link https://lore.kernel.org/regressions/4a639fff-445e-455b-9a31-57368d6b7021@leemhuis.info/ We have tested Kernel 6.1.55 like the one in Debian Bookworm with the above-mentioned patch reverted. It worked flawlessly. Might it be related to multiple CPU sockets i.e. CPUs. As we don't have an issue on a single Socket system. Both systems have an Intel Xeon CPU(s).
I have the same issue with openSUSE and Fedora new kernels. When I installed Leap 15.5 with 5.14 kernel or COPR for Fedora with kernel 5.15 all works fine. First time I got the issue with kernel 5.19 on Fedora and ASR-72405 controller with connected NetApp DS. openUSE 6.6.1 dmesg log: ``` [ 7427.739081] aacraid: Host bus reset request. SCSI hang ? [ 7427.739101] aacraid 0000:08:00.0: outstanding cmd: midlevel-0 [ 7427.739105] aacraid 0000:08:00.0: outstanding cmd: lowlevel-0 [ 7427.739107] aacraid 0000:08:00.0: outstanding cmd: error handler-0 [ 7427.739109] aacraid 0000:08:00.0: outstanding cmd: firmware-48 [ 7427.739111] aacraid 0000:08:00.0: outstanding cmd: kernel-0 [ 7427.765640] aacraid 0000:08:00.0: Controller reset type is 3 [ 7427.765652] aacraid 0000:08:00.0: Issuing IOP reset [ 7469.875692] aacraid 0000:08:00.0: IOP reset succeeded [ 7469.936116] aacraid: Comm Interface type2 enabled [ 7483.472661] aacraid 0000:08:00.0: Scheduling bus rescan [ 7496.491585] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 7496.491764] sd 0:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16). [ 7553.768632] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.768644] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.768650] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.768655] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.768660] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.768664] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.768669] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.818630] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835297] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835306] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835312] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835317] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835322] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835326] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835331] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835335] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835340] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835344] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835348] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835353] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7553.835357] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355616] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355631] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355637] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355642] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355647] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355652] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355657] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355661] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.355666] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7598.382419] aacraid: Host bus reset request. SCSI hang ? [ 7598.382439] aacraid 0000:08:00.0: outstanding cmd: midlevel-0 [ 7598.382443] aacraid 0000:08:00.0: outstanding cmd: lowlevel-0 [ 7598.382445] aacraid 0000:08:00.0: outstanding cmd: error handler-0 [ 7598.382447] aacraid 0000:08:00.0: outstanding cmd: firmware-30 [ 7598.382449] aacraid 0000:08:00.0: outstanding cmd: kernel-0 [ 7598.402360] aacraid 0000:08:00.0: Controller reset type is 3 [ 7598.402371] aacraid 0000:08:00.0: Issuing IOP reset [ 7640.363459] aacraid 0000:08:00.0: IOP reset succeeded [ 7640.422724] aacraid: Comm Interface type2 enabled [ 7653.650105] aacraid 0000:08:00.0: Scheduling bus rescan [ 7666.688374] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 7666.688467] sd 0:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16). [ 7814.462297] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462308] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462313] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462318] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462322] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462326] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462330] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462334] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.462338] aacraid: Host adapter abort request. aacraid: Outstanding commands on (0,0,1,0): [ 7814.478923] aacraid: Host bus reset request. SCSI hang ? [ 7814.478936] aacraid 0000:08:00.0: outstanding cmd: midlevel-0 [ 7814.478938] aacraid 0000:08:00.0: outstanding cmd: lowlevel-0 [ 7814.478940] aacraid 0000:08:00.0: outstanding cmd: error handler-0 [ 7814.478942] aacraid 0000:08:00.0: outstanding cmd: firmware-9 [ 7814.478943] aacraid 0000:08:00.0: outstanding cmd: kernel-0 [ 7814.495578] aacraid 0000:08:00.0: Controller reset type is 3 [ 7814.495587] aacraid 0000:08:00.0: Issuing IOP reset [ 7856.730598] aacraid 0000:08:00.0: IOP reset succeeded [ 7856.779371] aacraid: Comm Interface type2 enabled [ 7869.946607] aacraid 0000:08:00.0: Scheduling bus rescan [ 7882.985564] sd 0:0:0:0: [sda] Very big device. Trying to use READ CAPACITY(16). [ 7882.985661] sd 0:0:1:0: [sdb] Very big device. Trying to use READ CAPACITY(16). ```
Just to be correct: name of controller was ASR-71685 (not ASR-72405). openSUSE tested on ASR-71605, error happens for example on copying big amount of data to disk with high speeds if you run fsck.ext4 on ext4 file system with buggy kernel it will damage file system and its data using buggy kernel BTRFS scrub also says that checksums are wrong
TWIMC, I raised the issue once more in a mail to the people that should handle this: https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/T/#u
On Tue, 2023-11-21 at 09:54 +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217599 > > --- Comment #29 from The Linux kernel's regression tracker (Thorsten > Leemhuis) (regressions@leemhuis.info) --- > TWIMC, I raised the issue once more in a mail to the people that > should handle this: > > https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/T/#u Switching to email since the bugzilla seems to have stalled. The kernel lists will discard text/html email, so if you have email problems, you can reply by using bugzilla. Firstly, can as many reporters as possible check to see if reverting this commit: https://github.com/torvalds/linux/commit/9dc704dcc09eae7d21b5da0615eb2ed79278f63e Fixes your problem with an upstream kernel? Secondly, John Garry asked if you could provide: > Is there a full kernel log for this hanging system? > > I can only see snippets in the ticket. > > And what does /sys/class/scsi_host/host*/nr_hw_queues show? Regards, James
Created attachment 305451 [details] The kernel.log file wenn the system is hanging. The kernel log wenn including when the system hung. The output of: cat /sys/class/scsi_host/host*/nr_hw_queues root@ganeti-node2:~# cat /sys/class/scsi_host/host*/nr_hw_queues 32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 root@ganeti-node2:~#
Comment 31, has been created as request by https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/T/#m10c915e76ffc589585727f7c2288b213202102b9
Hannes' patch was to revert to using hw queue #0 always for internal commands, and it didn't help. @Sagar, Could there be any issue in using hw queue #0 for regular SCSI commands? AFAICS, that's a significant change in "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity" patch. Previously we would use fib->vector_no to decide the queue, which was in range (0, dev->max_msix).
Probably of little help, I will share that I can ping-pong the reported behavior by switching between two Flatcar Container Linux releases: alpha 3717.0.0 (kernel 6.1.50) where I DON'T experience the reported behavior and alpha 3732.0.0 (kernel 6.1.54) where I DO experience the reported behavior.
Would be really great if more people could do what James asked for in https://bugzilla.kernel.org/show_bug.cgi?id=217599#c30 (e.g. check something and if possible try a revert). (In reply to Randy from comment #34) > Probably of little help FWIW, as it was not mentioned here yet. As stated in https://lore.kernel.org/regressions/c6ff53dc-a001-48ee-8559-b69be8e4db81@leemhuis.info/ it's known that the culprit was picked up for 6.1.53, so earlier versions should be fine.
Hi all, Sorry for the delay in response since I was OOO. We have tried to duplicate this issues on multiple servers with no luck. I will come up to the speed on the latest activity on the ticket and I plan to attempt to recreate this issue on a server with more cores to see if that will help us dupe the issue. I am actively working on this and I will keep the ticket updated with my findings. Thanks
Hi Sagar, Have you also tested this on a multi CPU/Socket server? I've tested this on a single CPU/Socket server, no problem at all (1x Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz). On a Dual Socket/CPU server I get this issue (2x Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz).
(In reply to Joop Boonen from comment #37) > Hi Sagar, > > Have you also tested this on a multi CPU/Socket server? > > I've tested this on a single CPU/Socket server, no problem at all (1x > Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz). > > On a Dual Socket/CPU server I get this issue (2x Intel(R) Xeon(R) Silver > 4210 CPU @ 2.20GHz). As I understand him they tested 8-series controllers. But we all reported about 7-series (71605Z, 71605, 71685). I do not know if it happens on 8-series, unfortunately I do not have enough HDDs for now to check with 8805.
Created attachment 305466 [details] 0001-aacraid-fix-vector-calculation-when-submitting-command.patch aacraid: fix vector calculation when submitting commands.
Next idea; can you try with the above patch?
Created attachment 305469 [details] Test result of 0001-aacraid-fix-vector-calculation-when-submitting-command.patch I've tested patch 0001-aacraid-fix-vector-calculation-when-submitting-command.patch on vanilla kernel 6.1.55 . Doesn't boot properly, I don't know if this patch should work on this kernel version. I've attached the output of dmesg
while testing the proposed patch 0001-aacraid-fix-vector-calculation-when-submitting-command.patch I also ran into the system not booting - this is even the case for systems not using aacraid at all ;) I tested with 6.5 based on Ubuntu's kernel, which is also affected (the VM in question was actually running Proxmox VE 8.1, since that is also affected: https://bugzilla.proxmox.com/show_bug.cgi?id=5077 )
So it looks like Hannes' patch didn't help (thx for trying!) and things stalled again since about a week. Anyone still working on it? Or is a revert of the culprit slowly becoming the least worst option?
Looks like we have lost momentum getting this fixed. I have queued a revert for now.
I am looking into this issue actively. My efforts to dupe this locally on a machine with 2 CPUs is underway, and I will keep this ticket updated. If we happen to dupe, and find a fix - then we can consider the patch queued for revert with some tweak.
Hello, I'm also experiencing this issue on a single core system using the controller as an HBA with a 10 disk ZFS pool: Card: Adaptec ASR-71605 CPU: AMD Ryzen 5950X OS: debian bookworm kernel: Linux stratus 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64 GNU/Linux Syslog: kernel: aacraid: Outstanding commands on (0,1,4,0): kernel: aacraid: Host adapter abort request. kernel: aacraid: Outstanding commands on (0,1,4,0): kernel: aacraid: Host adapter abort request. kernel: aacraid: Outstanding commands on (0,1,11,0): kernel: aacraid: Host adapter abort request. kernel: aacraid: Outstanding commands on (0,1,4,0): kernel: aacraid: Host adapter abort request. kernel: aacraid: Outstanding commands on (0,1,4,0): kernel: aacraid: Host bus reset request. SCSI hang ? kernel: aacraid 0000:0c:00.0: outstanding cmd: midlevel-0 kernel: aacraid 0000:0c:00.0: outstanding cmd: lowlevel-0 kernel: aacraid 0000:0c:00.0: outstanding cmd: error handler-0 kernel: aacraid 0000:0c:00.0: outstanding cmd: firmware-28 kernel: aacraid 0000:0c:00.0: outstanding cmd: kernel-0 kernel: aacraid 0000:0c:00.0: Controller reset type is 3 kernel: aacraid 0000:0c:00.0: Issuing IOP reset kernel: aacraid 0000:0c:00.0: IOP reset succeeded kernel: aacraid: Comm Interface type2 enabled kernel: aacraid 0000:0c:00.0: Scheduling bus rescan Controller config: ./arcconf getconfig 1 Controllers found: 1 ---------------------------------------------------------------------- Controller information ---------------------------------------------------------------------- Controller Status : Optimal Controller Mode : HBA Channel description : SAS/SATA Controller Model : Adaptec ASR71605 Controller Serial Number : ########## Temperature : 46 C/ 114 F (Normal) Installed memory : 1024 MB Copyback : Disabled Background consistency check : Disabled Automatic Failover : Enabled Global task priority : High Performance Mode : Default/Dynamic Host bus type : PCIe Host bus speed : 8000 MHz Host bus link width : 8 bit(s)/link(s) Stayawake period : Disabled Spinup limit internal drives : 0 Spinup limit external drives : 0 Defunct disk drive count : 0 Logical devices/Failed/Degraded : 0/0/0 NCQ status : Enabled Statistics data collection mode : Disabled -------------------------------------------------------- Controller Version Information -------------------------------------------------------- BIOS : 7.5-0 (32118) Firmware : 7.5-0 (32118) Driver : 1.2-1 (50983) Boot Flash : 7.5-0 (32118) -------------------------------------------------------- Controller Cache Backup Unit Information -------------------------------------------------------- Overall Backup Unit Status : Not Ready Backup Unit Type : AFM-700/700LP Non-Volatile Storage Status : Ready Supercap Status : Fatal ---------------------------------------------------------------------- Logical device information ---------------------------------------------------------------------- No logical devices configured ---------------------------------------------------------------------- Physical Device information ---------------------------------------------------------------------- Device #0 Device is a Hard drive State : Raw (Pass Through) Block Size : 4K Supported : Yes Transfer Speed : SATA 6.0 Gb/s Reported Channel,Device(T:L) : 0,0(0:0) Reported Location : Connector 0, Device 0 Vendor : ATA Model : WDC WD140EDFZ-11 Firmware : 81.00A81 Serial number : ##### World-wide name : ##### Reserved Size : 0 KB Used Size : 0 MB Unused Size : 13351936 MB Total Size : 13351936 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No S.M.A.R.T. warnings : 0 Power State : Full rpm Supported Power States : Full rpm,Powered off,Reduced rpm SSD : No NCQ status : Enabled Device #1 Device is a Hard drive State : Raw (Pass Through) Block Size : 4K Supported : Yes Transfer Speed : SATA 6.0 Gb/s Reported Channel,Device(T:L) : 0,1(1:0) Reported Location : Connector 0, Device 1 Vendor : ATA Model : WDC WD140EDFZ-11 Firmware : 81.00A81 Serial number : ##### World-wide name : ##### Reserved Size : 0 KB Used Size : 0 MB Unused Size : 13351936 MB Total Size : 13351936 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No S.M.A.R.T. warnings : 0 Power State : Full rpm Supported Power States : Full rpm,Powered off,Reduced rpm SSD : No NCQ status : Enabled Device #2 Device is a Hard drive State : Raw (Pass Through) Block Size : 4K Supported : Yes Transfer Speed : SATA 6.0 Gb/s Reported Channel,Device(T:L) : 0,4(4:0) Reported Location : Connector 1, Device 0 Vendor : ATA Model : WDC WD140EDFZ-11 Firmware : 81.00A81 Serial number : ##### World-wide name : ##### Reserved Size : 0 KB Used Size : 0 MB Unused Size : 13351936 MB Total Size : 13351936 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No S.M.A.R.T. warnings : 0 Power State : Full rpm Supported Power States : Full rpm,Powered off,Reduced rpm SSD : No NCQ status : Enabled ... snip for brevity ... Device #9 Device is a Hard drive State : Raw (Pass Through) Block Size : 4K Supported : Yes Transfer Speed : SATA 6.0 Gb/s Reported Channel,Device(T:L) : 0,11(11:0) Reported Location : Connector 2, Device 3 Vendor : ATA Model : WDC WD140EDFZ-11 Firmware : 81.00A81 Serial number : ##### World-wide name : ##### Reserved Size : 0 KB Used Size : 0 MB Unused Size : 13351936 MB Total Size : 13351936 MB Write Cache : Enabled (write-back) FRU : None S.M.A.R.T. : No S.M.A.R.T. warnings : 0 Power State : Full rpm Supported Power States : Full rpm,Powered off,Reduced rpm SSD : No NCQ status : Enabled Command completed successfully. zfs errors: zed: eid=8 class=delay pool='tank' vdev=hdd-14-3 size=4096 offset=4762294419456 priority=3 err=0 flags=0x180880 delay=163243ms bookmark=643:0:0:75777 zed: eid=10 class=delay pool='tank' vdev=hdd-14-3 size=8192 offset=4766619205632 priority=1 err=0 flags=0x180880 delay=115074ms bookmark=515:456914:0:0 zed: eid=9 class=delay pool='tank' vdev=hdd-14-3 size=4096 offset=2135816388608 priority=3 err=0 flags=0x180880 delay=163242ms bookmark=643:0:1:74 zed: eid=11 class=delay pool='tank' vdev=hdd-14-3 size=16384 offset=13953519005696 priority=1 err=0 flags=0x180880 delay=115074ms bookmark=515:0:-2:1 zed: eid=12 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:8 zed: eid=13 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405183991808 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:12 zed: eid=14 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405183995904 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:13 zed: eid=16 class=delay pool='tank' vdev=hdd-14-8 size=4096 offset=12405183959040 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:1 zed: eid=15 class=delay pool='tank' vdev=hdd-14-0 size=4096 offset=12405184004096 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:16 zed: eid=17 class=delay pool='tank' vdev=hdd-14-8 size=4096 offset=12405183963136 priority=0 err=0 flags=0x180880 delay=172380ms bookmark=515:780332:0:0 zed: eid=18 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183979520 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:7 zed: eid=19 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:9 zed: eid=20 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183995904 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:14 zed: eid=21 class=delay pool='tank' vdev=hdd-14-6 size=4096 offset=12405183991808 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:11 zed: eid=22 class=delay pool='tank' vdev=hdd-14-5 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:9 zed: eid=23 class=delay pool='tank' vdev=hdd-14-5 size=4096 offset=12405183979520 priority=0 err=0 flags=0x180880 delay=172383ms bookmark=515:780332:0:7 zed: eid=24 class=delay pool='tank' vdev=hdd-14-5 size=4096 offset=12405183991808 priority=0 err=0 flags=0x180880 delay=172382ms bookmark=515:780332:0:11 zed: eid=25 class=delay pool='tank' vdev=hdd-14-2 size=4096 offset=12405184008192 priority=0 err=0 flags=0x180880 delay=172384ms bookmark=515:780332:0:18 zed: eid=26 class=delay pool='tank' vdev=hdd-14-2 size=4096 offset=12405184020480 priority=0 err=0 flags=0x180880 delay=172384ms bookmark=515:780332:0:20 zed: eid=27 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183971328 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:4 zed: eid=28 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183975424 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:6 zed: eid=29 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183987712 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:12 zed: eid=30 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183983616 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:9 zed: eid=31 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405183995904 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:14 zed: eid=32 class=delay pool='tank' vdev=hdd-14-7 size=4096 offset=12405184000000 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:16 zed: eid=33 class=delay pool='tank' vdev=hdd-14-1 size=4096 offset=12405183975424 priority=0 err=0 flags=0x180880 delay=172386ms bookmark=515:780332:0:5
Hi Joop, Maxim et all, I have been trying to duplicate this issue on two different machines with 2 CPUs and little luck. I was curious to know the magic ingredient that is missing in the setup. I have series-7 controllers on both machines, 3 drives attached to both the controllers, with four Raid-5 arrays created on these drives. I am running fio on all the arrays (/dev/sdb to /dev/sde). Could you share the details of tool (or any script?) that is being run on the system when you see the issue? I am mentioning both the configs here, for your reference, and please let me know if something seems conspicuous to you. Also it would really help me if you give similar details other than what has already been mentioned. Thanks Sagar Config-1 Details System Information Manufacturer: Supermicro Product Name: SYS-220U-TNR Version: 0123456789 Serial Number: S411795X2826083 BIOS Information Vendor: American Megatrends International, LLC. Version: 1.4 BIOS Revision: 5.22 System Slot Information Designation: RSC-W2-8888G4 SLOT3 PCI-E 4.0 X8 Type: x8 PCI Express 4 x8 Current Usage: In Use Length: Short ID: 3 Characteristics: 3.3 V is provided PME signal is supported Bus Address: 0000:4b:00.0 Processor Information Socket Designation: CPU1 Type: Central Processor Family: Xeon Manufacturer: Intel(R) Corporation ID: A6 06 06 00 FF FB EB BF Signature: Type 0, Family 6, Model 106, Stepping 6 Version: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz Core Count: 12 Core Enabled: 12 Thread Count: 24 Socket Designation: CPU2 Type: Central Processor Family: Xeon Manufacturer: Intel(R) Corporation ID: A6 06 06 00 FF FB EB BF Signature: Type 0, Family 6, Model 106, Stepping 6 Version: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz Core Count: 12 Core Enabled: 12 Thread Count: 24 Controller Details Controller : ASR71605 BIOS : 7.6-0 (32136) Firmware : 7.6-0 (32136) Driver : 1.2-1 (50983) Boot Flash : 2.57-0 (432) CPLD (Load version/ Flash version) : 8/ 8 SEEPROM (Load version/ Flash version) : 1/ 1 uname -r 6.4.0 lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz BIOS Model name: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 6 NUMA node(s): 2 NUMA node0 CPU(s): 0-11,24-35 NUMA node1 CPU(s): 12-23,36-47 lspci -s 4b:00.0 -k 4b:00.0 RAID bus controller: Adaptec Series 7 6G SAS/PCIe 3 (rev 01) Subsystem: Adaptec Series 7 - ASR-71605 - 16 internal 6G SAS Port/PCIe 3.0 Kernel driver in use: aacraid Kernel modules: aacraid Config-2 Details System Information Manufacturer: HPE Product Name: ProLiant DL380 Gen11 Version: Not Specified Serial Number: CNX2070BND BIOS Information Vendor: HPE Version: 1.40 Release Date: 06/01/2023 CPU Information Processor Information Socket Designation: Proc 1 Type: Central Processor Family: Xeon Manufacturer: Intel(R) Corporation Signature: Type 0, Family 6, Model 143, Stepping 6 Version: Intel(R) Xeon(R) Platinum 8454H Core Count: 32 Core Enabled: 32 Thread Count: 64 Socket Designation: Proc 2 Type: Central Processor Family: Xeon Manufacturer: Intel(R) Corporation ID: F6 06 08 00 FF FB EB BF Signature: Type 0, Family 6, Model 143, Stepping 6 Version: Intel(R) Xeon(R) Platinum 8454H Core Count: 32 Core Enabled: 32 Thread Count: 64 Controller Information Controller : ASR7805 BIOS : 7.5-0 (32118) Firmware : 7.5-0 (32118) Driver : 1.2-1 (50983) Boot Flash : 7.5-0 (32118) CPLD (Load version/ Flash version) : 7/ 10 SEEPROM (Load version/ Flash version) : 0/ 1 uname -r 6.4.0 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) Platinum 8454H BIOS Model name: Intel(R) Xeon(R) Platinum 8454H CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 2 NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-31,64-95 NUMA node1 CPU(s): 32-63,96-127 lspci -s 23:00.0 -k 23:00.0 RAID bus controller: Adaptec Series 7 6G SAS/PCIe 3 (rev 01) Subsystem: Adaptec Series 7 - ASR-7805 - 8 internal 6G SAS Port/PCIe 3.0 Kernel driver in use: aacraid Kernel modules: aacraid
Hi Sagar, I'm using a setup with 10 SATA disks in HBA mode and running a zfs raidz2 filesystem (akin to raid-6). This is a single CPU system so I don't believe the CPU count is the main issue here —- although its likely related. From examining the logs, doing some research, and drawing from my experience, it seems that timeouts and queues are the primary culprits. My suspicion is that during heavy loads, there's an overflow somewhere in the stack (could be in the kernel driver, firmware, or hardware), causing I/O requests to get lost and timeout. After a series of these timeouts, the driver triggers an error and resets the adapter. I stumbled upon threads dating back to around 2017 where users faced similar issues (check this one: https://forum.proxmox.com/threads/pve-5-1-aacraid-scsi-hang.38259/). One suggestion for a fix was to extend the disk timeout window for waiting on I/O. However, the current kernel (set at 60s) has already doubled the previous value of 30s, which makes me think it might not be the root cause but is also related. I'm not sure of the physical disk setup of other users connecting to their controllers, but I reliably see this issue with my 10 disk setup so my recommendation would be to increase the number of disks attached to the controller and stress test it with simultaneous sequential and random I/O using tools like dd and fio at the same time. My specific use case involves a file server and database with multiple users. I consistently observe the adapter aborting requests and resetting a few minutes after boot, when the file server and database applications start and warm up their caches (cache size is approximately 120GB in RAM). Upon further investigation, I found that anyone experiencing this issue could gather more information by modifying aacraid with dump_stack() added around line 713 of linux/latest/source/drivers/scsi/aacraid/linit.c within aac_eh_abort (refer to this: https://stackoverflow.com/questions/32557040/how-to-get-stack-trace-at-various-points-in-kernel-device-driver-code). Unfortunately, due to unacceptable downtime I had to revert my system to a different HBA and lack spare systems to test with. Best regards.
I used 2 raid arrays: - RAID0 with 8 SATA drives (8x4TB) - RAID0 with 2 SATA drives (2x16TB) Both was with LUKS and Ext4/BTRFS (tried both) on that moment. To reproduce it I just started copying files from one array (2nd one) to another using some file manager like Midnight Commander. During copying it can process some amount of data before problem happens and message appears in dmesg output. When it happens copying becomes slow, all hangs, and finally the copying is rejected. btrfs scrub gives similar result, it says that data is corrupt after starting scanning (maybe 200-300 GB is scanned OK before it happens). It is single CPU system, I never used multi-socket MB in in such scenarios. I do not think you need fio, I think you need to move a lot of data to array (for example big media files, VM images, backup files and so on). 2nd time when I got the same issue was different system, and it is also copying of data from RAID0 10x1TB SSD to RAID0 6x8TB HDD (LUKS and BTRFS). -- I do not think it is hardware issue because on old kernels it works fine with the same settings of the array. If FS is not damaged I can just boot in older kernel and all will work fine on the same file system. Usually FS will not be damaged if you not try to repair it by fsck.ext2 using problematic aacraid driver.
Hi. I have the same problem after update Proxmox 8.0 -> 8.1 (kernel version 6.2.19 to 6.5.11). My config is: Controller Model : Adaptec ASR81605Z BIOS : 7.16-0 (33456) Firmware : 7.16-0 (33456) Driver : 1.2-1 (50983) Boot Flash : 7.16-0 (33456) I use ext4 over lvm volume (neither BTRFS, LUKS or ZFS) and have same problem with periodical hangs of adapter: 2023-12-17T20:02:57.135482+03:00 ve5 kernel: [ 9568.092740] aacraid: Outstanding commands on (0,0,0,0): 2023-12-17T20:02:57.135483+03:00 ve5 kernel: [ 9568.093590] aacraid: Host adapter abort request. 2023-12-17T20:02:57.135484+03:00 ve5 kernel: [ 9568.093590] aacraid: Outstanding commands on (0,0,0,0): 2023-12-17T20:03:30.675479+03:00 ve5 kernel: [ 9601.630477] aacraid: Host bus reset request. SCSI hang ?
Hi, I was able to bisect this issue down to https://github.com/torvalds/linux/commit/9dc704dcc09eae7d21b5da0615eb2ed79278f63e I'm using Adaptec RAID 8405 and 6.1.68 kernel with this applied in reverse and everything went back to normal. I hope this patch could be reverted in 6.1.x and mainline. Cheers
FWIW, the revert is queued for 10+ days already, just sadly was not sent to Linus yet: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=fixes&id=c5becf57dd5659c687d41d623a69f42d63f59eb2
I don't know if this is exactly related, but I got here due to the dmesg matching. I'm using a 8805 card, on a Supermicro H12SSL-I motherboard and an single Epyc 7302 CPU. On 6.1.0-15 I can't boot/mount a dependent drive (this error happens). On 6.1.0-10 I can use my system fine.
We are also affected by this issue on Debian 12 machines. Adaptec ASR8805 BIOS : 7.18-0 (33556) Firmware : 7.18-0 (33556) Driver : 1.2-1 (50983) Boot Flash : 7.18-0 (33556) See also Debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624
We see this abort request / hang on Debian 12 with the ASR8805 two or three months ago. Since this was not the newest server we replaced this system because we thought this was an hardware issue. [two month later] But now, this week we upgraded the next server with ASR8805 from Debian 11.8 to 12.4 and saw exactly the same issue (Debian 6.1.67-1). This could not be the same hardware error, so we found this bug report and opened on at Debian bug tracker [1]. Salvatore build a test kernel [2] and we fired up the old "faulty" server for testing. With the new knowledge we was able to reproduce [3] this with an ASR8805 raid6 and 58TB LUKS drive. luksOpen and mount reproduce this every time with kernel 6.1.67-1 and need ~ 1 minute (because the hang and reset request, I guess). Now booting into Salvatore's test kernel (6.1.67-1a~test) and tried the same again, no issue and the luksOpen mount was really quick. This bug was more serious than I guess and it needed some time to get this puzzle together. [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624 [2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624#30 [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624#47
#regzbot fixed-by: c5becf57dd56 #regzbot fixed-by: 71758d4d87ef #regzbot fixed-by: 72e472a91c0d
(In reply to Salvatore Bonaccorso from comment #56) > #regzbot fixed-by: […] Thx, but this confused regzbot a bit, as it tracks the issue as a mainline commit only this is needed: #regzbot fixed-by: c5becf57dd56
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #57) > (In reply to Salvatore Bonaccorso from comment #56) > > #regzbot fixed-by: […] > > Thx, but this confused regzbot a bit, as it tracks the issue as a mainline > commit only this is needed: > [...] apologies for that, this was not my intention (and cause more work)! I though we can track as well the regression fixes in the stable series with it.
I have duplicated the issue locally and we are able to see the issue consistently. I am currently debugging the issue. I will keep posting the updates. Thank you
I had the same issue and I've put my x16 pcie slot in x8 and it seems to have considerably reduce the occurence I had while running on unRAID 6.12.6.
(In reply to Netix from comment #60) > I had the same issue and I've put my x16 pcie slot in x8 and it seems to > have considerably reduce the occurence I had while running on unRAID 6.12.6. Small update. I've made the switch (pcie slot to x8) on January 19th and since then it didn't happen at all. No occurence. Still running unRAID 6.12.6.
Could someone perhaps clarify a few things, please? Which versions and hardware are really affected by this? We're seeing this with an Adaptec ASR8405 after an Ubuntu upgrade 23.04 => 23.10. The mentioned Debian issue (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059624) talks about an ASR8805 controller. So this isn't only about 7-series devices it seems. And I wonder whether we could downgrade to a version that still works but I'm unsure which is the last version that didn't have this bug...? An upgrade isn't possible I guess as this bug isn't fixed yet, right? But why does the Debian issue say: "We believe that the bug you reported is fixed in the latest version".?
From Debians Bookworm linux-image-amd64 changelog [1]: <q> * Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity" (Closes: #1059624) </q> Debian Has reverted the Patch. I've tested on a dual Intel Socket Server with a RAID bus controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01) Subsystem: Adaptec Series 8 - ASR-8805 - 8 internal 0 external 12G SAS Port/PCIe 3.0 I didn't experience any Problems any more. [1] https://metadata.ftp-master.debian.org/changelogs//main/l/linux-signed-amd64/linux-signed-amd64_6.1.76+1_changelog
I am still encountering issues with the aacraid driver. While the reverted version has shown significant improvement in terms of stability and consistency, and I no longer face issues after the system has successfully booted, there is still one persistent problem: the system hangs for 120 seconds during the boot process. It is related to NetApp disk shelf (0000:03:00.0) connected to ASR-78165. For example Ubuntu Server 24.04 (live ISO): $ cat dmesg | grep aacraid [ 0.915806] kernel: Adaptec aacraid driver 1.2.1[50983]-custom [ 0.916680] kernel: aacraid 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control [ 0.928906] kernel: aacraid: Comm Interface type2 enabled [ 0.958831] kernel: aacraid 0000:03:00.0: 64 Bit DAC enabled [ 0.975916] kernel: scsi host0: aacraid [ 1.275877] kernel: aacraid: Host bus reset request. SCSI hang ? [ 1.277602] kernel: aacraid 0000:03:00.0: outstanding cmd: midlevel-1 [ 1.279315] kernel: aacraid 0000:03:00.0: outstanding cmd: lowlevel-0 [ 1.280986] kernel: aacraid 0000:03:00.0: outstanding cmd: error handler-0 [ 1.282672] kernel: aacraid 0000:03:00.0: outstanding cmd: firmware-0 [ 1.284357] kernel: aacraid 0000:03:00.0: outstanding cmd: kernel-0 [ 1.292931] kernel: aacraid 0000:03:00.0: Controller reset type is 3 [ 1.294620] kernel: aacraid 0000:03:00.0: Issuing IOP reset [ 34.039710] kernel: aacraid 0000:03:00.0: IOP reset succeeded [ 34.045955] kernel: aacraid: Comm Interface type2 enabled [ 49.015695] kernel: aacraid 0000:03:00.0: Scheduling bus rescan [ 59.522832] kernel: aacraid: Host bus reset request. SCSI hang ? [ 59.525823] kernel: aacraid 0000:03:00.0: outstanding cmd: midlevel-1 [ 59.528323] kernel: aacraid 0000:03:00.0: outstanding cmd: lowlevel-0 [ 59.529023] kernel: aacraid 0000:03:00.0: outstanding cmd: error handler-0 [ 59.529707] kernel: aacraid 0000:03:00.0: outstanding cmd: firmware-0 [ 59.530372] kernel: aacraid 0000:03:00.0: outstanding cmd: kernel-0 [ 59.537907] kernel: aacraid 0000:03:00.0: Controller reset type is 3 [ 59.538609] kernel: aacraid 0000:03:00.0: Issuing IOP reset [ 91.184486] kernel: aacraid 0000:03:00.0: IOP reset succeeded [ 91.191966] kernel: aacraid: Comm Interface type2 enabled [ 106.042633] kernel: aacraid 0000:03:00.0: Scheduling bus rescan [ 116.351867] kernel: aacraid: Host bus reset request. SCSI hang ? [ 116.354649] kernel: aacraid 0000:03:00.0: outstanding cmd: midlevel-1 [ 116.357268] kernel: aacraid 0000:03:00.0: outstanding cmd: lowlevel-0 [ 116.357904] kernel: aacraid 0000:03:00.0: outstanding cmd: error handler-0 [ 116.358523] kernel: aacraid 0000:03:00.0: outstanding cmd: firmware-0 [ 116.359113] kernel: aacraid 0000:03:00.0: outstanding cmd: kernel-0 [ 116.366903] kernel: aacraid 0000:03:00.0: Controller reset type is 3 [ 116.367580] kernel: aacraid 0000:03:00.0: Issuing IOP reset [ 147.979313] kernel: aacraid 0000:03:00.0: IOP reset succeeded [ 147.981970] kernel: aacraid: Comm Interface type2 enabled [ 162.916919] kernel: aacraid 0000:03:00.0: Scheduling bus rescan As I mentioned previously, the 5.15 kernel version is fully functional and performs consistently well. The same boot issue occurs even with RHEL8 forks, such as Rocky 8.10, likely because the driver has been backported. On the other hand, Ubuntu 22.04, which relies on the 5.15 kernel, works exceptionally well. This is the reason I can't use OpenSUSE Leap 15.6 and must stick with Ubuntu. In other words, some errors still exist in the driver’s source code, and a bisect starting from 5.15 is needed to resolve them.
(In reply to Maxim from comment #64) > > In other words, some errors still exist in the driver’s source code, and a > bisect starting from 5.15 is needed to resolve them. That's a pity, but best discussed in a new ticket, as things otherwise get highly confusing. Could you file one and then drop a link to it here? And I guess we really need a bisection to make progress here. When performing one, apply the revert with "git cherry-pick --no-commit c5becf57dd56" at each bisection step while in the range containing 9dc704dcc09ea)
Hi Maxim, Thorsten, Currently I am investigating this issue. I do not have a definite timeline on the fix, but I will post an update here once I am certain of the solution and the testing timeline. Thanks Sagar