Created attachment 299493 [details] Kernel log mvsas module fails to IDENTIFY some disks on RocketRaid 2744. Started happening with 5.15.0, continues with 5.15.1. Excerpt from kernel log attached.
I stumbled across this and gave it a try, and it fixed my immediate problem: https://sourceforge.net/p/scst/mailman/scst-devel/thread/4FDDA78C.400@acm.org/ However, it doesn't look like the mvsas driver has changed in some time, so I'm thinking the problem was caused by another change somewhere else in the kernel, and adding that one line of code to mv_sas.c simply "band-aided" the immediate issue I was experiencing.
It would help a lot if this issue would be bisected. See also https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html.
Oh, that's an excellent point. I will try to do so as soon as I can and provide further information.
Bisection has identified commit 2360fa1812cd77e1de13d3cca789fbd23462b651 as the origin of the issue.
On 11/15/21 2:34 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=214967 > > --- Comment #4 from Matthew Perkowski (mgperkow@gmail.com) --- > Bisection has identified commit 2360fa1812cd77e1de13d3cca789fbd23462b651 as > the > origin of the issue. This commit: 2360fa1812cd ("libata: cleanup NCQ priority handling")? Damien, can you take a look?
(In reply to Bart Van Assche from comment #5) > On 11/15/21 2:34 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=214967 > > > > --- Comment #4 from Matthew Perkowski (mgperkow@gmail.com) --- > > Bisection has identified commit 2360fa1812cd77e1de13d3cca789fbd23462b651 as > > the > > origin of the issue. Hmm... It seems very strange that this patch creates the problem. Even with a bug, the worst that could happen is failing to detect NCQ priority support. The problem is likely related to the errors "ata14.00: Read log page 0x08 failed, Emask 0x1" which come from the kernel trying to access a non existent log page (IDENTIFY DEVICE data log), which is tried when probing for NCQ priority support. libata ignores this error, not enabling the feature that was being probed. The mvsas driver may not. I posted a patch yesterday to prevent such access to log pages not supported by the device. See: https://lore.kernel.org/linux-ide/20211115060559.232835-1-damien.lemoal@opensource.wdc.com/ Can you try these ? > > This commit: 2360fa1812cd ("libata: cleanup NCQ priority handling")? > > Damien, can you take a look?
(In reply to Matthew Perkowski from comment #1) > I stumbled across this and gave it a try, and it fixed my immediate problem: > > https://sourceforge.net/p/scst/mailman/scst-devel/thread/4FDDA78C.400@acm. > org/ > > However, it doesn't look like the mvsas driver has changed in some time, so > I'm thinking the problem was caused by another change somewhere else in the > kernel, and adding that one line of code to mv_sas.c simply "band-aided" the > immediate issue I was experiencing. I am not familiar with the mv_sas driver & associated HBAs. Do these implement SAT using libata on the host ? Or does the HBA firmware handle scsi command translation ? Some quick grep in the driver code does not reveal much.
(In reply to Damien Le Moal from comment #6) > (In reply to Bart Van Assche from comment #5) > > On 11/15/21 2:34 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=214967 > > > > > > --- Comment #4 from Matthew Perkowski (mgperkow@gmail.com) --- > > > Bisection has identified commit 2360fa1812cd77e1de13d3cca789fbd23462b651 > as > > > the > > > origin of the issue. > > Hmm... It seems very strange that this patch creates the problem. Even with > a bug, the worst that could happen is failing to detect NCQ priority support. > > The problem is likely related to the errors "ata14.00: Read log page 0x08 > failed, Emask 0x1" which come from the kernel trying to access a non > existent log page (IDENTIFY DEVICE data log), which is tried when probing > for NCQ priority support. > > libata ignores this error, not enabling the feature that was being probed. > The mvsas driver may not. > > I posted a patch yesterday to prevent such access to log pages not supported > by the device. See: > > https://lore.kernel.org/linux-ide/20211115060559.232835-1-damien. > lemoal@opensource.wdc.com/ > > Can you try these ? > > > > > > This commit: 2360fa1812cd ("libata: cleanup NCQ priority handling")? > > > > Damien, can you take a look? I will try rebuilding with the patches at my first opportunity and report back.
(In reply to Damien Le Moal from comment #7) > (In reply to Matthew Perkowski from comment #1) > > I stumbled across this and gave it a try, and it fixed my immediate > problem: > > > > https://sourceforge.net/p/scst/mailman/scst-devel/thread/4FDDA78C.400@acm. > > org/ > > > > However, it doesn't look like the mvsas driver has changed in some time, so > > I'm thinking the problem was caused by another change somewhere else in the > > kernel, and adding that one line of code to mv_sas.c simply "band-aided" > the > > immediate issue I was experiencing. > > I am not familiar with the mv_sas driver & associated HBAs. Do these > implement SAT using libata on the host ? Or does the HBA firmware handle > scsi command translation ? Some quick grep in the driver code does not > reveal much. I'm afraid I don't know that for certain. I know the card HAS firmware since I've updated it once before. I'll see if those patches solve the problem. If not, maybe I can figure out more about the card to help us along. I can tell you that the card uses this controller chip: https://www.marvell.com/content/dam/marvell/en/public-collateral/storage/marvell-storage-88se94xx-product-brief-2011-04.pdf I'm not accustomed to interpreting hardware information at such a low level myself, but it mentions offering "native 6Gb/s SATA interface support."
(In reply to Matthew Perkowski from comment #9) > (In reply to Damien Le Moal from comment #7) > > (In reply to Matthew Perkowski from comment #1) > > > I stumbled across this and gave it a try, and it fixed my immediate > > problem: > > > > > > > https://sourceforge.net/p/scst/mailman/scst-devel/thread/4FDDA78C.400@acm. > > > org/ > > > > > > However, it doesn't look like the mvsas driver has changed in some time, > so > > > I'm thinking the problem was caused by another change somewhere else in > the > > > kernel, and adding that one line of code to mv_sas.c simply "band-aided" > > the > > > immediate issue I was experiencing. > > > > I am not familiar with the mv_sas driver & associated HBAs. Do these > > implement SAT using libata on the host ? Or does the HBA firmware handle > > scsi command translation ? Some quick grep in the driver code does not > > reveal much. > > I'm afraid I don't know that for certain. I know the card HAS firmware since > I've updated it once before. I'll see if those patches solve the problem. If > not, maybe I can figure out more about the card to help us along. I can tell > you that the card uses this controller chip: > > https://www.marvell.com/content/dam/marvell/en/public-collateral/storage/ > marvell-storage-88se94xx-product-brief-2011-04.pdf > > I'm not accustomed to interpreting hardware information at such a low level > myself, but it mentions offering "native 6Gb/s SATA interface support." Looks like the mvsas driver uses libsas, so it likely relies on libata for SCSI-to-ATA translation. Will have a closer look. I also have other HBAs using the pm80xx driver that is similar. Will do some more tests with that.
(In reply to Matthew Perkowski from comment #8) > (In reply to Damien Le Moal from comment #6) > > (In reply to Bart Van Assche from comment #5) > > > On 11/15/21 2:34 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=214967 > > > > > > > > --- Comment #4 from Matthew Perkowski (mgperkow@gmail.com) --- > > > > Bisection has identified commit > 2360fa1812cd77e1de13d3cca789fbd23462b651 > > as > > > > the > > > > origin of the issue. > > > > Hmm... It seems very strange that this patch creates the problem. Even with > > a bug, the worst that could happen is failing to detect NCQ priority > support. > > > > The problem is likely related to the errors "ata14.00: Read log page 0x08 > > failed, Emask 0x1" which come from the kernel trying to access a non > > existent log page (IDENTIFY DEVICE data log), which is tried when probing > > for NCQ priority support. > > > > libata ignores this error, not enabling the feature that was being probed. > > The mvsas driver may not. > > > > I posted a patch yesterday to prevent such access to log pages not > supported > > by the device. See: > > > > https://lore.kernel.org/linux-ide/20211115060559.232835-1-damien. > > lemoal@opensource.wdc.com/ > > > > Can you try these ? > > > > > > > > > > This commit: 2360fa1812cd ("libata: cleanup NCQ priority handling")? > > > > > > Damien, can you take a look? > > I will try rebuilding with the patches at my first opportunity and report > back. That would be great. Thanks.
(In reply to Damien Le Moal from comment #11) > (In reply to Matthew Perkowski from comment #8) > > (In reply to Damien Le Moal from comment #6) > > > (In reply to Bart Van Assche from comment #5) > > > > On 11/15/21 2:34 PM, bugzilla-daemon@bugzilla.kernel.org wrote: > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=214967 > > > > > > > > > > --- Comment #4 from Matthew Perkowski (mgperkow@gmail.com) --- > > > > > Bisection has identified commit > > 2360fa1812cd77e1de13d3cca789fbd23462b651 > > > as > > > > > the > > > > > origin of the issue. > > > > > > Hmm... It seems very strange that this patch creates the problem. Even > with > > > a bug, the worst that could happen is failing to detect NCQ priority > > support. > > > > > > The problem is likely related to the errors "ata14.00: Read log page 0x08 > > > failed, Emask 0x1" which come from the kernel trying to access a non > > > existent log page (IDENTIFY DEVICE data log), which is tried when probing > > > for NCQ priority support. > > > > > > libata ignores this error, not enabling the feature that was being > probed. > > > The mvsas driver may not. > > > > > > I posted a patch yesterday to prevent such access to log pages not > > supported > > > by the device. See: > > > > > > https://lore.kernel.org/linux-ide/20211115060559.232835-1-damien. > > > lemoal@opensource.wdc.com/ > > > > > > Can you try these ? > > > > > > > > > > > > > > This commit: 2360fa1812cd ("libata: cleanup NCQ priority handling")? > > > > > > > > Damien, can you take a look? > > > > I will try rebuilding with the patches at my first opportunity and report > > back. > > That would be great. Thanks. Looks like your hunch may have been right on. I applied your patches to a fresh copy of the 5.15.2 source (which was not working properly with my RR 2744 card via the mvsas driver in vanilla form), and the issue did not arise when I booted into it.
(In reply to Matthew Perkowski from comment #12) > > > I will try rebuilding with the patches at my first opportunity and report > > > back. > > > > That would be great. Thanks. > > Looks like your hunch may have been right on. I applied your patches to a > fresh copy of the 5.15.2 source (which was not working properly with my RR > 2744 card via the mvsas driver in vanilla form), and the issue did not arise > when I booted into it. Great. Can I add your Tested-by tag ? E.g.: Tested-by: Matthew Perkowski <mgperkow@gmail.com>
(In reply to Damien Le Moal from comment #13) > (In reply to Matthew Perkowski from comment #12) > > > > I will try rebuilding with the patches at my first opportunity and > report > > > > back. > > > > > > That would be great. Thanks. > > > > Looks like your hunch may have been right on. I applied your patches to a > > fresh copy of the 5.15.2 source (which was not working properly with my RR > > 2744 card via the mvsas driver in vanilla form), and the issue did not > arise > > when I booted into it. > > Great. Can I add your Tested-by tag ? > E.g.: > > Tested-by: Matthew Perkowski <mgperkow@gmail.com> Certainly. That is fine with me.
(In reply to Matthew Perkowski from comment #14) > (In reply to Damien Le Moal from comment #13) > > (In reply to Matthew Perkowski from comment #12) > > > > > I will try rebuilding with the patches at my first opportunity and > > report > > > > > back. > > > > > > > > That would be great. Thanks. > > > > > > Looks like your hunch may have been right on. I applied your patches to a > > > fresh copy of the 5.15.2 source (which was not working properly with my > RR > > > 2744 card via the mvsas driver in vanilla form), and the issue did not > > arise > > > when I booted into it. > > > > Great. Can I add your Tested-by tag ? > > E.g.: > > > > Tested-by: Matthew Perkowski <mgperkow@gmail.com> > > Certainly. That is fine with me. Could you test with the latest 5.16-rc1 kernel too please ?
(In reply to Damien Le Moal from comment #15) > Could you test with the latest 5.16-rc1 kernel too please ? Built, patched, and tested with 5.16-rc1. mvsas detected all drives as expected. Scanned the kernel log after boot to check more in-depth as well. No unusual messages from the mvsas driver. Seems to be working exactly as I would expect.
(In reply to Matthew Perkowski from comment #16) > (In reply to Damien Le Moal from comment #15) > > Could you test with the latest 5.16-rc1 kernel too please ? > > Built, patched, and tested with 5.16-rc1. mvsas detected all drives as > expected. Scanned the kernel log after boot to check more in-depth as well. > No unusual messages from the mvsas driver. Seems to be working exactly as I > would expect. Great ! Thanks for testing.
Created attachment 307503 [details] Resurface the bug -- 1/17/2025 I encountered a similar issue with the initial report back in 2021. I tried to revive my old workstation which has BIOS/MBR with 5 HDD slots. However, Arch linux can only detect 2 of the disks. BIOS can see all the HDDs. Linux kernel is: 6.6.72-1-lts and updated system with pacman -Syu. attached my journalctl output here. First time to post, not sure what posting policy/requirement. Thanks for any help in advance.