Bug 215519 - failed command: READ FPDMA QUEUED
Summary: failed command: READ FPDMA QUEUED
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-22 10:53 UTC by Daniel Menelkir
Modified: 2022-08-31 11:04 UTC (History)
6 users (show)

See Also:
Kernel Version: 5.16.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments
smart (11.77 KB, text/plain)
2022-01-22 10:53 UTC, Daniel Menelkir
Details
lspci (2.00 KB, text/plain)
2022-01-22 10:53 UTC, Daniel Menelkir
Details
dmesg (65.62 KB, text/plain)
2022-01-22 10:53 UTC, Daniel Menelkir
Details
dmesg-loglevel7-patched (195.45 KB, image/jpeg)
2022-01-27 03:04 UTC, Daniel Menelkir
Details
dmesg-loglevel7-patched-bootdelay (225.65 KB, image/jpeg)
2022-01-27 04:04 UTC, Daniel Menelkir
Details

Description Daniel Menelkir 2022-01-22 10:53:07 UTC
Created attachment 300301 [details]
smart

Issue: 
I wasn't able to extract a log since the error occurs very early, but I will describe it:
ata3.00: exception Emask 0x0 SAct 0x40000000 SErr 0x0 action 0x6 frozen
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:f0:00:00:00/00:00:00:00:00/40 tag 30 ncq dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }

What I've tried so far:
* Booting my distribution supplied kernels (Artix Linux, both linux and linux-zen are 5.16.2)
* Compiled by hand from kernel.org
* smart tests such -t long, no issues (smarct -a attached).
Even a pendrive with a kernel 5.16 gets this error very early. And the error is eventually logged in sdd smart. Booting 5.15 and older works fine.
Comment 1 Daniel Menelkir 2022-01-22 10:53:20 UTC
Created attachment 300302 [details]
lspci
Comment 2 Daniel Menelkir 2022-01-22 10:53:33 UTC
Created attachment 300303 [details]
dmesg
Comment 3 Daniel Menelkir 2022-01-25 10:44:52 UTC
I did some other tests with minimal SATA drivers just to be sure and obtained same results. I guess I'll wait for 5.16.3.
Comment 4 loqs 2022-01-25 23:28:35 UTC
It looks similar to [1] which the bisection indicated was caused by the introduction of support for concurrent positioning ranges.

[1]: https://bbs.archlinux.org/viewtopic.php?pid=2018259#p2018259
Comment 5 Daniel Menelkir 2022-01-26 02:07:53 UTC
(In reply to loqs from comment #4)
> It looks similar to [1] which the bisection indicated was caused by the
> introduction of support for concurrent positioning ranges.
> 
> [1]: https://bbs.archlinux.org/viewtopic.php?pid=2018259#p2018259

I've answered there as well so maybe it gets more visibility.
Comment 6 Daniel Menelkir 2022-01-26 19:02:05 UTC
I've tested a kernel with the cpr disabled, note in this post: https://bbs.archlinux.org/viewtopic.php?pid=2018268#p2018268 and the bug is gone. No errors, working as expected. The bug was introduced by the commit fe22e1c2f705676a705d821301fc52eecc2fe055.
Comment 7 Damien Le Moal 2022-01-27 00:06:39 UTC
I do not see this problem on any of my test machines, even with super old drives plugged... Very strange.
Let me have a look today. Will report back asap.
Comment 8 Daniel Menelkir 2022-01-27 00:09:29 UTC
(In reply to Damien Le Moal from comment #7)
Thanks for your input. It's a very odd issue. At first I thought it was my ssd, but I even exchange the SSDs (one machine is BIOS-only i5 1st gen, and the other one UEFI i5 ivy bridge, both have the same issue with different SSDs).
Comment 9 Damien Le Moal 2022-01-27 00:30:15 UTC
It is likely something to do with the log pages and bad information being gathered about the drive when the cpr probe is run, as opposed to when it is not. That result in the partition probe issuing read commands failing. BIOS and ATA adapter are very much likely irrelevant here. At first glance, I do not see why this can happen. Will dig.
Comment 10 Damien Le Moal 2022-01-27 01:02:34 UTC
With libata-core.c intact (do not remove the call to ata_dev_config_cpr()), can you try this patch:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 65875a598d62..d66fc0817bdd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3321,6 +3321,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
                        sd_read_block_limits(sdkp);
                        sd_read_block_characteristics(sdkp);
                        sd_zbc_read_zones(sdkp, buffer);
+                       sd_read_cpr(sdkp);
                }
 
                sd_print_capacity(sdkp, old_capacity);
@@ -3330,7 +3331,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
                sd_read_app_tag_own(sdkp, buffer);
                sd_read_write_same(sdkp, buffer);
                sd_read_security(sdkp, buffer);
-               sd_read_cpr(sdkp);
        }
 
        /*

This moves the call to sd_read_cpr() under the "if (scsi_device_supports_vpd(sdp)) {"
Comment 11 Damien Le Moal 2022-01-27 01:41:42 UTC
Your ata3 device is ATA-9 (ACS-2), so it is not supposed to support the IDENTIFY DEVICE log page, but it does (many drives do the same). The concurrent positioning ranges info is a sub-page of the IDENTIFY DEVICE log, and your drive definitely should not be supporting that.

When booted in 5.15, can you post this information:

sg_sat_read_gplog --log=48 --page=0 /dev/sdX

And then try this:

sg_sat_read_gplog --log=48 --page=71 /dev/sdX

This will try to access the concurrent positioning sub-page. With this, we can see if your drive reacts badly to this.
Comment 12 Daniel Menelkir 2022-01-27 01:52:59 UTC
sh-5.1# sg_sat_read_gplog --log=48 --page=0 /dev/sda
 00     0001 0000 0000 0000 0008 0201 0403 0605     .. .. .. .. .. .. .. .. 
 08     0008 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 10     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 18     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 20     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 28     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 30     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 38     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 40     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 48     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 50     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 58     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 60     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 68     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 70     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 78     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 80     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 88     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 90     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 98     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 a0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 a8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 b0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 b8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 c0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 c8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 d0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 d8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 e0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 e8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 f0     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
 f8     0000 0000 0000 0000 0000 0000 0000 0000     .. .. .. .. .. .. .. .. 
sh-5.1# sg_sat_read_gplog --log=48 --page=71 /dev/sda
ATA PASS-THROUGH (16), bad field in cdb
sg_sat_read_gplog failed: Illegal request
Comment 13 Damien Le Moal 2022-01-27 02:04:21 UTC
Hmmm... So your drive really does not support any sub-page 0x47 (71), so ata_dev_config_cpr() should be doing absolutely nothing. This is very weird.
Can you try the sd.c change I posted ?
Comment 14 Daniel Menelkir 2022-01-27 02:06:01 UTC
(In reply to Damien Le Moal from comment #13)
> Can you try the sd.c change I posted ?
Yes, I'm compiling the kernel right now.
Just to mention, I've tested the command with other SSD and 3 other HDDs and had the same results.
Comment 15 Daniel Menelkir 2022-01-27 02:53:44 UTC
(In reply to Damien Le Moal from comment #13)
Ok, tested with the sd.c changed, same results.
Comment 16 Damien Le Moal 2022-01-27 02:56:08 UTC
OK. Thanks. Something fishy is really going on. A dmesg output of when the problem happen (with loglevel=7) would be helpful. Could you try to capture one with serial console (if you can) ?
Comment 17 Daniel Menelkir 2022-01-27 03:03:19 UTC
I can't capture via serial console, but I've took a picture of the boot when it stops.
Comment 18 Daniel Menelkir 2022-01-27 03:04:40 UTC
Created attachment 300331 [details]
dmesg-loglevel7-patched
Comment 19 Damien Le Moal 2022-01-27 03:29:21 UTC
(In reply to Daniel Menelkir from comment #17)
> I can't capture via serial console, but I've took a picture of the boot when
> it stops.

OK. So this is really the partition scan failing with a timeout. But this does not tell us why. The "invalid CHS" error message comes from libata-scsi trying to get the sense code for the failed command. But that is failing too.

Does your machine reboot automatically after this ? Or does it stop with a prompt (emergency shell) ? Can you see the ata device probe messages before the error (shift page-up to go back on the console) ?
If we could see what the device probe look like, we may get a hint about what is going on.

You could try to add the option "emergency" to your kernel boot params. Or use the param "boot_delay=<msecs>" to slow down the kernel messages printed to be able to take pictures of the ata probe messages. If that does not allow to see the messages, then we'll need to use kdump (kernel crash dump).
Comment 20 Damien Le Moal 2022-01-27 03:30:13 UTC
By the way, could you also try 5.17-rc1 to see if the problem is still there ?
Comment 21 Daniel Menelkir 2022-01-27 03:33:23 UTC
(In reply to Damien Le Moal from comment #19)

> Does your machine reboot automatically after this ? Or does it stop with a
> prompt (emergency shell) ? Can you see the ata device probe messages before
> the error (shift page-up to go back on the console) ?
It tries for some time, last time it was a matter of 20 minutes so I gave up and reboot.
(In reply to Damien Le Moal from comment #20)
> By the way, could you also try 5.17-rc1 to see if the problem is still there
> ?
I've tried with the HEAD some time ago with same results.
Comment 22 Damien Le Moal 2022-01-27 03:57:33 UTC
(In reply to Daniel Menelkir from comment #21)
> (In reply to Damien Le Moal from comment #19)
> 
> > Does your machine reboot automatically after this ? Or does it stop with a
> > prompt (emergency shell) ? Can you see the ata device probe messages before
> > the error (shift page-up to go back on the console) ?
> It tries for some time, last time it was a matter of 20 minutes so I gave up
> and reboot.

Can you try the boot_delay kernel parameter to see if you can get a picture of the ata device probe messages ? I would like to see if some error or weird info is printed at that time.
Comment 23 Daniel Menelkir 2022-01-27 04:04:00 UTC
It stills get to fast, this is the far I could get.
Comment 24 Daniel Menelkir 2022-01-27 04:04:35 UTC
Created attachment 300332 [details]
dmesg-loglevel7-patched-bootdelay
Comment 25 Damien Le Moal 2022-01-27 04:17:00 UTC
(In reply to Daniel Menelkir from comment #23)
> It stills get to fast, this is the far I could get.

OK. So I think the best option is to use kdump to be able to capture what 5.16.2 is doing. Can you try to set that up ?
Comment 26 Daniel Menelkir 2022-01-27 04:23:15 UTC
(In reply to Damien Le Moal from comment #25)
> OK. So I think the best option is to use kdump to be able to capture what
> 5.16.2 is doing. Can you try to set that up ?
I'm a bit unfamiliar with kdump, can I kdump to somewhere else? Since this happens before the root filesystem even being able to be mounted (actually, before any filesystem to be able of being mounted).
Comment 27 Damien Le Moal 2022-01-27 04:26:58 UTC
(In reply to Daniel Menelkir from comment #26)
> (In reply to Damien Le Moal from comment #25)
> > OK. So I think the best option is to use kdump to be able to capture what
> > 5.16.2 is doing. Can you try to set that up ?
> I'm a bit unfamiliar with kdump, can I kdump to somewhere else? Since this
> happens before the root filesystem even being able to be mounted (actually,
> before any filesystem to be able of being mounted).

Yes, you can. But it will require careful setup.

Another solution, may be easier, is to create a bootable USB stick that has 5.16.2 kernel and use it to boot the machine. Your SSD will still fail to be probed, but userspace should be functional and we will be able to see the boot messages. It may be as simple as installing your distro on the USB stick, changing the kernel, and booting that.
Comment 28 Daniel Menelkir 2022-01-27 12:11:16 UTC
(In reply to Damien Le Moal from comment #27)
> Yes, you can. But it will require careful setup.
> 
> Another solution, may be easier, is to create a bootable USB stick that has
> 5.16.2 kernel and use it to boot the machine. Your SSD will still fail to be
> probed, but userspace should be functional and we will be able to see the
> boot messages. It may be as simple as installing your distro on the USB
> stick, changing the kernel, and booting that.

My system also freezes booting via USB with 5.16.2. I'm preparing a kdump bootable USB, but I don't think I'll be able to dump anything since it freezes very early.
Comment 29 Daniel Menelkir 2022-01-27 12:23:40 UTC
Yes, as I've suspected, it breaks too early to do anything.
Comment 30 Damien Le Moal 2022-01-27 19:34:28 UTC
OK. So I think that the last solution is to build a custom initramfs that simply drops into a shell without trying to mount anything. This page seems to have everything explained to do that:

https://wiki.gentoo.org/wiki/Custom_Initramfs

To simplify, make sure that all drivers are compiled in the kernel.

With that, drives will be probed, so you will still see the IO errors, but nothing should crash and should get the shell to inspect dmesg.

Can you try that ?
Comment 31 Daniel Menelkir 2022-01-27 20:21:02 UTC
(In reply to Damien Le Moal from comment #30)
I was doing that, a really minimal kernel only able to boot and the initramfs was used only for the extra tools, such as filesystem-tools, etc.
Comment 32 Damien Le Moal 2022-01-27 22:01:16 UTC
(In reply to Daniel Menelkir from comment #31)
> (In reply to Damien Le Moal from comment #30)
> I was doing that, a really minimal kernel only able to boot and the
> initramfs was used only for the extra tools, such as filesystem-tools, etc.

Great ! Make sure to compile in the ahci driver so that we can see probe related messages :)
Comment 33 Daniel Menelkir 2022-01-27 22:06:54 UTC
(In reply to Damien Le Moal from comment #32)
That's the problem, I was doing that when I gave you the picture of the boot.
Comment 34 Damien Le Moal 2022-01-27 22:09:56 UTC
Problem ? What is the problem ? Instead of a module, compile in all ATA & ahci drivers. You can always add the modules to the initramfs though, and manually "modprobe ahci" once you are in the shell.
Comment 35 Damien Le Moal 2022-02-01 03:51:31 UTC
Hi,

Any progress on this ? Another extremely simple thing to try would be to take a video of the boot messages, instead of a picture of the hang screen. With grub quiet & graphic boot disabled, all boot messages should be visible on the video...
Comment 36 Daniel Menelkir 2022-02-01 11:42:19 UTC
The machine hangs up after the picture I send you and start the errors with READ FPDMA QUEUED until the kernel panic. Why a video would be better if everything is here? I told you, the machine hangs after the picture I send you of the dmesg, I'm using an old kernel because 5.16 doesn't boot at all in two machines and now I'm apprehensive because every time I boot with cpr, it's generating logs on the nvram of the devices.
Comment 37 Damien Le Moal 2022-02-01 12:23:27 UTC
(In reply to Daniel Menelkir from comment #36)
> The machine hangs up after the picture I send you and start the errors with
> READ FPDMA QUEUED until the kernel panic. Why a video would be better if
> everything is here? I told you, the machine hangs after the picture I send
> you of the dmesg, I'm using an old kernel because 5.16 doesn't boot at all
> in two machines and now I'm apprehensive because every time I boot with cpr,
> it's generating logs on the nvram of the devices.

I suspect that the failed read commands are only the symptom of the problem. They certainly are causing the hang (failure to mount rootfs), but they are most likely due to something going wrong well before the commands are issued, when the drive is probed with the ahci driver initialization.

As you confirmed, removing the call to ata_dev_config_cpr() results in a good boot. So my hunch is that somehow, trying to access the cpr log page is causing problems with your drive. It should not, but it seems like that is the most likely cause right now.

So I would like to see the kernel messages during the disk probing, well before the read error happen. Hence the idea of the minimal initramfs with a shell, or simply a video of the boot messages to try to get these probe messages and see if there are any errors/complaints from libata or scsi disk driver at that time.
Comment 38 Konstantin 2022-02-06 14:20:58 UTC
Same problem. Logs:

failed command: READ FPDMA QUEUED
device reported invalid CHS sector 0

after few seconds 'failed command: READ FPDMA QUEUED' changes to 'failed command: READ DMA.
Comment 39 xrootware 2022-02-06 18:09:26 UTC
(In reply to Konstantin from comment #38)
> Same problem. Logs:
> 
> failed command: READ FPDMA QUEUED
> device reported invalid CHS sector 0
> 
> after few seconds 'failed command: READ FPDMA QUEUED' changes to 'failed
> command: READ DMA.

Have the same on Fedora 35 after updating to kernel 5.16.x
Comment 40 Damien Le Moal 2022-02-07 02:17:25 UTC
I found a bug in ata_dev_config_cpr() in drivers/ata/libata-core.c that seems to be the root cause of this problem, which is now being reported by many people. Posting a fix asap. I will post a link to the patch here too for people to test.
Comment 41 Damien Le Moal 2022-02-07 03:42:42 UTC
I posted a patch.
See: https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/?h=for-5.17-fixes

This should fix the issue. Can someone having the problem can try building and testing a kernel using libata tree, branch for-5.17-fixes ?

The patch will also apply cleanly to 5.16 kernel.
Comment 42 xrootware 2022-02-07 10:01:14 UTC
(In reply to Damien Le Moal from comment #41)
> I posted a patch.
> See:
> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/
> ?h=for-5.17-fixes
> 
> This should fix the issue. Can someone having the problem can try building
> and testing a kernel using libata tree, branch for-5.17-fixes ?
> 
> The patch will also apply cleanly to 5.16 kernel.

Hi, thank for your work! Is that a big chance that patch will be included to upstream?
Comment 43 Damien Le Moal 2022-02-07 10:16:49 UTC
(In reply to xrootware from comment #42)
> (In reply to Damien Le Moal from comment #41)
> > I posted a patch.
> > See:
> > https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/
> > ?h=for-5.17-fixes
> > 
> > This should fix the issue. Can someone having the problem can try building
> > and testing a kernel using libata tree, branch for-5.17-fixes ?
> > 
> > The patch will also apply cleanly to 5.16 kernel.
> 
> Hi, thank for your work! Is that a big chance that patch will be included to
> upstream?

Yes. That is the goal. I just need to get test confirmation that the fix works and I will then send the patch upstream for 5.17-rc4 and backport to 5.16 stable. None of the drives I have behave badly despite the bug, so I cannot test myself.
Comment 44 Sushanth 2022-02-08 03:00:45 UTC
(In reply to Damien Le Moal from comment #41)
> I posted a patch.
> See:
> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/
> ?h=for-5.17-fixes
> 
> This should fix the issue. Can someone having the problem can try building
> and testing a kernel using libata tree, branch for-5.17-fixes ?
> 
> The patch will also apply cleanly to 5.16 kernel.

Thanks for the patch it works on 5.16.7 stable release
Comment 45 Sushanth 2022-02-08 03:03:16 UTC
(In reply to Sushanth from comment #44)
> (In reply to Damien Le Moal from comment #41)
> > I posted a patch.
> > See:
> > https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/
> > ?h=for-5.17-fixes
> > 
> > This should fix the issue. Can someone having the problem can try building
> > and testing a kernel using libata tree, branch for-5.17-fixes ?
> > 
> > The patch will also apply cleanly to 5.16 kernel.
> 
> Thanks for the patch it works on 5.16.7 stable release

Should I test it with 5.17-rc4?
Comment 46 Damien Le Moal 2022-02-08 04:15:22 UTC
(In reply to Sushanth from comment #45)
> (In reply to Sushanth from comment #44)
> > (In reply to Damien Le Moal from comment #41)
> > > I posted a patch.
> > > See:
> > >
> https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/
> > > ?h=for-5.17-fixes
> > > 
> > > This should fix the issue. Can someone having the problem can try
> building
> > > and testing a kernel using libata tree, branch for-5.17-fixes ?
> > > 
> > > The patch will also apply cleanly to 5.16 kernel.
> > 
> > Thanks for the patch it works on 5.16.7 stable release
> 
> Should I test it with 5.17-rc4?

Sure, yes, you can. It is not yet rc4 though. The patch is now in Linus tree.
I found a couple of 10-years old HDDs in my lab this morning and both triggered the issue with 5.17-rc3. The patch solved the problem for me with 5.17-rc3.
Comment 47 Abderraouf Adjal 2022-02-08 07:25:11 UTC
(In reply to Daniel Menelkir from comment #0)
> Created attachment 300301 [details]
> smart
> 
> Issue: 
> I wasn't able to extract a log since the error occurs very early, but I will
> describe it:
> ata3.00: exception Emask 0x0 SAct 0x40000000 SErr 0x0 action 0x6 frozen
> ata3.00: failed command: READ FPDMA QUEUED
> ata3.00: cmd 60/08:f0:00:00:00/00:00:00:00:00/40 tag 30 ncq dma 4096 in
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata3.00: status: { DRDY }
> 
> What I've tried so far:
> * Booting my distribution supplied kernels (Artix Linux, both linux and
> linux-zen are 5.16.2)
> * Compiled by hand from kernel.org
> * smart tests such -t long, no issues (smarct -a attached).
> Even a pendrive with a kernel 5.16 gets this error very early. And the error
> is eventually logged in sdd smart. Booting 5.15 and older works fine.

Is it possible to know what controller your XrayDisk 128GB is using by opening the case? I suspect it's YeeStor brand.
Comment 48 Abderraouf Adjal 2022-02-08 07:39:24 UTC
This patch fixed the issue for my hardware.

Effected SSD info for the record: 
  - Brand: Goldenfir 512GB
  - Model: T650-512GB
  - PIN: 189512TA1124
  - VER: 189.01.03
  - Flash IC: 4 no-brand flash ICs, text on the IC chip "YZ101690"
  - Controller: YeeStor YS9082HC, text on the IC chip: "2121 YS9082HC UGA955 CC1GBAS"
  - Internal photo: https://i.imgur.com/3i7jC4g.jpeg
Comment 49 Sushanth 2022-02-08 07:56:56 UTC
(In reply to Damien Le Moal from comment #46)
> (In reply to Sushanth from comment #45)
> > (In reply to Sushanth from comment #44)
> > > (In reply to Damien Le Moal from comment #41)
> > > > I posted a patch.
> > > > See:
> > > >
> > https://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata.git/commit/
> > > > ?h=for-5.17-fixes
> > > > 
> > > > This should fix the issue. Can someone having the problem can try
> > building
> > > > and testing a kernel using libata tree, branch for-5.17-fixes ?
> > > > 
> > > > The patch will also apply cleanly to 5.16 kernel.
> > > 
> > > Thanks for the patch it works on 5.16.7 stable release
> > 
> > Should I test it with 5.17-rc4?
> 
> Sure, yes, you can. It is not yet rc4 though. The patch is now in Linus tree.
> I found a couple of 10-years old HDDs in my lab this morning and both
> triggered the issue with 5.17-rc3. The patch solved the problem for me with
> 5.17-rc3.

Tested 5.17-rc3 no errors now. Thanks a lot for your help.
Comment 50 Konstantin 2022-02-08 11:05:52 UTC
Confirm, patch works. Now i can boot with 5.16.5-100.

System info:
fedora 34, kernel: 5.16.5-100.fc34.x86_64

description: ATA Disk
          product: SanDisk SDSSDHII
          physical id: 0
          bus info: scsi@0:0.0.0
          logical name: /dev/sda
          version: 00RL
          size: 111GiB (120GB)
          capabilities: partitioned partitioned:dos
          configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=7cc7f5d3
Comment 51 Damien Le Moal 2022-02-10 00:49:04 UTC
A fix patch is upstream:

commit fda17afc6166e975bec1197bd94cd2a3317bce3f
Author: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Date:   Mon Feb 7 11:27:53 2022 +0900

    ata: libata-core: Fix ata_dev_config_cpr()

This patch was also backported to 5.16 stable tree.
I think we can close this one.
Comment 52 xrootware 2022-02-11 22:46:43 UTC
Have no problems on Fedora 35 with kernel 5.16.9. Seems it fixed.

Thanks for good job, guys.

Note You need to log in before you can comment on or make changes to this bug.