Bug 209397

Summary: CQHCI support is broken on Irbis NB111, prevent laptop boot
Product: Drivers Reporter: RussianNeuroMancer (russianneuromancer)
Component: MMC/SDAssignee: drivers_mmc-sd
Status: RESOLVED CODE_FIX    
Severity: high CC: jwrdegoede
Priority: P1    
Hardware: Intel   
OS: Linux   
Kernel Version: 4.16.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: Linux 5.4 boot
Patched Linux 5.8 boot
Patched sdhci-pci-core.c
Irbis NB111 dmidecode
[PATCH] mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models
Patched Linux 5.8 boot (attempt 2)

Description RussianNeuroMancer 2020-09-25 22:44:29 UTC
Created attachment 292651 [details]
Linux 5.4 boot

Hello!

I found that Irbis NB111 refuse to boot after kernel upgrade from Linux 4.15 to any newer version with following error messages:

[ 5.014415] mmc0: running CQE recovery
[ 5.014662] mmc0: running CQE recovery
[ 5.014886] mmc0: running CQE recovery
[ 5.015085] blkupdaterequest: I/O error, dev mmcblk0, sector 53454848 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
[ 5.015137] mmc0: running CQE recovery
[ 5.015365] mmc0: running CQE recovery
[ 5.015590] mmc0: running CQE recovery
[ 5.015793] blkupdaterequest: I/O error, dev mmcblk0, sector 53454848 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 5.015796] Buffer I/O error on dev mmcblk0p3, logical block 0, async page read

Full dmesg log of Linux 5.4 boot (that just regular Ubuntu 20.04 installer started from flash drive) is attached. Git bisect between Linux 4.15 and 4.16 lead me to the following commit:

commit 8ee82bda230fc972c7ee3bb15ce1260eefb4721c
Author: Adrian Hunter adrian.hunter@intel.com
Date: Wed Nov 29 15:41:06 2017 +0200
mmc: sdhci-pci: Add CQHCI support for Intel GLK
Add CQHCI initialization and implement CQHCI operations for Intel GLK.

I build Linux 4.16 without this commit and sure enough laptop boot normally.
Comment 1 RussianNeuroMancer 2020-09-25 22:47:17 UTC
Created attachment 292653 [details]
Patched Linux 5.8 boot

I also noticed there is bugreport https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1848883 and following workaround https://github.com/torvalds/linux/commit/bedf9fc01ff1f40cfd1a79ccacedd9f3cd8e652a so I tried to replace "dmimatch(DMIBIOS_VENDOR, "LENOVO");" with "dmimatch(DMIBIOS_VENDOR, "IRBIS");" in Linux 5.8.11 sources - but this solution didn't work out. I installed patched kernel on system running from eMMC, but attempt to boot it result in continuously printed following errors:

[   34.838058] mmc0: running CQE recovery
[   34.838355] mmc0: running CQE recovery
[   34.838608] mmc0: running CQE recovery
[   34.838815] blkupdaterequest: I/O error, dev mmcblk0, sector 53454720 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   34.841166] mmc0: running CQE recovery
[   34.841398] mmc0: running CQE recovery
[   34.841647] mmc0: running CQE recovery
[   34.841867] blkupdaterequest: I/O error, dev mmcblk0, sector 53454720 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   34.844072] Buffer I/O error on dev mmcblk0p2, logical block 6553584, async page read
[   34.850888] mmc0: running CQE recovery
[   34.851178] mmc0: running CQE recovery
[   34.851436] mmc0: running CQE recovery
[   34.851660] blkupdaterequest: I/O error, dev mmcblk0, sector 53454720 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[   34.854497] mmc0: running CQE recovery
[   34.854730] mmc0: running CQE recovery
[   34.854965] mmc0: running CQE recovery
[   34.855181] blkupdaterequest: I/O error, dev mmcblk0, sector 53454720 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[   34.857898] Buffer I/O error on dev mmcblk0p2, logical block 6553584, async page read

Then it give up and drop to initramfs shell. Full dmesg of this boot is attached.
Comment 2 RussianNeuroMancer 2020-09-25 22:56:17 UTC
Also, just to be sure, you can verify that I didn't make some stupid mistake. Below I attached dmidecode output from affected device where you can verify that "Manufacturer: IRBIS" in both of "System Information" and "Base Board Information", and also attached patched sdhci-pci-core.c where I replaced LENOVO with IRBIS, so you can double check this part too.
Comment 3 RussianNeuroMancer 2020-09-25 22:56:47 UTC
Created attachment 292655 [details]
Patched sdhci-pci-core.c
Comment 4 RussianNeuroMancer 2020-09-25 22:57:06 UTC
Created attachment 292657 [details]
Irbis NB111 dmidecode
Comment 5 Hans de Goede 2020-09-26 19:15:22 UTC
I just checked your patched version and the problem is that you changed the dmi_match to:

+              dmi_match(DMI_BIOS_VENDOR, "IRBIS");

Which matches on the *BIOS* vendor and the dmidecode says:

BIOS Information
	Vendor: American Megatrends Inc.

So yeah, the change you are doing will not make a difference.

I will prepare a fixed patch for you to try and attach it here.
Comment 6 Hans de Goede 2020-09-26 19:27:16 UTC
Created attachment 292671 [details]
[PATCH] mmc: sdhci: Workaround broken command queuing on Intel GLK based IRBIS models

Please give this patch a try. Once I have received positive testing feedback on this patch I will submit it upstream.
Comment 7 RussianNeuroMancer 2020-09-26 23:57:15 UTC
Created attachment 292673 [details]
Patched Linux 5.8 boot (attempt 2)

> Please give this patch a try.

Yes, it works, thank you!.

dmesg is attached, just in case.

> Once I have received positive testing feedback on this patch I will submit it
> upstream.

Is there a chance to get it into stable branches as well? Since it's a regression that break OS upgrade (Ubuntu 18.04->20.04) for some people right now.
Comment 8 Hans de Goede 2020-09-27 10:49:59 UTC
(In reply to RussianNeuroMancer from comment #7)
> Yes, it works, thank you!.

Thank you for testing, I have submitted the patch upstream.

> Is there a chance to get it into stable branches as well? Since it's a
> regression that break OS upgrade (Ubuntu 18.04->20.04) for some people right
> now.

It has a Fixes tag, so it should get cherry-picked into the various stable kernel series.
Comment 9 RussianNeuroMancer 2020-09-27 10:53:56 UTC
Good to know, thank you!