Bug 212065 - Write eMMC device failed with message running CQE recovery on Intel N6000 machines
Summary: Write eMMC device failed with message running CQE recovery on Intel N6000 mac...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: MMC/SD (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_mmc-sd
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-05 03:57 UTC by Jian-Hong Pan
Modified: 2022-01-02 23:38 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.10+
Tree: Mainline
Regression: No


Attachments
The kernel's journal log (1.63 MB, text/plain)
2021-03-05 03:57 UTC, Jian-Hong Pan
Details

Description Jian-Hong Pan 2021-03-05 03:57:24 UTC
Created attachment 295665 [details]
The kernel's journal log

We have a laptop equipped with Intel N6000 and an eMMC device G1J39E.
Tested with latest mainline kernel and found system always shows "mmc0: running CQE recovery" when writes to the eMMC, then becomes failed:

Mar 04 17:22:44 endless kernel: mmc0: running CQE recovery
Mar 04 17:22:44 endless kernel: ------------[ cut here ]------------
Mar 04 17:22:44 endless kernel: mmc0: cqhci: spurious TCN for tag 12
Mar 04 17:22:44 endless kernel: WARNING: CPU: 2 PID: 22 at drivers/mmc/host/cqhci-core.c:778 cqhci_irq+0x2af/0x660 [cqhci]
Mar 04 17:22:44 endless kernel: Modules linked in: x86_pkg_temp_thermal ucsi_acpi typec_ucsi efivarfs mmc_block nvme sdhci_pci cqhci nvme_core sdhci mmc_core
Mar 04 17:22:44 endless kernel: CPU: 2 PID: 22 Comm: kworker/2:0H Tainted: G     U            5.12.0-rc1+ #207
Mar 04 17:22:44 endless kernel: Hardware name: ASUSTeK COMPUTER INC. ASUS BR1100FKA BR1100FKA/BR1100FKA, BIOS BR1100FKA.213 01/22/2021
Mar 04 17:22:44 endless kernel: Workqueue: kblockd blk_mq_run_work_fn
Mar 04 17:22:44 endless kernel: RIP: 0010:cqhci_irq+0x2af/0x660 [cqhci]
Mar 04 17:22:44 endless kernel: Code: 3d 15 27 00 00 00 75 a2 48 8b 75 58 c6 05 08 27 00 00 01 48 85 f6 75 04 48 8b 75 08 89 da 48 c7 c7 00 f6 06 c0 e8 db f3 0c e0 <0f> 0b 49 63 75 24 e9 75 ff ff ff 41 80 7d 3c 00 74 25 41 8b 45 28
Mar 04 17:22:44 endless kernel: RSP: 0018:ffffa43240174e40 EFLAGS: 00010086
Mar 04 17:22:44 endless kernel: RAX: 0000000000000000 RBX: 000000000000000c RCX: ffffa28ceff17478
Mar 04 17:22:44 endless kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa28ceff17470
Mar 04 17:22:44 endless kernel: RBP: ffffa28985823000 R08: ffffffffa0d31f68 R09: 0000000000009ffb
Mar 04 17:22:44 endless kernel: R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000002
Mar 04 17:22:44 endless kernel: R13: ffffa28980da2728 R14: 0000000000000000 R15: 0000000000000000
Mar 04 17:22:44 endless kernel: FS:  0000000000000000(0000) GS:ffffa28ceff00000(0000) knlGS:0000000000000000
Mar 04 17:22:44 endless kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 04 17:22:44 endless kernel: CR2: 000056300d77b048 CR3: 000000010eadc000 CR4: 0000000000350ee0
Mar 04 17:22:44 endless kernel: Call Trace:
Mar 04 17:22:44 endless kernel:  <IRQ>
Mar 04 17:22:44 endless kernel:  ? try_to_wake_up+0x183/0x3f0
Mar 04 17:22:44 endless kernel:  sdhci_cqhci_irq+0x53/0x80 [sdhci_pci]
Mar 04 17:22:44 endless kernel:  sdhci_irq+0x1ba/0xca2 [sdhci]
Mar 04 17:22:44 endless kernel:  ? pci_bus_read_config_word+0x3e/0x60
Mar 04 17:22:44 endless kernel:  __handle_irq_event_percpu+0x32/0x140
Mar 04 17:22:44 endless kernel:  handle_irq_event+0x44/0xa0
Mar 04 17:22:44 endless kernel:  handle_fasteoi_irq+0x6e/0x190
Mar 04 17:22:44 endless kernel:  __common_interrupt+0x33/0x90
Mar 04 17:22:44 endless kernel:  common_interrupt+0x76/0xa0
Mar 04 17:22:44 endless kernel:  </IRQ>
Mar 04 17:22:44 endless kernel:  asm_common_interrupt+0x1e/0x40
Mar 04 17:22:44 endless kernel: RIP: 0010:dma_direct_map_sg+0x74/0x230
Mar 04 17:22:44 endless kernel: Code: 01 0f 84 d6 00 00 00 4c 8d 5c 0a ff 48 83 f9 ff 0f 84 c2 00 00 00 49 8b 86 38 02 00 00 4c 8b 08 49 8b 86 48 02 00 00 4d 85 c9 <74> 10 48 85 c0 0f 84 c9 00 00 00 4c 39 c8 49 0f 47 c1 49 39 c3 0f
Mar 04 17:22:44 endless kernel: RSP: 0018:ffffa4324016fc20 EFLAGS: 00000286
Mar 04 17:22:44 endless kernel: RAX: 0000000000000000 RBX: 0000000000000080 RCX: 00000001fa733000
Mar 04 17:22:44 endless kernel: RDX: 0000000000001000 RSI: 00000001fa733000 RDI: 0000000000000000
Mar 04 17:22:44 endless kernel: RBP: ffffa2898654c9a0 R08: 0000000000001000 R09: ffffffffffffffff
Mar 04 17:22:44 endless kernel: R10: ffffa289858b5800 R11: 00000001fa733fff R12: 0000000000000001
Mar 04 17:22:44 endless kernel: R13: 0000000000000000 R14: ffffa28980bf90c0 R15: 000000000000004d
Mar 04 17:22:44 endless kernel:  dma_map_sg_attrs+0x2b/0x40
Mar 04 17:22:44 endless kernel:  cqhci_request+0x1ae/0x4c0 [cqhci]
Mar 04 17:22:44 endless kernel:  ? mmc_queue_map_sg+0x39/0x53 [mmc_block]
Mar 04 17:22:44 endless kernel:  mmc_cqe_start_req+0x46/0x90 [mmc_core]
Mar 04 17:22:44 endless kernel:  mmc_blk_mq_issue_rq+0x589/0x950 [mmc_block]
Mar 04 17:22:44 endless kernel:  ? sbitmap_get+0x6c/0xe0
Mar 04 17:22:44 endless kernel:  mmc_mq_queue_rq+0x128/0x250 [mmc_block]
Mar 04 17:22:44 endless kernel:  blk_mq_dispatch_rq_list+0x10b/0x760
Mar 04 17:22:44 endless kernel:  __blk_mq_sched_dispatch_requests+0xc0/0x190
Mar 04 17:22:44 endless kernel:  blk_mq_sched_dispatch_requests+0x2b/0x50
Mar 04 17:22:44 endless kernel:  __blk_mq_run_hw_queue+0x28/0x60
Mar 04 17:22:44 endless kernel:  process_one_work+0x1c9/0x360
Mar 04 17:22:44 endless kernel:  worker_thread+0x48/0x3c0
Mar 04 17:22:44 endless kernel:  ? rescuer_thread+0x3b0/0x3b0
Mar 04 17:22:44 endless kernel:  kthread+0x113/0x130
Mar 04 17:22:44 endless kernel:  ? kthread_create_worker_on_cpu+0x60/0x60
Mar 04 17:22:44 endless kernel:  ret_from_fork+0x1f/0x30
Mar 04 17:22:44 endless kernel: ---[ end trace f5bcc237f3a1205b ]---
...
Mar 04 17:22:44 endless kernel: mmc0: running CQE recovery
Mar 04 17:22:44 endless kernel: blk_update_request: I/O error, dev mmcblk0, sector 121908224 op 0x1:(WRITE) flags 0x0 phys_seg 128 prio class 0
Mar 04 17:22:44 endless kernel: mmc0: running CQE recovery
Mar 04 17:22:44 endless kernel: blk_update_request: I/O error, dev mmcblk0, sector 121901056 op 0x1:(WRITE) flags 0x4000 phys_seg 128 prio class 0
Mar 04 17:22:44 endless kernel: mmc0: running CQE recovery
...

Upload the full kernel's journal log as the attachment
Comment 1 harryharryharry 2021-12-06 16:52:55 UTC
I'm running into the same issue on a ASUS BR1100FKA, I found it can be circumvented by booting with a module quirk enabled for the sdhci module
sdhci.debug_quirks2=0x90c

I'm not sure whether this quirk comes with a performance penalty and should be used as a temporary workaround, or that the quirk is an actual fix that can be incorporated in into the mainline kernel for N6000 (jasper lake) devices that need it.
Comment 2 harryharryharry 2022-01-02 23:38:18 UTC
this parameter also works: sdhci.debug_quirks=0x65168080

source: https://community.nxp.com/t5/i-MX-Processors/eMMC-message-quot-mmc2-running-CQE-recovery-quot/m-p/1092607

Not sure though if there is a difference/what the difference might be.

Note You need to log in before you can comment on or make changes to this bug.