Bug 106331
Summary: | MMC times out during installation (Surface 3) | ||
---|---|---|---|
Product: | Drivers | Reporter: | Bastien Nocera (bugzilla) |
Component: | MMC/SD | Assignee: | drivers_mmc-sd |
Status: | NEW --- | ||
Severity: | normal | CC: | andrea, dan, stanislav.v.v, stephenjust |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.3 rc5 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
4.3 rc5 kernel full dmesg with timeouts
dmesg Lenovo Ideapad 120s |
Description
Bastien Nocera
2015-10-20 09:42:35 UTC
Hi We are seeing similar symptoms on another Cherry Trail platform, the UP board [1], which has an Intel Atom X5-Z8350 SoC. Some details from the kernel log [ 2.291332] mmc0: SDHCI controller on ACPI [80860F14:00] using ADMA ... [ 2.423293] mmc0: new HS200 MMC card at address 0001 [ 2.428541] mmcblk0: mmc0:0001 M52564 58.3 GiB [ 2.428744] mmcblk0boot0: mmc0:0001 M52564 partition 1 4.00 MiB [ 2.428915] mmcblk0boot1: mmc0:0001 M52564 partition 2 4.00 MiB [ 2.429072] mmcblk0rpmb: mmc0:0001 M52564 partition 3 4.00 MiB [ 2.431859] mmcblk0: p1 p2 p3 ... The eMMC flash part is a 64GB Kingston M525 chip. We're currently seeing this issue on a v4.4 kernel, and have also reproduced on the v4.8 kernel. In normal use, it occurs at random, and not frequently. But if we run a test with intensive disk I/O and CPU/GPU activity, it can be reproduced quite reliably in less than 30 minutes. The pattern seems to be that we see timeouts during tuning, which occur from time to time and mostly appear harmless. But, on some occasions, this timeout is followed 10 seconds later by a "Timeout waiting for hardware interrupt". This is the nasty one, because it typically causes some write operation to fail and can be fatal for the application. [103887.417739] sdhci: Timeout waiting for Buffer Read Ready interrupt during tuning procedure, falling back to fixed sampling clock [103897.561325] mmc0: Timeout waiting for hardware interrupt. [103897.561344] sdhci: =========== REGISTER DUMP (mmc0)=========== [103897.561352] sdhci: Sys addr: 0x00000400 | Version: 0x00001002 [103897.561358] sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000400 [103897.561363] sdhci: Argument: 0x01180800 | Trn mode: 0x0000002b [103897.561368] sdhci: Present: 0x1fff0001 | Host ctl: 0x00000035 [103897.561374] sdhci: Power: 0x0000000b | Blk gap: 0x00000080 [103897.561379] sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 [103897.561390] sdhci: Timeout: 0x00000008 | Int stat: 0x00000000 [103897.561395] sdhci: Int enab: 0x02ff000b | Sig enab: 0x02ff000b [103897.561401] sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000 [103897.561406] sdhci: Caps: 0x446cc8b2 | Caps_1: 0x00000807 [103897.561412] sdhci: Cmd: 0x0000193a | Max curr: 0x00000000 [103897.561416] sdhci: Host ctl2: 0x0000000b [103897.561421] sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x36400200 [103897.561424] sdhci: =========================================== [103897.565662] mmcblk0: error -110 sending stop command, original cmd response 0x0, card status 0x400900 [103897.565704] mmcblk0: error -110 transferring data, sector 18352128, nr 1024, cmd response 0x0, card status 0x0 It doesn't occur in quite the same way as described in the original ticket, so it may be a different issue, but perhaps its related. It's interesting that it is occurring on a very similar platform. We've tried various experiments to try to find a workaround/solution on the UP board. In our case, removing the commits mentioned in the original message didn't help. Increasing the command timeout (the one that triggers the hardware interrupt timeout error) by 50ms didn't help either. The only thing that seems to make a real difference is increasing the timeout for the tuning, from 50ms to 100ms. It perhaps hints that not leaving enough time before cleaning up after the failed tuning can have a knock-on effect on subsequent commands? Has this issue been observed on any other Cherry Trail platforms? [1] http://www.up-board.org/up/ Got this on Lenovo Ideapad 120S: Tried kernels from Debian distribution: 4.9.110, 4.19.0-rc2 with same result. dmesg: ``` [ 7487.305755] JBD2: Detected IO errors while flushing file data on mmcblk0p3-8 [ 7497.466460] mmc0: Timeout waiting for hardware interrupt. [ 7497.466475] mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== [ 7497.466485] mmc0: sdhci: Sys addr: 0x00000030 | Version: 0x00001002 [ 7497.466492] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000030 [ 7497.466500] mmc0: sdhci: Argument: 0x018ce3f0 | Trn mode: 0x0000002b [ 7497.466507] mmc0: sdhci: Present: 0x1fff0001 | Host ctl: 0x0000003d [ 7497.466514] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080 [ 7497.466521] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007 [ 7497.466529] mmc0: sdhci: Timeout: 0x00000006 | Int stat: 0x00000000 [ 7497.466536] mmc0: sdhci: Int enab: 0x02ff000b | Sig enab: 0x02ff000b [ 7497.466543] mmc0: sdhci: AC12 err: 0x00000000 | Slot int: 0x00000000 [ 7497.466551] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x00000807 [ 7497.466558] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000 [ 7497.466565] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0x00000000 [ 7497.466572] mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00001900 [ 7497.466578] mmc0: sdhci: Host ctl2: 0x0000008b [ 7497.466587] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x000000017442b200 [ 7497.466591] mmc0: sdhci: ============================================ [ 7497.472074] mmc0: cache flush error -110 [ 7497.472086] print_req_error: I/O error, dev mmcblk0, sector 26010656 [ 7497.472104] Aborting journal on device mmcblk0p3-8. [ 7497.526410] mmc0: Tuning timeout, falling back to fixed sampling clock [ 7497.529355] EXT4-fs error (device mmcblk0p3): ext4_journal_check_start:61: Detected aborted journal [ 7497.529361] EXT4-fs (mmcblk0p3): Remounting filesystem read-only ``` hardware: ``` [ 0.804066] pci 0000:00:1c.0: [8086:5acc] type 00 class 0x080501 [ 0.804102] pci 0000:00:1c.0: reg 0x10: [mem 0x91327000-0x91327fff 64bit] [ 0.804120] pci 0000:00:1c.0: reg 0x18: [mem 0x91328000-0x91328fff 64bit] [ 2.424201] sdhci: Secure Digital Host Controller Interface driver [ 2.424204] sdhci: Copyright(c) Pierre Ossman [ 2.426056] sdhci-pci 0000:00:1c.0: SDHCI controller found [8086:5acc] (rev b) [ 2.437453] mmc0: SDHCI controller on PCI [0000:00:1c.0] using ADMA 64-bit [ 2.560657] mmc0: new HS200 MMC card at address 0001 [ 2.566365] mmcblk0: mmc0:0001 DF4032 29.1 GiB [ 2.566956] mmcblk0boot0: mmc0:0001 DF4032 partition 1 4.00 MiB [ 2.567539] mmcblk0boot1: mmc0:0001 DF4032 partition 2 4.00 MiB [ 2.568126] mmcblk0rpmb: mmc0:0001 DF4032 partition 3 4.00 MiB, chardev (246:0) [ 2.572339] mmcblk0: p1 p2 p3 p4 ``` full dmesg: https://pastebin.com/7jfaEjkk Created attachment 278403 [details]
dmesg Lenovo Ideapad 120s
attach full dmesg for 4.19.0-rc2
|