Bug 150881
Summary: | genirq: Flags mismatch irq 8, 00000088 (mmc0) vs. 00000080 (rtc0). mmc0: Failed to request irq 8: -16 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Dmitry (nrndda) |
Component: | MMC/SD | Assignee: | drivers_mmc-sd |
Status: | RESOLVED CODE_FIX | ||
Severity: | blocking | CC: | andy.shevchenko, bastienphilbert, hugh, jbMacBrodie, jwrdegoede, nrndda, regressions, sergei.a.trusov |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.8-rc0 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
bisect.log
Test Patch kernel's .config sysrq cpus backtraces git diff in linux tree /proc/interrupts lsmod dmesg Merrfield Verfity Patch dmesg with new printk for mrfld gpio driver Gpio Test Patch dracut log Verify Code Reach with patch /proc/interrupts dmesg for 4.7.1 with genirq error bisect log dmesg with rd.break=mount dracut logs sdhci_acpi_emmc_probe_slot error |
Description
Dmitry
2016-08-01 15:22:47 UTC
Did you bisect to that commit or are you just assuming that it's a bad commit. If this was a guess please actually bisect to the commit causing your regression. Created attachment 227931 [details]
bisect.log
Just finished bisecting:
# first bad commit: [1cd04d293c818687795b83cd8f2626bd4662feeb] Merge tag 'gpio-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
I'll try now master again and then revert this merge commit.
Yes, I can boot after reverting gpio branch merge. So, it's definitely a regression, but different kernel subsystem. I don't know why IRQ numbers are involved.. Can't even get to dracut shell after this error. What board and driver are you using for gpio? cat /sys/class/gpio/*/device/modalias acpi:INT33FD:INT33FD: acpi:INT33FC:INT33FC: acpi:INT33FC:INT33FC: acpi:INT33FC:INT33FC: The first one is Intel's Atom crystal cove. See if this patch helps fix your issue. Created attachment 227961 [details]
Test Patch
Thanks for trying but no luck. The same error. I'll try to get useful backtrace or whatever. Ok but also can you send me a lsmod output of the drivers your running. Created attachment 227971 [details]
kernel's .config
It's useless as I have self-compiled vanilla kernel with a couple of patches for wifi and soc buttons..
I can no longer debug this as you have proprietary modules. Sorry please close this bug as invalid unless you are valid to run without those modules. Created attachment 227981 [details]
sysrq cpus backtraces
This could be useful. That's all I got from sysrq...
I have only virtualbox modules. Nothing else.. Also some experimental i915 parameters. Created attachment 227991 [details]
git diff in linux tree
I can test without patches or external modules if it will help.. Can you that would be very helpful. Sorry for long pause. Rebuilt kernel with all modules in initramfs and blacklisted vbox modules. It seem now all works. Modprobe vbox modules and still all OK. Found problem: if I choose rtc_cmos to builtin in kernel, then kernel won't boot and show this error. Also rtc_cmos and mmc0 (rootfs) share one IRQ. 8: 2326 0 0 0 IO-APIC 47-fasteoi mmc0, rtc0 So, there is an issue, but now I build kernel with initramfs with rtc_cmos and mmc drivers as modules and it works fine. Created attachment 228341 [details]
/proc/interrupts
Created attachment 228351 [details]
lsmod
Created attachment 228361 [details]
dmesg
Created attachment 228481 [details]
Merrfield Verfity Patch
I want to see if your indeed using the new Merrfield driver to probe your device under the configuration that is currently causing this regression as the driver was introduced this merge window for your devices similar to your crystal dove board this merge window. I don't have X86_INTEL_MID enabled, so no X86_INTEL_MID and merrifield in .config . In dmesg no mrfld. Created attachment 228511 [details]
dmesg with new printk for mrfld gpio driver
See if the below helps or triggers any new warnings. Created attachment 228581 [details]
Gpio Test Patch
Created attachment 229001 [details]
dracut log
Nothing showed, but I've managed to get backtrace from this error.
So kernel is fully functional after this error, just no root(on mmc0). Systemd hangs after not being able to mount root.
Ok run the patch below and send me your dracut log. This patch is not meant to fix the issue but verify where the IRQS are being shared incorrectly by the GPIO core subystem changes. Created attachment 229141 [details]
Verify Code Reach
Created attachment 229371 [details]
with patch
Nothing showed up.
Created attachment 229381 [details]
/proc/interrupts
Also I've migrated from ext4 to f2fs as my root and I see the same error in 4.7.1. I don't know what have changed but it's very strange. Interrupts is different for 4.7.1 compared to 4.8.
Created attachment 229391 [details]
dmesg for 4.7.1 with genirq error
Dmitry, can you please confirm if this is a regression in 4.8? Did you find the exact commit (e.g. not a merge commit) with the change that introduced the regression? Is it still present in 4.8? Yes, it's a regression. Introduced in gpio-v4.8-1 merge. I'm bisecting now in this merge. As soon as I get a result I'll post it. And yes, it's still present. Workaround is to build rtc_cmos as a module. Created attachment 230991 [details]
bisect log
I haven't succeded in bisecting.
Linux 4.8-rc4 give me the same error.
Also I found another error and a clue. I have two mmc buses: 1 - on motherboard with soldered memory chip and wifi; 2 - with external slot. First one is working but after huge pause and the second one is not working at all. The error showed in function: sdhci_acpi_emmc_probe_slot+0x70/0x70.
Created attachment 231001 [details]
dmesg with rd.break=mount
Created attachment 231011 [details]
dracut logs
Created attachment 231021 [details]
sdhci_acpi_emmc_probe_slot error
Comment on attachment 231021 [details]
sdhci_acpi_emmc_probe_slot error
Oh, it's already in dmesg.
(In reply to Dmitry from comment #2) > Created attachment 227931 [details] > bisect.log > > Just finished bisecting: > # first bad commit: [1cd04d293c818687795b83cd8f2626bd4662feeb] Merge tag > 'gpio-v4.8-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio > > I'll try now master again and then revert this merge commit. I can confirm bisect using only kernel.org source. My problem is accessing internal SDHC card. My system uses Z3775 processor (ASUS T100CHI). My /home is on the SD card and it hangs booting 4.8-rc1 thru rc4. If I move my SD card to an external USB SD card reader boot is successful. [I'll try making rtc-cmos, mmc as modules - just need a workaround beside external SD reader] This dmesg fragment (from a T100TA) is present when bug is active genirq: Flags mismatch irq8. 00000088 (mmc0) vs. 00000080 (rtc0) mmc0: Failed to request IRQ 8: -16 sdhci-acpi: probe of 80860F14:01 failed with error -16 Problem also affect Toshiba CLick Mini (Z3735F) - external card reader also works Looks like this affects multiple baytrail processors and exact symptoms are hardware implementation dependent. I can also confirm the module workaround. I had to hunt a bit to find the settings DeviceDrivers->RealTimeClock->PC-style 'CMOS' <M> DeviceDrivers->MMC/SD/SDIO card support-> MMC block device driver <M> My dmesg fragment from T100CHI (Z3775) [ 3.246833] sdhci: Copyright(c) Pierre Ossman [ 3.260548] hidraw: raw HID events driver (C) Jiri Kosina [ 3.266972] genirq: Flags mismatch irq 8. 00000088 (mmc0) vs. 00000080 (rtc0) [ 3.267073] mmc0: Failed to request IRQ 8: -16 [ 3.267153] sdhci-acpi: probe of 80860F14:01 failed with error -16 [ 3.271907] mmc0: SDHCI controller on ACPI [INT33BB:00] using ADMA I can test patches and run diagnostics, if that would help. I don't have Dmity's grasp of the hardware or his debuggin skills and I'm a rookie at git & bisecting. Dmitry, I own you an apology. My comments were meant to support your efforts. Instead, they could be seen as offensive. That was not my intent. Please accept my apology. I am grateful for your work to isolate this problem and especially grateful for the workaround. Thank you. jbMacAZ, never mind, it's all right. Tried with latest rc5 and bug is still there. There is another note: right after genirq error message I can reboot by ctrl-alt-del and switch between splash and console, so the keyboard is working. But after some time(couple of seconds) I can not do this any more. Only sysrq's reboot is working. Other functions of sysrq are not usable. So kernel is frozen at this time or not answering on other IRQ requests. P.S. I forgot that this could be connected with bug 109051, because I set max_cstate in systemd service after initram switches root. I'll check this. Adding intel_idle.max_cstate=1 in cmdline prevents freezing but not this bug. 3 of 4 cpus are in idle state and last one is in handling irq state. Last functions before irq from keyboard are: handle_irq_event -> handle_irq_event_percpu -> add_interrupt_randomness -> __handle_irq_event__precpu -> credit_entropy_bits Then goes xhci_irq from sysrq keys. Call credit_entropy_bits in backtrace sometimes disappears. So cpu is stuck in __handle_irq_event__precpu. FWIW, there were three commits on Friday that might be related (but I'm not sure, I have no idea if those chips are in your system): https://git.kernel.org/torvalds/c/c6c864993d9a20f8d7cacb4feaac5c46a2f2e4db https://git.kernel.org/torvalds/c/56beac95cb88c188d2a885825a5da131edb41fe3 https://git.kernel.org/torvalds/c/60f749f8e4cfdfffa5f29c966050ed680eeedac2 Problem still affects 4.8.4 and 4.9-rc2. A newly discovered symptom is broken backlight control, which the module work around fixes. This patch helped to solve a similar flag mismatch https://patchwork.kernel.org/patch/6118791/ (In reply to Sergei Trusov from comment #47) > This patch helped to solve a similar flag mismatch > https://patchwork.kernel.org/patch/6118791/ Thanks. That old patch does seem to fix the SD card/ IRQ8 allocation for baytrail Asus T100CHI. Tested in kernel 4.8.12. Without the patch, if the PC style cmos RTC module is built-in, booting hangs (/home is on SD card). (In reply to jbMacAZ from comment #48) > (In reply to Sergei Trusov from comment #47) > > This patch helped to solve a similar flag mismatch > > https://patchwork.kernel.org/patch/6118791/ > > Thanks. That old patch does seem to fix the SD card/ IRQ8 allocation for > baytrail Asus T100CHI. Tested in kernel 4.8.12. Without the patch, if the > PC style cmos RTC module is built-in, booting hangs (/home is on SD card). The patch no longer works as of 4.8.13. The .config work around is still effective (make cmos RTC a module rather than built-in). Apparently this (IRQ 8 hardcoded for RTC) is an old problem from about 2 years ago, only to re-appear in 4.8. I retested the patch (comment #47) with 4.10.4 and it is effective. I don't know why it failed in 4.8.13. Anyone understand why this patch was reverted for 4.8.0-rc1? This should be fixed by this commit: https://github.com/jwrdegoede/linux-sunxi/commit/556673a6b122b16e57b4fcbb607e120de3f71f90 Which should land upstream in 4.12-rc1. -(In reply to Hans de Goede from comment #51) > This should be fixed by this commit: > > https://github.com/jwrdegoede/linux-sunxi/commit/ > 556673a6b122b16e57b4fcbb607e120de3f71f90 > > Which should land upstream in 4.12-rc1. I've applied the patch to 4.11-rc6 replacing the patch from #comment 47. I also restored the two .config params to built-in (#comment 41) I am able to boot and run (Mint 18.1, T100CHI baytrail Z3775) Booting means that the kernel is able to read /home from the SD card (my main symptom - no SD card reader). SDIO wifi seems to be throwing fewer errors with this change, but... Thanks! |