Bug 150881

Summary: genirq: Flags mismatch irq 8, 00000088 (mmc0) vs. 00000080 (rtc0). mmc0: Failed to request irq 8: -16
Product: Drivers Reporter: Dmitry (nrndda)
Component: MMC/SDAssignee: drivers_mmc-sd
Status: RESOLVED CODE_FIX    
Severity: blocking CC: andy.shevchenko, bastienphilbert, hugh, jbMacBrodie, jwrdegoede, nrndda, regressions, sergei.a.trusov
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.8-rc0 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: bisect.log
Test Patch
kernel's .config
sysrq cpus backtraces
git diff in linux tree
/proc/interrupts
lsmod
dmesg
Merrfield Verfity Patch
dmesg with new printk for mrfld gpio driver
Gpio Test Patch
dracut log
Verify Code Reach
with patch
/proc/interrupts
dmesg for 4.7.1 with genirq error
bisect log
dmesg with rd.break=mount
dracut logs
sdhci_acpi_emmc_probe_slot error

Description Dmitry 2016-08-01 15:22:47 UTC
Intel Baytrail Z3770 with sdhci-acpi fails to boot (or boots but hangs afterwards) with this error. Found similar error for tty/hvc in this commit: bbc3dfe8805de86874b1a1b1429a002e8670043e
Comment 1 [account disabled by administrator] 2016-08-06 04:58:52 UTC
Did you bisect to that commit or are you just assuming that it's a bad commit. If this was a guess please actually bisect to the commit causing your regression.
Comment 2 Dmitry 2016-08-08 09:20:33 UTC
Created attachment 227931 [details]
bisect.log

Just finished bisecting:
# first bad commit: [1cd04d293c818687795b83cd8f2626bd4662feeb] Merge tag 'gpio-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

I'll try now master again and then revert this merge commit.
Comment 3 Dmitry 2016-08-08 12:22:50 UTC
Yes, I can boot after reverting gpio branch merge. So, it's definitely a regression, but different kernel subsystem.
I don't know why IRQ numbers are involved.. Can't even get to dracut shell after this error.
Comment 4 [account disabled by administrator] 2016-08-08 14:56:24 UTC
What board and driver are you using for gpio?
Comment 5 Dmitry 2016-08-08 15:05:37 UTC
cat /sys/class/gpio/*/device/modalias 
acpi:INT33FD:INT33FD:
acpi:INT33FC:INT33FC:
acpi:INT33FC:INT33FC:
acpi:INT33FC:INT33FC:

The first one is Intel's Atom crystal cove.
Comment 6 [account disabled by administrator] 2016-08-08 16:29:02 UTC
See if this patch helps fix your issue.
Comment 7 [account disabled by administrator] 2016-08-08 16:29:13 UTC
Created attachment 227961 [details]
Test Patch
Comment 8 Dmitry 2016-08-08 16:59:48 UTC
Thanks for trying but no luck. The same error. I'll try to get useful backtrace or whatever.
Comment 9 [account disabled by administrator] 2016-08-08 17:25:10 UTC
Ok but also can you send me a lsmod output of the drivers your running.
Comment 10 Dmitry 2016-08-08 18:28:04 UTC
Created attachment 227971 [details]
kernel's .config

It's useless as I have self-compiled vanilla kernel with a couple of patches for wifi and soc buttons..
Comment 11 [account disabled by administrator] 2016-08-08 18:31:12 UTC
I can no longer debug this as you have proprietary modules. Sorry please close this bug as invalid unless you are valid to run without those modules.
Comment 12 Dmitry 2016-08-08 18:31:50 UTC
Created attachment 227981 [details]
sysrq cpus backtraces

This could be useful. That's all I got from sysrq...
Comment 13 Dmitry 2016-08-08 18:33:02 UTC
I have only virtualbox modules. Nothing else.. Also some experimental i915 parameters.
Comment 14 Dmitry 2016-08-08 18:35:01 UTC
Created attachment 227991 [details]
git diff in linux tree
Comment 15 Dmitry 2016-08-08 18:37:11 UTC
I can test without patches or external modules if it will help..
Comment 16 [account disabled by administrator] 2016-08-08 19:16:08 UTC
Can you that would be very helpful.
Comment 17 Dmitry 2016-08-11 14:44:01 UTC
Sorry for long pause. Rebuilt kernel with all modules in initramfs and blacklisted vbox modules. It seem now all works. Modprobe vbox modules and still all OK.
Found problem: if I choose rtc_cmos to builtin in kernel, then kernel won't boot and show this error. Also rtc_cmos and mmc0 (rootfs) share one IRQ.
   8: 2326 0 0 0 IO-APIC 47-fasteoi mmc0, rtc0

So, there is an issue, but now I build kernel with initramfs with rtc_cmos and mmc drivers as modules and it works fine.
Comment 18 Dmitry 2016-08-11 14:44:31 UTC
Created attachment 228341 [details]
/proc/interrupts
Comment 19 Dmitry 2016-08-11 14:44:56 UTC
Created attachment 228351 [details]
lsmod
Comment 20 Dmitry 2016-08-11 14:45:54 UTC
Created attachment 228361 [details]
dmesg
Comment 21 [account disabled by administrator] 2016-08-12 03:39:37 UTC
Created attachment 228481 [details]
Merrfield Verfity Patch
Comment 22 [account disabled by administrator] 2016-08-12 03:41:21 UTC
I want to see if your indeed using the new Merrfield driver to probe your device under the configuration that is currently causing this regression as the driver was introduced this merge window for your devices similar to your crystal dove board this merge window.
Comment 23 Dmitry 2016-08-13 08:24:19 UTC
I don't have X86_INTEL_MID enabled, so no X86_INTEL_MID and merrifield in .config .
In dmesg no mrfld.
Comment 24 Dmitry 2016-08-13 08:27:05 UTC
Created attachment 228511 [details]
dmesg with new printk for mrfld gpio driver
Comment 25 [account disabled by administrator] 2016-08-13 17:44:05 UTC
See if the below helps or triggers any new warnings.
Comment 26 [account disabled by administrator] 2016-08-13 17:44:26 UTC
Created attachment 228581 [details]
Gpio Test Patch
Comment 27 Dmitry 2016-08-16 07:32:04 UTC
Created attachment 229001 [details]
dracut log

Nothing showed, but I've managed to get backtrace from this error.
So kernel is fully functional after this error, just no root(on mmc0). Systemd hangs after not being able to mount root.
Comment 28 [account disabled by administrator] 2016-08-16 17:44:02 UTC
Ok run the patch below and send me your dracut log. This patch is not meant to fix the issue but verify where the IRQS are being shared incorrectly by the GPIO core subystem changes.
Comment 29 [account disabled by administrator] 2016-08-16 17:44:29 UTC
Created attachment 229141 [details]
Verify Code Reach
Comment 30 Dmitry 2016-08-19 10:44:35 UTC
Created attachment 229371 [details]
with patch

Nothing showed up.
Comment 31 Dmitry 2016-08-19 10:46:52 UTC
Created attachment 229381 [details]
/proc/interrupts

Also I've migrated from ext4 to f2fs as my root and I see the same error in 4.7.1. I don't know what have changed but it's very strange. Interrupts is different for 4.7.1 compared to 4.8.
Comment 32 Dmitry 2016-08-19 10:47:44 UTC
Created attachment 229391 [details]
dmesg for 4.7.1 with genirq error
Comment 33 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-08-28 06:48:23 UTC
Dmitry, can you please confirm if this is a regression in 4.8? Did you find the exact commit (e.g. not a merge commit) with the change that introduced the regression? Is it still present in 4.8?
Comment 34 Dmitry 2016-08-28 11:55:00 UTC
Yes, it's a regression. Introduced in gpio-v4.8-1 merge. I'm bisecting now in this merge. As soon as I get a result I'll post it. And yes, it's still present. Workaround is to build rtc_cmos as a module.
Comment 35 Dmitry 2016-08-29 07:36:21 UTC
Created attachment 230991 [details]
bisect log

I haven't succeded in bisecting. 
Linux 4.8-rc4 give me the same error.
Also I found another error and a clue. I have two mmc buses: 1 - on motherboard with soldered memory chip and wifi; 2 - with external slot. First one is working but after huge pause and the second one is not working at all. The error showed in function: sdhci_acpi_emmc_probe_slot+0x70/0x70.
Comment 36 Dmitry 2016-08-29 07:37:08 UTC
Created attachment 231001 [details]
dmesg with rd.break=mount
Comment 37 Dmitry 2016-08-29 07:37:34 UTC
Created attachment 231011 [details]
dracut logs
Comment 38 Dmitry 2016-08-29 07:38:21 UTC
Created attachment 231021 [details]
sdhci_acpi_emmc_probe_slot error
Comment 39 Dmitry 2016-08-29 07:40:05 UTC
Comment on attachment 231021 [details]
sdhci_acpi_emmc_probe_slot error

Oh, it's already in dmesg.
Comment 40 jbMacAZ 2016-09-04 20:49:48 UTC
(In reply to Dmitry from comment #2)
> Created attachment 227931 [details]
> bisect.log
> 
> Just finished bisecting:
> # first bad commit: [1cd04d293c818687795b83cd8f2626bd4662feeb] Merge tag
> 'gpio-v4.8-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> 
> I'll try now master again and then revert this merge commit.

I can confirm bisect using only kernel.org source.  My problem is accessing internal SDHC card.  My system uses Z3775 processor (ASUS T100CHI).  My /home is on the SD card and it hangs booting 4.8-rc1 thru rc4.  If I move my SD card to an external USB SD card reader boot is successful.  [I'll try making rtc-cmos, mmc as modules - just need a workaround beside external SD reader]

This dmesg fragment (from a T100TA) is present when bug is active

genirq: Flags mismatch irq8. 00000088 (mmc0) vs. 00000080 (rtc0)
mmc0: Failed to request IRQ 8: -16
sdhci-acpi: probe of 80860F14:01 failed with error -16

Problem also affect Toshiba CLick Mini (Z3735F) - external card reader also works

Looks like this affects multiple baytrail processors and exact symptoms are hardware implementation dependent.
Comment 41 jbMacAZ 2016-09-04 21:50:10 UTC
I can also confirm the module workaround.

I had to hunt a bit to find the settings  

DeviceDrivers->RealTimeClock->PC-style 'CMOS' <M>
DeviceDrivers->MMC/SD/SDIO card support-> MMC block device driver <M>

My dmesg fragment from T100CHI (Z3775)

[    3.246833] sdhci: Copyright(c) Pierre Ossman
[    3.260548] hidraw: raw HID events driver (C) Jiri Kosina
[    3.266972] genirq: Flags mismatch irq 8. 00000088 (mmc0) vs. 00000080 (rtc0)
[    3.267073] mmc0: Failed to request IRQ 8: -16
[    3.267153] sdhci-acpi: probe of 80860F14:01 failed with error -16
[    3.271907] mmc0: SDHCI controller on ACPI [INT33BB:00] using ADMA

I can test patches and run diagnostics, if that would help.  I don't have Dmity's grasp of the hardware or his debuggin skills and I'm a rookie at git & bisecting.
Comment 42 jbMacAZ 2016-09-09 01:16:01 UTC
Dmitry, I own you an apology.  My comments were meant to support your efforts. Instead, they could be seen as offensive.  That was not my intent.  Please accept my apology.  I am grateful for your work to isolate this problem and especially grateful for the workaround.  Thank you.
Comment 43 Dmitry 2016-09-09 07:23:01 UTC
jbMacAZ, never mind, it's all right.

Tried with latest rc5 and bug is still there.
There is another note: right after genirq error message I can reboot by ctrl-alt-del and switch between splash and console, so the keyboard is working. But after some time(couple of seconds) I can not do this any more. Only sysrq's reboot is working. Other functions of sysrq are not usable. So kernel is frozen at this time or not answering on other IRQ requests.

P.S. I forgot that this could be connected with bug 109051, because I set max_cstate in systemd service after initram switches root. I'll check this.
Comment 44 Dmitry 2016-09-09 08:21:54 UTC
Adding intel_idle.max_cstate=1 in cmdline prevents freezing but not this bug. 
3 of 4 cpus are in idle state and last one is in handling irq state. Last functions before irq from keyboard are: 
handle_irq_event -> handle_irq_event_percpu -> add_interrupt_randomness -> __handle_irq_event__precpu -> credit_entropy_bits
Then goes xhci_irq from sysrq keys.
Call credit_entropy_bits in backtrace sometimes disappears. So cpu is stuck in __handle_irq_event__precpu.
Comment 45 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-09-11 11:07:43 UTC
FWIW, there were three commits on Friday that might be related (but I'm not sure, I have no idea if those chips are in your system):
https://git.kernel.org/torvalds/c/c6c864993d9a20f8d7cacb4feaac5c46a2f2e4db
https://git.kernel.org/torvalds/c/56beac95cb88c188d2a885825a5da131edb41fe3
https://git.kernel.org/torvalds/c/60f749f8e4cfdfffa5f29c966050ed680eeedac2
Comment 46 jbMacAZ 2016-10-25 05:36:36 UTC
Problem still affects 4.8.4 and 4.9-rc2.  A newly discovered symptom is broken backlight control, which the module work around fixes.
Comment 47 Sergei Trusov 2016-12-02 13:52:24 UTC
This patch helped to solve a similar flag mismatch
https://patchwork.kernel.org/patch/6118791/
Comment 48 jbMacAZ 2016-12-04 06:27:29 UTC
(In reply to Sergei Trusov from comment #47)
> This patch helped to solve a similar flag mismatch
> https://patchwork.kernel.org/patch/6118791/

Thanks.  That old patch does seem to fix the SD card/ IRQ8 allocation for baytrail Asus T100CHI. Tested in kernel 4.8.12.  Without the patch, if the PC style cmos RTC module is built-in, booting hangs (/home is on SD card).
Comment 49 jbMacAZ 2016-12-09 07:13:55 UTC
(In reply to jbMacAZ from comment #48)
> (In reply to Sergei Trusov from comment #47)
> > This patch helped to solve a similar flag mismatch
> > https://patchwork.kernel.org/patch/6118791/
> 
> Thanks.  That old patch does seem to fix the SD card/ IRQ8 allocation for
> baytrail Asus T100CHI. Tested in kernel 4.8.12.  Without the patch, if the
> PC style cmos RTC module is built-in, booting hangs (/home is on SD card).

The patch no longer works as of 4.8.13.  The .config work around is still effective (make cmos RTC a module rather than built-in).
Comment 50 jbMacAZ 2017-03-21 07:57:28 UTC
Apparently this (IRQ 8 hardcoded for RTC) is an old problem from about 2 years ago, only to re-appear in 4.8.  I retested the patch (comment #47) with 4.10.4 and it is effective.  I don't know why it failed in 4.8.13.

Anyone understand why this patch was reverted for 4.8.0-rc1?
Comment 51 Hans de Goede 2017-04-10 16:23:47 UTC
This should be fixed by this commit:

https://github.com/jwrdegoede/linux-sunxi/commit/556673a6b122b16e57b4fcbb607e120de3f71f90

Which should land upstream in 4.12-rc1.
Comment 52 jbMacAZ 2017-04-10 19:36:57 UTC
-(In reply to Hans de Goede from comment #51)
> This should be fixed by this commit:
> 
> https://github.com/jwrdegoede/linux-sunxi/commit/
> 556673a6b122b16e57b4fcbb607e120de3f71f90
> 
> Which should land upstream in 4.12-rc1.

I've applied the patch to 4.11-rc6 replacing the patch from #comment 47.  I also restored the two .config params to built-in (#comment 41)

I am able to boot and run (Mint 18.1, T100CHI baytrail Z3775)  Booting means that the kernel is able to read /home from the SD card (my main symptom - no SD card reader). SDIO wifi seems to be throwing fewer errors with this change, but... 

Thanks!