Bug 79301 - irq 17: nobody cared on MacbookPro 8,2 (2011), 9,2 (2012)
Summary: irq 17: nobody cared on MacbookPro 8,2 (2011), 9,2 (2012)
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: MMC/SD (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: other_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-30 22:49 UTC by Chris Murphy
Modified: 2016-11-14 10:04 UTC (History)
12 users (show)

See Also:
Kernel Version: 4.0.0-0.rc4.git2.1.fc22.x86_64
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (137.41 KB, text/plain)
2014-06-30 22:52 UTC, Chris Murphy
Details
acpidump (185.57 KB, text/plain)
2014-06-30 22:52 UTC, Chris Murphy
Details
lspci (16.65 KB, text/plain)
2014-06-30 22:52 UTC, Chris Murphy
Details
dmesg with pci=biosirq (136.01 KB, text/plain)
2014-06-30 23:02 UTC, Chris Murphy
Details
photo trace with acpi=noirq (411.12 KB, image/jpeg)
2014-07-02 01:14 UTC, Chris Murphy
Details
dmesg mbp92 (168.06 KB, text/plain)
2014-07-02 01:16 UTC, Chris Murphy
Details
journal mbp92 (283.04 KB, text/plain)
2014-07-02 01:17 UTC, Chris Murphy
Details
acpidump mbp92 (197.71 KB, text/plain)
2014-07-02 01:17 UTC, Chris Murphy
Details
dmidecode mbp92 (19.74 KB, text/plain)
2014-07-02 01:17 UTC, Chris Murphy
Details
[PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm (4.09 KB, patch)
2016-03-29 11:02 UTC, Lukas Wunner
Details | Diff
[RFC] x86/efi: Reset network interfaces on Apple Macs (6.30 KB, patch)
2016-05-13 18:40 UTC, Lukas Wunner
Details | Diff
[RFC] x86: Add early quirk to reset Apple AirPort card (8.61 KB, patch)
2016-05-13 18:42 UTC, Lukas Wunner
Details | Diff
[PATCH] bcma: Print addr and wrap on core scan (1.05 KB, patch)
2016-05-18 21:18 UTC, Lukas Wunner
Details | Diff
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot] (12.20 KB, patch)
2016-05-23 15:59 UTC, Lukas Wunner
Details | Diff
dmesg after appying patch uploaded @ 2016-05-23 15:59 UTC (66.47 KB, text/x-log)
2016-05-26 17:40 UTC, Bryan Paradis
Details
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure according to Michael Büsch] (13.17 KB, patch)
2016-05-26 22:51 UTC, Lukas Wunner
Details | Diff
dmesg + wl after appying patch uploaded @ 2016-05-23 15:59 UTC (65.68 KB, application/octet-stream)
2016-05-26 22:54 UTC, Bryan Paradis
Details

Description Chris Murphy 2014-06-30 22:49:57 UTC
Message appears most every boot. With irqpoll set, the trackpad isn't responsive and sometimes the keyboard isn't either although those could be related to other rawness in Rawhide.

This is not a new problem, it's been present since at least kernel 3.6.11 on this same hardware model.
https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
Comment 1 Chris Murphy 2014-06-30 22:52:11 UTC
Created attachment 141531 [details]
dmesg
Comment 2 Chris Murphy 2014-06-30 22:52:30 UTC
Created attachment 141541 [details]
acpidump
Comment 3 Chris Murphy 2014-06-30 22:52:50 UTC
Created attachment 141551 [details]
lspci
Comment 4 Chris Murphy 2014-06-30 23:02:25 UTC
Created attachment 141561 [details]
dmesg with pci=biosirq

No apparent change with this parameter.
Comment 5 Chris Murphy 2014-07-02 01:14:07 UTC
Created attachment 141831 [details]
photo trace with acpi=noirq

With acpi=noirq I get something akin to a panic. Still model MacbookPro 8,2.
Comment 6 Chris Murphy 2014-07-02 01:16:20 UTC
On the MacbookPro 8,2, the message doesn't happen (or happen as often) if I use b43-fwcutter to copy firmware to /lib/firmware. The message still occurs on a 9,2 model even with firmware in /lib/firmware.
Comment 7 Chris Murphy 2014-07-02 01:16:58 UTC
Created attachment 141841 [details]
dmesg mbp92
Comment 8 Chris Murphy 2014-07-02 01:17:20 UTC
Created attachment 141851 [details]
journal mbp92
Comment 9 Chris Murphy 2014-07-02 01:17:36 UTC
Created attachment 141861 [details]
acpidump mbp92
Comment 10 Chris Murphy 2014-07-02 01:17:58 UTC
Created attachment 141871 [details]
dmidecode mbp92
Comment 11 Daniel Svensson 2014-09-25 09:18:32 UTC
Same problem with MacBookPro 10,1 running 3.17-rc5.
Comment 12 me 2014-10-10 16:02:18 UTC
Same problem with MacBookPro 10,1 running 3.17.0-300.fc21.x86_64.


[    9.704961] irq 17: nobody cared (try booting with the "irqpoll" option)
[    9.704967] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           OE  3.17.0-300.fc21.x86_64 #1
[    9.704969] Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS MBP101.88Z.00EE.B03.1212211437 12/21/2012
[    9.704970]  0000000000000000 c7c304c7564d0cfc ffff88046f203e48 ffffffff8173c311
[    9.704973]  ffff8804575ab800 ffff88046f203e70 ffffffff810edd82 ffff8804575ab800
[    9.704975]  0000000000000000 0000000000000011 ffff88046f203ea8 ffffffff810ee127
[    9.704978] Call Trace:
[    9.704980]  <IRQ>  [<ffffffff8173c311>] dump_stack+0x45/0x56
[    9.704989]  [<ffffffff810edd82>] __report_bad_irq+0x32/0xd0
[    9.704991]  [<ffffffff810ee127>] note_interrupt+0x247/0x290
[    9.704996]  [<ffffffff810eb653>] handle_irq_event_percpu+0x133/0x1a0
[    9.704999]  [<ffffffff810eb6f7>] handle_irq_event+0x37/0x60
[    9.705001]  [<ffffffff810ee6b8>] handle_fasteoi_irq+0x78/0x150
[    9.705005]  [<ffffffff810163a4>] handle_irq+0x84/0x150
[    9.705009]  [<ffffffff8109a6c2>] ? _local_bh_enable+0x22/0x50
[    9.705012]  [<ffffffff8174616d>] do_IRQ+0x4d/0xe0
[    9.705016]  [<ffffffff81743f6d>] common_interrupt+0x6d/0x6d
[    9.705017]  <EOI>  [<ffffffff815d21f6>] ? cpuidle_enter_state+0x66/0x160
[    9.705022]  [<ffffffff815d21e1>] ? cpuidle_enter_state+0x51/0x160
[    9.705024]  [<ffffffff815d23d7>] cpuidle_enter+0x17/0x20
[    9.705027]  [<ffffffff810d7377>] cpu_startup_entry+0x397/0x3d0
[    9.705030]  [<ffffffff81731cb7>] rest_init+0x77/0x80
[    9.705034]  [<ffffffff81d4bffc>] start_kernel+0x486/0x4a7
[    9.705037]  [<ffffffff81d4b120>] ? early_idt_handlers+0x120/0x120
[    9.705039]  [<ffffffff81d4b4d7>] x86_64_start_reservations+0x2a/0x2c
[    9.705042]  [<ffffffff81d4b62b>] x86_64_start_kernel+0x152/0x175
[    9.705043] handlers:
[    9.705053] [<ffffffffa0266830>] sdhci_irq [sdhci] threaded [<ffffffffa0262fa0>] sdhci_thread_irq [sdhci]
[    9.705058] [<ffffffffa02d3890>] azx_interrupt [snd_hda_controller]
[    9.705060] Disabling IRQ #17


$ cat /proc/interrupts  |grep '^\s*17'
 17:     189873          0          0          0          0          0          0          0   IO-APIC  17-fasteoi   mmc0, snd_hda_intel, wlp4s0
Comment 13 Sumit Rai 2014-12-02 23:01:08 UTC
I am also experiencing the same problem with MacBook Pro Early 2011 Model.
Please take a look at bug https://bugzilla.redhat.com/show_bug.cgi?id=1149632
Thanks.
Comment 14 Sumit Rai 2014-12-02 23:12:23 UTC
Looks like irq 17 is attached to mmc card controller.

# lspci -vnn
02:00.1 SD Host controller [0805]: Broadcom Corporation BCM57765/57785 SDXC/MMC Card Reader [14e4:16bc] (rev 10) (prog-if 01)
        Subsystem: Broadcom Corporation Device [14e4:0000]
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at a0420000 (64-bit, prefetchable) [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: sdhci-pci
        Kernel modules: sdhci_pci

[srai@localhost ~]$ cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:         47          0          0          0   IO-APIC-edge      timer
  8:          1          0          0          0   IO-APIC-edge      rtc0
  9:     124126          0          0          0   IO-APIC-fasteoi   acpi
 17:     100001          0          0          0   IO-APIC  17-fasteoi   mmc0
 19:          0          0          0          0   IO-APIC  19-fasteoi   uhci_hcd:usb4
 21:          0          0          0          0   IO-APIC  21-fasteoi   uhci_hcd:usb3
 22:    1551032          0          0          0   IO-APIC  22-fasteoi   ehci_hcd:usb2
 23:     138892          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb1
 24:          0          0          0          0   PCI-MSI-edge      PCIe PME
 25:          0          0          0          0   PCI-MSI-edge      PCIe PME
 26:          0          0          0          0   PCI-MSI-edge      PCIe PME
 27:          0          0          0          0   PCI-MSI-edge      PCIe PME
 28:          0          0          0          0   PCI-MSI-edge      PCIe PME
 29:     115730          0          0          0   PCI-MSI-edge      ahci
 30:          4          0          0          0   PCI-MSI-edge      firewire_ohci
 31:     257697          0          0          0   PCI-MSI-edge      i915
 32:          8          0          0          0   PCI-MSI-edge      mei_me
 33:        393          0          0          0   PCI-MSI-edge      snd_hda_intel
 34:      35259          0          0          0   PCI-MSI-edge      enp2s0f0-tx-0
 35:      40231          0          0          0   PCI-MSI-edge      enp2s0f0-rx-1
 36:      37855          0          0          0   PCI-MSI-edge      enp2s0f0-rx-2
 37:      21092          0          0          0   PCI-MSI-edge      enp2s0f0-rx-3
 38:      22128          0          0          0   PCI-MSI-edge      enp2s0f0-rx-4
NMI:          0          0          0          0   Non-maskable interrupts
LOC:    6055379    4193295    4936419    3467806   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
IWI:          2          0          0          0   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:      47853      37958      31064      29007   Rescheduling interrupts
CAL:       4140       6477       3234       5624   Function call interrupts
TLB:       9585       8667      12741      13045   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:         85         84         84         84   Machine check polls
THR:          0          0          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0



Truncated backtrace:
irq 17: nobody cared (try booting with the "irqpoll" option)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.1-302.fc21.x86_64 #1
Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS    MBP81.88Z.0047.B27.1201241646 01/24/12
 0000000000000000 d0bdbd5a6a6ab5ee ffff8802efa03e48 ffffffff8173dbb1
 ffff8802dcbed000 ffff8802efa03e70 ffffffff810edd82 ffff8802dcbed000
 0000000000000000 0000000000000011 ffff8802efa03ea8 ffffffff810ee127
Call Trace:
 <IRQ>  [<ffffffff8173dbb1>] dump_stack+0x45/0x56
 [<ffffffff810edd82>] __report_bad_irq+0x32/0xd0
 [<ffffffff810ee127>] note_interrupt+0x247/0x290
 [<ffffffff810eb653>] handle_irq_event_percpu+0x133/0x1a0
 [<ffffffff810eb6f7>] handle_irq_event+0x37/0x60
 [<ffffffff810ee6b8>] handle_fasteoi_irq+0x78/0x150
 [<ffffffff810163a4>] handle_irq+0x84/0x150
 [<ffffffff810b610a>] ? atomic_notifier_call_chain+0x1a/0x20
 [<ffffffff81747a2d>] do_IRQ+0x4d/0xe0
 [<ffffffff8174582d>] common_interrupt+0x6d/0x6d
 <EOI>  [<ffffffff815d24d3>] ? cpuidle_enter_state+0x63/0x160
 [<ffffffff815d24c1>] ? cpuidle_enter_state+0x51/0x160
 [<ffffffff815d26b7>] cpuidle_enter+0x17/0x20
 [<ffffffff810d7377>] cpu_startup_entry+0x397/0x3d0
 [<ffffffff81733557>] rest_init+0x77/0x80
 [<ffffffff81d4bffc>] start_kernel+0x486/0x4a7
 [<ffffffff81d4b120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff81d4b4d7>] x86_64_start_reservations+0x2a/0x2c
 [<ffffffff81d4b62b>] x86_64_start_kernel+0x152/0x175
Comment 15 Sumit Rai 2014-12-02 23:44:03 UTC
Please also take a look at duplicates:
https://bugzilla.redhat.com/show_bug.cgi?id=1149632
https://bugzilla.redhat.com/show_bug.cgi?id=1009819
Comment 16 Dan Ziemba 2014-12-26 23:28:03 UTC
Same problem here on my macbook pro 8,1.  Currently with Fedora 21, kernel 3.17.7-300, but has existed for at least a couple years with previous fedora versions.  I have the b43 module blacklisted because I use an external usb wifi instead (Atheros AR7010+AR9280), so maybe this can rule out the wifi being related?  Here's part of the log from my last boot:

Dec 26 14:10:48 fedoramac kernel: sdhci-pci 0000:02:00.1: SDHCI controller found [14e4:16bc] (rev 10)
Dec 26 14:10:48 fedoramac kernel: sdhci-pci 0000:02:00.1: No vmmc regulator found
Dec 26 14:10:48 fedoramac kernel: sdhci-pci 0000:02:00.1: No vqmmc regulator found
Dec 26 14:10:48 fedoramac kernel: mmc0: SDHCI controller on PCI [0000:02:00.1] using ADMA
Dec 26 14:10:48 fedoramac kernel: Bluetooth: Core ver 2.19
Dec 26 14:10:48 fedoramac kernel: NET: Registered protocol family 31
Dec 26 14:10:48 fedoramac kernel: Bluetooth: HCI device and connection manager initialized
Dec 26 14:10:48 fedoramac kernel: Bluetooth: HCI socket layer initialized
Dec 26 14:10:48 fedoramac kernel: Bluetooth: L2CAP socket layer initialized
Dec 26 14:10:48 fedoramac kernel: Bluetooth: SCO socket layer initialized
Dec 26 14:10:48 fedoramac kernel: usbcore: registered new interface driver btusb
Dec 26 14:10:48 fedoramac kernel: usb 1-1.1.1: USB disconnect, device number 7
Dec 26 14:10:48 fedoramac systemd-udevd[9055]: error opening USB device 'descriptors' file
Dec 26 14:10:48 fedoramac kernel: usb 1-1.1.2: USB disconnect, device number 8
Dec 26 14:10:48 fedoramac kernel: irq 17: nobody cared (try booting with the "irqpoll" option)
Dec 26 14:10:48 fedoramac kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.7-300.fc21.x86_64 #1
Dec 26 14:10:48 fedoramac kernel: Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS    MBP81.88Z.0047.B27.1201241646 01/24/12
Dec 26 14:10:48 fedoramac kernel:  0000000000000000 0f99e105c1f28136 ffff88046fa03e48 ffffffff817401ea
Dec 26 14:10:48 fedoramac kernel:  ffff8804560f8700 ffff88046fa03e70 ffffffff810ee0d2 ffff8804560f8700
Dec 26 14:10:48 fedoramac kernel:  0000000000000000 0000000000000011 ffff88046fa03ea8 ffffffff810ee477
Dec 26 14:10:48 fedoramac kernel: Call Trace:
Dec 26 14:10:48 fedoramac kernel:  <IRQ>  [<ffffffff817401ea>] dump_stack+0x45/0x56
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810ee0d2>] __report_bad_irq+0x32/0xd0
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810ee477>] note_interrupt+0x247/0x290
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810eb9a3>] handle_irq_event_percpu+0x133/0x1a0
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810eba47>] handle_irq_event+0x37/0x60
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810eea08>] handle_fasteoi_irq+0x78/0x150
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810164e4>] handle_irq+0x84/0x150
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810b633a>] ? atomic_notifier_call_chain+0x1a/0x20
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff8174a06d>] do_IRQ+0x4d/0xe0
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff81747ead>] common_interrupt+0x6d/0x6d
Dec 26 14:10:48 fedoramac kernel:  <EOI>  [<ffffffff815d4553>] ? cpuidle_enter_state+0x63/0x160
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff815d4541>] ? cpuidle_enter_state+0x51/0x160
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff815d4737>] cpuidle_enter+0x17/0x20
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff810d75c7>] cpu_startup_entry+0x397/0x3d0
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff81735aa7>] rest_init+0x77/0x80
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff81d49004>] start_kernel+0x48e/0x4af
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff81d48120>] ? early_idt_handlers+0x120/0x120
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff81d484d7>] x86_64_start_reservations+0x2a/0x2c
Dec 26 14:10:48 fedoramac kernel:  [<ffffffff81d4862b>] x86_64_start_kernel+0x152/0x175
Dec 26 14:10:48 fedoramac kernel: handlers:
Dec 26 14:10:48 fedoramac kernel: [<ffffffffa0589830>] sdhci_irq [sdhci] threaded [<ffffffffa0585fa0>] sdhci_thread_irq [sdhci]
Dec 26 14:10:48 fedoramac kernel: Disabling IRQ #17
Comment 17 Christian Stadelmann 2015-03-21 13:34:33 UTC
This bug does not only happen in the MMC/SD card stack. I am running into this bug on a Asus P7H55-M/USB3 board with just 2 USB devices (mouse and keyboard) and no card reader installed. Still libreport tells me it is the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1149632

In my case it is IRQ 16 (not IRQ 17) that's affected. On this IRQ 16 I have 2 devices according to `$ lspci -v`:

00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06) (prog-if 20 [EHCI])
        Subsystem: ASUSTeK Computer Inc. Device 8383
        Flags: bus master, medium devsel, latency 0, IRQ 16
        Memory at f7dfa000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: <access denied>
        Kernel driver in use: ehci-pci

00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
        I/O behind bridge: 00002000-00002fff
        Memory behind bridge: f0400000-f05fffff
        Prefetchable memory behind bridge: 00000000f0600000-00000000f07fffff
        Capabilities: <access denied>
        Kernel driver in use: pcieport
        Kernel modules: shpchp

Both devices are part of the Intel H55 chipset.
Comment 18 Chris Murphy 2015-03-23 14:15:21 UTC
Always happens with these kernels as well, regardless of whether the b43 firmware is in /lib/firmware.

3.19.2-200.fc21.x86_64
4.0.0-0.rc4.git2.1.fc22.x86_64
Comment 19 Chris Bainbridge 2015-03-29 11:53:44 UTC
It looks like this bug has been around for years. There is a report from 2011 from a user of linux kernel 3.1.1: "Unless the bcma and b43 modules are compiled and loaded, IRQ17 is shut early in the process." https://dentifrice.poivron.org/laptops/macbookpro8,2/

I have confirmed the bug is still present in linux 3.16.0-4 and latest linux 4.0.0-rc5+ 7fc377e on a 2012 Ivy Bridge Macbook Pro 13.

mmc0 and b43 share interrupt 17. The bug happens when the sdhci module is loaded before b43. If b43 is builtin or loaded in initramfs (/etc/initramfs-tools/modules, update initramfs) then this bug will not occur.

02:00.1 SD Host controller: Broadcom Corporation BCM57765/57785 SDXC/MMC Card Reader (rev 21)
03:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02)
Comment 20 Chris Bainbridge 2015-03-30 14:52:45 UTC
@Christian Stadelmann - Your bug looks like a different one. This bug report is for the boot error "irq 17: nobody cared" from Broadcom BCM57765 card reader (as used in Macbook Pro) as described at https://lkml.org/lkml/2013/7/12/99
Comment 21 Christian Stadelmann 2015-04-07 08:34:25 UTC
@Chris Bainbridge - I don't even have a broadcom card reader in this computer. It might be unrelated though.
Comment 22 Chris Bainbridge 2015-04-07 09:41:09 UTC
There is also bug #73241 - SDHCI PCI driver incompatible with 14e4:16bc / Broadcom BCM57765/57785 SDXC/MMC Card Reader
Comment 23 Scott Corcoran 2015-10-30 06:21:19 UTC
This "irq 17: nobody cared" boot crash was repeatable together with
a variety of other minor failures booting MacBookPro8,2/Mac
3.8Gb, Intel Core i7-2860QM @2.50GHz x 8

Interestingly, the co-installed RESCUE system
4.0.4-301.fc22.x86_64 #1 SMP came up and ran just fine.
Not sure what the differences between the systems are.
Comment 24 Lukas Wunner 2016-03-29 11:02:50 UTC
Created attachment 210981 [details]
[PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm

Here's a patch to fix this issue, please test if it works for you. It does for me.

This bugzilla entry is misclassified as an MMC/SD bug because the kernel error message makes it appear that sdhci_irq is the culprit. That's a red herring, the issue is caused by an IRQ storm originating from the wireless card.

What seems to be happening is that Apple's EFI bootloader enables the wireless card to use it for Internet Recovery and it leaves the card enabled when passing control to the OS. The card cries for attention by sending interrupts, interfering with other drivers using that same IRQ line. This does not stop until the wireless driver is loaded. The patch resets the card early on in the boot process to stop the interrupts.
Comment 25 hr 2016-05-11 21:29:52 UTC
(In reply to Lukas Wunner from comment #24)
> Created attachment 210981 [details]
> [PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm
> 
> Here's a patch to fix this issue, please test if it works for you. It does
> for me.
> 
> This bugzilla entry is misclassified as an MMC/SD bug because the kernel
> error message makes it appear that sdhci_irq is the culprit. That's a red
> herring, the issue is caused by an IRQ storm originating from the wireless
> card.
> 
> What seems to be happening is that Apple's EFI bootloader enables the
> wireless card to use it for Internet Recovery and it leaves the card enabled
> when passing control to the OS. The card cries for attention by sending
> interrupts, interfering with other drivers using that same IRQ line. This
> does not stop until the wireless driver is loaded. The patch resets the card
> early on in the boot process to stop the interrupts.

Trying your patch on the Debian and kernel 4.4.10 sources It did not help for me.

Messages continue in dmesg "irq 17: nobody cared" and irq storm is continuing.

Besides messages in dmesg, as before i get to become unstable wi-fi, every hour randomly disconnects occur.  I use last a proprietary broadcom driver 6.30.223.271 (Use b43 driver does not change anything) Also lost connection for a minute or a little less dmesg reports this "ERROR @wl_notify_scan_status: eth1 Scan_results error (-22)" (my wlan name eth1 :))

grep . -r /sys/firmware/acpi/interrupts/
/sys/firmware/acpi/interrupts/sci:    3730
/sys/firmware/acpi/interrupts/error:       0
/sys/firmware/acpi/interrupts/gpe00:       0   invalid
/sys/firmware/acpi/interrupts/gpe01:       0   invalid
/sys/firmware/acpi/interrupts/gpe02:       0   invalid
/sys/firmware/acpi/interrupts/gpe03:       0   invalid
/sys/firmware/acpi/interrupts/gpe04:       0   invalid
/sys/firmware/acpi/interrupts/gpe05:       0   invalid
/sys/firmware/acpi/interrupts/gpe06:       0   invalid
/sys/firmware/acpi/interrupts/gpe07:       0   enabled
/sys/firmware/acpi/interrupts/gpe08:       0   invalid
/sys/firmware/acpi/interrupts/gpe09:       0   disabled
/sys/firmware/acpi/interrupts/gpe10:       0   invalid
/sys/firmware/acpi/interrupts/gpe11:       0   enabled
/sys/firmware/acpi/interrupts/gpe12:       0   invalid
/sys/firmware/acpi/interrupts/gpe13:       0   enabled
/sys/firmware/acpi/interrupts/gpe14:       0   invalid
/sys/firmware/acpi/interrupts/gpe15:       0   enabled
/sys/firmware/acpi/interrupts/gpe16:       0   disabled
/sys/firmware/acpi/interrupts/gpe0A:       0   invalid
/sys/firmware/acpi/interrupts/gpe17:    3730   enabled
/sys/firmware/acpi/interrupts/gpe0B:       0   invalid
/sys/firmware/acpi/interrupts/gpe18:       0   invalid
/sys/firmware/acpi/interrupts/gpe0C:       0   invalid
/sys/firmware/acpi/interrupts/gpe19:       0   disabled
/sys/firmware/acpi/interrupts/gpe0D:       0   disabled
/sys/firmware/acpi/interrupts/gpe0E:       0   invalid
/sys/firmware/acpi/interrupts/gpe20:       0   invalid
/sys/firmware/acpi/interrupts/gpe0F:       0   invalid
/sys/firmware/acpi/interrupts/gpe21:       0   invalid
/sys/firmware/acpi/interrupts/gpe22:       0   invalid
/sys/firmware/acpi/interrupts/gpe23:       0   enabled
/sys/firmware/acpi/interrupts/gpe24:       0   invalid
/sys/firmware/acpi/interrupts/gpe25:       0   invalid
/sys/firmware/acpi/interrupts/gpe26:       0   invalid
/sys/firmware/acpi/interrupts/gpe1A:       0   invalid
/sys/firmware/acpi/interrupts/gpe27:       0   invalid
/sys/firmware/acpi/interrupts/gpe1B:       0   invalid
/sys/firmware/acpi/interrupts/gpe28:       0   invalid
/sys/firmware/acpi/interrupts/gpe1C:       0   invalid
/sys/firmware/acpi/interrupts/gpe29:       0   invalid
/sys/firmware/acpi/interrupts/gpe1D:       0   invalid
/sys/firmware/acpi/interrupts/gpe1E:       0   invalid
/sys/firmware/acpi/interrupts/gpe30:       0   invalid
/sys/firmware/acpi/interrupts/gpe1F:       0   invalid
/sys/firmware/acpi/interrupts/gpe31:       0   invalid
/sys/firmware/acpi/interrupts/gpe32:       0   invalid
/sys/firmware/acpi/interrupts/gpe33:       0   invalid
/sys/firmware/acpi/interrupts/gpe34:       0   invalid
/sys/firmware/acpi/interrupts/gpe35:       0   invalid
/sys/firmware/acpi/interrupts/gpe36:       0   invalid
/sys/firmware/acpi/interrupts/gpe2A:       0   invalid
/sys/firmware/acpi/interrupts/gpe37:       0   invalid
/sys/firmware/acpi/interrupts/gpe2B:       0   invalid
/sys/firmware/acpi/interrupts/gpe38:       0   invalid
/sys/firmware/acpi/interrupts/gpe2C:       0   invalid
/sys/firmware/acpi/interrupts/gpe39:       0   invalid
/sys/firmware/acpi/interrupts/gpe2D:       0   invalid
/sys/firmware/acpi/interrupts/gpe2E:       0   invalid
/sys/firmware/acpi/interrupts/gpe2F:       0   invalid
/sys/firmware/acpi/interrupts/gpe3A:       0   invalid
/sys/firmware/acpi/interrupts/gpe3B:       0   invalid
/sys/firmware/acpi/interrupts/gpe3C:       0   invalid
/sys/firmware/acpi/interrupts/gpe3D:       0   invalid
/sys/firmware/acpi/interrupts/gpe3E:       0   invalid
/sys/firmware/acpi/interrupts/gpe3F:       0   invalid
/sys/firmware/acpi/interrupts/sci_not:       5
/sys/firmware/acpi/interrupts/ff_pmtimer:       0   invalid
/sys/firmware/acpi/interrupts/ff_rt_clk:       0   disabled
/sys/firmware/acpi/interrupts/gpe_all:    3730
/sys/firmware/acpi/interrupts/ff_gbl_lock:       0   enabled
/sys/firmware/acpi/interrupts/ff_pwr_btn:       0   enabled
/sys/firmware/acpi/interrupts/ff_slp_btn:       0   invalid


cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       
  0:         17          0          0          0   IO-APIC   2-edge      timer
  8:          1          0          0          0   IO-APIC   8-edge      rtc0
  9:       4399          0          0          0   IO-APIC   9-fasteoi   acpi
 17:     105479          0          0          0   IO-APIC  17-fasteoi   mmc0, eth1
 19:          0          0          0          0   IO-APIC  19-fasteoi   uhci_hcd:usb4
 21:          0          0          0          0   IO-APIC  21-fasteoi   uhci_hcd:usb3
 22:         63          0          0          0   IO-APIC  22-fasteoi   ehci_hcd:usb2
 23:      15260          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb1
 28:          0          0          0          0   PCI-MSI 3194880-edge      pciehp
 29:          0          0          0          0   PCI-MSI 3211264-edge      pciehp
 30:          0          0          0          0   PCI-MSI 3227648-edge      pciehp
 31:          0          0          0          0   PCI-MSI 3244032-edge      pciehp
 32:          3          0          0          0   PCI-MSI 2097152-edge      firewire_ohci
 33:       9309          0          0          0   PCI-MSI 512000-edge      0000:00:1f.2
 34:        450          0          0          0   PCI-MSI 442368-edge      snd_hda_intel
 35:       6600          0          0          0   PCI-MSI 32768-edge      i915
 36:         71          0          0          0   PCI-MSI 1048576-edge      eth0-tx-0
 37:         38          0          0          0   PCI-MSI 1048577-edge      eth0-rx-1
 38:         34          0          0          0   PCI-MSI 1048578-edge      eth0-rx-2
 39:          1          0          0          0   PCI-MSI 1048579-edge      eth0-rx-3
 40:         15          0          0          0   PCI-MSI 1048580-edge      eth0-rx-4
NMI:          1          0          0          0   Non-maskable interrupts
LOC:      34113      14394      11025      10579   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          1          0          0          0   Performance monitoring interrupts
IWI:          4          0          0          0   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:        892        940        961        374   Rescheduling interrupts
CAL:       1045       1052       1099       1086   Function call interrupts
TLB:        297        380        184        200   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          3          3          3          3   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0   Posted-interrupt wakeup event


    2.570457] irq 17: nobody cared (try booting with the "irqpoll" option)
[    2.570480] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.10 #2
[    2.570481] Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS    MBP81.88Z.0047.B2A.1506082203 06/08/15
[    2.570482]  0000000000000000 ffffffff812cac6f ffff8800892e5200 ffff8800892e52d4
[    2.570484]  ffffffff810c1a70 ffff8800892e5200 0000000000000000 0000000000000000
[    2.570486]  ffffffff810c1ded 0000000000000000 0000000000000011 0000000000000000
[    2.570487] Call Trace:
[    2.570489]  <IRQ>  [<ffffffff812cac6f>] ? dump_stack+0x5c/0x7d
[    2.570497]  [<ffffffff810c1a70>] ? __report_bad_irq+0x30/0xc0
[    2.570499]  [<ffffffff810c1ded>] ? note_interrupt+0x22d/0x270
[    2.570501]  [<ffffffff810bf43d>] ? handle_irq_event_percpu+0x15d/0x1c0
[    2.570503]  [<ffffffff810bf4ca>] ? handle_irq_event+0x2a/0x50
[    2.570504]  [<ffffffff810c22fb>] ? handle_fasteoi_irq+0x8b/0x150
[    2.570506]  [<ffffffff81017e2c>] ? handle_irq+0x1c/0x30
[    2.570508]  [<ffffffff815514a6>] ? do_IRQ+0x46/0xd0
[    2.570510]  [<ffffffff8154f5c2>] ? common_interrupt+0x82/0x82
[    2.570511]  <EOI>  [<ffffffff814265b7>] ? poll_idle+0x57/0xa0
[    2.570516]  [<ffffffff814260c8>] ? cpuidle_enter_state+0xc8/0x260
[    2.570518]  [<ffffffff810ac4ee>] ? cpu_startup_entry+0x2ae/0x370
[    2.570519]  [<ffffffff81931f42>] ? start_kernel+0x472/0x47a
[    2.570521]  [<ffffffff81931120>] ? early_idt_handler_array+0x120/0x120
[    2.570522]  [<ffffffff81931600>] ? x86_64_start_kernel+0x145/0x154
[    2.570523] handlers:
[    2.570533] [<ffffffffa0070640>] sdhci_irq [sdhci] threaded [<ffffffffa006ec40>] sdhci_thread_irq [sdhci]
[    2.570565] Disabling IRQ #17
Comment 26 Lukas Wunner 2016-05-13 18:40:43 UTC
Created attachment 216231 [details]
[RFC] x86/efi: Reset network interfaces on Apple Macs
Comment 27 Lukas Wunner 2016-05-13 18:42:28 UTC
Created attachment 216241 [details]
[RFC] x86: Add early quirk to reset Apple AirPort card
Comment 28 Lukas Wunner 2016-05-13 19:04:10 UTC
(In reply to hr from comment #25)
> Trying your patch on the Debian and kernel 4.4.10 sources It did not help
> for me.
> 
> Messages continue in dmesg "irq 17: nobody cared" and irq storm is
> continuing.

Yes I've heard that the patch doesn't work for everybody. I don't know why, perhaps DisINTx is not set on the wireless card on some machines upon boot. To verify that's indeed the cause, boot with "modprobe.blacklist=b43 modprobe.blacklist=bcma modprobe.blacklist=wl", then check if "lspci -vv" reports "DisINTx-" for the wireless card.

I've attached two alternative patches. They cannot be submitted as is, they need more polish, but I can't decide which one is better. Please test both and let me know if either or both work for you. They're based on 4.4 but can be applied to 4.6 as well with some fuzz.

b43_eboot_4.4.patch resets all network interfaces before control is handed over from EFI to the kernel. This only works if you boot with the EFI stub. If you use gummiboot, the EFI stub is used by default. If you use grub I think you have to load the kernel with the "chainloader" directive, not the "linux" directive. You should briefly see a message "Welcome to Macintosh" on boot, plus the line "Resetting network interface" for every network card built into the machine.

b43_earlyquirk_4.4.patch resets the BCM 4331 card during kernel initialization. In dmesg there should be a message "Resetting Apple AirPort card". If MMIO was not already enabled, you'll see an additional message "Enabling mmio on Apple AirPort card". Let me know if you see that additional message as I'm not sure if I should drop this, maybe MMIO is always enabled and this is not needed.

I'm not sure if this is a bug in the EFI driver for the BCM 4331 or if this is actually a feature. Perhaps OS X supports some kind of connection handover from EFI. If this is true, it's not sufficient to just quirk the BCM 4331, we'd need this for all other cards used by Apple. E.g. models introduced 2013+ use BCM 4360. The eboot patch works for all cards whereas the earlyquirk patch re


> 
> Besides messages in dmesg, as before i get to become unstable wi-fi, every
> hour randomly disconnects occur.  I use last a proprietary broadcom driver
> 6.30.223.271 (Use b43 driver does not change anything) Also lost connection
> for a minute or a little less dmesg reports this "ERROR
> @wl_notify_scan_status: eth1 Scan_results error (-22)" (my wlan name eth1 :))
> 
> grep . -r /sys/firmware/acpi/interrupts/
> /sys/firmware/acpi/interrupts/sci:    3730
> /sys/firmware/acpi/interrupts/error:       0
> /sys/firmware/acpi/interrupts/gpe00:       0   invalid
> /sys/firmware/acpi/interrupts/gpe01:       0   invalid
> /sys/firmware/acpi/interrupts/gpe02:       0   invalid
> /sys/firmware/acpi/interrupts/gpe03:       0   invalid
> /sys/firmware/acpi/interrupts/gpe04:       0   invalid
> /sys/firmware/acpi/interrupts/gpe05:       0   invalid
> /sys/firmware/acpi/interrupts/gpe06:       0   invalid
> /sys/firmware/acpi/interrupts/gpe07:       0   enabled
> /sys/firmware/acpi/interrupts/gpe08:       0   invalid
> /sys/firmware/acpi/interrupts/gpe09:       0   disabled
> /sys/firmware/acpi/interrupts/gpe10:       0   invalid
> /sys/firmware/acpi/interrupts/gpe11:       0   enabled
> /sys/firmware/acpi/interrupts/gpe12:       0   invalid
> /sys/firmware/acpi/interrupts/gpe13:       0   enabled
> /sys/firmware/acpi/interrupts/gpe14:       0   invalid
> /sys/firmware/acpi/interrupts/gpe15:       0   enabled
> /sys/firmware/acpi/interrupts/gpe16:       0   disabled
> /sys/firmware/acpi/interrupts/gpe0A:       0   invalid
> /sys/firmware/acpi/interrupts/gpe17:    3730   enabled
> /sys/firmware/acpi/interrupts/gpe0B:       0   invalid
> /sys/firmware/acpi/interrupts/gpe18:       0   invalid
> /sys/firmware/acpi/interrupts/gpe0C:       0   invalid
> /sys/firmware/acpi/interrupts/gpe19:       0   disabled
> /sys/firmware/acpi/interrupts/gpe0D:       0   disabled
> /sys/firmware/acpi/interrupts/gpe0E:       0   invalid
> /sys/firmware/acpi/interrupts/gpe20:       0   invalid
> /sys/firmware/acpi/interrupts/gpe0F:       0   invalid
> /sys/firmware/acpi/interrupts/gpe21:       0   invalid
> /sys/firmware/acpi/interrupts/gpe22:       0   invalid
> /sys/firmware/acpi/interrupts/gpe23:       0   enabled
> /sys/firmware/acpi/interrupts/gpe24:       0   invalid
> /sys/firmware/acpi/interrupts/gpe25:       0   invalid
> /sys/firmware/acpi/interrupts/gpe26:       0   invalid
> /sys/firmware/acpi/interrupts/gpe1A:       0   invalid
> /sys/firmware/acpi/interrupts/gpe27:       0   invalid
> /sys/firmware/acpi/interrupts/gpe1B:       0   invalid
> /sys/firmware/acpi/interrupts/gpe28:       0   invalid
> /sys/firmware/acpi/interrupts/gpe1C:       0   invalid
> /sys/firmware/acpi/interrupts/gpe29:       0   invalid
> /sys/firmware/acpi/interrupts/gpe1D:       0   invalid
> /sys/firmware/acpi/interrupts/gpe1E:       0   invalid
> /sys/firmware/acpi/interrupts/gpe30:       0   invalid
> /sys/firmware/acpi/interrupts/gpe1F:       0   invalid
> /sys/firmware/acpi/interrupts/gpe31:       0   invalid
> /sys/firmware/acpi/interrupts/gpe32:       0   invalid
> /sys/firmware/acpi/interrupts/gpe33:       0   invalid
> /sys/firmware/acpi/interrupts/gpe34:       0   invalid
> /sys/firmware/acpi/interrupts/gpe35:       0   invalid
> /sys/firmware/acpi/interrupts/gpe36:       0   invalid
> /sys/firmware/acpi/interrupts/gpe2A:       0   invalid
> /sys/firmware/acpi/interrupts/gpe37:       0   invalid
> /sys/firmware/acpi/interrupts/gpe2B:       0   invalid
> /sys/firmware/acpi/interrupts/gpe38:       0   invalid
> /sys/firmware/acpi/interrupts/gpe2C:       0   invalid
> /sys/firmware/acpi/interrupts/gpe39:       0   invalid
> /sys/firmware/acpi/interrupts/gpe2D:       0   invalid
> /sys/firmware/acpi/interrupts/gpe2E:       0   invalid
> /sys/firmware/acpi/interrupts/gpe2F:       0   invalid
> /sys/firmware/acpi/interrupts/gpe3A:       0   invalid
> /sys/firmware/acpi/interrupts/gpe3B:       0   invalid
> /sys/firmware/acpi/interrupts/gpe3C:       0   invalid
> /sys/firmware/acpi/interrupts/gpe3D:       0   invalid
> /sys/firmware/acpi/interrupts/gpe3E:       0   invalid
> /sys/firmware/acpi/interrupts/gpe3F:       0   invalid
> /sys/firmware/acpi/interrupts/sci_not:       5
> /sys/firmware/acpi/interrupts/ff_pmtimer:       0   invalid
> /sys/firmware/acpi/interrupts/ff_rt_clk:       0   disabled
> /sys/firmware/acpi/interrupts/gpe_all:    3730
> /sys/firmware/acpi/interrupts/ff_gbl_lock:       0   enabled
> /sys/firmware/acpi/interrupts/ff_pwr_btn:       0   enabled
> /sys/firmware/acpi/interrupts/ff_slp_btn:       0   invalid
> 
> 
> cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3       
>   0:         17          0          0          0   IO-APIC   2-edge     
> timer
>   8:          1          0          0          0   IO-APIC   8-edge      rtc0
>   9:       4399          0          0          0   IO-APIC   9-fasteoi   acpi
>  17:     105479          0          0          0   IO-APIC  17-fasteoi  
> mmc0, eth1
>  19:          0          0          0          0   IO-APIC  19-fasteoi  
> uhci_hcd:usb4
>  21:          0          0          0          0   IO-APIC  21-fasteoi  
> uhci_hcd:usb3
>  22:         63          0          0          0   IO-APIC  22-fasteoi  
> ehci_hcd:usb2
>  23:      15260          0          0          0   IO-APIC  23-fasteoi  
> ehci_hcd:usb1
>  28:          0          0          0          0   PCI-MSI 3194880-edge     
> pciehp
>  29:          0          0          0          0   PCI-MSI 3211264-edge     
> pciehp
>  30:          0          0          0          0   PCI-MSI 3227648-edge     
> pciehp
>  31:          0          0          0          0   PCI-MSI 3244032-edge     
> pciehp
>  32:          3          0          0          0   PCI-MSI 2097152-edge     
> firewire_ohci
>  33:       9309          0          0          0   PCI-MSI 512000-edge     
> 0000:00:1f.2
>  34:        450          0          0          0   PCI-MSI 442368-edge     
> snd_hda_intel
>  35:       6600          0          0          0   PCI-MSI 32768-edge     
> i915
>  36:         71          0          0          0   PCI-MSI 1048576-edge     
> eth0-tx-0
>  37:         38          0          0          0   PCI-MSI 1048577-edge     
> eth0-rx-1
>  38:         34          0          0          0   PCI-MSI 1048578-edge     
> eth0-rx-2
>  39:          1          0          0          0   PCI-MSI 1048579-edge     
> eth0-rx-3
>  40:         15          0          0          0   PCI-MSI 1048580-edge     
> eth0-rx-4
> NMI:          1          0          0          0   Non-maskable interrupts
> LOC:      34113      14394      11025      10579   Local timer interrupts
> SPU:          0          0          0          0   Spurious interrupts
> PMI:          1          0          0          0   Performance monitoring
> interrupts
> IWI:          4          0          0          0   IRQ work interrupts
> RTR:          0          0          0          0   APIC ICR read retries
> RES:        892        940        961        374   Rescheduling interrupts
> CAL:       1045       1052       1099       1086   Function call interrupts
> TLB:        297        380        184        200   TLB shootdowns
> TRM:          0          0          0          0   Thermal event interrupts
> THR:          0          0          0          0   Threshold APIC interrupts
> DFR:          0          0          0          0   Deferred Error APIC
> interrupts
> MCE:          0          0          0          0   Machine check exceptions
> MCP:          3          3          3          3   Machine check polls
> ERR:          0
> MIS:          0
> PIN:          0          0          0          0   Posted-interrupt
> notification event
> PIW:          0          0          0          0   Posted-interrupt wakeup
> event
> 
> 
>     2.570457] irq 17: nobody cared (try booting with the "irqpoll" option)
> [    2.570480] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.10 #2
> [    2.570481] Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81,
> BIOS    MBP81.88Z.0047.B2A.1506082203 06/08/15
> [    2.570482]  0000000000000000 ffffffff812cac6f ffff8800892e5200
> ffff8800892e52d4
> [    2.570484]  ffffffff810c1a70 ffff8800892e5200 0000000000000000
> 0000000000000000
> [    2.570486]  ffffffff810c1ded 0000000000000000 0000000000000011
> 0000000000000000
> [    2.570487] Call Trace:
> [    2.570489]  <IRQ>  [<ffffffff812cac6f>] ? dump_stack+0x5c/0x7d
> [    2.570497]  [<ffffffff810c1a70>] ? __report_bad_irq+0x30/0xc0
> [    2.570499]  [<ffffffff810c1ded>] ? note_interrupt+0x22d/0x270
> [    2.570501]  [<ffffffff810bf43d>] ? handle_irq_event_percpu+0x15d/0x1c0
> [    2.570503]  [<ffffffff810bf4ca>] ? handle_irq_event+0x2a/0x50
> [    2.570504]  [<ffffffff810c22fb>] ? handle_fasteoi_irq+0x8b/0x150
> [    2.570506]  [<ffffffff81017e2c>] ? handle_irq+0x1c/0x30
> [    2.570508]  [<ffffffff815514a6>] ? do_IRQ+0x46/0xd0
> [    2.570510]  [<ffffffff8154f5c2>] ? common_interrupt+0x82/0x82
> [    2.570511]  <EOI>  [<ffffffff814265b7>] ? poll_idle+0x57/0xa0
> [    2.570516]  [<ffffffff814260c8>] ? cpuidle_enter_state+0xc8/0x260
> [    2.570518]  [<ffffffff810ac4ee>] ? cpu_startup_entry+0x2ae/0x370
> [    2.570519]  [<ffffffff81931f42>] ? start_kernel+0x472/0x47a
> [    2.570521]  [<ffffffff81931120>] ? early_idt_handler_array+0x120/0x120
> [    2.570522]  [<ffffffff81931600>] ? x86_64_start_kernel+0x145/0x154
> [    2.570523] handlers:
> [    2.570533] [<ffffffffa0070640>] sdhci_irq [sdhci] threaded
> [<ffffffffa006ec40>] sdhci_thread_irq [sdhci]
> [    2.570565] Disabling IRQ #17
Comment 29 Lukas Wunner 2016-05-13 19:10:46 UTC
(In reply to hr from comment #25)
> Trying your patch on the Debian and kernel 4.4.10 sources It did not help
> for me.
> 
> Messages continue in dmesg "irq 17: nobody cared" and irq storm is
> continuing.

Yes I've heard that the patch doesn't work for everybody. I don't know why, perhaps DisINTx is not set on the wireless card on some machines upon boot. To verify that's indeed the cause, boot with "modprobe.blacklist=b43 modprobe.blacklist=bcma modprobe.blacklist=wl", then check if "lspci -vv" reports "DisINTx-" for the wireless card.

I've attached two alternative patches. They cannot be submitted as is, they need more polish, but I can't decide which one is better. Please test both and let me know if either or both work for you. They're based on 4.4 but can be applied to 4.6 as well with some fuzz.

b43_eboot_4.4.patch resets all network interfaces before control is handed over from EFI to the kernel. This only works if you boot with the EFI stub. If you use gummiboot, the EFI stub is used by default. If you use grub I think you have to load the kernel with the "chainloader" directive, not the "linux" directive. You should briefly see a message "Welcome to Macintosh" on boot, plus the line "Resetting network interface" for every network card built into the machine.

b43_earlyquirk_4.4.patch resets the BCM 4331 card during kernel initialization. In dmesg there should be a message "Resetting Apple AirPort card". If MMIO was not already enabled, you'll see an additional message "Enabling mmio on Apple AirPort card". Let me know if you see that additional message as I'm not sure if I should drop this, maybe MMIO is always enabled and this is not needed.

I'm not sure if this is a bug in the EFI driver for the BCM 4331 or if it's actually a feature. Perhaps OS X supports some kind of connection handover from EFI. If that is true, it's not sufficient to just quirk the BCM 4331, we'd need this for all other cards used by Apple. E.g. models introduced 2013+ use BCM 4360. The eboot patch works for all cards whereas the earlyquirk patch requires a list of all cards used by Apple. So that's an advantage of the eboot patch. *If* this is indeed a feature, which I don't know.

A disadvantage of the eboot quirk is that it only works if the EFI stub is used. So I have no idea which patch to continue working on. If only one of them works for you, that would certainly make the choice easier.
Comment 30 hr 2016-05-13 22:54:49 UTC
testing b43_earlyquirk_4.4.patch

dmesg


[    0.000000] early_pci_scan 0000:00:00.0 [8086:0104]
[    0.000000] early_pci_scan 0000:00:01.0 [8086:0101]
[    0.000000] early_pci_scan 0000:00:01.1 [8086:0105]
[    0.000000] early_pci_scan 0000:05:00.0 [8086:1513]
[    0.000000] early_pci_scan 0000:00:02.0 [8086:0126]
[    0.000000] Reserving Intel graphics stolen memory at 0x8ba00000-0x8f9fffff
[    0.000000] early_pci_scan 0000:00:16.0 [8086:1c3a]
[    0.000000] early_pci_scan 0000:00:1a.0 [8086:1c2c]
[    0.000000] early_pci_scan 0000:00:1a.1 [8086:1c2e]
[    0.000000] early_pci_scan 0000:00:1b.0 [8086:1c20]
[    0.000000] early_pci_scan 0000:00:1c.0 [8086:1c10]
[    0.000000] early_pci_scan 0000:02:00.0 [14e4:16b4]
[    0.000000] early_pci_scan 0000:02:00.1 [14e4:16bc]
[    0.000000] early_pci_scan 0000:00:1c.1 [8086:1c12]
[    0.000000] early_pci_scan 0000:03:00.0 [14e4:4331]
[    0.000000] Mapping address 0xa0600000 for Apple AirPort card
[    0.000000] Resetting Apple AirPort card
[    0.000000] early_pci_scan 0000:00:1c.2 [8086:1c14]
[    0.000000] early_pci_scan 0000:04:00.0 [11c1:5901]
[    0.000000] early_pci_scan 0000:00:1d.0 [8086:1c27]
[    0.000000] early_pci_scan 0000:00:1d.1 [8086:1c28]
[    0.000000] early_pci_scan 0000:00:1f.0 [8086:1c49]

and again

[    2.493855] irq 17: nobody cared (try booting with the "irqpoll" option)

...

[    2.493957] handlers:
[    2.493972] [<ffffffffa0129640>] sdhci_irq [sdhci] threaded [<ffffffffa0127c40>] sdhci_thread_irq [sdhci]
[    2.494023] Disabling IRQ #17

parasitic interrupts irq 17 continuing.

none message "Enabling mmio on Apple AirPort card".


lspci -vv with blacklisted wi-fi drivers:

03:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02)
	Subsystem: Broadcom Corporation BCM4331 802.11a/b/g/n
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Latency: 0, Cache Line Size: 256 bytes
	Interrupt: pin A routed to IRQ 0
	Region 0: Memory at a0600000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=2 PME-
	Capabilities: [58] Vendor Specific Information: Len=78 <?>
	Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [d0] Express (v1) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <32us
			ClockPM+ Surprise- LLActRep+ BwNot-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [13c v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 00-00-00-ff-ff-00-00-00
	Capabilities: [16c v1] Power Budgeting <?>
Comment 31 hr 2016-05-14 11:44:05 UTC
 
Testing b43_eboot_4.4.patch patch will take longer. because it is i currently having problems with grub2 loading the kernel in EFI stub. To reading manuals will take some time.

I decided to spend a little experiment. I am physically disconnected conbined Broadcom BCM94331 module from the laptop motherboard. and I got a strange result, message "irq 17: nobody cared" gone, but this: 

hr@debian:~$ grep . -r /sys/firmware/acpi/interrupts/
/sys/firmware/acpi/interrupts/sci:   23577
/sys/firmware/acpi/interrupts/error:       0
/sys/firmware/acpi/interrupts/gpe00:       0   invalid
/sys/firmware/acpi/interrupts/gpe01:       0   invalid
/sys/firmware/acpi/interrupts/gpe02:       0   invalid
/sys/firmware/acpi/interrupts/gpe03:       0   invalid
/sys/firmware/acpi/interrupts/gpe04:       0   invalid
/sys/firmware/acpi/interrupts/gpe05:       0   invalid
/sys/firmware/acpi/interrupts/gpe06:       0   invalid
/sys/firmware/acpi/interrupts/gpe07:       0   enabled
/sys/firmware/acpi/interrupts/gpe08:       0   invalid
/sys/firmware/acpi/interrupts/gpe09:       0   disabled
/sys/firmware/acpi/interrupts/gpe10:       0   invalid
/sys/firmware/acpi/interrupts/gpe11:       0   enabled
/sys/firmware/acpi/interrupts/gpe12:       0   invalid
/sys/firmware/acpi/interrupts/gpe13:       0   enabled
/sys/firmware/acpi/interrupts/gpe14:       0   invalid
/sys/firmware/acpi/interrupts/gpe15:       0   enabled
/sys/firmware/acpi/interrupts/gpe16:       0   disabled
/sys/firmware/acpi/interrupts/gpe0A:       0   invalid
/sys/firmware/acpi/interrupts/gpe17:   23577   enabled
/sys/firmware/acpi/interrupts/gpe0B:       0   invalid
/sys/firmware/acpi/interrupts/gpe18:       0   invalid
/sys/firmware/acpi/interrupts/gpe0C:       0   invalid
/sys/firmware/acpi/interrupts/gpe19:       0   disabled
/sys/firmware/acpi/interrupts/gpe0D:       0   disabled
/sys/firmware/acpi/interrupts/gpe0E:       0   invalid
/sys/firmware/acpi/interrupts/gpe20:       0   invalid
/sys/firmware/acpi/interrupts/gpe0F:       0   invalid
/sys/firmware/acpi/interrupts/gpe21:       0   invalid
/sys/firmware/acpi/interrupts/gpe22:       0   invalid
/sys/firmware/acpi/interrupts/gpe23:       0   enabled
/sys/firmware/acpi/interrupts/gpe24:       0   invalid
/sys/firmware/acpi/interrupts/gpe25:       0   invalid
/sys/firmware/acpi/interrupts/gpe26:       0   invalid
/sys/firmware/acpi/interrupts/gpe1A:       0   invalid
/sys/firmware/acpi/interrupts/gpe27:       0   invalid
/sys/firmware/acpi/interrupts/gpe1B:       0   invalid
/sys/firmware/acpi/interrupts/gpe28:       0   invalid
/sys/firmware/acpi/interrupts/gpe1C:       0   invalid
/sys/firmware/acpi/interrupts/gpe29:       0   invalid
/sys/firmware/acpi/interrupts/gpe1D:       0   invalid
/sys/firmware/acpi/interrupts/gpe1E:       0   invalid
/sys/firmware/acpi/interrupts/gpe30:       0   invalid
/sys/firmware/acpi/interrupts/gpe1F:       0   invalid
/sys/firmware/acpi/interrupts/gpe31:       0   invalid
/sys/firmware/acpi/interrupts/gpe32:       0   invalid
/sys/firmware/acpi/interrupts/gpe33:       0   invalid
/sys/firmware/acpi/interrupts/gpe34:       0   invalid
/sys/firmware/acpi/interrupts/gpe35:       0   invalid
/sys/firmware/acpi/interrupts/gpe36:       0   invalid
/sys/firmware/acpi/interrupts/gpe2A:       0   invalid
/sys/firmware/acpi/interrupts/gpe37:       0   invalid
/sys/firmware/acpi/interrupts/gpe2B:       0   invalid
/sys/firmware/acpi/interrupts/gpe38:       0   invalid
/sys/firmware/acpi/interrupts/gpe2C:       0   invalid
/sys/firmware/acpi/interrupts/gpe39:       0   invalid
/sys/firmware/acpi/interrupts/gpe2D:       0   invalid
/sys/firmware/acpi/interrupts/gpe2E:       0   invalid
/sys/firmware/acpi/interrupts/gpe2F:       0   invalid
/sys/firmware/acpi/interrupts/gpe3A:       0   invalid
/sys/firmware/acpi/interrupts/gpe3B:       0   invalid
/sys/firmware/acpi/interrupts/gpe3C:       0   invalid
/sys/firmware/acpi/interrupts/gpe3D:       0   invalid
/sys/firmware/acpi/interrupts/gpe3E:       0   invalid
/sys/firmware/acpi/interrupts/gpe3F:       0   invalid
/sys/firmware/acpi/interrupts/sci_not:      21
/sys/firmware/acpi/interrupts/ff_pmtimer:       0   invalid
/sys/firmware/acpi/interrupts/ff_rt_clk:       0   disabled
/sys/firmware/acpi/interrupts/gpe_all:   23577
/sys/firmware/acpi/interrupts/ff_gbl_lock:       0   enabled
/sys/firmware/acpi/interrupts/ff_pwr_btn:       0   enabled
/sys/firmware/acpi/interrupts/ff_slp_btn:       0   invalid

Who sending interrupts gpe17 if wi-fi physically not present in system? Thats normal situation or not? May be it may be something i do not understand?
Comment 32 Lukas Wunner 2016-05-14 14:42:54 UTC
(In reply to hr from comment #31)
> Testing b43_eboot_4.4.patch patch will take longer. because it is i
> currently having problems with grub2 loading the kernel in EFI stub. To
> reading manuals will take some time.

I've just realized, since you're using Debian it might be sufficient to just replace "linux" with "linuxefi" in your grub configuration. The Debian grub2 package carries a patch to enable this command:

http://lists.gnu.org/archive/html/grub-devel/2014-01/msg00137.html
https://anonscm.debian.org/cgit/pkg-grub/grub.git/commit/?id=e4951626e0ab33c446cfd7e9a22044f602ca0106


Obviously resetting the card didn't work with b43_eboot_4.4.patch, the lspci output shows "INTx+" in the "Status:" line, so the card keeps asserting its interrupt line. Question is why. Michael Büsch mentioned on linux-wireless@ that my patch only works if the wireless core is the second one on the card. On my machine that's the case, but perhaps your machine uses a different revision of the chip with a different layout. If you try "modprobe b43", which cores does it list in dmesg?

On my machine it looks like this:
[   59.706481] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0)
[   59.708383] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0)
[   59.710324] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0)

If the core labeled "IEEE 802.11" has a different id than "1" (i.e., it's not the second core), that would explain why the patch isn't working on your machine.


> I decided to spend a little experiment. I am physically disconnected
> conbined Broadcom BCM94331 module from the laptop motherboard. and I got a
> strange result, message "irq 17: nobody cared" gone, but this: 
[...]
> /sys/firmware/acpi/interrupts/gpe17:   23577   enabled
[...]
> Who sending interrupts gpe17 if wi-fi physically not present in system?
> Thats normal situation or not? May be it may be something i do not
> understand?

GPE 17 is used by the ACPI Embedded Controller, apparently it's unhappy if the wireless module is disconnected. There's an I2C bus between the EC and pins 12 and 14 of the wireless module connector which I believe is used for a temperature sensor, perhaps the EC fires its interrupt because it thinks that something on or near the wireless module is overheating. Also, pin 10 is used to signal wireless events (Wake on Wireless LAN perhaps), the pin goes to low (!) when an event occurs, so by disconnecting the card I guess it signals an event all the time.
Comment 33 Lukas Wunner 2016-05-14 14:47:32 UTC
Ugh, s/b43_eboot_4.4.patch/b43_earlyquirk_4.4.patch/.
Comment 34 hr 2016-05-14 17:41:23 UTC
(In reply to Lukas Wunner from comment #32)

> Michael Büsch mentioned on linux-wireless@
> that my patch only works if the wireless core is the second one on the card.
> On my machine that's the case, but perhaps your machine uses a different
> revision of the chip with a different layout. If you try "modprobe b43",
> which cores does it list in dmesg?

[23129.929976] bcma: bus0: Found chip with id 0x4331, rev 0x02 and package 0x09
[23129.930009] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0)
[23129.930034] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0)
[23129.930081] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0)
[23129.941219] bcma: bus0: Bus registered
[23130.020310] b43-phy0: Broadcom 4331 WLAN found (core revision 29)
[23130.020731] b43-phy0: Found PHY: Analog 9, Type 7 (HT), Revision 1
[23130.020751] b43-phy0: Found Radio: Manuf 0x17F, ID 0x2059, Revision 0, Version 1
[23130.020758] b43-phy0 warning: 5 GHz band is unsupported on this PHY 

  
> GPE 17 is used by the ACPI Embedded Controller, apparently it's unhappy if
> the wireless module is disconnected. There's an I2C bus between the EC and
> pins 12 and 14 of the wireless module connector which I believe is used for
> a temperature sensor, perhaps the EC fires its interrupt because it thinks
> that something on or near the wireless module is overheating. Also, pin 10
> is used to signal wireless events (Wake on Wireless LAN perhaps), the pin
> goes to low (!) when an event occurs, so by disconnecting the card I guess
> it signals an event all the time.

I would like to say that the situation gpe17 does not change, if the card is connected or not connected. It seems rather strange, can may be faulty hardware controller cardreader?
Comment 35 hr 2016-05-15 12:35:19 UTC
I was not able to get the kernel boot efi stub on grub2. Linuxefi settings work, but I always get a kernel panic, as the kernel can not find the file systems. I even tried to build into the kernel cmdline option indicating the partition UUID but something I'm doing wrong and it does not work.

Finally install rEFInd, it loads the kernel without any problems. your patch b43_eboot_4.4.patch work early in the boot there are lines Welcome to Macintosh, and two lines "Resetting network interface"

Error "irq 17: nobody cared (try booting with the" irqpoll "option)" is no more.

But now the system hangs during operation minutes or slightly less if there sometimes is used Internet application (no response to the cursor does not react to pressing the keypad). Of course, I can`t see dmesg, all I can do is clamp the power button on the laptop to turn it off and on again. If wl driver is not loaded, then there is no hang-ups.  This has never happened to the other versions kernels and with the kernel in default Debian. Now i boot with kernel version 3.16, and with the same version of the driver wl, everything works as before. Every time compile source driver for the new kernel, i do not just copy the file wl.ko.
Comment 36 hr 2016-05-18 17:27:16 UTC
Testing b43_eboot_4.4.patch with b43 driver. No hang and freezes, but if download large files at high speed, process irq/17-b43 use 10-20% cpu.

cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       
  0:         17          0          0          0   IO-APIC   2-edge      timer
  8:          1          0          0          0   IO-APIC   8-edge      rtc0
  9:      13581          0          0          0   IO-APIC   9-fasteoi   acpi
 17:    2276967          0          0          0   IO-APIC  17-fasteoi   mmc0, b43
 19:          0          0          0          0   IO-APIC  19-fasteoi   uhci_hcd:usb4
 21:          0          0          0          0   IO-APIC  21-fasteoi   uhci_hcd:usb3
 22:         68          0          0          0   IO-APIC  22-fasteoi   ehci_hcd:usb2
 23:      86807          0          0          0   IO-APIC  23-fasteoi   ehci_hcd:usb1
 28:          0          0          0          0   PCI-MSI 3194880-edge      pciehp
 29:          0          0          0          0   PCI-MSI 3211264-edge      pciehp
 30:          0          0          0          0   PCI-MSI 3227648-edge      pciehp
 31:          0          0          0          0   PCI-MSI 3244032-edge      pciehp
 32:          3          0          0          0   PCI-MSI 2097152-edge      firewire_ohci
 33:      20142          0          0          0   PCI-MSI 512000-edge      0000:00:1f.2
 34:      43489          0          0          0   PCI-MSI 442368-edge      snd_hda_intel
 35:      71355          0          0          0   PCI-MSI 32768-edge      i915
 36:          1          0          0          0   PCI-MSI 1048576-edge      eth0-tx-0
 37:          1          0          0          0   PCI-MSI 1048577-edge      eth0-rx-1
 38:          1          0          0          0   PCI-MSI 1048578-edge      eth0-rx-2
 39:          1          0          0          0   PCI-MSI 1048579-edge      eth0-rx-3
 40:          1          0          0          0   PCI-MSI 1048580-edge      eth0-rx-4
NMI:          0          0          0          0   Non-maskable interrupts
LOC:     231063     148480     188319     113361   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:      39733      15928     231716      11938   Rescheduling interrupts
CAL:       1084       1283       1210       1307   Function call interrupts
TLB:      20252      14652      23227      14233   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          7          7          7          7   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0   Posted-interrupt wakeup event
Comment 37 Lukas Wunner 2016-05-18 21:17:01 UTC
Thank you for the extensive testing.

I've looked at the b43 driver to see which steps I might be missing in b43_earlyquirk_4.4.patch and realized that before resetting the wireless core, it has to be mapped first. On my machine that step isn't needed but perhaps it is on yours.

Could you try the following: Apply the newly attached b43_print_core_addr.patch to a stock kernel, boot and load b43. The driver should now output the address and wrap for each core. Looks like this on my machine:
[ 3596.239935] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0, addr 0X18000000, wrap 0X18100000)
[ 3596.241657] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0, addr 0X18001000, wrap 0X18101000)
[ 3596.243438] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0, addr 0X18002000, wrap 0X18102000)

Reboot with all wireless modules blacklisted (b43, bcma, wl). Invoke "lspci -vvvv -xxxx -s 03:00.0". You will get a hexdump of the wireless card's PCI config space. Look at the four bytes starting at positions 0x80 and 0xac. This is the address and wrap of the core that is currently mapped (in little endian order). For the patch to work, this needs to be the address and wrap of the IEEE 802.11 core.

E.g. on my machine, the four bytes at position 0x80 are 00 10 00 18 (= addr 0x18001000), at position 0xac it's 00 10 10 18 (= wrap 0x18101000).

Please let me know to which of the three cores the 2x four bytes in the config space correspond.

As for b43_eboot_4.4.patch causing the system to freeze when the broadcom-sta driver is used, I don't really have an idea yet what might be the cause. The high CPU load with b43 might be normal, I'll check this on my machine.

As for GPE 17, I can see that about 500 interrupts have accumulated for this GPE on my machine shortly after booting, something in the 20000+ range seems high indeed. After briefly looking at drivers/acpi/ec.c, I think it should be possible to determine what is causing the GPE to fire by adding some debug code to acpi_ec_query() so that the _Qxx number is printed. That number can then be compared to the _Qxx methods in DSDT. I'll see to it that I cook up a patch for that.
Comment 38 Lukas Wunner 2016-05-18 21:18:10 UTC
Created attachment 216631 [details]
[PATCH] bcma: Print addr and wrap on core scan
Comment 39 hr 2016-05-21 11:56:56 UTC
(In reply to Lukas Wunner from comment #37)

> Could you try the following: Apply the newly attached
> b43_print_core_addr.patch to a stock kernel, boot and load b43. The driver
> should now output the address and wrap for each core. Looks like this on my
> machine:

with b43_print_core_addr.patch


[    8.923331] bcma: bus0: Found chip with id 0x4331, rev 0x02 and package 0x09
[    8.923365] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0, addr 0X18000000, wrap 0X18100000)
[    8.923392] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0, addr 0X18001000, wrap 0X18101000)
[    8.923446] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0, addr 0X18002000, wrap 0X18102000)
[    8.973257] bcma: bus0: Bus registered


 
> Reboot with all wireless modules blacklisted (b43, bcma, wl). Invoke "lspci
> -vvvv -xxxx -s 03:00.0". You will get a hexdump of the wireless card's PCI
> config space. Look at the four bytes starting at positions 0x80 and 0xac.
> This is the address and wrap of the core that is currently mapped (in little
> endian order). For the patch to work, this needs to be the address and wrap
> of the IEEE 802.11 core.
> 
> E.g. on my machine, the four bytes at position 0x80 are 00 10 00 18 (= addr
> 0x18001000), at position 0xac it's 00 10 10 18 (= wrap 0x18101000).
> 
> Please let me know to which of the three cores the 2x four bytes in the
> config space correspond.

00: e4 14 31 43 06 00 18 00 02 00 80 02 40 00 00 00
10: 04 00 60 a0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 e4 14 31 43
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00
40: 01 58 03 06 08 40 00 00 05 d0 80 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 09 48 78 00 13 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 10 00 18 00 00 00 00 00 00 00 00 03 00 00 00
90: 00 02 00 00 00 03 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 0b 00 00 10 10 18
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 10 00 01 00 a0 8f 90 05 00 00 10 00 11 dc 16 00
e0: 43 01 11 30 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
100: 01 00 c1 13 00 00 00 00 00 00 00 00 11 20 06 00
110: 00 00 00 00 00 20 00 00 b4 00 00 00 00 00 00 00
120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
130: 00 00 00 00 00 00 00 00 00 00 00 00 02 00 01 16
140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
150: ff 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00
160: 03 00 c1 16 00 00 00 ff ff 00 00 00 04 00 01 00
170: 00 00 00 00 c4 62 05 00 01 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

...

ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

range 180-ff0 is null.
 
 
> As for GPE 17, I can see that about 500 interrupts have accumulated for this
> GPE on my machine shortly after booting, something in the 20000+ range seems
> high indeed. After briefly looking at drivers/acpi/ec.c, I think it should
> be possible to determine what is causing the GPE to fire by adding some
> debug code to acpi_ec_query() so that the _Qxx number is printed. That
> number can then be compared to the _Qxx methods in DSDT. I'll see to it that
> I cook up a patch for that.

may be i introduce you astray. GPE17 actually has about 500 interrupts after boot (with b43_eboot_4.4.patch), 20000+ it after a long work laptop. if you have the same values after boot, i should calm down about this.
Comment 40 Lukas Wunner 2016-05-23 15:58:28 UTC
So the reason why b43_earlyquirk_4.4.patch doesn't work on your machine seems to be that the wireless card is in power state D3hot. I'm attaching a new version which transitions the card to D0 before resetting it. The commit message contains an explanation why the card is in D3hot. (It's caused by grub.)

Please let me know if this new patch works for you. I'd be glad to include a Tested-by: in the commit message so that you get credit for your testing efforts. If you would like to be credited with your real name, please send it to me by e-mail. If you prefer to remain anonymous, that is also fine of course.

As for GPE 17, I've hacked the EC interrupt handler to output the _Qxx number and the about 500 interrupts on boot seem to be caused by initializing the battery (_Q10). Afterwards I've only seen _Q40, which is related to the Ambient Light Sensor. This increases the interrupt count by 6 each time. Wave your hand in front of the camera (where the ALS is located) and the interrupt count on GPE 17 will go up fairly quickly.
Comment 41 Lukas Wunner 2016-05-23 15:59:57 UTC
Created attachment 217121 [details]
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot]
Comment 42 hr 2016-05-25 08:09:31 UTC
(In reply to Lukas Wunner from comment #40)
>[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot]

works fine. after boot:

cat /proc/interrupts

17:    553          0          0          0   IO-APIC  17-fasteoi   mmc0, b43

Аlso is still proprietary driver wl makes the system to hang. If your patch is accepted into the linux kernel then it will be a surprise for those who use proprietary driver and once upgraded to patched version of the kernel.

 
> As for GPE 17, I've hacked the EC interrupt handler to output the _Qxx
> number and the about 500 interrupts on boot seem to be caused by
> initializing the battery (_Q10). Afterwards I've only seen _Q40, which is
> related to the Ambient Light Sensor. This increases the interrupt count by 6
> each time. Wave your hand in front of the camera (where the ALS is located)
> and the interrupt count on GPE 17 will go up fairly quickly.

It may be possible for Ambient Light Sensor disable send an interrupt? because it is not used for example for me, i always adjust the brightness manually.

Thank you very much for the detailed explanation and your patches.
Comment 43 Bryan Paradis 2016-05-26 17:39:02 UTC
I recompiled my kernel (4.5.4-1-ARCH) with your latest patch: >[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot].

It seems to have fixed the IRQ problem on start up. I am run b43-firmware from AUR  normally but I have installed and configured as hr mentioned issues with that combination. 

I installed broadcom-wl from AUR, disabled b43 and ssb and modprobed wl. My connection doesn't seem as stable with this wl driver as with b43 but I haven't had the box hang on me or anything yet. 

hr do you have a good procedure to reproduce the freezing/hanging you are seeing with the wl driver? Could you provide more information about how and where it is hanging for you?'

kernel information:
uname -a
4.5.4-1 SMP PREEMPT Thu May 26 12:46:14 EDT 2016 x86_64 GNU/Linux

patches:
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot]

dmesg:
Going to try to attach
Comment 44 Bryan Paradis 2016-05-26 17:40:42 UTC
Created attachment 217731 [details]
dmesg after appying patch uploaded @ 2016-05-23 15:59 UTC
Comment 45 Lukas Wunner 2016-05-26 19:32:25 UTC
Thank you Bryan. Is the dmesg output from when you tested it with wl? Because on that boot, b43 was used. Otherwise it looks fine, except for the Thunderbolt controller not being supported (which is fixed in 4.7) and the BIOS being from 2016 (which is odd, I thought the newest version was Apple's EFI Update 2015-002).

hr tells me that lockups only occur with large amounts of traffic. When transmitting just pings, things seem to work fine even over longer periods of time. Unfortunately I don't have a wifi AP at my disposal right now, I mostly use Gigabit Ethernet, so I have to lean on you guys to test it.
Comment 46 Lukas Wunner 2016-05-26 22:51:09 UTC
Created attachment 217741 [details]
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure according to Michael Büsch]
Comment 47 Bryan Paradis 2016-05-26 22:54:25 UTC
Created attachment 217751 [details]
dmesg + wl after appying patch uploaded @ 2016-05-23 15:59 UTC
Comment 48 Lukas Wunner 2016-05-26 23:02:57 UTC
I've sent an e-mail to Broadcom support asking for help:
http://lists.infradead.org/pipermail/b43-dev/2016-May/003975.html

Michael Büsch (not at Broadcom but a regular b43 contributor) responded that my simplified reset procedure (which just sets the reset bit and does nothing else) might cause issues with wl and that I need to follow the procedure as per the bcma source code:
http://lists.infradead.org/pipermail/b43-dev/2016-May/003976.html

So I've just attached a new version of the patch which follows Michael's specification to the bit. Not sure if it helps but worth a try?

In the bcma source code we're actually doing a bit more, we're writing to the BCMA_IOCTL a couple of times. If the latest patch doesn't resolve the issue I could add that as well:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/bcma/core.c#n42
Comment 49 Bryan Paradis 2016-05-26 23:34:30 UTC
Hello Lukas, Dmesg was with b43 and now I have uploaded one with wl. Hadn't even thought to look into the Thunderbolt stuff yet. Just recently got this machine. Good to hear it's in 4.7!

As I am not familiar with the platform I have no idea about the EFI update version. My friend had updated the device to El Captain just before I received it. Packaged with it maybe? I have dumped the eeprom if you are interested. 

I have been doing internet based testing but my connection isn't very fast. I will setup some wifi/lan tests with high throughput. Will be interesting if I can reproduce the same issue hr is seeing or not. I was seeing lots of problems before but now it seems really stable now using: 

aur/broadcom-wl-dkms-248 6.30.223.248-1 [installed] (2) (0.03)

Interesting about the reset procedure. Nice of Michael to respond. Could make sense. Wonder if it will need he BCMA_IOCTL or not. Is there anyway to access the registers or any other information that might help when it is in an unstable state? If I can get it into one anyway. 

Takes a good while to compile on here (: I will get back to you a bit later with the results. Do you know which version of wl hr was using?
Comment 50 hr 2016-05-27 19:26:48 UTC
(In reply to Bryan Paradis from comment #43)

 
> I installed broadcom-wl from AUR

Thanks for info Bryan. I download last shapshot drivers https://aur.archlinux.org/packages/broadcom-wl-dkms/

Аnd find there something interesting, two patches:

001-null-pointer-fix.patch
002-rdtscl.patch

I applied them to the drivers who have installed so far:

https://www.broadcom.com/docs/linux_sta/hybrid-v35-nodebug-pcoem-6_30_223_271.tar.gz

And now have been no hangups everything works fine with patch Lukas.
Comment 51 Lukas Wunner 2016-05-29 00:02:21 UTC
Great, thanks for the research, it's good to know about that null pointer issue in wl and the corresponding patch. I've just sent the patch to the lists, let's see if there are any objections, bikeshedding requests, etc.
Comment 52 Bryan Paradis 2016-05-31 14:11:25 UTC
Good find hr. I have reconfigured my system with a new drive so I have been pretty busy. Just got the kernel built with the latest patch: 217741. I have been having some wonky wifi without the past at least. I will see how this goes and report back.
Comment 53 hr 2016-06-02 16:39:17 UTC
(In reply to Lukas Wunner from comment #46)
> Created attachment 217741 [details]
> [PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure
> according to Michael Büsch]

Yesterday try this patch, some errors come back in dmesg:

[23289.098445] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22)
[23404.895181] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22)

Soon i try to revert to previous version patch and test.
Comment 54 Bryan Paradis 2016-06-03 00:01:27 UTC
I was having some instability problems which I troubleshot down to some weird disassociation problems on the AP side. Updated my OpenWRT to 15.05.1 and dropped 40mhz width back to 20mhz. No problems since.

I haven't seen the errors in my dmesg as you hr.Could you provide any more context?
Comment 55 Bryan Paradis 2016-06-04 08:20:43 UTC
New information. I have been messing around with the kernel source see if I can get further towards a fix for this issue. Definitely something earlier in the UHS init is breaking the tuning. I am unsure. 

Can someone else confirm that me thinking that the BCM57785 has an on board voltage switching regular and so there is no reason that it wouldn't be capable of v1.8?

Some Hardware Tests:

--TEST1--
Physically disconnected BCM4331 Wifi Card
+ UHS-I card 
+ Cat5 not plugged into BCM57785 Ethernet 
+ tg3 (ethernet) module loaded
= BAD: Timeouts occur waiting for hardware interrupt.

[  42.784679] mmc0: Timeout waiting for hardware interrupt.
[   52.812088] mmc0: Timeout waiting for hardware interrupt.
[   60.889130] mmc0: Card removed during transfer!
[   60.889138] mmc0: Resetting controller.
[   60.899787] mmc0: error -123 whilst initialising SD card
[   72.120058] mmc0: Timeout waiting for hardware interrupt.
[   82.147469] mmc0: Timeout waiting for hardware interrupt.
[  101.135438] mmc0: Timeout waiting for hardware interrupt.
[  111.162834] mmc0: Timeout waiting for hardware interrupt.
[  121.190203] mmc0: Timeout waiting for hardware interrupt.
[  140.204860] mmc0: Timeout waiting for hardware interrupt.
[  140.213640] mmc0: error -123 whilst initialising SD card

--TEST2--
Physically disconnected BCM4331 Wifi Card
+ UHS-I card 
+ Cat5 plugged into BCM57785 Ethernet <--- Change here
+ tg3 (ethernet) module loaded
= Good: voltage switch gets skipped and card loads as high speed

[ 1755.779744] tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
[ 1755.779766] tg3 0000:01:00.0 enp1s0f0: Flow control is on for TX and on for RX
[ 1755.779772] tg3 0000:01:00.0 enp1s0f0: EEE is disabled
[ 1775.595860] mmc0: Skipping voltage switch
[ 1776.610955] mmc0: new high speed SDXC card at address e624
[ 1776.614773] mmcblk0: mmc0:e624 SU64G 59.5 GiB (ro)
[ 1776.616032]  mmcblk0: p1 p2

--TEST3--
Physically disconnected BCM4331 Wifi Card
+ UHS-I card 
+ Cat5 plugged into BCM57785 Ethernet 
+ tg3 (ethernet) module unloaded <--- Change here
= Bad: Same result as TEST1

--TEST4--
Physically disconnected BCM4331 Wifi Card
+ UHS-I card    
+ Cat5 plugged into BCM57785 Ethernet    
+ tg3 (ethernet) module unloaded
+ sdhci debug_quirks2=4 <--- Change here
= Bad: Same result as TEST1

--TEST5--
Connected BCM4331 Wifi Card
+ UHS-I card
+ Cat5 not plugged into BCM57785 Ethernet
+ tg3 module loaded
+ no sdhci debug quriks
= Bad: Main bug result

[  177.120002] sdhci: Timeout waiting for Buffer Read Ready interrupt during tuning procedure, falling back to fixed sampling clock
[  177.120023] mmc0: new ultra high speed DDR50 SDXC card at address e624
[  177.129048] mmcblk0: mmc0:e624 SU64G 59.5 GiB (ro)
[  177.139349] mmc0: Controller never released inhibit bit(s).
[  187.163886] mmc0: Timeout waiting for hardware interrupt.
[  187.163944] mmcblk0: error -110 sending status command, retrying
[  187.213949] sdhci: Timeout waiting for Buffer Read Ready interrupt during tuning procedure, falling back to fixed sampling clock

--TEST6--
Connected BCM4331 Wifi Card
+ UHS-I card
+ Cat5 not plugged into BCM57785 Ethernet
+ tg3 module loaded
+ sdhci debug_quirks2=4
= Good: Card gets deteched but with no voltage skips - Similar to TEST2

[  310.925853] sdhci-pci 0000:01:00.1: SDHCI controller found [14e4:16bc] (rev 10)
[  310.927211] mmc0: SDHCI controller on PCI [0000:01:00.1] using ADMA 64-bit
[  313.079711] mmc0: new high speed SDXC card at address e624
[  313.081903] mmcblk0: mmc0:e624 SU64G 59.5 GiB (ro)
[  313.082996]  mmcblk0: p1 p2

I will also be attaching lspci -v and dmesg for two wifi plugged in/unplugged states that includes logs of me doing the tests above:

wifi_unplugged.lspci
wifi_unplugged.dmesg
wifi_plugged_in.lspci
wifi_plugged_in.dmesg
Comment 56 Bryan Paradis 2016-06-04 08:24:32 UTC
Unfortunately I just posted in the wrong bug. I don't think you will be able to glean any information about the current problem from that one. It was meant for https://bugzilla.kernel.org/show_bug.cgi?id=73241 but I got my windows mixed up.

As for an update here I have been running a kernel with the latest patch quite extensively for a few days with no occurrences of any wifi problems.
Comment 57 Bryan Paradis 2016-06-10 10:35:51 UTC
(In reply to hr from comment #53)
> (In reply to Lukas Wunner from comment #46)
> > Created attachment 217741 [details]
> > [PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure
> > according to Michael Büsch]
> 
> Yesterday try this patch, some errors come back in dmesg:
> 
> [23289.098445] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22)
> [23404.895181] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22)
> 
> Soon i try to revert to previous version patch and test.

hr: I noticed that iwconfig and other wireless utilities need to be run as root to work else I errors like:

ERROR @wl_dev_intvar_get : error (-1)
ERROR @wl_cfg80211_get_tx_power : error (-1)

Could this have happened when using some sort of utility? Anymore information about it occurring?
Comment 58 Lukas Wunner 2016-06-10 10:57:08 UTC
Error -1 is -EPERM:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/asm-generic/errno-base.h

I can reproduce the error hr is getting by loading b43, unloading, then loading wl. Somehow wl is picky about the state the device is in and returns -EINVAL when scanning. But I believe this is independent from this patch.

By the way, the patch was queued by Ingo Molnar this week:
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=625a99d9bfd0

There was an objection raised post-merge which I've addressed, we'll have to see if there are others:
https://lkml.org/lkml/2016/6/8/972
Comment 59 Bryan Paradis 2016-06-10 11:02:04 UTC
(In reply to Lukas Wunner from comment #58)
> Error -1 is -EPERM:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/
> uapi/asm-generic/errno-base.h
> 
> I can reproduce the error hr is getting by loading b43, unloading, then
> loading wl. Somehow wl is picky about the state the device is in and returns
> -EINVAL when scanning. But I believe this is independent from this patch.
> 

Thanks for posting that. Totally makes sense that this exists.

> By the way, the patch was queued by Ingo Molnar this week:
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/
> ?id=625a99d9bfd0
> 
> There was an objection raised post-merge which I've addressed, we'll have to
> see if there are others:
> https://lkml.org/lkml/2016/6/8/972

Very cool. Thanks for doing this.
Comment 61 Lukas Wunner 2016-08-10 14:02:44 UTC
Fixed in stable kernels 4.6.6, 4.4.17, 4.1.30, 3.18.39
Comment 62 Lukas Wunner 2016-11-14 10:04:06 UTC
Fixed in upcoming stable kernels 3.16.39, 3.2.84

Note You need to log in before you can comment on or make changes to this bug.