Bug 79301
Description
Chris Murphy
2014-06-30 22:49:57 UTC
Created attachment 141531 [details]
dmesg
Created attachment 141541 [details]
acpidump
Created attachment 141551 [details]
lspci
Created attachment 141561 [details]
dmesg with pci=biosirq
No apparent change with this parameter.
Created attachment 141831 [details]
photo trace with acpi=noirq
With acpi=noirq I get something akin to a panic. Still model MacbookPro 8,2.
On the MacbookPro 8,2, the message doesn't happen (or happen as often) if I use b43-fwcutter to copy firmware to /lib/firmware. The message still occurs on a 9,2 model even with firmware in /lib/firmware. Created attachment 141841 [details]
dmesg mbp92
Created attachment 141851 [details]
journal mbp92
Created attachment 141861 [details]
acpidump mbp92
Created attachment 141871 [details]
dmidecode mbp92
Same problem with MacBookPro 10,1 running 3.17-rc5. Same problem with MacBookPro 10,1 running 3.17.0-300.fc21.x86_64. [ 9.704961] irq 17: nobody cared (try booting with the "irqpoll" option) [ 9.704967] CPU: 0 PID: 0 Comm: swapper/0 Tainted: P OE 3.17.0-300.fc21.x86_64 #1 [ 9.704969] Hardware name: Apple Inc. MacBookPro10,1/Mac-C3EC7CD22292981F, BIOS MBP101.88Z.00EE.B03.1212211437 12/21/2012 [ 9.704970] 0000000000000000 c7c304c7564d0cfc ffff88046f203e48 ffffffff8173c311 [ 9.704973] ffff8804575ab800 ffff88046f203e70 ffffffff810edd82 ffff8804575ab800 [ 9.704975] 0000000000000000 0000000000000011 ffff88046f203ea8 ffffffff810ee127 [ 9.704978] Call Trace: [ 9.704980] <IRQ> [<ffffffff8173c311>] dump_stack+0x45/0x56 [ 9.704989] [<ffffffff810edd82>] __report_bad_irq+0x32/0xd0 [ 9.704991] [<ffffffff810ee127>] note_interrupt+0x247/0x290 [ 9.704996] [<ffffffff810eb653>] handle_irq_event_percpu+0x133/0x1a0 [ 9.704999] [<ffffffff810eb6f7>] handle_irq_event+0x37/0x60 [ 9.705001] [<ffffffff810ee6b8>] handle_fasteoi_irq+0x78/0x150 [ 9.705005] [<ffffffff810163a4>] handle_irq+0x84/0x150 [ 9.705009] [<ffffffff8109a6c2>] ? _local_bh_enable+0x22/0x50 [ 9.705012] [<ffffffff8174616d>] do_IRQ+0x4d/0xe0 [ 9.705016] [<ffffffff81743f6d>] common_interrupt+0x6d/0x6d [ 9.705017] <EOI> [<ffffffff815d21f6>] ? cpuidle_enter_state+0x66/0x160 [ 9.705022] [<ffffffff815d21e1>] ? cpuidle_enter_state+0x51/0x160 [ 9.705024] [<ffffffff815d23d7>] cpuidle_enter+0x17/0x20 [ 9.705027] [<ffffffff810d7377>] cpu_startup_entry+0x397/0x3d0 [ 9.705030] [<ffffffff81731cb7>] rest_init+0x77/0x80 [ 9.705034] [<ffffffff81d4bffc>] start_kernel+0x486/0x4a7 [ 9.705037] [<ffffffff81d4b120>] ? early_idt_handlers+0x120/0x120 [ 9.705039] [<ffffffff81d4b4d7>] x86_64_start_reservations+0x2a/0x2c [ 9.705042] [<ffffffff81d4b62b>] x86_64_start_kernel+0x152/0x175 [ 9.705043] handlers: [ 9.705053] [<ffffffffa0266830>] sdhci_irq [sdhci] threaded [<ffffffffa0262fa0>] sdhci_thread_irq [sdhci] [ 9.705058] [<ffffffffa02d3890>] azx_interrupt [snd_hda_controller] [ 9.705060] Disabling IRQ #17 $ cat /proc/interrupts |grep '^\s*17' 17: 189873 0 0 0 0 0 0 0 IO-APIC 17-fasteoi mmc0, snd_hda_intel, wlp4s0 I am also experiencing the same problem with MacBook Pro Early 2011 Model. Please take a look at bug https://bugzilla.redhat.com/show_bug.cgi?id=1149632 Thanks. Looks like irq 17 is attached to mmc card controller. # lspci -vnn 02:00.1 SD Host controller [0805]: Broadcom Corporation BCM57765/57785 SDXC/MMC Card Reader [14e4:16bc] (rev 10) (prog-if 01) Subsystem: Broadcom Corporation Device [14e4:0000] Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at a0420000 (64-bit, prefetchable) [size=64K] Capabilities: <access denied> Kernel driver in use: sdhci-pci Kernel modules: sdhci_pci [srai@localhost ~]$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 47 0 0 0 IO-APIC-edge timer 8: 1 0 0 0 IO-APIC-edge rtc0 9: 124126 0 0 0 IO-APIC-fasteoi acpi 17: 100001 0 0 0 IO-APIC 17-fasteoi mmc0 19: 0 0 0 0 IO-APIC 19-fasteoi uhci_hcd:usb4 21: 0 0 0 0 IO-APIC 21-fasteoi uhci_hcd:usb3 22: 1551032 0 0 0 IO-APIC 22-fasteoi ehci_hcd:usb2 23: 138892 0 0 0 IO-APIC 23-fasteoi ehci_hcd:usb1 24: 0 0 0 0 PCI-MSI-edge PCIe PME 25: 0 0 0 0 PCI-MSI-edge PCIe PME 26: 0 0 0 0 PCI-MSI-edge PCIe PME 27: 0 0 0 0 PCI-MSI-edge PCIe PME 28: 0 0 0 0 PCI-MSI-edge PCIe PME 29: 115730 0 0 0 PCI-MSI-edge ahci 30: 4 0 0 0 PCI-MSI-edge firewire_ohci 31: 257697 0 0 0 PCI-MSI-edge i915 32: 8 0 0 0 PCI-MSI-edge mei_me 33: 393 0 0 0 PCI-MSI-edge snd_hda_intel 34: 35259 0 0 0 PCI-MSI-edge enp2s0f0-tx-0 35: 40231 0 0 0 PCI-MSI-edge enp2s0f0-rx-1 36: 37855 0 0 0 PCI-MSI-edge enp2s0f0-rx-2 37: 21092 0 0 0 PCI-MSI-edge enp2s0f0-rx-3 38: 22128 0 0 0 PCI-MSI-edge enp2s0f0-rx-4 NMI: 0 0 0 0 Non-maskable interrupts LOC: 6055379 4193295 4936419 3467806 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 2 0 0 0 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 47853 37958 31064 29007 Rescheduling interrupts CAL: 4140 6477 3234 5624 Function call interrupts TLB: 9585 8667 12741 13045 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 85 84 84 84 Machine check polls THR: 0 0 0 0 Hypervisor callback interrupts ERR: 0 MIS: 0 Truncated backtrace: irq 17: nobody cared (try booting with the "irqpoll" option) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.1-302.fc21.x86_64 #1 Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS MBP81.88Z.0047.B27.1201241646 01/24/12 0000000000000000 d0bdbd5a6a6ab5ee ffff8802efa03e48 ffffffff8173dbb1 ffff8802dcbed000 ffff8802efa03e70 ffffffff810edd82 ffff8802dcbed000 0000000000000000 0000000000000011 ffff8802efa03ea8 ffffffff810ee127 Call Trace: <IRQ> [<ffffffff8173dbb1>] dump_stack+0x45/0x56 [<ffffffff810edd82>] __report_bad_irq+0x32/0xd0 [<ffffffff810ee127>] note_interrupt+0x247/0x290 [<ffffffff810eb653>] handle_irq_event_percpu+0x133/0x1a0 [<ffffffff810eb6f7>] handle_irq_event+0x37/0x60 [<ffffffff810ee6b8>] handle_fasteoi_irq+0x78/0x150 [<ffffffff810163a4>] handle_irq+0x84/0x150 [<ffffffff810b610a>] ? atomic_notifier_call_chain+0x1a/0x20 [<ffffffff81747a2d>] do_IRQ+0x4d/0xe0 [<ffffffff8174582d>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff815d24d3>] ? cpuidle_enter_state+0x63/0x160 [<ffffffff815d24c1>] ? cpuidle_enter_state+0x51/0x160 [<ffffffff815d26b7>] cpuidle_enter+0x17/0x20 [<ffffffff810d7377>] cpu_startup_entry+0x397/0x3d0 [<ffffffff81733557>] rest_init+0x77/0x80 [<ffffffff81d4bffc>] start_kernel+0x486/0x4a7 [<ffffffff81d4b120>] ? early_idt_handlers+0x120/0x120 [<ffffffff81d4b4d7>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81d4b62b>] x86_64_start_kernel+0x152/0x175 Please also take a look at duplicates: https://bugzilla.redhat.com/show_bug.cgi?id=1149632 https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Same problem here on my macbook pro 8,1. Currently with Fedora 21, kernel 3.17.7-300, but has existed for at least a couple years with previous fedora versions. I have the b43 module blacklisted because I use an external usb wifi instead (Atheros AR7010+AR9280), so maybe this can rule out the wifi being related? Here's part of the log from my last boot: Dec 26 14:10:48 fedoramac kernel: sdhci-pci 0000:02:00.1: SDHCI controller found [14e4:16bc] (rev 10) Dec 26 14:10:48 fedoramac kernel: sdhci-pci 0000:02:00.1: No vmmc regulator found Dec 26 14:10:48 fedoramac kernel: sdhci-pci 0000:02:00.1: No vqmmc regulator found Dec 26 14:10:48 fedoramac kernel: mmc0: SDHCI controller on PCI [0000:02:00.1] using ADMA Dec 26 14:10:48 fedoramac kernel: Bluetooth: Core ver 2.19 Dec 26 14:10:48 fedoramac kernel: NET: Registered protocol family 31 Dec 26 14:10:48 fedoramac kernel: Bluetooth: HCI device and connection manager initialized Dec 26 14:10:48 fedoramac kernel: Bluetooth: HCI socket layer initialized Dec 26 14:10:48 fedoramac kernel: Bluetooth: L2CAP socket layer initialized Dec 26 14:10:48 fedoramac kernel: Bluetooth: SCO socket layer initialized Dec 26 14:10:48 fedoramac kernel: usbcore: registered new interface driver btusb Dec 26 14:10:48 fedoramac kernel: usb 1-1.1.1: USB disconnect, device number 7 Dec 26 14:10:48 fedoramac systemd-udevd[9055]: error opening USB device 'descriptors' file Dec 26 14:10:48 fedoramac kernel: usb 1-1.1.2: USB disconnect, device number 8 Dec 26 14:10:48 fedoramac kernel: irq 17: nobody cared (try booting with the "irqpoll" option) Dec 26 14:10:48 fedoramac kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.17.7-300.fc21.x86_64 #1 Dec 26 14:10:48 fedoramac kernel: Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS MBP81.88Z.0047.B27.1201241646 01/24/12 Dec 26 14:10:48 fedoramac kernel: 0000000000000000 0f99e105c1f28136 ffff88046fa03e48 ffffffff817401ea Dec 26 14:10:48 fedoramac kernel: ffff8804560f8700 ffff88046fa03e70 ffffffff810ee0d2 ffff8804560f8700 Dec 26 14:10:48 fedoramac kernel: 0000000000000000 0000000000000011 ffff88046fa03ea8 ffffffff810ee477 Dec 26 14:10:48 fedoramac kernel: Call Trace: Dec 26 14:10:48 fedoramac kernel: <IRQ> [<ffffffff817401ea>] dump_stack+0x45/0x56 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810ee0d2>] __report_bad_irq+0x32/0xd0 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810ee477>] note_interrupt+0x247/0x290 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810eb9a3>] handle_irq_event_percpu+0x133/0x1a0 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810eba47>] handle_irq_event+0x37/0x60 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810eea08>] handle_fasteoi_irq+0x78/0x150 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810164e4>] handle_irq+0x84/0x150 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810b633a>] ? atomic_notifier_call_chain+0x1a/0x20 Dec 26 14:10:48 fedoramac kernel: [<ffffffff8174a06d>] do_IRQ+0x4d/0xe0 Dec 26 14:10:48 fedoramac kernel: [<ffffffff81747ead>] common_interrupt+0x6d/0x6d Dec 26 14:10:48 fedoramac kernel: <EOI> [<ffffffff815d4553>] ? cpuidle_enter_state+0x63/0x160 Dec 26 14:10:48 fedoramac kernel: [<ffffffff815d4541>] ? cpuidle_enter_state+0x51/0x160 Dec 26 14:10:48 fedoramac kernel: [<ffffffff815d4737>] cpuidle_enter+0x17/0x20 Dec 26 14:10:48 fedoramac kernel: [<ffffffff810d75c7>] cpu_startup_entry+0x397/0x3d0 Dec 26 14:10:48 fedoramac kernel: [<ffffffff81735aa7>] rest_init+0x77/0x80 Dec 26 14:10:48 fedoramac kernel: [<ffffffff81d49004>] start_kernel+0x48e/0x4af Dec 26 14:10:48 fedoramac kernel: [<ffffffff81d48120>] ? early_idt_handlers+0x120/0x120 Dec 26 14:10:48 fedoramac kernel: [<ffffffff81d484d7>] x86_64_start_reservations+0x2a/0x2c Dec 26 14:10:48 fedoramac kernel: [<ffffffff81d4862b>] x86_64_start_kernel+0x152/0x175 Dec 26 14:10:48 fedoramac kernel: handlers: Dec 26 14:10:48 fedoramac kernel: [<ffffffffa0589830>] sdhci_irq [sdhci] threaded [<ffffffffa0585fa0>] sdhci_thread_irq [sdhci] Dec 26 14:10:48 fedoramac kernel: Disabling IRQ #17 This bug does not only happen in the MMC/SD card stack. I am running into this bug on a Asus P7H55-M/USB3 board with just 2 USB devices (mouse and keyboard) and no card reader installed. Still libreport tells me it is the same bug as https://bugzilla.redhat.com/show_bug.cgi?id=1149632 In my case it is IRQ 16 (not IRQ 17) that's affected. On this IRQ 16 I have 2 devices according to `$ lspci -v`: 00:1a.0 USB controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 06) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. Device 8383 Flags: bus master, medium devsel, latency 0, IRQ 16 Memory at f7dfa000 (32-bit, non-prefetchable) [size=1K] Capabilities: <access denied> Kernel driver in use: ehci-pci 00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 06) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0, IRQ 16 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: f0400000-f05fffff Prefetchable memory behind bridge: 00000000f0600000-00000000f07fffff Capabilities: <access denied> Kernel driver in use: pcieport Kernel modules: shpchp Both devices are part of the Intel H55 chipset. Always happens with these kernels as well, regardless of whether the b43 firmware is in /lib/firmware. 3.19.2-200.fc21.x86_64 4.0.0-0.rc4.git2.1.fc22.x86_64 It looks like this bug has been around for years. There is a report from 2011 from a user of linux kernel 3.1.1: "Unless the bcma and b43 modules are compiled and loaded, IRQ17 is shut early in the process." https://dentifrice.poivron.org/laptops/macbookpro8,2/ I have confirmed the bug is still present in linux 3.16.0-4 and latest linux 4.0.0-rc5+ 7fc377e on a 2012 Ivy Bridge Macbook Pro 13. mmc0 and b43 share interrupt 17. The bug happens when the sdhci module is loaded before b43. If b43 is builtin or loaded in initramfs (/etc/initramfs-tools/modules, update initramfs) then this bug will not occur. 02:00.1 SD Host controller: Broadcom Corporation BCM57765/57785 SDXC/MMC Card Reader (rev 21) 03:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02) @Christian Stadelmann - Your bug looks like a different one. This bug report is for the boot error "irq 17: nobody cared" from Broadcom BCM57765 card reader (as used in Macbook Pro) as described at https://lkml.org/lkml/2013/7/12/99 @Chris Bainbridge - I don't even have a broadcom card reader in this computer. It might be unrelated though. There is also bug #73241 - SDHCI PCI driver incompatible with 14e4:16bc / Broadcom BCM57765/57785 SDXC/MMC Card Reader This "irq 17: nobody cared" boot crash was repeatable together with a variety of other minor failures booting MacBookPro8,2/Mac 3.8Gb, Intel Core i7-2860QM @2.50GHz x 8 Interestingly, the co-installed RESCUE system 4.0.4-301.fc22.x86_64 #1 SMP came up and ran just fine. Not sure what the differences between the systems are. Created attachment 210981 [details]
[PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm
Here's a patch to fix this issue, please test if it works for you. It does for me.
This bugzilla entry is misclassified as an MMC/SD bug because the kernel error message makes it appear that sdhci_irq is the culprit. That's a red herring, the issue is caused by an IRQ storm originating from the wireless card.
What seems to be happening is that Apple's EFI bootloader enables the wireless card to use it for Internet Recovery and it leaves the card enabled when passing control to the OS. The card cries for attention by sending interrupts, interfering with other drivers using that same IRQ line. This does not stop until the wireless driver is loaded. The patch resets the card early on in the boot process to stop the interrupts.
(In reply to Lukas Wunner from comment #24) > Created attachment 210981 [details] > [PATCH] PCI: Add Broadcom 4331 reset quirk to prevent IRQ storm > > Here's a patch to fix this issue, please test if it works for you. It does > for me. > > This bugzilla entry is misclassified as an MMC/SD bug because the kernel > error message makes it appear that sdhci_irq is the culprit. That's a red > herring, the issue is caused by an IRQ storm originating from the wireless > card. > > What seems to be happening is that Apple's EFI bootloader enables the > wireless card to use it for Internet Recovery and it leaves the card enabled > when passing control to the OS. The card cries for attention by sending > interrupts, interfering with other drivers using that same IRQ line. This > does not stop until the wireless driver is loaded. The patch resets the card > early on in the boot process to stop the interrupts. Trying your patch on the Debian and kernel 4.4.10 sources It did not help for me. Messages continue in dmesg "irq 17: nobody cared" and irq storm is continuing. Besides messages in dmesg, as before i get to become unstable wi-fi, every hour randomly disconnects occur. I use last a proprietary broadcom driver 6.30.223.271 (Use b43 driver does not change anything) Also lost connection for a minute or a little less dmesg reports this "ERROR @wl_notify_scan_status: eth1 Scan_results error (-22)" (my wlan name eth1 :)) grep . -r /sys/firmware/acpi/interrupts/ /sys/firmware/acpi/interrupts/sci: 3730 /sys/firmware/acpi/interrupts/error: 0 /sys/firmware/acpi/interrupts/gpe00: 0 invalid /sys/firmware/acpi/interrupts/gpe01: 0 invalid /sys/firmware/acpi/interrupts/gpe02: 0 invalid /sys/firmware/acpi/interrupts/gpe03: 0 invalid /sys/firmware/acpi/interrupts/gpe04: 0 invalid /sys/firmware/acpi/interrupts/gpe05: 0 invalid /sys/firmware/acpi/interrupts/gpe06: 0 invalid /sys/firmware/acpi/interrupts/gpe07: 0 enabled /sys/firmware/acpi/interrupts/gpe08: 0 invalid /sys/firmware/acpi/interrupts/gpe09: 0 disabled /sys/firmware/acpi/interrupts/gpe10: 0 invalid /sys/firmware/acpi/interrupts/gpe11: 0 enabled /sys/firmware/acpi/interrupts/gpe12: 0 invalid /sys/firmware/acpi/interrupts/gpe13: 0 enabled /sys/firmware/acpi/interrupts/gpe14: 0 invalid /sys/firmware/acpi/interrupts/gpe15: 0 enabled /sys/firmware/acpi/interrupts/gpe16: 0 disabled /sys/firmware/acpi/interrupts/gpe0A: 0 invalid /sys/firmware/acpi/interrupts/gpe17: 3730 enabled /sys/firmware/acpi/interrupts/gpe0B: 0 invalid /sys/firmware/acpi/interrupts/gpe18: 0 invalid /sys/firmware/acpi/interrupts/gpe0C: 0 invalid /sys/firmware/acpi/interrupts/gpe19: 0 disabled /sys/firmware/acpi/interrupts/gpe0D: 0 disabled /sys/firmware/acpi/interrupts/gpe0E: 0 invalid /sys/firmware/acpi/interrupts/gpe20: 0 invalid /sys/firmware/acpi/interrupts/gpe0F: 0 invalid /sys/firmware/acpi/interrupts/gpe21: 0 invalid /sys/firmware/acpi/interrupts/gpe22: 0 invalid /sys/firmware/acpi/interrupts/gpe23: 0 enabled /sys/firmware/acpi/interrupts/gpe24: 0 invalid /sys/firmware/acpi/interrupts/gpe25: 0 invalid /sys/firmware/acpi/interrupts/gpe26: 0 invalid /sys/firmware/acpi/interrupts/gpe1A: 0 invalid /sys/firmware/acpi/interrupts/gpe27: 0 invalid /sys/firmware/acpi/interrupts/gpe1B: 0 invalid /sys/firmware/acpi/interrupts/gpe28: 0 invalid /sys/firmware/acpi/interrupts/gpe1C: 0 invalid /sys/firmware/acpi/interrupts/gpe29: 0 invalid /sys/firmware/acpi/interrupts/gpe1D: 0 invalid /sys/firmware/acpi/interrupts/gpe1E: 0 invalid /sys/firmware/acpi/interrupts/gpe30: 0 invalid /sys/firmware/acpi/interrupts/gpe1F: 0 invalid /sys/firmware/acpi/interrupts/gpe31: 0 invalid /sys/firmware/acpi/interrupts/gpe32: 0 invalid /sys/firmware/acpi/interrupts/gpe33: 0 invalid /sys/firmware/acpi/interrupts/gpe34: 0 invalid /sys/firmware/acpi/interrupts/gpe35: 0 invalid /sys/firmware/acpi/interrupts/gpe36: 0 invalid /sys/firmware/acpi/interrupts/gpe2A: 0 invalid /sys/firmware/acpi/interrupts/gpe37: 0 invalid /sys/firmware/acpi/interrupts/gpe2B: 0 invalid /sys/firmware/acpi/interrupts/gpe38: 0 invalid /sys/firmware/acpi/interrupts/gpe2C: 0 invalid /sys/firmware/acpi/interrupts/gpe39: 0 invalid /sys/firmware/acpi/interrupts/gpe2D: 0 invalid /sys/firmware/acpi/interrupts/gpe2E: 0 invalid /sys/firmware/acpi/interrupts/gpe2F: 0 invalid /sys/firmware/acpi/interrupts/gpe3A: 0 invalid /sys/firmware/acpi/interrupts/gpe3B: 0 invalid /sys/firmware/acpi/interrupts/gpe3C: 0 invalid /sys/firmware/acpi/interrupts/gpe3D: 0 invalid /sys/firmware/acpi/interrupts/gpe3E: 0 invalid /sys/firmware/acpi/interrupts/gpe3F: 0 invalid /sys/firmware/acpi/interrupts/sci_not: 5 /sys/firmware/acpi/interrupts/ff_pmtimer: 0 invalid /sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled /sys/firmware/acpi/interrupts/gpe_all: 3730 /sys/firmware/acpi/interrupts/ff_gbl_lock: 0 enabled /sys/firmware/acpi/interrupts/ff_pwr_btn: 0 enabled /sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IO-APIC 2-edge timer 8: 1 0 0 0 IO-APIC 8-edge rtc0 9: 4399 0 0 0 IO-APIC 9-fasteoi acpi 17: 105479 0 0 0 IO-APIC 17-fasteoi mmc0, eth1 19: 0 0 0 0 IO-APIC 19-fasteoi uhci_hcd:usb4 21: 0 0 0 0 IO-APIC 21-fasteoi uhci_hcd:usb3 22: 63 0 0 0 IO-APIC 22-fasteoi ehci_hcd:usb2 23: 15260 0 0 0 IO-APIC 23-fasteoi ehci_hcd:usb1 28: 0 0 0 0 PCI-MSI 3194880-edge pciehp 29: 0 0 0 0 PCI-MSI 3211264-edge pciehp 30: 0 0 0 0 PCI-MSI 3227648-edge pciehp 31: 0 0 0 0 PCI-MSI 3244032-edge pciehp 32: 3 0 0 0 PCI-MSI 2097152-edge firewire_ohci 33: 9309 0 0 0 PCI-MSI 512000-edge 0000:00:1f.2 34: 450 0 0 0 PCI-MSI 442368-edge snd_hda_intel 35: 6600 0 0 0 PCI-MSI 32768-edge i915 36: 71 0 0 0 PCI-MSI 1048576-edge eth0-tx-0 37: 38 0 0 0 PCI-MSI 1048577-edge eth0-rx-1 38: 34 0 0 0 PCI-MSI 1048578-edge eth0-rx-2 39: 1 0 0 0 PCI-MSI 1048579-edge eth0-rx-3 40: 15 0 0 0 PCI-MSI 1048580-edge eth0-rx-4 NMI: 1 0 0 0 Non-maskable interrupts LOC: 34113 14394 11025 10579 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 1 0 0 0 Performance monitoring interrupts IWI: 4 0 0 0 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 892 940 961 374 Rescheduling interrupts CAL: 1045 1052 1099 1086 Function call interrupts TLB: 297 380 184 200 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts DFR: 0 0 0 0 Deferred Error APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 3 3 3 3 Machine check polls ERR: 0 MIS: 0 PIN: 0 0 0 0 Posted-interrupt notification event PIW: 0 0 0 0 Posted-interrupt wakeup event 2.570457] irq 17: nobody cared (try booting with the "irqpoll" option) [ 2.570480] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.10 #2 [ 2.570481] Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, BIOS MBP81.88Z.0047.B2A.1506082203 06/08/15 [ 2.570482] 0000000000000000 ffffffff812cac6f ffff8800892e5200 ffff8800892e52d4 [ 2.570484] ffffffff810c1a70 ffff8800892e5200 0000000000000000 0000000000000000 [ 2.570486] ffffffff810c1ded 0000000000000000 0000000000000011 0000000000000000 [ 2.570487] Call Trace: [ 2.570489] <IRQ> [<ffffffff812cac6f>] ? dump_stack+0x5c/0x7d [ 2.570497] [<ffffffff810c1a70>] ? __report_bad_irq+0x30/0xc0 [ 2.570499] [<ffffffff810c1ded>] ? note_interrupt+0x22d/0x270 [ 2.570501] [<ffffffff810bf43d>] ? handle_irq_event_percpu+0x15d/0x1c0 [ 2.570503] [<ffffffff810bf4ca>] ? handle_irq_event+0x2a/0x50 [ 2.570504] [<ffffffff810c22fb>] ? handle_fasteoi_irq+0x8b/0x150 [ 2.570506] [<ffffffff81017e2c>] ? handle_irq+0x1c/0x30 [ 2.570508] [<ffffffff815514a6>] ? do_IRQ+0x46/0xd0 [ 2.570510] [<ffffffff8154f5c2>] ? common_interrupt+0x82/0x82 [ 2.570511] <EOI> [<ffffffff814265b7>] ? poll_idle+0x57/0xa0 [ 2.570516] [<ffffffff814260c8>] ? cpuidle_enter_state+0xc8/0x260 [ 2.570518] [<ffffffff810ac4ee>] ? cpu_startup_entry+0x2ae/0x370 [ 2.570519] [<ffffffff81931f42>] ? start_kernel+0x472/0x47a [ 2.570521] [<ffffffff81931120>] ? early_idt_handler_array+0x120/0x120 [ 2.570522] [<ffffffff81931600>] ? x86_64_start_kernel+0x145/0x154 [ 2.570523] handlers: [ 2.570533] [<ffffffffa0070640>] sdhci_irq [sdhci] threaded [<ffffffffa006ec40>] sdhci_thread_irq [sdhci] [ 2.570565] Disabling IRQ #17 Created attachment 216231 [details]
[RFC] x86/efi: Reset network interfaces on Apple Macs
Created attachment 216241 [details]
[RFC] x86: Add early quirk to reset Apple AirPort card
(In reply to hr from comment #25) > Trying your patch on the Debian and kernel 4.4.10 sources It did not help > for me. > > Messages continue in dmesg "irq 17: nobody cared" and irq storm is > continuing. Yes I've heard that the patch doesn't work for everybody. I don't know why, perhaps DisINTx is not set on the wireless card on some machines upon boot. To verify that's indeed the cause, boot with "modprobe.blacklist=b43 modprobe.blacklist=bcma modprobe.blacklist=wl", then check if "lspci -vv" reports "DisINTx-" for the wireless card. I've attached two alternative patches. They cannot be submitted as is, they need more polish, but I can't decide which one is better. Please test both and let me know if either or both work for you. They're based on 4.4 but can be applied to 4.6 as well with some fuzz. b43_eboot_4.4.patch resets all network interfaces before control is handed over from EFI to the kernel. This only works if you boot with the EFI stub. If you use gummiboot, the EFI stub is used by default. If you use grub I think you have to load the kernel with the "chainloader" directive, not the "linux" directive. You should briefly see a message "Welcome to Macintosh" on boot, plus the line "Resetting network interface" for every network card built into the machine. b43_earlyquirk_4.4.patch resets the BCM 4331 card during kernel initialization. In dmesg there should be a message "Resetting Apple AirPort card". If MMIO was not already enabled, you'll see an additional message "Enabling mmio on Apple AirPort card". Let me know if you see that additional message as I'm not sure if I should drop this, maybe MMIO is always enabled and this is not needed. I'm not sure if this is a bug in the EFI driver for the BCM 4331 or if this is actually a feature. Perhaps OS X supports some kind of connection handover from EFI. If this is true, it's not sufficient to just quirk the BCM 4331, we'd need this for all other cards used by Apple. E.g. models introduced 2013+ use BCM 4360. The eboot patch works for all cards whereas the earlyquirk patch re > > Besides messages in dmesg, as before i get to become unstable wi-fi, every > hour randomly disconnects occur. I use last a proprietary broadcom driver > 6.30.223.271 (Use b43 driver does not change anything) Also lost connection > for a minute or a little less dmesg reports this "ERROR > @wl_notify_scan_status: eth1 Scan_results error (-22)" (my wlan name eth1 :)) > > grep . -r /sys/firmware/acpi/interrupts/ > /sys/firmware/acpi/interrupts/sci: 3730 > /sys/firmware/acpi/interrupts/error: 0 > /sys/firmware/acpi/interrupts/gpe00: 0 invalid > /sys/firmware/acpi/interrupts/gpe01: 0 invalid > /sys/firmware/acpi/interrupts/gpe02: 0 invalid > /sys/firmware/acpi/interrupts/gpe03: 0 invalid > /sys/firmware/acpi/interrupts/gpe04: 0 invalid > /sys/firmware/acpi/interrupts/gpe05: 0 invalid > /sys/firmware/acpi/interrupts/gpe06: 0 invalid > /sys/firmware/acpi/interrupts/gpe07: 0 enabled > /sys/firmware/acpi/interrupts/gpe08: 0 invalid > /sys/firmware/acpi/interrupts/gpe09: 0 disabled > /sys/firmware/acpi/interrupts/gpe10: 0 invalid > /sys/firmware/acpi/interrupts/gpe11: 0 enabled > /sys/firmware/acpi/interrupts/gpe12: 0 invalid > /sys/firmware/acpi/interrupts/gpe13: 0 enabled > /sys/firmware/acpi/interrupts/gpe14: 0 invalid > /sys/firmware/acpi/interrupts/gpe15: 0 enabled > /sys/firmware/acpi/interrupts/gpe16: 0 disabled > /sys/firmware/acpi/interrupts/gpe0A: 0 invalid > /sys/firmware/acpi/interrupts/gpe17: 3730 enabled > /sys/firmware/acpi/interrupts/gpe0B: 0 invalid > /sys/firmware/acpi/interrupts/gpe18: 0 invalid > /sys/firmware/acpi/interrupts/gpe0C: 0 invalid > /sys/firmware/acpi/interrupts/gpe19: 0 disabled > /sys/firmware/acpi/interrupts/gpe0D: 0 disabled > /sys/firmware/acpi/interrupts/gpe0E: 0 invalid > /sys/firmware/acpi/interrupts/gpe20: 0 invalid > /sys/firmware/acpi/interrupts/gpe0F: 0 invalid > /sys/firmware/acpi/interrupts/gpe21: 0 invalid > /sys/firmware/acpi/interrupts/gpe22: 0 invalid > /sys/firmware/acpi/interrupts/gpe23: 0 enabled > /sys/firmware/acpi/interrupts/gpe24: 0 invalid > /sys/firmware/acpi/interrupts/gpe25: 0 invalid > /sys/firmware/acpi/interrupts/gpe26: 0 invalid > /sys/firmware/acpi/interrupts/gpe1A: 0 invalid > /sys/firmware/acpi/interrupts/gpe27: 0 invalid > /sys/firmware/acpi/interrupts/gpe1B: 0 invalid > /sys/firmware/acpi/interrupts/gpe28: 0 invalid > /sys/firmware/acpi/interrupts/gpe1C: 0 invalid > /sys/firmware/acpi/interrupts/gpe29: 0 invalid > /sys/firmware/acpi/interrupts/gpe1D: 0 invalid > /sys/firmware/acpi/interrupts/gpe1E: 0 invalid > /sys/firmware/acpi/interrupts/gpe30: 0 invalid > /sys/firmware/acpi/interrupts/gpe1F: 0 invalid > /sys/firmware/acpi/interrupts/gpe31: 0 invalid > /sys/firmware/acpi/interrupts/gpe32: 0 invalid > /sys/firmware/acpi/interrupts/gpe33: 0 invalid > /sys/firmware/acpi/interrupts/gpe34: 0 invalid > /sys/firmware/acpi/interrupts/gpe35: 0 invalid > /sys/firmware/acpi/interrupts/gpe36: 0 invalid > /sys/firmware/acpi/interrupts/gpe2A: 0 invalid > /sys/firmware/acpi/interrupts/gpe37: 0 invalid > /sys/firmware/acpi/interrupts/gpe2B: 0 invalid > /sys/firmware/acpi/interrupts/gpe38: 0 invalid > /sys/firmware/acpi/interrupts/gpe2C: 0 invalid > /sys/firmware/acpi/interrupts/gpe39: 0 invalid > /sys/firmware/acpi/interrupts/gpe2D: 0 invalid > /sys/firmware/acpi/interrupts/gpe2E: 0 invalid > /sys/firmware/acpi/interrupts/gpe2F: 0 invalid > /sys/firmware/acpi/interrupts/gpe3A: 0 invalid > /sys/firmware/acpi/interrupts/gpe3B: 0 invalid > /sys/firmware/acpi/interrupts/gpe3C: 0 invalid > /sys/firmware/acpi/interrupts/gpe3D: 0 invalid > /sys/firmware/acpi/interrupts/gpe3E: 0 invalid > /sys/firmware/acpi/interrupts/gpe3F: 0 invalid > /sys/firmware/acpi/interrupts/sci_not: 5 > /sys/firmware/acpi/interrupts/ff_pmtimer: 0 invalid > /sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled > /sys/firmware/acpi/interrupts/gpe_all: 3730 > /sys/firmware/acpi/interrupts/ff_gbl_lock: 0 enabled > /sys/firmware/acpi/interrupts/ff_pwr_btn: 0 enabled > /sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid > > > cat /proc/interrupts > CPU0 CPU1 CPU2 CPU3 > 0: 17 0 0 0 IO-APIC 2-edge > timer > 8: 1 0 0 0 IO-APIC 8-edge rtc0 > 9: 4399 0 0 0 IO-APIC 9-fasteoi acpi > 17: 105479 0 0 0 IO-APIC 17-fasteoi > mmc0, eth1 > 19: 0 0 0 0 IO-APIC 19-fasteoi > uhci_hcd:usb4 > 21: 0 0 0 0 IO-APIC 21-fasteoi > uhci_hcd:usb3 > 22: 63 0 0 0 IO-APIC 22-fasteoi > ehci_hcd:usb2 > 23: 15260 0 0 0 IO-APIC 23-fasteoi > ehci_hcd:usb1 > 28: 0 0 0 0 PCI-MSI 3194880-edge > pciehp > 29: 0 0 0 0 PCI-MSI 3211264-edge > pciehp > 30: 0 0 0 0 PCI-MSI 3227648-edge > pciehp > 31: 0 0 0 0 PCI-MSI 3244032-edge > pciehp > 32: 3 0 0 0 PCI-MSI 2097152-edge > firewire_ohci > 33: 9309 0 0 0 PCI-MSI 512000-edge > 0000:00:1f.2 > 34: 450 0 0 0 PCI-MSI 442368-edge > snd_hda_intel > 35: 6600 0 0 0 PCI-MSI 32768-edge > i915 > 36: 71 0 0 0 PCI-MSI 1048576-edge > eth0-tx-0 > 37: 38 0 0 0 PCI-MSI 1048577-edge > eth0-rx-1 > 38: 34 0 0 0 PCI-MSI 1048578-edge > eth0-rx-2 > 39: 1 0 0 0 PCI-MSI 1048579-edge > eth0-rx-3 > 40: 15 0 0 0 PCI-MSI 1048580-edge > eth0-rx-4 > NMI: 1 0 0 0 Non-maskable interrupts > LOC: 34113 14394 11025 10579 Local timer interrupts > SPU: 0 0 0 0 Spurious interrupts > PMI: 1 0 0 0 Performance monitoring > interrupts > IWI: 4 0 0 0 IRQ work interrupts > RTR: 0 0 0 0 APIC ICR read retries > RES: 892 940 961 374 Rescheduling interrupts > CAL: 1045 1052 1099 1086 Function call interrupts > TLB: 297 380 184 200 TLB shootdowns > TRM: 0 0 0 0 Thermal event interrupts > THR: 0 0 0 0 Threshold APIC interrupts > DFR: 0 0 0 0 Deferred Error APIC > interrupts > MCE: 0 0 0 0 Machine check exceptions > MCP: 3 3 3 3 Machine check polls > ERR: 0 > MIS: 0 > PIN: 0 0 0 0 Posted-interrupt > notification event > PIW: 0 0 0 0 Posted-interrupt wakeup > event > > > 2.570457] irq 17: nobody cared (try booting with the "irqpoll" option) > [ 2.570480] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.10 #2 > [ 2.570481] Hardware name: Apple Inc. MacBookPro8,1/Mac-94245B3640C91C81, > BIOS MBP81.88Z.0047.B2A.1506082203 06/08/15 > [ 2.570482] 0000000000000000 ffffffff812cac6f ffff8800892e5200 > ffff8800892e52d4 > [ 2.570484] ffffffff810c1a70 ffff8800892e5200 0000000000000000 > 0000000000000000 > [ 2.570486] ffffffff810c1ded 0000000000000000 0000000000000011 > 0000000000000000 > [ 2.570487] Call Trace: > [ 2.570489] <IRQ> [<ffffffff812cac6f>] ? dump_stack+0x5c/0x7d > [ 2.570497] [<ffffffff810c1a70>] ? __report_bad_irq+0x30/0xc0 > [ 2.570499] [<ffffffff810c1ded>] ? note_interrupt+0x22d/0x270 > [ 2.570501] [<ffffffff810bf43d>] ? handle_irq_event_percpu+0x15d/0x1c0 > [ 2.570503] [<ffffffff810bf4ca>] ? handle_irq_event+0x2a/0x50 > [ 2.570504] [<ffffffff810c22fb>] ? handle_fasteoi_irq+0x8b/0x150 > [ 2.570506] [<ffffffff81017e2c>] ? handle_irq+0x1c/0x30 > [ 2.570508] [<ffffffff815514a6>] ? do_IRQ+0x46/0xd0 > [ 2.570510] [<ffffffff8154f5c2>] ? common_interrupt+0x82/0x82 > [ 2.570511] <EOI> [<ffffffff814265b7>] ? poll_idle+0x57/0xa0 > [ 2.570516] [<ffffffff814260c8>] ? cpuidle_enter_state+0xc8/0x260 > [ 2.570518] [<ffffffff810ac4ee>] ? cpu_startup_entry+0x2ae/0x370 > [ 2.570519] [<ffffffff81931f42>] ? start_kernel+0x472/0x47a > [ 2.570521] [<ffffffff81931120>] ? early_idt_handler_array+0x120/0x120 > [ 2.570522] [<ffffffff81931600>] ? x86_64_start_kernel+0x145/0x154 > [ 2.570523] handlers: > [ 2.570533] [<ffffffffa0070640>] sdhci_irq [sdhci] threaded > [<ffffffffa006ec40>] sdhci_thread_irq [sdhci] > [ 2.570565] Disabling IRQ #17 (In reply to hr from comment #25) > Trying your patch on the Debian and kernel 4.4.10 sources It did not help > for me. > > Messages continue in dmesg "irq 17: nobody cared" and irq storm is > continuing. Yes I've heard that the patch doesn't work for everybody. I don't know why, perhaps DisINTx is not set on the wireless card on some machines upon boot. To verify that's indeed the cause, boot with "modprobe.blacklist=b43 modprobe.blacklist=bcma modprobe.blacklist=wl", then check if "lspci -vv" reports "DisINTx-" for the wireless card. I've attached two alternative patches. They cannot be submitted as is, they need more polish, but I can't decide which one is better. Please test both and let me know if either or both work for you. They're based on 4.4 but can be applied to 4.6 as well with some fuzz. b43_eboot_4.4.patch resets all network interfaces before control is handed over from EFI to the kernel. This only works if you boot with the EFI stub. If you use gummiboot, the EFI stub is used by default. If you use grub I think you have to load the kernel with the "chainloader" directive, not the "linux" directive. You should briefly see a message "Welcome to Macintosh" on boot, plus the line "Resetting network interface" for every network card built into the machine. b43_earlyquirk_4.4.patch resets the BCM 4331 card during kernel initialization. In dmesg there should be a message "Resetting Apple AirPort card". If MMIO was not already enabled, you'll see an additional message "Enabling mmio on Apple AirPort card". Let me know if you see that additional message as I'm not sure if I should drop this, maybe MMIO is always enabled and this is not needed. I'm not sure if this is a bug in the EFI driver for the BCM 4331 or if it's actually a feature. Perhaps OS X supports some kind of connection handover from EFI. If that is true, it's not sufficient to just quirk the BCM 4331, we'd need this for all other cards used by Apple. E.g. models introduced 2013+ use BCM 4360. The eboot patch works for all cards whereas the earlyquirk patch requires a list of all cards used by Apple. So that's an advantage of the eboot patch. *If* this is indeed a feature, which I don't know. A disadvantage of the eboot quirk is that it only works if the EFI stub is used. So I have no idea which patch to continue working on. If only one of them works for you, that would certainly make the choice easier. testing b43_earlyquirk_4.4.patch dmesg [ 0.000000] early_pci_scan 0000:00:00.0 [8086:0104] [ 0.000000] early_pci_scan 0000:00:01.0 [8086:0101] [ 0.000000] early_pci_scan 0000:00:01.1 [8086:0105] [ 0.000000] early_pci_scan 0000:05:00.0 [8086:1513] [ 0.000000] early_pci_scan 0000:00:02.0 [8086:0126] [ 0.000000] Reserving Intel graphics stolen memory at 0x8ba00000-0x8f9fffff [ 0.000000] early_pci_scan 0000:00:16.0 [8086:1c3a] [ 0.000000] early_pci_scan 0000:00:1a.0 [8086:1c2c] [ 0.000000] early_pci_scan 0000:00:1a.1 [8086:1c2e] [ 0.000000] early_pci_scan 0000:00:1b.0 [8086:1c20] [ 0.000000] early_pci_scan 0000:00:1c.0 [8086:1c10] [ 0.000000] early_pci_scan 0000:02:00.0 [14e4:16b4] [ 0.000000] early_pci_scan 0000:02:00.1 [14e4:16bc] [ 0.000000] early_pci_scan 0000:00:1c.1 [8086:1c12] [ 0.000000] early_pci_scan 0000:03:00.0 [14e4:4331] [ 0.000000] Mapping address 0xa0600000 for Apple AirPort card [ 0.000000] Resetting Apple AirPort card [ 0.000000] early_pci_scan 0000:00:1c.2 [8086:1c14] [ 0.000000] early_pci_scan 0000:04:00.0 [11c1:5901] [ 0.000000] early_pci_scan 0000:00:1d.0 [8086:1c27] [ 0.000000] early_pci_scan 0000:00:1d.1 [8086:1c28] [ 0.000000] early_pci_scan 0000:00:1f.0 [8086:1c49] and again [ 2.493855] irq 17: nobody cared (try booting with the "irqpoll" option) ... [ 2.493957] handlers: [ 2.493972] [<ffffffffa0129640>] sdhci_irq [sdhci] threaded [<ffffffffa0127c40>] sdhci_thread_irq [sdhci] [ 2.494023] Disabling IRQ #17 parasitic interrupts irq 17 continuing. none message "Enabling mmio on Apple AirPort card". lspci -vv with blacklisted wi-fi drivers: 03:00.0 Network controller: Broadcom Corporation BCM4331 802.11a/b/g/n (rev 02) Subsystem: Broadcom Corporation BCM4331 802.11a/b/g/n Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+ Latency: 0, Cache Line Size: 256 bytes Interrupt: pin A routed to IRQ 0 Region 0: Memory at a0600000 (64-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=2 PME- Capabilities: [58] Vendor Specific Information: Len=78 <?> Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [d0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <32us ClockPM+ Surprise- LLActRep+ BwNot- LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 14, GenCap+ CGenEn- ChkCap+ ChkEn- Capabilities: [13c v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [160 v1] Device Serial Number 00-00-00-ff-ff-00-00-00 Capabilities: [16c v1] Power Budgeting <?> Testing b43_eboot_4.4.patch patch will take longer. because it is i currently having problems with grub2 loading the kernel in EFI stub. To reading manuals will take some time. I decided to spend a little experiment. I am physically disconnected conbined Broadcom BCM94331 module from the laptop motherboard. and I got a strange result, message "irq 17: nobody cared" gone, but this: hr@debian:~$ grep . -r /sys/firmware/acpi/interrupts/ /sys/firmware/acpi/interrupts/sci: 23577 /sys/firmware/acpi/interrupts/error: 0 /sys/firmware/acpi/interrupts/gpe00: 0 invalid /sys/firmware/acpi/interrupts/gpe01: 0 invalid /sys/firmware/acpi/interrupts/gpe02: 0 invalid /sys/firmware/acpi/interrupts/gpe03: 0 invalid /sys/firmware/acpi/interrupts/gpe04: 0 invalid /sys/firmware/acpi/interrupts/gpe05: 0 invalid /sys/firmware/acpi/interrupts/gpe06: 0 invalid /sys/firmware/acpi/interrupts/gpe07: 0 enabled /sys/firmware/acpi/interrupts/gpe08: 0 invalid /sys/firmware/acpi/interrupts/gpe09: 0 disabled /sys/firmware/acpi/interrupts/gpe10: 0 invalid /sys/firmware/acpi/interrupts/gpe11: 0 enabled /sys/firmware/acpi/interrupts/gpe12: 0 invalid /sys/firmware/acpi/interrupts/gpe13: 0 enabled /sys/firmware/acpi/interrupts/gpe14: 0 invalid /sys/firmware/acpi/interrupts/gpe15: 0 enabled /sys/firmware/acpi/interrupts/gpe16: 0 disabled /sys/firmware/acpi/interrupts/gpe0A: 0 invalid /sys/firmware/acpi/interrupts/gpe17: 23577 enabled /sys/firmware/acpi/interrupts/gpe0B: 0 invalid /sys/firmware/acpi/interrupts/gpe18: 0 invalid /sys/firmware/acpi/interrupts/gpe0C: 0 invalid /sys/firmware/acpi/interrupts/gpe19: 0 disabled /sys/firmware/acpi/interrupts/gpe0D: 0 disabled /sys/firmware/acpi/interrupts/gpe0E: 0 invalid /sys/firmware/acpi/interrupts/gpe20: 0 invalid /sys/firmware/acpi/interrupts/gpe0F: 0 invalid /sys/firmware/acpi/interrupts/gpe21: 0 invalid /sys/firmware/acpi/interrupts/gpe22: 0 invalid /sys/firmware/acpi/interrupts/gpe23: 0 enabled /sys/firmware/acpi/interrupts/gpe24: 0 invalid /sys/firmware/acpi/interrupts/gpe25: 0 invalid /sys/firmware/acpi/interrupts/gpe26: 0 invalid /sys/firmware/acpi/interrupts/gpe1A: 0 invalid /sys/firmware/acpi/interrupts/gpe27: 0 invalid /sys/firmware/acpi/interrupts/gpe1B: 0 invalid /sys/firmware/acpi/interrupts/gpe28: 0 invalid /sys/firmware/acpi/interrupts/gpe1C: 0 invalid /sys/firmware/acpi/interrupts/gpe29: 0 invalid /sys/firmware/acpi/interrupts/gpe1D: 0 invalid /sys/firmware/acpi/interrupts/gpe1E: 0 invalid /sys/firmware/acpi/interrupts/gpe30: 0 invalid /sys/firmware/acpi/interrupts/gpe1F: 0 invalid /sys/firmware/acpi/interrupts/gpe31: 0 invalid /sys/firmware/acpi/interrupts/gpe32: 0 invalid /sys/firmware/acpi/interrupts/gpe33: 0 invalid /sys/firmware/acpi/interrupts/gpe34: 0 invalid /sys/firmware/acpi/interrupts/gpe35: 0 invalid /sys/firmware/acpi/interrupts/gpe36: 0 invalid /sys/firmware/acpi/interrupts/gpe2A: 0 invalid /sys/firmware/acpi/interrupts/gpe37: 0 invalid /sys/firmware/acpi/interrupts/gpe2B: 0 invalid /sys/firmware/acpi/interrupts/gpe38: 0 invalid /sys/firmware/acpi/interrupts/gpe2C: 0 invalid /sys/firmware/acpi/interrupts/gpe39: 0 invalid /sys/firmware/acpi/interrupts/gpe2D: 0 invalid /sys/firmware/acpi/interrupts/gpe2E: 0 invalid /sys/firmware/acpi/interrupts/gpe2F: 0 invalid /sys/firmware/acpi/interrupts/gpe3A: 0 invalid /sys/firmware/acpi/interrupts/gpe3B: 0 invalid /sys/firmware/acpi/interrupts/gpe3C: 0 invalid /sys/firmware/acpi/interrupts/gpe3D: 0 invalid /sys/firmware/acpi/interrupts/gpe3E: 0 invalid /sys/firmware/acpi/interrupts/gpe3F: 0 invalid /sys/firmware/acpi/interrupts/sci_not: 21 /sys/firmware/acpi/interrupts/ff_pmtimer: 0 invalid /sys/firmware/acpi/interrupts/ff_rt_clk: 0 disabled /sys/firmware/acpi/interrupts/gpe_all: 23577 /sys/firmware/acpi/interrupts/ff_gbl_lock: 0 enabled /sys/firmware/acpi/interrupts/ff_pwr_btn: 0 enabled /sys/firmware/acpi/interrupts/ff_slp_btn: 0 invalid Who sending interrupts gpe17 if wi-fi physically not present in system? Thats normal situation or not? May be it may be something i do not understand? (In reply to hr from comment #31) > Testing b43_eboot_4.4.patch patch will take longer. because it is i > currently having problems with grub2 loading the kernel in EFI stub. To > reading manuals will take some time. I've just realized, since you're using Debian it might be sufficient to just replace "linux" with "linuxefi" in your grub configuration. The Debian grub2 package carries a patch to enable this command: http://lists.gnu.org/archive/html/grub-devel/2014-01/msg00137.html https://anonscm.debian.org/cgit/pkg-grub/grub.git/commit/?id=e4951626e0ab33c446cfd7e9a22044f602ca0106 Obviously resetting the card didn't work with b43_eboot_4.4.patch, the lspci output shows "INTx+" in the "Status:" line, so the card keeps asserting its interrupt line. Question is why. Michael Büsch mentioned on linux-wireless@ that my patch only works if the wireless core is the second one on the card. On my machine that's the case, but perhaps your machine uses a different revision of the chip with a different layout. If you try "modprobe b43", which cores does it list in dmesg? On my machine it looks like this: [ 59.706481] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0) [ 59.708383] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0) [ 59.710324] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0) If the core labeled "IEEE 802.11" has a different id than "1" (i.e., it's not the second core), that would explain why the patch isn't working on your machine. > I decided to spend a little experiment. I am physically disconnected > conbined Broadcom BCM94331 module from the laptop motherboard. and I got a > strange result, message "irq 17: nobody cared" gone, but this: [...] > /sys/firmware/acpi/interrupts/gpe17: 23577 enabled [...] > Who sending interrupts gpe17 if wi-fi physically not present in system? > Thats normal situation or not? May be it may be something i do not > understand? GPE 17 is used by the ACPI Embedded Controller, apparently it's unhappy if the wireless module is disconnected. There's an I2C bus between the EC and pins 12 and 14 of the wireless module connector which I believe is used for a temperature sensor, perhaps the EC fires its interrupt because it thinks that something on or near the wireless module is overheating. Also, pin 10 is used to signal wireless events (Wake on Wireless LAN perhaps), the pin goes to low (!) when an event occurs, so by disconnecting the card I guess it signals an event all the time. Ugh, s/b43_eboot_4.4.patch/b43_earlyquirk_4.4.patch/. (In reply to Lukas Wunner from comment #32) > Michael Büsch mentioned on linux-wireless@ > that my patch only works if the wireless core is the second one on the card. > On my machine that's the case, but perhaps your machine uses a different > revision of the chip with a different layout. If you try "modprobe b43", > which cores does it list in dmesg? [23129.929976] bcma: bus0: Found chip with id 0x4331, rev 0x02 and package 0x09 [23129.930009] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0) [23129.930034] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0) [23129.930081] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0) [23129.941219] bcma: bus0: Bus registered [23130.020310] b43-phy0: Broadcom 4331 WLAN found (core revision 29) [23130.020731] b43-phy0: Found PHY: Analog 9, Type 7 (HT), Revision 1 [23130.020751] b43-phy0: Found Radio: Manuf 0x17F, ID 0x2059, Revision 0, Version 1 [23130.020758] b43-phy0 warning: 5 GHz band is unsupported on this PHY > GPE 17 is used by the ACPI Embedded Controller, apparently it's unhappy if > the wireless module is disconnected. There's an I2C bus between the EC and > pins 12 and 14 of the wireless module connector which I believe is used for > a temperature sensor, perhaps the EC fires its interrupt because it thinks > that something on or near the wireless module is overheating. Also, pin 10 > is used to signal wireless events (Wake on Wireless LAN perhaps), the pin > goes to low (!) when an event occurs, so by disconnecting the card I guess > it signals an event all the time. I would like to say that the situation gpe17 does not change, if the card is connected or not connected. It seems rather strange, can may be faulty hardware controller cardreader? I was not able to get the kernel boot efi stub on grub2. Linuxefi settings work, but I always get a kernel panic, as the kernel can not find the file systems. I even tried to build into the kernel cmdline option indicating the partition UUID but something I'm doing wrong and it does not work. Finally install rEFInd, it loads the kernel without any problems. your patch b43_eboot_4.4.patch work early in the boot there are lines Welcome to Macintosh, and two lines "Resetting network interface" Error "irq 17: nobody cared (try booting with the" irqpoll "option)" is no more. But now the system hangs during operation minutes or slightly less if there sometimes is used Internet application (no response to the cursor does not react to pressing the keypad). Of course, I can`t see dmesg, all I can do is clamp the power button on the laptop to turn it off and on again. If wl driver is not loaded, then there is no hang-ups. This has never happened to the other versions kernels and with the kernel in default Debian. Now i boot with kernel version 3.16, and with the same version of the driver wl, everything works as before. Every time compile source driver for the new kernel, i do not just copy the file wl.ko. Testing b43_eboot_4.4.patch with b43 driver. No hang and freezes, but if download large files at high speed, process irq/17-b43 use 10-20% cpu. cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 17 0 0 0 IO-APIC 2-edge timer 8: 1 0 0 0 IO-APIC 8-edge rtc0 9: 13581 0 0 0 IO-APIC 9-fasteoi acpi 17: 2276967 0 0 0 IO-APIC 17-fasteoi mmc0, b43 19: 0 0 0 0 IO-APIC 19-fasteoi uhci_hcd:usb4 21: 0 0 0 0 IO-APIC 21-fasteoi uhci_hcd:usb3 22: 68 0 0 0 IO-APIC 22-fasteoi ehci_hcd:usb2 23: 86807 0 0 0 IO-APIC 23-fasteoi ehci_hcd:usb1 28: 0 0 0 0 PCI-MSI 3194880-edge pciehp 29: 0 0 0 0 PCI-MSI 3211264-edge pciehp 30: 0 0 0 0 PCI-MSI 3227648-edge pciehp 31: 0 0 0 0 PCI-MSI 3244032-edge pciehp 32: 3 0 0 0 PCI-MSI 2097152-edge firewire_ohci 33: 20142 0 0 0 PCI-MSI 512000-edge 0000:00:1f.2 34: 43489 0 0 0 PCI-MSI 442368-edge snd_hda_intel 35: 71355 0 0 0 PCI-MSI 32768-edge i915 36: 1 0 0 0 PCI-MSI 1048576-edge eth0-tx-0 37: 1 0 0 0 PCI-MSI 1048577-edge eth0-rx-1 38: 1 0 0 0 PCI-MSI 1048578-edge eth0-rx-2 39: 1 0 0 0 PCI-MSI 1048579-edge eth0-rx-3 40: 1 0 0 0 PCI-MSI 1048580-edge eth0-rx-4 NMI: 0 0 0 0 Non-maskable interrupts LOC: 231063 148480 188319 113361 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 39733 15928 231716 11938 Rescheduling interrupts CAL: 1084 1283 1210 1307 Function call interrupts TLB: 20252 14652 23227 14233 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts DFR: 0 0 0 0 Deferred Error APIC interrupts MCE: 0 0 0 0 Machine check exceptions MCP: 7 7 7 7 Machine check polls ERR: 0 MIS: 0 PIN: 0 0 0 0 Posted-interrupt notification event PIW: 0 0 0 0 Posted-interrupt wakeup event Thank you for the extensive testing. I've looked at the b43 driver to see which steps I might be missing in b43_earlyquirk_4.4.patch and realized that before resetting the wireless core, it has to be mapped first. On my machine that step isn't needed but perhaps it is on yours. Could you try the following: Apply the newly attached b43_print_core_addr.patch to a stock kernel, boot and load b43. The driver should now output the address and wrap for each core. Looks like this on my machine: [ 3596.239935] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0, addr 0X18000000, wrap 0X18100000) [ 3596.241657] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0, addr 0X18001000, wrap 0X18101000) [ 3596.243438] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0, addr 0X18002000, wrap 0X18102000) Reboot with all wireless modules blacklisted (b43, bcma, wl). Invoke "lspci -vvvv -xxxx -s 03:00.0". You will get a hexdump of the wireless card's PCI config space. Look at the four bytes starting at positions 0x80 and 0xac. This is the address and wrap of the core that is currently mapped (in little endian order). For the patch to work, this needs to be the address and wrap of the IEEE 802.11 core. E.g. on my machine, the four bytes at position 0x80 are 00 10 00 18 (= addr 0x18001000), at position 0xac it's 00 10 10 18 (= wrap 0x18101000). Please let me know to which of the three cores the 2x four bytes in the config space correspond. As for b43_eboot_4.4.patch causing the system to freeze when the broadcom-sta driver is used, I don't really have an idea yet what might be the cause. The high CPU load with b43 might be normal, I'll check this on my machine. As for GPE 17, I can see that about 500 interrupts have accumulated for this GPE on my machine shortly after booting, something in the 20000+ range seems high indeed. After briefly looking at drivers/acpi/ec.c, I think it should be possible to determine what is causing the GPE to fire by adding some debug code to acpi_ec_query() so that the _Qxx number is printed. That number can then be compared to the _Qxx methods in DSDT. I'll see to it that I cook up a patch for that. Created attachment 216631 [details]
[PATCH] bcma: Print addr and wrap on core scan
(In reply to Lukas Wunner from comment #37) > Could you try the following: Apply the newly attached > b43_print_core_addr.patch to a stock kernel, boot and load b43. The driver > should now output the address and wrap for each core. Looks like this on my > machine: with b43_print_core_addr.patch [ 8.923331] bcma: bus0: Found chip with id 0x4331, rev 0x02 and package 0x09 [ 8.923365] bcma: bus0: Core 0 found: ChipCommon (manuf 0x4BF, id 0x800, rev 0x25, class 0x0, addr 0X18000000, wrap 0X18100000) [ 8.923392] bcma: bus0: Core 1 found: IEEE 802.11 (manuf 0x4BF, id 0x812, rev 0x1D, class 0x0, addr 0X18001000, wrap 0X18101000) [ 8.923446] bcma: bus0: Core 2 found: PCIe (manuf 0x4BF, id 0x820, rev 0x13, class 0x0, addr 0X18002000, wrap 0X18102000) [ 8.973257] bcma: bus0: Bus registered > Reboot with all wireless modules blacklisted (b43, bcma, wl). Invoke "lspci > -vvvv -xxxx -s 03:00.0". You will get a hexdump of the wireless card's PCI > config space. Look at the four bytes starting at positions 0x80 and 0xac. > This is the address and wrap of the core that is currently mapped (in little > endian order). For the patch to work, this needs to be the address and wrap > of the IEEE 802.11 core. > > E.g. on my machine, the four bytes at position 0x80 are 00 10 00 18 (= addr > 0x18001000), at position 0xac it's 00 10 10 18 (= wrap 0x18101000). > > Please let me know to which of the three cores the 2x four bytes in the > config space correspond. 00: e4 14 31 43 06 00 18 00 02 00 80 02 40 00 00 00 10: 04 00 60 a0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 e4 14 31 43 30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 00 00 40: 01 58 03 06 08 40 00 00 05 d0 80 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 09 48 78 00 13 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 10 00 18 00 00 00 00 00 00 00 00 03 00 00 00 90: 00 02 00 00 00 03 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 0b 00 00 10 10 18 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 10 00 01 00 a0 8f 90 05 00 00 10 00 11 dc 16 00 e0: 43 01 11 30 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 100: 01 00 c1 13 00 00 00 00 00 00 00 00 11 20 06 00 110: 00 00 00 00 00 20 00 00 b4 00 00 00 00 00 00 00 120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 130: 00 00 00 00 00 00 00 00 00 00 00 00 02 00 01 16 140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 150: ff 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 160: 03 00 c1 16 00 00 00 ff ff 00 00 00 04 00 01 00 170: 00 00 00 00 c4 62 05 00 01 00 00 00 00 00 00 00 180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... ff0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 range 180-ff0 is null. > As for GPE 17, I can see that about 500 interrupts have accumulated for this > GPE on my machine shortly after booting, something in the 20000+ range seems > high indeed. After briefly looking at drivers/acpi/ec.c, I think it should > be possible to determine what is causing the GPE to fire by adding some > debug code to acpi_ec_query() so that the _Qxx number is printed. That > number can then be compared to the _Qxx methods in DSDT. I'll see to it that > I cook up a patch for that. may be i introduce you astray. GPE17 actually has about 500 interrupts after boot (with b43_eboot_4.4.patch), 20000+ it after a long work laptop. if you have the same values after boot, i should calm down about this. So the reason why b43_earlyquirk_4.4.patch doesn't work on your machine seems to be that the wireless card is in power state D3hot. I'm attaching a new version which transitions the card to D0 before resetting it. The commit message contains an explanation why the card is in D3hot. (It's caused by grub.) Please let me know if this new patch works for you. I'd be glad to include a Tested-by: in the commit message so that you get credit for your testing efforts. If you would like to be credited with your real name, please send it to me by e-mail. If you prefer to remain anonymous, that is also fine of course. As for GPE 17, I've hacked the EC interrupt handler to output the _Qxx number and the about 500 interrupts on boot seem to be caused by initializing the battery (_Q10). Afterwards I've only seen _Q40, which is related to the Ambient Light Sensor. This increases the interrupt count by 6 each time. Wave your hand in front of the camera (where the ALS is located) and the interrupt count on GPE 17 will go up fairly quickly. Created attachment 217121 [details]
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot]
(In reply to Lukas Wunner from comment #40) >[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot] works fine. after boot: cat /proc/interrupts 17: 553 0 0 0 IO-APIC 17-fasteoi mmc0, b43 Аlso is still proprietary driver wl makes the system to hang. If your patch is accepted into the linux kernel then it will be a surprise for those who use proprietary driver and once upgraded to patched version of the kernel. > As for GPE 17, I've hacked the EC interrupt handler to output the _Qxx > number and the about 500 interrupts on boot seem to be caused by > initializing the battery (_Q10). Afterwards I've only seen _Q40, which is > related to the Ambient Light Sensor. This increases the interrupt count by 6 > each time. Wave your hand in front of the camera (where the ALS is located) > and the interrupt count on GPE 17 will go up fairly quickly. It may be possible for Ambient Light Sensor disable send an interrupt? because it is not used for example for me, i always adjust the brightness manually. Thank you very much for the detailed explanation and your patches. I recompiled my kernel (4.5.4-1-ARCH) with your latest patch: >[PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot]. It seems to have fixed the IRQ problem on start up. I am run b43-firmware from AUR normally but I have installed and configured as hr mentioned issues with that combination. I installed broadcom-wl from AUR, disabled b43 and ssb and modprobed wl. My connection doesn't seem as stable with this wl driver as with b43 but I haven't had the box hang on me or anything yet. hr do you have a good procedure to reproduce the freezing/hanging you are seeing with the wl driver? Could you provide more information about how and where it is hanging for you?' kernel information: uname -a 4.5.4-1 SMP PREEMPT Thu May 26 12:46:14 EDT 2016 x86_64 GNU/Linux patches: [PATCH] x86: Add early quirk to reset Apple AirPort card [+ resume from D3hot] dmesg: Going to try to attach Created attachment 217731 [details]
dmesg after appying patch uploaded @ 2016-05-23 15:59 UTC
Thank you Bryan. Is the dmesg output from when you tested it with wl? Because on that boot, b43 was used. Otherwise it looks fine, except for the Thunderbolt controller not being supported (which is fixed in 4.7) and the BIOS being from 2016 (which is odd, I thought the newest version was Apple's EFI Update 2015-002). hr tells me that lockups only occur with large amounts of traffic. When transmitting just pings, things seem to work fine even over longer periods of time. Unfortunately I don't have a wifi AP at my disposal right now, I mostly use Gigabit Ethernet, so I have to lean on you guys to test it. Created attachment 217741 [details]
[PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure according to Michael Büsch]
Created attachment 217751 [details]
dmesg + wl after appying patch uploaded @ 2016-05-23 15:59 UTC
I've sent an e-mail to Broadcom support asking for help: http://lists.infradead.org/pipermail/b43-dev/2016-May/003975.html Michael Büsch (not at Broadcom but a regular b43 contributor) responded that my simplified reset procedure (which just sets the reset bit and does nothing else) might cause issues with wl and that I need to follow the procedure as per the bcma source code: http://lists.infradead.org/pipermail/b43-dev/2016-May/003976.html So I've just attached a new version of the patch which follows Michael's specification to the bit. Not sure if it helps but worth a try? In the bcma source code we're actually doing a bit more, we're writing to the BCMA_IOCTL a couple of times. If the latest patch doesn't resolve the issue I could add that as well: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/bcma/core.c#n42 Hello Lukas, Dmesg was with b43 and now I have uploaded one with wl. Hadn't even thought to look into the Thunderbolt stuff yet. Just recently got this machine. Good to hear it's in 4.7! As I am not familiar with the platform I have no idea about the EFI update version. My friend had updated the device to El Captain just before I received it. Packaged with it maybe? I have dumped the eeprom if you are interested. I have been doing internet based testing but my connection isn't very fast. I will setup some wifi/lan tests with high throughput. Will be interesting if I can reproduce the same issue hr is seeing or not. I was seeing lots of problems before but now it seems really stable now using: aur/broadcom-wl-dkms-248 6.30.223.248-1 [installed] (2) (0.03) Interesting about the reset procedure. Nice of Michael to respond. Could make sense. Wonder if it will need he BCMA_IOCTL or not. Is there anyway to access the registers or any other information that might help when it is in an unstable state? If I can get it into one anyway. Takes a good while to compile on here (: I will get back to you a bit later with the results. Do you know which version of wl hr was using? (In reply to Bryan Paradis from comment #43) > I installed broadcom-wl from AUR Thanks for info Bryan. I download last shapshot drivers https://aur.archlinux.org/packages/broadcom-wl-dkms/ Аnd find there something interesting, two patches: 001-null-pointer-fix.patch 002-rdtscl.patch I applied them to the drivers who have installed so far: https://www.broadcom.com/docs/linux_sta/hybrid-v35-nodebug-pcoem-6_30_223_271.tar.gz And now have been no hangups everything works fine with patch Lukas. Great, thanks for the research, it's good to know about that null pointer issue in wl and the corresponding patch. I've just sent the patch to the lists, let's see if there are any objections, bikeshedding requests, etc. Good find hr. I have reconfigured my system with a new drive so I have been pretty busy. Just got the kernel built with the latest patch: 217741. I have been having some wonky wifi without the past at least. I will see how this goes and report back. (In reply to Lukas Wunner from comment #46) > Created attachment 217741 [details] > [PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure > according to Michael Büsch] Yesterday try this patch, some errors come back in dmesg: [23289.098445] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22) [23404.895181] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22) Soon i try to revert to previous version patch and test. I was having some instability problems which I troubleshot down to some weird disassociation problems on the AP side. Updated my OpenWRT to 15.05.1 and dropped 40mhz width back to 20mhz. No problems since. I haven't seen the errors in my dmesg as you hr.Could you provide any more context? New information. I have been messing around with the kernel source see if I can get further towards a fix for this issue. Definitely something earlier in the UHS init is breaking the tuning. I am unsure. Can someone else confirm that me thinking that the BCM57785 has an on board voltage switching regular and so there is no reason that it wouldn't be capable of v1.8? Some Hardware Tests: --TEST1-- Physically disconnected BCM4331 Wifi Card + UHS-I card + Cat5 not plugged into BCM57785 Ethernet + tg3 (ethernet) module loaded = BAD: Timeouts occur waiting for hardware interrupt. [ 42.784679] mmc0: Timeout waiting for hardware interrupt. [ 52.812088] mmc0: Timeout waiting for hardware interrupt. [ 60.889130] mmc0: Card removed during transfer! [ 60.889138] mmc0: Resetting controller. [ 60.899787] mmc0: error -123 whilst initialising SD card [ 72.120058] mmc0: Timeout waiting for hardware interrupt. [ 82.147469] mmc0: Timeout waiting for hardware interrupt. [ 101.135438] mmc0: Timeout waiting for hardware interrupt. [ 111.162834] mmc0: Timeout waiting for hardware interrupt. [ 121.190203] mmc0: Timeout waiting for hardware interrupt. [ 140.204860] mmc0: Timeout waiting for hardware interrupt. [ 140.213640] mmc0: error -123 whilst initialising SD card --TEST2-- Physically disconnected BCM4331 Wifi Card + UHS-I card + Cat5 plugged into BCM57785 Ethernet <--- Change here + tg3 (ethernet) module loaded = Good: voltage switch gets skipped and card loads as high speed [ 1755.779744] tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex [ 1755.779766] tg3 0000:01:00.0 enp1s0f0: Flow control is on for TX and on for RX [ 1755.779772] tg3 0000:01:00.0 enp1s0f0: EEE is disabled [ 1775.595860] mmc0: Skipping voltage switch [ 1776.610955] mmc0: new high speed SDXC card at address e624 [ 1776.614773] mmcblk0: mmc0:e624 SU64G 59.5 GiB (ro) [ 1776.616032] mmcblk0: p1 p2 --TEST3-- Physically disconnected BCM4331 Wifi Card + UHS-I card + Cat5 plugged into BCM57785 Ethernet + tg3 (ethernet) module unloaded <--- Change here = Bad: Same result as TEST1 --TEST4-- Physically disconnected BCM4331 Wifi Card + UHS-I card + Cat5 plugged into BCM57785 Ethernet + tg3 (ethernet) module unloaded + sdhci debug_quirks2=4 <--- Change here = Bad: Same result as TEST1 --TEST5-- Connected BCM4331 Wifi Card + UHS-I card + Cat5 not plugged into BCM57785 Ethernet + tg3 module loaded + no sdhci debug quriks = Bad: Main bug result [ 177.120002] sdhci: Timeout waiting for Buffer Read Ready interrupt during tuning procedure, falling back to fixed sampling clock [ 177.120023] mmc0: new ultra high speed DDR50 SDXC card at address e624 [ 177.129048] mmcblk0: mmc0:e624 SU64G 59.5 GiB (ro) [ 177.139349] mmc0: Controller never released inhibit bit(s). [ 187.163886] mmc0: Timeout waiting for hardware interrupt. [ 187.163944] mmcblk0: error -110 sending status command, retrying [ 187.213949] sdhci: Timeout waiting for Buffer Read Ready interrupt during tuning procedure, falling back to fixed sampling clock --TEST6-- Connected BCM4331 Wifi Card + UHS-I card + Cat5 not plugged into BCM57785 Ethernet + tg3 module loaded + sdhci debug_quirks2=4 = Good: Card gets deteched but with no voltage skips - Similar to TEST2 [ 310.925853] sdhci-pci 0000:01:00.1: SDHCI controller found [14e4:16bc] (rev 10) [ 310.927211] mmc0: SDHCI controller on PCI [0000:01:00.1] using ADMA 64-bit [ 313.079711] mmc0: new high speed SDXC card at address e624 [ 313.081903] mmcblk0: mmc0:e624 SU64G 59.5 GiB (ro) [ 313.082996] mmcblk0: p1 p2 I will also be attaching lspci -v and dmesg for two wifi plugged in/unplugged states that includes logs of me doing the tests above: wifi_unplugged.lspci wifi_unplugged.dmesg wifi_plugged_in.lspci wifi_plugged_in.dmesg Unfortunately I just posted in the wrong bug. I don't think you will be able to glean any information about the current problem from that one. It was meant for https://bugzilla.kernel.org/show_bug.cgi?id=73241 but I got my windows mixed up. As for an update here I have been running a kernel with the latest patch quite extensively for a few days with no occurrences of any wifi problems. (In reply to hr from comment #53) > (In reply to Lukas Wunner from comment #46) > > Created attachment 217741 [details] > > [PATCH] x86: Add early quirk to reset Apple AirPort card [+ reset procedure > > according to Michael Büsch] > > Yesterday try this patch, some errors come back in dmesg: > > [23289.098445] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22) > [23404.895181] ERROR @wl_notify_scan_status : eth1 Scan_results error (-22) > > Soon i try to revert to previous version patch and test. hr: I noticed that iwconfig and other wireless utilities need to be run as root to work else I errors like: ERROR @wl_dev_intvar_get : error (-1) ERROR @wl_cfg80211_get_tx_power : error (-1) Could this have happened when using some sort of utility? Anymore information about it occurring? Error -1 is -EPERM: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/asm-generic/errno-base.h I can reproduce the error hr is getting by loading b43, unloading, then loading wl. Somehow wl is picky about the state the device is in and returns -EINVAL when scanning. But I believe this is independent from this patch. By the way, the patch was queued by Ingo Molnar this week: https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=625a99d9bfd0 There was an objection raised post-merge which I've addressed, we'll have to see if there are others: https://lkml.org/lkml/2016/6/8/972 (In reply to Lukas Wunner from comment #58) > Error -1 is -EPERM: > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/ > uapi/asm-generic/errno-base.h > > I can reproduce the error hr is getting by loading b43, unloading, then > loading wl. Somehow wl is picky about the state the device is in and returns > -EINVAL when scanning. But I believe this is independent from this patch. > Thanks for posting that. Totally makes sense that this exists. > By the way, the patch was queued by Ingo Molnar this week: > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/ > ?id=625a99d9bfd0 > > There was an objection raised post-merge which I've addressed, we'll have to > see if there are others: > https://lkml.org/lkml/2016/6/8/972 Very cool. Thanks for doing this. Fixed in Linux 4.7 with: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=447d29d1d3ae https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=850c321027c2 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=abb2bafd295f Fixed in stable kernels 4.6.6, 4.4.17, 4.1.30, 3.18.39 Fixed in upcoming stable kernels 3.16.39, 3.2.84 |