Bug 114101 - iwlwifi: 8260: can't load the firmware if pci_enable_msi failed
Summary: iwlwifi: 8260: can't load the firmware if pci_enable_msi failed
Status: CLOSED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: drivers_network-wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-03-09 10:30 UTC by Jonas Thiem
Modified: 2016-07-21 20:28 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.4.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with iwlwifi debug=0xffffffff (183.21 KB, text/plain)
2016-03-14 13:33 UTC, Jonas Thiem
Details

Description Jonas Thiem 2016-03-09 10:30:03 UTC
Wifi doesn't work when booting. When manually unloading and reloading iwlmvm and iwlwifi, the following happens in dmesg and the system lags heavily with still no working wlan:

[  111.425472] virbr0: port 1(virbr0-nic) entered listening state
[  111.425480] virbr0: port 1(virbr0-nic) entered listening state
[  111.527316] virbr0: port 1(virbr0-nic) entered disabled state
[  130.504136] fuse init (API version 7.23)
[  131.625166] Bluetooth: RFCOMM TTY layer initialized
[  131.625171] Bluetooth: RFCOMM socket layer initialized
[  131.625211] Bluetooth: RFCOMM ver 1.11
[  153.359108] Intel(R) Wireless WiFi driver for Linux
[  153.359110] Copyright(c) 2003- 2015 Intel Corporation
[  153.361012] iwlwifi 0000:04:00.0: pci_enable_msi failed(0Xffffffda)
[  153.362942] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-19.ucode failed with error -2
[  153.362951] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-18.ucode failed with error -2
[  153.362957] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-17.ucode failed with error -2
[  153.364436] iwlwifi 0000:04:00.0: Unsupported splx structure
[  153.364570] iwlwifi 0000:04:00.0: loaded firmware version 16.242414.0 op_mode iwlmvm
[  153.404274] iwlmvm: unknown parameter 'iwlwifi' ignored
[  153.406053] iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
[  153.406699] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[  153.408178] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[  153.409996] iwlwifi 0000:04:00.0: can't access the RSA semaphore it is write protected
[  162.933436] irq 11: nobody cared (try booting with the "irqpoll" option)
[  162.933439] CPU: 0 PID: 2807 Comm: modprobe Not tainted 4.4.3-300.rhbz1313034.fc23.x86_64 #1
[  162.933440] Hardware name: LENOVO 20FDCTO1WW/20FDCTO1WW, BIOS N1GET35W (1.12 ) 11/16/2015
[  162.933442]  0000000000000086 00000000c06e0963 ffff880142403ce0 ffffffff813b4b6e
[  162.933444]  ffff88013d883e00 ffff88013d883e9c ffff880142403d08 ffffffff810ff153
[  162.933446]  ffff88013d883e00 0000000000000000 000000000000000b ffff880142403d40
[  162.933447] Call Trace:
[  162.933448]  <IRQ>  [<ffffffff813b4b6e>] dump_stack+0x63/0x85
[  162.933456]  [<ffffffff810ff153>] __report_bad_irq+0x33/0xc0
[  162.933458]  [<ffffffff810ff4de>] note_interrupt+0x23e/0x280
[  162.933460]  [<ffffffff810fc78a>] handle_irq_event_percpu+0x15a/0x1c0
[  162.933461]  [<ffffffff810fc81c>] handle_irq_event+0x2c/0x50
[  162.933463]  [<ffffffff810ffbfd>] handle_level_irq+0x7d/0x100
[  162.933465]  [<ffffffff81019dc3>] handle_irq+0x73/0x120
[  162.933467]  [<ffffffff817a1bab>] do_IRQ+0x4b/0xd0
[  162.933469]  [<ffffffff8179fc07>] common_interrupt+0x87/0x87
[  162.933472]  [<ffffffff810a8a76>] ? __do_softirq+0x86/0x2d0
[  162.933474]  [<ffffffff810a8ec5>] irq_exit+0x105/0x110
[  162.933475]  [<ffffffff817a1bb4>] do_IRQ+0x54/0xd0
[  162.933477]  [<ffffffff8179fc07>] common_interrupt+0x87/0x87
[  162.933477]  <EOI>  [<ffffffff813e1e1d>] ? mpihelp_addmul_1+0x9d/0xc0
[  162.933481]  [<ffffffff813e38f4>] mpih_sqr_n_basecase+0x64/0x100
[  162.933483]  [<ffffffff813e3ac3>] mpih_sqr_n+0x133/0x320
[  162.933484]  [<ffffffff813e3c49>] mpih_sqr_n+0x2b9/0x320
[  162.933485]  [<ffffffff813e3a80>] mpih_sqr_n+0xf0/0x320
[  162.933486]  [<ffffffff813e4715>] mpi_powm+0x4b5/0xa10
[  162.933489]  [<ffffffff8120cf46>] ? kmem_cache_alloc_trace+0x196/0x210
[  162.933491]  [<ffffffff81379643>] RSA_verify_signature+0xf3/0x2b0
[  162.933493]  [<ffffffff813794be>] public_key_verify_signature+0x7e/0xb0
[  162.933495]  [<ffffffff81379505>] public_key_verify_signature_2+0x15/0x20
[  162.933496]  [<ffffffff813793ec>] verify_signature+0x3c/0x50
[  162.933498]  [<ffffffff8137b915>] pkcs7_validate_trust+0x225/0x290
[  162.933501]  [<ffffffff811abcf4>] system_verify_data+0x94/0x110
[  162.933503]  [<ffffffff8112b605>] mod_verify_sig+0x75/0xc0
[  162.933504]  [<ffffffff811287fc>] load_module+0x16c/0x2650
[  162.933506]  [<ffffffff811ee43c>] ? alloc_vmap_area+0x2fc/0x360
[  162.933508]  [<ffffffff811ef1a6>] ? vmap_page_range_noflush+0x246/0x350
[  162.933509]  [<ffffffff811ef2e6>] ? map_vm_area+0x36/0x50
[  162.933511]  [<ffffffff811f0266>] ? __vmalloc_node_range+0x196/0x2c0
[  162.933512]  [<ffffffff8112adaf>] ? SyS_init_module+0xcf/0x190
[  162.933514]  [<ffffffff8112ae2e>] SyS_init_module+0x14e/0x190
[  162.933515]  [<ffffffff8179f16e>] entry_SYSCALL_64_fastpath+0x12/0x71
[  162.933516] handlers:
[  162.933519] [<ffffffff8155eba0>] ahci_single_level_irq_intr
[  162.933522] [<ffffffff815816c0>] usb_hcd_irq
[  162.933527] [<ffffffffa0029720>] rtsx_pci_isr [rtsx_pci]
[  162.933555] [<ffffffffa0150dc0>] gen8_irq_handler [i915]
[  162.933558] [<ffffffffa041afc0>] mei_me_irq_quick_handler [mei_me] threaded [<ffffffffa041b100>] mei_me_irq_thread_handler [mei_me]
[  162.933561] [<ffffffffa03ea9e0>] i801_isr [i2c_i801]
[  162.933569] [<ffffffffa0611e80>] azx_interrupt [snd_hda_codec]
[  162.933577] [<ffffffffa00db890>] e1000_intr [e1000e]
[  162.933578] Disabling IRQ #11
[  162.948627] iwlwifi 0000:04:00.0: Failed to load firmware chunk!
[  162.948631] iwlwifi 0000:04:00.0: Could not load the [0] uCode section
[  162.948638] iwlwifi 0000:04:00.0: Failed to start INIT ucode: -110
[  162.949152] iwlwifi 0000:04:00.0: Failed to run INIT ucode: -110
[  162.949173] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Comment 1 Jonas Thiem 2016-03-09 10:31:44 UTC
jonas@cyberman:~$ lspci | grep Intel | grep Wireless
04:00.0 Network controller: Intel Corporation Wireless 8260 (rev 3a)
jonas@cyberman:~$

The kernel I tested originated from here: http://koji.fedoraproject.org/koji/taskinfo?taskID=13183287 however, 4.4.2 has the same issue.
Comment 2 Emmanuel Grumbach 2016-03-10 09:14:33 UTC
I don't like this:

iwlwifi 0000:04:00.0: pci_enable_msi failed(0Xffffffda)

and then (maybe unrelated) we don't get the interrupt which in turn prevents us from loading the firmware.
Comment 3 Emmanuel Grumbach 2016-03-10 21:02:25 UTC
Can you please try the following:
load iwlwifi with debug=0xffffffff (Requires CONFIG_IWLWIFI_DEBUG)

and provide dmesg again.

after the failed attempt to load the firmware (the print you pasted), please do cat /proc/interrupt and send me the output.

thanks!
Comment 4 Jonas Thiem 2016-03-14 13:33:47 UTC
Created attachment 209061 [details]
dmesg with iwlwifi debug=0xffffffff

The relevant output starts from line 2242 which is where I had loaded iwlwifi with debug=0xffffffff. I hope you don't mind the VirtualBox/vboxdrv module which Fedora doesn't package in the main repo which results in the kernel being marked tainted - I can rerun without that module loaded if required.

Also the PS/2 driver and intel gfx driver are having some fun... this notebook is a bit new, so some stuff is a bit broken :-)
Comment 5 Jonas Thiem 2016-03-14 13:37:01 UTC
jonas@cyberman:~$ cat /proc/interrupts 
           CPU0       
  0:  132612452    XT-PIC  timer
  1:     236832    XT-PIC  i8042
  2:          0    XT-PIC  cascade
  7:          8    XT-PIC
  8:          1    XT-PIC  rtc0
  9:      74392    XT-PIC  acpi
 10:          0    XT-PIC  iwlwifi
 11:   48093512    XT-PIC  0000:00:17.0, xhci-hcd:usb1, i915, rtsx_pci, i801_smbus, snd_hda_intel, mei_me, enp0s31f6
 12:   15991578    XT-PIC  i8042
NMI:          0   Non-maskable interrupts
LOC:          0   Local timer interrupts
SPU:          0   Spurious interrupts
PMI:          0   Performance monitoring interrupts
IWI:          0   IRQ work interrupts
RTR:          0   APIC ICR read retries
RES:          0   Rescheduling interrupts
CAL:          0   Function call interrupts
TLB:          0   TLB shootdowns
TRM:          0   Thermal event interrupts
THR:          0   Threshold APIC interrupts
DFR:          0   Deferred Error APIC interrupts
MCE:          0   Machine check exceptions
MCP:       1189   Machine check polls
ERR:          8
MIS:          0
PIN:          0   Posted-interrupt notification event
PIW:          0   Posted-interrupt wakeup event
jonas@cyberman:~$
Comment 6 Emmanuel Grumbach 2016-03-14 13:39:32 UTC
something goes really bad here:
iwlwifi didn't get any interrupt?

Doesn't make sense.
Moreover there was an interrupt on 11 which wasn't handled? whereas I am waiting for my interrupt.

and pci_enable_msi failed?
Adding PCI people.
Comment 7 Jonas Thiem 2016-03-14 13:47:47 UTC
Sorry, I forgot to mention something which might be relevant: I booted with "nolapic" because otherwise I just get a black screen (another issue with this hardware)
Comment 8 Emmanuel Grumbach 2016-03-14 15:41:53 UTC
Bjorn, what component should we move this bug to?
Comment 9 Bjorn Helgaas 2016-03-14 16:21:19 UTC
The black screen when booting without "nolapic" sounds like the first thing to fix.  That is a pretty serious issue that could bite lots of people.  I don't know how to debug that other than to bisect it.

It would be helpful if iwlwifi printed the pci_enable_msi() failure as signed decimal rather than hex.  0Xffffffda is -38, or -ENOSYS.  That means your kernel is built without CONFIG_PCI_MSI enabled.  I would think you would want CONFIG_PCI_MSI enabled, but Linux should still work even without it.

It looks like we're expecting iwlwifi to interrupt on IRQ 10, but it's actually interrupting on IRQ 11.  If that's the case, booting with "irqpoll" would probably be a workaround.
Comment 10 Jonas Thiem 2016-03-14 17:03:27 UTC
Yup, booting with nolapic and irqpoll makes wlan work flawlessly. However, I still can't boot without nolapic. I guess I should file a separate bugreport for that?
Comment 11 Jonas Thiem 2016-03-14 17:05:04 UTC
https://bugzilla.kernel.org/show_bug.cgi?id=110941 this bug might be related to my problems
Comment 12 Bjorn Helgaas 2016-03-14 17:11:19 UTC
Yes, please.  I suppose most users of this notebook would be using distro kernels with CONFIG_PCI_MSI enabled, and maybe that would work better?  In any case, we really should figure out the lapic issue because it's really hard for users to deal with black screen failures.
Comment 13 Emmanuel Grumbach 2016-03-14 17:16:43 UTC
Please change the title of this bug. I'll remove wifi folks afterwards.
Comment 14 Emmanuel Grumbach 2016-03-14 17:56:59 UTC
(In reply to Bjorn Helgaas from comment #9)
> It would be helpful if iwlwifi printed the pci_enable_msi() failure as
> signed decimal rather than hex.

Patch submitted internally. Thanks!
Comment 15 Jonas Thiem 2016-05-09 13:50:05 UTC
(In reply to Bjorn Helgaas from comment #12)
> We really should figure out the lapic issue because it's really
> hard for users to deal with black screen failures.

Just as a side note: since the suggested intel_pstate=no_hwp workaround in https://bugzilla.kernel.org/show_bug.cgi?id=110941 fixes things including the wlan issue, I'm pretty sure that's the underlying cause why I initially required the nolapic option. I'll know for sure once I get 4.6 on my hands to test if it's fixed.

Of course you're free to keep this around for other pci_enable_msi() revamps or whatever structural changes you still want to do... I just wanted to say that as far as I'm concerned, this issue is "worksforme" and with a patch for https://bugzilla.kernel.org/show_bug.cgi?id=110941 submitted it seems the black screen issue should soon be fixed as well, finally making all kernel boot option workarounds unnecessary for this laptop
Comment 16 Luca Coelho 2016-06-16 06:23:13 UTC
Thanks for the update, Jonas!

So, should we still keep this in "network-wireless" (in which case, I will close it) or should it be moved to another component?

Note You need to log in before you can comment on or make changes to this bug.