Bug 208183

Summary: iwlwifi: 9260?: Regression on 5.7.x kernel
Product: Drivers Reporter: Takashi Iwai (tiwai)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: dan.cermak, linuxwifi, mmrmartin, nate, nidaamber55
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.7.1 Subsystem:
Regression: Yes Bisected commit-id:

Description Takashi Iwai 2020-06-15 11:19:28 UTC
We've got a few regression reports on 5.7.1 kernel that hangs up at boot, and it turned out to be a bug in iwlwifi driver.  It starts hitting WARNING below, and causes lot of other weird errors and hangs.

Jun 15 08:11:34 Boreas kernel: ------------[ cut here ]------------
Jun 15 08:11:34 Boreas kernel: [Firmware Bug]: Page fault caused by firmware at PA: 0xffffb68141ae0068
Jun 15 08:11:34 Boreas kernel: WARNING: CPU: 5 PID: 195 at arch/x86/platform/efi/quirks.c:743 efi_recover_from_page_fault+0x2a/0xc8
Jun 15 08:11:34 Boreas kernel: Modules linked in: btusb(+) dell_wmi snd_hda_core btrtl uvcvideo btbcm pcc_cpufreq(-) snd_hwdep kvm iwlwifi btintel fjes(-) videobuf2_vmalloc dell_smbios irqbypass videobuf2_memops joydev pcspkr >
Jun 15 08:11:34 Boreas kernel:  cryptd usbcore nvme_core serio_raw rtsx_pci i2c_hid battery pinctrl_cannonlake video wmi pinctrl_intel button sg nbd dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua br_netfilter bridge>
Jun 15 08:11:34 Boreas kernel: CPU: 5 PID: 195 Comm: kworker/5:1 Not tainted 5.7.1-1-default #1 openSUSE Tumbleweed (unreleased)
Jun 15 08:11:34 Boreas kernel: Hardware name: Dell Inc. Precision 5530/0NFGCT, BIOS 1.15.0 12/25/2019
Jun 15 08:11:34 Boreas kernel: Workqueue: events request_firmware_work_func
Jun 15 08:11:34 Boreas kernel: RIP: 0010:efi_recover_from_page_fault+0x2a/0xc8
Jun 15 08:11:34 Boreas kernel: usbcore: registered new interface driver btusb
Jun 15 08:11:34 Boreas kernel: Code: 0f 1f 44 00 00 8b 15 f5 a1 6b 02 85 d2 74 09 48 81 ff ff 0f 00 00 77 01 c3 53 48 89 fe 48 c7 c7 50 4a 31 94 50 e8 1d 07 01 00 <0f> 0b 83 3d cd a1 6b 02 0a 0f 84 8f 00 00 00 48 8b 05 a0 f7 6>
Jun 15 08:11:34 Boreas kernel: RSP: 0018:ffffb68140637be8 EFLAGS: 00010082
Jun 15 08:11:34 Boreas kernel: Bluetooth: hci0: Bootloader revision 0.1 build 42 week 52 2015
Jun 15 08:11:34 Boreas kernel: Bluetooth: hci0: Device revision is 2
Jun 15 08:11:34 Boreas kernel: RAX: 0000000000000000 RBX: ffff975f990e8000 RCX: 0000000000000007
Jun 15 08:11:34 Boreas kernel: RDX: 00000000fffffff8 RSI: 0000000000000086 RDI: ffff975f9c55bdd0
Jun 15 08:11:34 Boreas kernel: RBP: ffffb68140637c38 R08: 0000000000000000 R09: 000000000000048d
Jun 15 08:11:34 Boreas kernel: R10: 0000000000aaaaaa R11: ffffffff940aab40 R12: ffffb68141ae0068
Jun 15 08:11:34 Boreas kernel: R13: 0000000000000003 R14: 0000000000000001 R15: 000000000000000b
Jun 15 08:11:34 Boreas kernel: FS:  0000000000000000(0000) GS:ffff975f9c540000(0000) knlGS:0000000000000000
Jun 15 08:11:34 Boreas kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 15 08:11:34 Boreas kernel: CR2: ffffb68141ae0068 CR3: 0000000856a18002 CR4: 00000000003606e0
Jun 15 08:11:34 Boreas kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun 15 08:11:34 Boreas kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jun 15 08:11:34 Boreas kernel: Call Trace:
Jun 15 08:11:34 Boreas kernel:  no_context+0xf4/0x1f0
Jun 15 08:11:34 Boreas kernel:  page_fault+0x3e/0x50
Jun 15 08:11:34 Boreas kernel: RIP: 0010:iwl_dbg_tlv_alloc_trigger+0x25/0x60 [iwlwifi]
Jun 15 08:11:34 Boreas kernel: Code: eb f2 0f 1f 00 0f 1f 44 00 00 83 7e 04 33 48 89 f8 44 8b 46 10 48 89 f7 76 40 41 8d 50 ff 83 fa 19 77 23 8b 56 20 85 d2 75 07 <c7> 46 20 ff ff ff ff 4b 8d 14 40 48 c1 e2 04 48 8d b4 10 00 0>
Jun 15 08:11:34 Boreas kernel: RSP: 0018:ffffb68140637ce8 EFLAGS: 00010246
Jun 15 08:11:34 Boreas kernel: RAX: ffff975f840b4018 RBX: ffff975f840b4018 RCX: ffffffffc11b56c0
Jun 15 08:11:34 Boreas kernel: RDX: 0000000000000000 RSI: ffffb68141ae0048 RDI: ffffb68141ae0048
Jun 15 08:11:34 Boreas kernel: RBP: 0000000000000000 R08: 0000000000000004 R09: 0000000000000001
Jun 15 08:11:34 Boreas kernel: R10: 0000000000000034 R11: ffffb68141ae0050 R12: ffff975f840b4230
Jun 15 08:11:34 Boreas kernel: R13: 0000000001000009 R14: ffff9759eced1000 R15: ffff975f83ad5000
Jun 15 08:11:34 Boreas kernel:  ? iwl_dbg_tlv_add+0x60/0x60 [iwlwifi]
Jun 15 08:11:34 Boreas kernel:  iwl_dbg_tlv_alloc+0x79/0x120 [iwlwifi]
Jun 15 08:11:34 Boreas kernel:  iwl_parse_tlv_firmware.isra.0+0x57d/0x1550 [iwlwifi]
Jun 15 08:11:34 Boreas kernel:  ? fw_add_devm_name.part.0+0x40/0x80
Jun 15 08:11:34 Boreas kernel:  iwl_req_fw_callback+0x3f8/0x6a0 [iwlwifi]
Jun 15 08:11:34 Boreas kernel:  ? devres_add+0x1e/0x60
Jun 15 08:11:34 Boreas kernel:  ? fw_add_devm_name.part.0+0x5c/0x80
Jun 15 08:11:34 Boreas kernel:  ? assign_fw+0x6c/0x140
Jun 15 08:11:34 Boreas kernel:  ? _request_firmware+0x131/0x190
Jun 15 08:11:34 Boreas kernel:  ? __switch_to_asm+0x34/0x70
Jun 15 08:11:34 Boreas kernel:  request_firmware_work_func+0x47/0x90
Jun 15 08:11:34 Boreas kernel:  process_one_work+0x1e3/0x3b0
Jun 15 08:11:34 Boreas kernel:  worker_thread+0x46/0x340
Jun 15 08:11:34 Boreas kernel:  ? process_one_work+0x3b0/0x3b0
Jun 15 08:11:34 Boreas kernel:  kthread+0x115/0x140
Jun 15 08:11:34 Boreas kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 15 08:11:34 Boreas kernel:  ret_from_fork+0x35/0x40
Jun 15 08:11:34 Boreas kernel: ---[ end trace 046de324b323b499 ]---

The detailed hardware information is found on openSUSE bugzilla entry:
  https://bugzilla.opensuse.org/show_bug.cgi?id=1172905

The hardware showing the problem seems to be the 9260 chip.

23: PCI 3b00.0: 0280 Network controller
  [Created at pci.386]
  Unique ID: c5zV._nJChamEFnA
  Parent ID: z8Q3.q4o8LX05eKA
  SysFS ID: /devices/pci0000:00/0000:00:1c.0/0000:3b:00.0
  SysFS BusID: 0000:3b:00.0
  Hardware Class: network
  Model: "Intel Wireless-AC 9260"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x2526 "Wireless-AC 9260"
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x4010 
  Revision: 0x29
  Memory Range: 0xed400000-0xed403fff (rw,non-prefetchable,disabled)
  IRQ: 255 (no events)
  Module Alias: "pci:v00008086d00002526sv00008086sd00004010bc02sc80i00"
  Driver Info #0:
    Driver Status: iwlwifi is not active
    Driver Activation Cmd: "modprobe iwlwifi"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

Another one:
56: PCI 2100.0: 0282 WLAN controller
  [Created at pci.386]
  Unique ID: S6TQ.42LULsJAqu6
  Parent ID: PMq2.8EPKPINKVS0
  SysFS ID: /devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:00.0/0000:21:00.0
  SysFS BusID: 0000:21:00.0
  Hardware Class: network
  Device Name: "Broadcom 5762"
  Model: "Intel Wireless-AC 9260"
  Vendor: pci 0x8086 "Intel Corporation"
  Device: pci 0x2526 "Wireless-AC 9260"
  SubVendor: pci 0x8086 "Intel Corporation"
  SubDevice: pci 0x0014 
  Revision: 0x29
  Driver: "iwlwifi"
  Driver Modules: "iwlwifi"

FWIW, 5.6.y kernel worked, and seems to be a regression in 5.7.x.
Comment 2 Takashi Iwai 2020-06-15 12:59:52 UTC
The likely fix:
  https://lore.kernel.org/netdev/20200612073800.27742-1-jslaby@suse.cz/
Comment 3 Takashi Iwai 2020-06-16 12:53:58 UTC
The patch in comment 2 was confirmed to fix the problem.

It's interesting that the problem surfaced on 5.7 at first although the problematic commit was already present in 5.5 (commit a9248de42464 "iwlwifi: dbg_ini: add TLV allocation new API support"), though.