Bug 202397 - iwlwifi: 8265: stuck queue followed by kernel NULL pointer dereference
Summary: iwlwifi: 8265: stuck queue followed by kernel NULL pointer dereference
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Intel Linux Wireless
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-24 11:28 UTC by Joost de Greef
Modified: 2019-02-25 21:05 UTC (History)
0 users

See Also:
Kernel Version: 4.20.3
Tree: Mainline
Regression: No


Attachments

Description Joost de Greef 2019-01-24 11:28:56 UTC
Heallo iwlwifi team,

I'm using an Intel Corporation Wireless 8265 in a hp microserver gen 10 to provide wifi in my home. Unfortunately, this is not very stable. Every couple of days there is a crash of the iwlwifi driver taking all the network connectivity with it.
First there will be a message about a stuck queue, then a null pointer reference.
I'm using kernel 4.20.3 on ubuntu 18.04, but have been seeing this for some time on older kernels too.
Is there some other information you would like me to provide or some other way I can help?

using the latest firmware from linux-firmware:
commit 89d37c65a413852a0688b9aee6f74dbc53cf8cb5
Author: Amit K Bag <amit.k.bag@intel.com>
Date:   Fri Dec 21 16:29:00 2018 +0530
    linux-firmware: Update firmware file for Intel Bluetooth,8265
    This patch updates the firmware file for Intel Bluetooth 8265
    Also it is known as Intel WindStormPeak (WsP).
    FW Build: REL0295
    Release Version: 20.100.0.3
    Signed-off-by: Amit K Bag <amit.k.bag@intel.com>
    Signed-off-by: Josh Boyer <jwboyer@kernel.org>


lspci -v
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
        Subsystem: Intel Corporation Dual Band Wireless-AC 8265
        Physical Slot: 2
        Flags: bus master, fast devsel, latency 0, IRQ 46
        Memory at fe900000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [40] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number f8-59-71-ff-ff-ec-2b-3d
        Capabilities: [14c] Latency Tolerance Reporting
        Capabilities: [154] L1 PM Substates
        Kernel driver in use: iwlwifi
        Kernel modules: iwlwifi

syslog stuck queue message:
Jan 22 22:27:42 box kernel: [459738.479356] iwlwifi 0000:02:00.0: Queue 21 is active on fifo 0 and stuck for 10000 ms. SW [237, 240] HW [237, 240] FH TRB=0x0c00150ef
Jan 22 22:27:42 box kernel: [459738.480417] iwlwifi 0000:02:00.0: Queue 18 is active on fifo 1 and stuck for 10000 ms. SW [238, 33] HW [238, 33] FH TRB=0x0c01120f7
Jan 22 22:27:42 box kernel: [459738.481662] iwlwifi 0000:02:00.0: Microcode SW error detected.  Restarting 0x2000000.
Jan 22 22:27:42 box kernel: [459738.482971] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
Jan 22 22:27:42 box kernel: [459738.483628] iwlwifi 0000:02:00.0: Status: 0x00000100, count: 6
Jan 22 22:27:42 box kernel: [459738.484293] iwlwifi 0000:02:00.0: Loaded firmware version: 36.e91976c0.0
Jan 22 22:27:42 box kernel: [459738.484962] iwlwifi 0000:02:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN      
Jan 22 22:27:42 box kernel: [459738.485643] iwlwifi 0000:02:00.0: 0x008002F3 | trm_hw_status0
Jan 22 22:27:42 box kernel: [459738.486322] iwlwifi 0000:02:00.0: 0x00000000 | trm_hw_status1
Jan 22 22:27:42 box kernel: [459738.486993] iwlwifi 0000:02:00.0: 0x000248DC | branchlink2
Jan 22 22:27:42 box kernel: [459738.487670] iwlwifi 0000:02:00.0: 0x0003A7DA | interruptlink1
Jan 22 22:27:42 box kernel: [459738.488355] iwlwifi 0000:02:00.0: 0x00004DE2 | interruptlink2
Jan 22 22:27:42 box kernel: [459738.489013] iwlwifi 0000:02:00.0: 0x00000000 | data1
Jan 22 22:27:42 box kernel: [459738.489660] iwlwifi 0000:02:00.0: 0x00000080 | data2
Jan 22 22:27:42 box kernel: [459738.490295] iwlwifi 0000:02:00.0: 0x07830000 | data3
Jan 22 22:27:42 box kernel: [459738.490932] iwlwifi 0000:02:00.0: 0x00000000 | beacon time
Jan 22 22:27:42 box kernel: [459738.491581] iwlwifi 0000:02:00.0: 0x97E046FC | tsf low
Jan 22 22:27:42 box kernel: [459738.492244] iwlwifi 0000:02:00.0: 0x00000046 | tsf hi
Jan 22 22:27:42 box kernel: [459738.492885] iwlwifi 0000:02:00.0: 0x00005AA8 | time gp1
Jan 22 22:27:42 box kernel: [459738.493562] iwlwifi 0000:02:00.0: 0x97E046FD | time gp2
Jan 22 22:27:42 box kernel: [459738.494203] iwlwifi 0000:02:00.0: 0x00000001 | uCode revision type
Jan 22 22:27:42 box kernel: [459738.494859] iwlwifi 0000:02:00.0: 0x00000024 | uCode version major
Jan 22 22:27:42 box kernel: [459738.495524] iwlwifi 0000:02:00.0: 0xE91976C0 | uCode version minor
Jan 22 22:27:42 box kernel: [459738.496191] iwlwifi 0000:02:00.0: 0x00000230 | hw version
Jan 22 22:27:42 box kernel: [459738.496850] iwlwifi 0000:02:00.0: 0x00C89000 | board version
Jan 22 22:27:42 box kernel: [459738.497509] iwlwifi 0000:02:00.0: 0x12EE001C | hcmd
Jan 22 22:27:42 box kernel: [459738.498164] iwlwifi 0000:02:00.0: 0x00023003 | isr0
Jan 22 22:27:42 box kernel: [459738.498800] iwlwifi 0000:02:00.0: 0x0085E000 | isr1
Jan 22 22:27:42 box kernel: [459738.499408] iwlwifi 0000:02:00.0: 0x08001812 | isr2
Jan 22 22:27:42 box kernel: [459738.499990] iwlwifi 0000:02:00.0: 0x0041F9C0 | isr3
Jan 22 22:27:42 box kernel: [459738.500517] iwlwifi 0000:02:00.0: 0x00000000 | isr4
Jan 22 22:27:42 box kernel: [459738.501010] iwlwifi 0000:02:00.0: 0x07A0001C | last cmd Id
Jan 22 22:27:42 box kernel: [459738.501474] iwlwifi 0000:02:00.0: 0x00000000 | wait_event
Jan 22 22:27:42 box kernel: [459738.501913] iwlwifi 0000:02:00.0: 0x0000DCD0 | l2p_control
Jan 22 22:27:42 box kernel: [459738.502347] iwlwifi 0000:02:00.0: 0x00005C02 | l2p_duration
Jan 22 22:27:42 box kernel: [459738.502766] iwlwifi 0000:02:00.0: 0x0000013F | l2p_mhvalid
Jan 22 22:27:42 box kernel: [459738.503180] iwlwifi 0000:02:00.0: 0x00000000 | l2p_addr_match
Jan 22 22:27:42 box kernel: [459738.503601] iwlwifi 0000:02:00.0: 0x0000001F | lmpm_pmg_sel
Jan 22 22:27:42 box kernel: [459738.504007] iwlwifi 0000:02:00.0: 0x28031619 | timestamp
Jan 22 22:27:42 box kernel: [459738.504409] iwlwifi 0000:02:00.0: 0x00343848 | flow_handler
Jan 22 22:27:42 box kernel: [459738.504879] iwlwifi 0000:02:00.0: Start IWL Error Log Dump:
Jan 22 22:27:42 box kernel: [459738.505292] iwlwifi 0000:02:00.0: Status: 0x00000100, count: 7
Jan 22 22:27:42 box kernel: [459738.505710] iwlwifi 0000:02:00.0: 0x00000070 | ADVANCED_SYSASSERT
Jan 22 22:27:42 box kernel: [459738.506143] iwlwifi 0000:02:00.0: 0x00000000 | umac branchlink1
Jan 22 22:27:42 box kernel: [459738.506583] iwlwifi 0000:02:00.0: 0xC008689C | umac branchlink2
Jan 22 22:27:42 box kernel: [459738.507019] iwlwifi 0000:02:00.0: 0xC0083A94 | umac interruptlink1
Jan 22 22:27:42 box kernel: [459738.507467] iwlwifi 0000:02:00.0: 0xC0083A94 | umac interruptlink2
Jan 22 22:27:42 box kernel: [459738.507919] iwlwifi 0000:02:00.0: 0x00000800 | umac data1
Jan 22 22:27:42 box kernel: [459738.508350] iwlwifi 0000:02:00.0: 0xC0083A94 | umac data2
Jan 22 22:27:42 box kernel: [459738.508780] iwlwifi 0000:02:00.0: 0xDEADBEEF | umac data3
Jan 22 22:27:42 box kernel: [459738.509198] iwlwifi 0000:02:00.0: 0x00000024 | umac major
Jan 22 22:27:42 box kernel: [459738.509618] iwlwifi 0000:02:00.0: 0xE91976C0 | umac minor
Jan 22 22:27:42 box kernel: [459738.510036] iwlwifi 0000:02:00.0: 0xC088628C | frame pointer
Jan 22 22:27:42 box kernel: [459738.510453] iwlwifi 0000:02:00.0: 0xC088628C | stack pointer
Jan 22 22:27:42 box kernel: [459738.510860] iwlwifi 0000:02:00.0: 0x00BB014E | last host cmd
Jan 22 22:27:42 box kernel: [459738.511274] iwlwifi 0000:02:00.0: 0x00000000 | isr status reg
Jan 22 22:27:42 box kernel: [459738.511708] ieee80211 phy0: Hardware restart was requested
Jan 22 22:27:43 box kernel: [459738.978959] iwlwifi 0000:02:00.0: Failing on timeout while stopping DMA channel 8 [0x07fd0003]

syslog firmware crash:
Jan 22 23:22:45 box kernel: [463040.160243] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
Jan 22 23:22:45 box kernel: [463040.161299] PGD 0 P4D 0
Jan 22 23:22:45 box kernel: [463040.161836] Oops: 0002 [#1] SMP NOPTI
Jan 22 23:22:45 box kernel: [463040.162386] CPU: 0 PID: 28637 Comm: kworker/0:2 Tainted: G           O      4.20.3-box #25
Jan 22 23:22:45 box kernel: [463040.163534] Hardware name: HPE ProLiant MicroServer Gen10/ProLiant MicroServer Gen10, BIOS 5.12 01/24/2018
Jan 22 23:22:45 box kernel: [463040.164839] Workqueue: events iwl_mvm_add_new_dqa_stream_wk [iwlmvm]
Jan 22 23:22:45 box kernel: [463040.165551] RIP: 0010:iwl_trans_pcie_txq_enable+0x59/0x3f0 [iwlwifi]
Jan 22 23:22:45 box kernel: [463040.166306] Code: 8b a4 c7 60 91 00 00 f0 48 0f ab 87 60 a1 00 00 73 0d 80 3d 0c 50 02 00 00 0f 84 4f 03 00 00 44 89 c7 e8 1a b1 15 f5 4d 85 ed <49> 89 44 24 68 0f 84 94 02 00 00 41 0f b6 87 42 a2 00 00 39 d8 0f
Jan 22 23:22:45 box kernel: [463040.168591] RSP: 0018:ffffa0140c417cc0 EFLAGS: 00010246
Jan 22 23:22:45 box kernel: [463040.169386] RAX: 00000000000009c4 RBX: 000000000000001f RCX: 0000000000000000
Jan 22 23:22:45 box kernel: [463040.171030] RDX: 0000000000000000 RSI: 000000000000001f RDI: 0000000000002710
Jan 22 23:22:45 box kernel: [463040.172816] RBP: 0000000000000000 R08: 0000000000002710 R09: 0000000000000001
Jan 22 23:22:45 box kernel: [463040.174741] R10: 0000000000000040 R11: 0000000000000007 R12: 0000000000000000
Jan 22 23:22:45 box kernel: [463040.176812] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9a9970e50018
Jan 22 23:22:45 box kernel: [463040.179016] FS:  0000000000000000(0000) GS:ffff9a9977200000(0000) knlGS:0000000000000000
Jan 22 23:22:45 box kernel: [463040.181361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 22 23:22:45 box kernel: [463040.182584] CR2: 0000000000000068 CR3: 00000001f5440000 CR4: 00000000001406f0
Jan 22 23:22:45 box kernel: [463040.185008] Call Trace:
Jan 22 23:22:45 box kernel: [463040.186200]  ? iwl_mvm_inactivity_check+0x538/0xa70 [iwlmvm]
Jan 22 23:22:45 box kernel: [463040.187389]  iwl_mvm_enable_txq+0x1ac/0x380 [iwlmvm]
Jan 22 23:22:45 box kernel: [463040.188585]  ? load_balance+0x14d/0x8a0
Jan 22 23:22:45 box kernel: [463040.189785]  iwl_mvm_add_new_dqa_stream_wk+0x6f6/0xd40 [iwlmvm]
Jan 22 23:22:45 box kernel: [463040.190994]  ? pick_next_task_fair+0x2ee/0x5b0
Jan 22 23:22:45 box kernel: [463040.192188]  ? __switch_to+0x182/0x350
Jan 22 23:22:45 box kernel: [463040.193355]  process_one_work+0x1b0/0x340
Jan 22 23:22:45 box kernel: [463040.194495]  worker_thread+0x28/0x3f0
Jan 22 23:22:45 box kernel: [463040.195610]  ? process_one_work+0x340/0x340
Jan 22 23:22:45 box kernel: [463040.196716]  kthread+0x107/0x120
Jan 22 23:22:45 box kernel: [463040.197795]  ? kthread_park+0x80/0x80
Jan 22 23:22:45 box kernel: [463040.198853]  ret_from_fork+0x1f/0x30
Jan 22 23:22:45 box kernel: [463040.199886] Modules linked in: softdog bridge stp llc amd64_edac_mod edac_mce_amd crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nls_iso8859_1 vfat fat aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 88XXau(O) k10temp fam15h_power sch_fq_codel iwlmvm iwlwifi ip_tables x_tables sha1_ssse3 sha1_generic ipv6 autofs4 i2c_piix4 e1000e amdgpu tg3 ahci libahci chash gpu_sched ttm
Jan 22 23:22:45 box kernel: [463040.206227] CR2: 0000000000000068
Jan 22 23:22:45 box kernel: [463040.207245] ---[ end trace 23241437dae69698 ]---
Jan 22 23:22:45 box kernel: [463041.395548] RIP: 0010:iwl_trans_pcie_txq_enable+0x59/0x3f0 [iwlwifi]
Jan 22 23:22:45 box kernel: [463041.396598] Code: 8b a4 c7 60 91 00 00 f0 48 0f ab 87 60 a1 00 00 73 0d 80 3d 0c 50 02 00 00 0f 84 4f 03 00 00 44 89 c7 e8 1a b1 15 f5 4d 85 ed <49> 89 44 24 68 0f 84 94 02 00 00 41 0f b6 87 42 a2 00 00 39 d8 0f
Jan 22 23:22:45 box kernel: [463041.399568] RSP: 0018:ffffa0140c417cc0 EFLAGS: 00010246
Jan 22 23:22:45 box kernel: [463041.400539] RAX: 00000000000009c4 RBX: 000000000000001f RCX: 0000000000000000
Jan 22 23:22:45 box kernel: [463041.402465] RDX: 0000000000000000 RSI: 000000000000001f RDI: 0000000000002710
Jan 22 23:22:45 box kernel: [463041.404431] RBP: 0000000000000000 R08: 0000000000002710 R09: 0000000000000001
Jan 22 23:22:45 box kernel: [463041.406473] R10: 0000000000000040 R11: 0000000000000007 R12: 0000000000000000
Jan 22 23:22:45 box kernel: [463041.408588] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9a9970e50018
Jan 22 23:22:45 box kernel: [463041.410766] FS:  0000000000000000(0000) GS:ffff9a9977200000(0000) knlGS:0000000000000000
Jan 22 23:22:45 box kernel: [463041.413051] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 22 23:22:45 box kernel: [463041.414234] CR2: 0000000000000068 CR3: 00000001f5440000 CR4: 00000000001406f0
Comment 1 Emmanuel Grumbach 2019-01-24 12:27:29 UTC
can you please attach (and compress) iwlwifi.ko.

I hope it was compiled with debug data so that we can check where you got the NULL pointer exception.

Thanks.
Comment 2 Joost de Greef 2019-01-24 13:12:38 UTC
Sorry, I do not have debug enabled.
I'll build a debug version and report back if the issue occurs again.

This is enough to produce something debuggable?
CONFIG_DEBUG_INFO=y
CONFIG_GDB_SCRIPTS=y
CONFIG_FRAME_WARN=1024
CONFIG_READABLE_ASM=y
CONFIG_DEBUG_FS=y
CONFIG_FRAME_POINTER=y
CONFIG_STACK_VALIDATION=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x01b6
CONFIG_DEBUG_KERNEL=y

Thank you for your time.
Comment 3 Emmanuel Grumbach 2019-01-24 13:35:49 UTC
yes, that should be good.
Comment 4 Emmanuel Grumbach 2019-02-04 04:30:23 UTC
Ping ?
Comment 5 Joost de Greef 2019-02-04 09:04:17 UTC
(In reply to Emmanuel Grumbach from comment #4)
> Ping ?

No crash with the debug version yet. Sorry :-)
Comment 6 Emmanuel Grumbach 2019-02-25 21:05:15 UTC
let's close this and re-open if needed.

Note You need to log in before you can comment on or make changes to this bug.