Bug 200855 - iwlwifi: Oops: null pointer dereference in iwl_trans_pcie_txq_enable
Summary: iwlwifi: Oops: null pointer dereference in iwl_trans_pcie_txq_enable
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Intel Linux Wireless
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-19 01:25 UTC by phil
Modified: 2018-10-24 17:14 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.14.52
Tree: Mainline
Regression: No


Attachments
dmesg (26.16 KB, text/plain)
2018-08-19 01:25 UTC, phil
Details
iwlwifi.ko (427.92 KB, application/octet-stream)
2018-08-19 01:26 UTC, phil
Details

Description phil 2018-08-19 01:25:33 UTC
Created attachment 277929 [details]
dmesg

I see this oops hit every couple of days on my Intel 8265 (in master mode) running a vanilla 4.14.52 kernel (Alpine Linux 3.8); dmesg attached.

Some searching turned up a very similar oops that someone had posted on pastebin, as well as on github (https://gist.github.com/aplund/7ba82370be0388abfa1974d13102ae9a), but I was unable to find a matching issue in the issue tracker.
Comment 1 phil 2018-08-19 01:26:29 UTC
Created attachment 277931 [details]
iwlwifi.ko
Comment 2 Emmanuel Grumbach 2018-08-19 18:03:47 UTC
So we fail here (last line):
0000000000008e7d <iwl_trans_pcie_txq_enable>:
    8e7d:       e8 00 00 00 00          callq  8e82 <iwl_trans_pcie_txq_enable+0x5>
    8e82:       41 57                   push   %r15
    8e84:       41 56                   push   %r14
    8e86:       49 89 fe                mov    %rdi,%r14
    8e89:       41 55                   push   %r13
    8e8b:       41 54                   push   %r12
    8e8d:       49 89 cd                mov    %rcx,%r13
    8e90:       55                      push   %rbp
    8e91:       53                      push   %rbx
    8e92:       41 89 d4                mov    %edx,%r12d
    8e95:       89 f3                   mov    %esi,%ebx
    8e97:       48 83 ec 20             sub    $0x20,%rsp
    8e9b:       65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
    8ea2:       00 00
    8ea4:       48 89 44 24 18          mov    %rax,0x18(%rsp)
    8ea9:       31 c0                   xor    %eax,%eax
    8eab:       48 63 c6                movslq %esi,%rax
    8eae:       66 89 54 24 02          mov    %dx,0x2(%rsp)
    8eb3:       4c 8b bc c7 08 7e 00    mov    0x7e08(%rdi,%rax,8),%r15
    8eba:       00
    8ebb:       f0 48 0f ab 87 08 8e    lock bts %rax,0x8e08(%rdi)
    8ec2:       00 00
    8ec4:       73 28                   jae    8eee <iwl_trans_pcie_txq_enable+0x71>
    8ec6:       80 3d 00 00 00 00 00    cmpb   $0x0,0x0(%rip)        # 8ecd <iwl_trans_pcie_txq_enable+0x50>
    8ecd:       75 1f                   jne    8eee <iwl_trans_pcie_txq_enable+0x71>
    8ecf:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
    8ed6:       44 89 44 24 04          mov    %r8d,0x4(%rsp)
    8edb:       c6 05 00 00 00 00 01    movb   $0x1,0x0(%rip)        # 8ee2 <iwl_trans_pcie_txq_enable+0x65>
    8ee2:       e8 00 00 00 00          callq  8ee7 <iwl_trans_pcie_txq_enable+0x6a>
    8ee7:       0f 0b                   ud2
    8ee9:       44 8b 44 24 04          mov    0x4(%rsp),%r8d
    8eee:       44 89 c7                mov    %r8d,%edi
    8ef1:       e8 00 00 00 00          callq  8ef6 <iwl_trans_pcie_txq_enable+0x79>
    8ef6:       4d 85 ed                test   %r13,%r13
    8ef9:       49 89 47 70             mov    %rax,0x70(%r15)


Clearly, r15 is 0. r15 is assigned as mov    0x7e08(%rdi,%rax,8),%r15  which teaches me that r15 much be the pointer to the txq. rdi is the first param to the function (trans) and apparently rax is the txq_id (the second parameter although this doesn't come natural from the calling convention, rax is has been assigned to be txq_id).
The txq assignment is: struct iwl_txq *txq = trans_pcie->txq[txq_id];


Bottom line, txq is NULL...
Note that we tried (and failed) to open AMPDU a bit before the crash and this is clearly not a classic scenario.
I really don't see how trans_pcie->txq[txq_id] could be NULL... If only we knew what was the value of txq_id...
Can you load iwlwifi with debug=0x80000000 ?
Comment 3 Luca Coelho 2018-10-12 09:47:23 UTC
Ping? Is this still reproducible?
Comment 4 Emmanuel Grumbach 2018-10-24 17:14:58 UTC
please re-open if you have the data we asked for.

Note You need to log in before you can comment on or make changes to this bug.