Bug 119761 - system hangs possibly due to brcmfmac regression
Summary: system hangs possibly due to brcmfmac regression
Status: RESOLVED DUPLICATE of bug 119451
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-07 18:36 UTC by favonia
Modified: 2016-06-26 11:40 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.7-rc2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description favonia 2016-06-07 18:36:14 UTC
4.7-rc2 is very unstable on my Dell laptop, while 4.6 runs fine (with occasional minor errors). During the last testing I saw two error messages before the system hanged, and brcmfmac seems to be cause. I am happy to provide more details if needed.

MESSAGE 1:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
IP: [<ffffffff814c3ed6>] enqueue_to_backlog+0x56/0x260
PGD 0 
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: xt_connmark iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntr
 i2c_i801 soundcore rtsx_pci_ms brcmutil memstick idma64 cfg80211 mei_me mei intel_lpss_pci processor_th
CPU: 1 PID: 433 Comm: irq/137-brcmf_p Tainted: G           O    4.7.0-rc2-mainline #1
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 01.02.00 04/07/2016
task: ffff8804a13edb80 ti: ffff88049af4c000 task.ti: ffff88049af4c000
RIP: 0010:[<ffffffff814c3ed6>]  [<ffffffff814c3ed6>] enqueue_to_backlog+0x56/0x260
RSP: 0018:ffff88049af4fca0  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8804bdc57c40 RCX: 000000000000007d
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8804bdc57d4c
RBP: ffff88049af4fce8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000729 R11: 000000000000094b R12: 0000000000017c40
R13: ffff88049af4fd08 R14: ffff8804a8ad0f00 R15: ffff8804bdc57d4c
FS:  0000000000000000(0000) GS:ffff8804bdc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000048 CR3: 0000000001806000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff88049af4fcb0 ffffffff814b34f1 ffff88049af4fcd8 0000000000000282
 0000000000000000 ffff8804a8ad0f00 ffff8804a8e86800 ffff8804a8ad0f00
 ffff880036c34000 ffff88049af4fd28 ffffffff814c412b ffffffff813120d3
Call Trace:
 [<ffffffff814b34f1>] ? skb_free_head+0x21/0x30
 [<ffffffff814c412b>] netif_rx_internal+0x4b/0x170
 [<ffffffff813120d3>] ? swiotlb_tbl_unmap_single+0xf3/0x120
 [<ffffffff814c5657>] netif_rx_ni+0x27/0xc0
 [<ffffffffa071d9e9>] brcmf_netif_rx+0x49/0x70 [brcmfmac]
 [<ffffffffa07224d4>] brcmf_msgbuf_process_rx+0x2b4/0x570 [brcmfmac]
 [<ffffffff81020071>] ? __xen_set_pgd_hyper+0xb1/0xd0
 [<ffffffff810d60b0>] ? irq_forced_thread_fn+0x70/0x70
 [<ffffffffa0723381>] brcmf_proto_msgbuf_rx_trigger+0x31/0xe0 [brcmfmac]
 [<ffffffffa072de8f>] brcmf_pcie_isr_thread+0x7f/0x110 [brcmfmac]
 [<ffffffff810d60d0>] irq_thread_fn+0x20/0x50
 [<ffffffff810d63ad>] irq_thread+0x12d/0x1c0
 [<ffffffff815d0a65>] ? __schedule+0x2f5/0x7a0
 [<ffffffff810d61d0>] ? wake_threads_waitq+0x30/0x30
 [<ffffffff810d6280>] ? irq_thread_dtor+0xb0/0xb0
 [<ffffffff81098ea8>] kthread+0xd8/0xf0
 [<ffffffff815d4e3f>] ret_from_fork+0x1f/0x40
 [<ffffffff81098dd0>] ? kthread_worker_fn+0x170/0x170
Code: 1c f5 60 9a 8e 81 9c 58 0f 1f 44 00 00 48 89 45 d0 fa 66 0f 1f 44 00 00 4c 8d bb 0c 01 00 00 4c 89
RIP  [<ffffffff814c3ed6>] enqueue_to_backlog+0x56/0x260
 RSP <ffff88049af4fca0>
CR2: 0000000000000048
---[ end trace bf3ebb7302f9750e ]---
note: irq/137-brcmf_p[433] exited with preempt_count 3
BUG: unable to handle kernel paging request at ffffffffffffffd8
IP: [<ffffffff810993f0>] kthread_data+0x10/0x20
PGD 1809067 PUD 180b067 PMD 0 
Oops: 0000 [#2] PREEMPT SMP

MESSAGE 2:

Modules linked in: xt_connmark iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntr
 i2c_i801 soundcore rtsx_pci_ms brcmutil memstick idma64 cfg80211 mei_me mei intel_lpss_pci processor_th
CPU: 1 PID: 433 Comm: irq/137-brcmf_p Tainted: G      D    O    4.7.0-rc2-mainline #1
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 01.02.00 04/07/2016
task: ffff8804a13edb80 ti: ffff88049af4c000 task.ti: ffff88049af4c000
RIP: 0010:[<ffffffff810993f0>]  [<ffffffff810993f0>] kthread_data+0x10/0x20
RSP: 0018:ffff88049af4f9a0  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8804a13edb80 RCX: 0000000000000000
RDX: ffff88049af4fe80 RSI: 0000000000000000 RDI: ffff8804a13edb80
RBP: ffff88049af4f9a0 R08: 0000000000000000 R09: 0000000000000005
R10: 00000000ffffffff R11: ffffffff81a7ab4d R12: 0000000000000000
R13: ffff8804a13ee268 R14: 0000000000000000 R15: ffff8804a13edb80
FS:  0000000000000000(0000) GS:ffff8804bdc40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd8 CR3: 0000000001806000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffff88049af4f9c0 ffffffff810d61f3 ffffffff81a76650 0000000000000000
 ffff88049af4f9f8 ffffffff81097114 ffff8804a13edb80 ffff88049af4fa28
 0000000000000000 0000000000000048 0000000000000046 ffff88049af4fa68
Call Trace:
 [<ffffffff810d61f3>] irq_thread_dtor+0x23/0xb0
 [<ffffffff81097114>] task_work_run+0x84/0xa0
 [<ffffffff8107dbf8>] do_exit+0x3b8/0xb60
 [<ffffffff8103175c>] oops_end+0x9c/0xd0
 [<ffffffff81065ca6>] no_context+0x176/0x3a0
 [<ffffffff81065fe3>] __bad_area_nosemaphore+0x113/0x210
 [<ffffffff810b0cbd>] ? enqueue_entity+0x1fd/0xdb0
 [<ffffffff810aa3cc>] ? __enqueue_entity+0x6c/0x70
 [<ffffffff810660f4>] bad_area_nosemaphore+0x14/0x20
 [<ffffffff810664c4>] __do_page_fault+0xd4/0x510
 [<ffffffff810b1913>] ? enqueue_task_fair+0xa3/0x940
 [<ffffffff81066922>] do_page_fault+0x22/0x30
 [<ffffffff815d6c38>] page_fault+0x28/0x30
 [<ffffffff814c3ed6>] ? enqueue_to_backlog+0x56/0x260
 [<ffffffff814b34f1>] ? skb_free_head+0x21/0x30
 [<ffffffff814c412b>] netif_rx_internal+0x4b/0x170
 [<ffffffff813120d3>] ? swiotlb_tbl_unmap_single+0xf3/0x120
 [<ffffffff814c5657>] netif_rx_ni+0x27/0xc0
 [<ffffffffa071d9e9>] brcmf_netif_rx+0x49/0x70 [brcmfmac]
 [<ffffffffa07224d4>] brcmf_msgbuf_process_rx+0x2b4/0x570 [brcmfmac]
 [<ffffffff81020071>] ? __xen_set_pgd_hyper+0xb1/0xd0
 [<ffffffff810d60b0>] ? irq_forced_thread_fn+0x70/0x70
 [<ffffffffa0723381>] brcmf_proto_msgbuf_rx_trigger+0x31/0xe0 [brcmfmac]
 [<ffffffffa072de8f>] brcmf_pcie_isr_thread+0x7f/0x110 [brcmfmac]
 [<ffffffff810d60d0>] irq_thread_fn+0x20/0x50
 [<ffffffff810d63ad>] irq_thread+0x12d/0x1c0
 [<ffffffff815d0a65>] ? __schedule+0x2f5/0x7a0
 [<ffffffff810d61d0>] ? wake_threads_waitq+0x30/0x30
 [<ffffffff810d6280>] ? irq_thread_dtor+0xb0/0xb0
 [<ffffffff81098ea8>] kthread+0xd8/0xf0
 [<ffffffff815d4e3f>] ret_from_fork+0x1f/0x40
 [<ffffffff81098dd0>] ? kthread_worker_fn+0x170/0x170
Code: 5c 5d c3 e8 4b 9c f6 ff e9 e2 fe ff ff 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48
RIP  [<ffffffff810993f0>] kthread_data+0x10/0x20
 RSP <ffff88049af4f9a0>
CR2: ffffffffffffffd8
---[ end trace bf3ebb7302f9750f ]---
Fixing recursive fault but reboot is needed!

Here's the output of lspci -v:

02:00.0 Network controller: Broadcom Corporation BCM43602 802.11ac Wireless LAN SoC (rev 01)
        Subsystem: Dell Device 0024
        Flags: bus master, fast devsel, latency 0, IRQ 137
        Memory at dd800000 (64-bit, non-prefetchable) [size=32K]
        Memory at dd400000 (64-bit, non-prefetchable) [size=4M]
        Capabilities: [48] Power Management version 3
        Capabilities: [58] MSI: Enable+ Count=1/16 Maskable- 64bit+
        Capabilities: [68] Vendor Specific Information: Len=44 <?>
        Capabilities: [ac] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [13c] Device Serial Number dc-23-a8-ff-ff-e1-44-1c
        Capabilities: [150] Power Budgeting <?>
        Capabilities: [160] Virtual Channel
        Capabilities: [1b0] Latency Tolerance Reporting
        Capabilities: [220] #15
        Capabilities: [240] L1 PM Substates
        Kernel driver in use: brcmfmac
        Kernel modules: brcmfmac
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-06-12 13:11:02 UTC
This report might be dupe of Bug 119451, which should be fixed as of https://git.kernel.org/torvalds/c/31143e2933d1675c4c1ba6ce125cdd95870edd85 (merged two days ago). Please check and if the bug still shows up in rc3 (due later today); if not please consider reproducing the problem without loading external drivers (the tainted flag indicates a out-of-tree module was loaded).
Comment 2 favonia 2016-06-25 01:37:09 UTC
Sorry I did not have time to test it until now. I am running 4.7 rc4 and it looks stable on my machine except some screen flickering. (There are other tickets for the flickering problem.) I think this bug was fixed as you described, and please feel free to mark it as resolved.
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-06-26 10:34:06 UTC
(In reply to favonia from comment #2)
> I think this bug was fixed as you described, 
thx for the feedback. 

> and please feel free to mark it as resolved.
I'm currently lacking permissions to do so, can you do that yourself please? tia!
Comment 4 favonia 2016-06-26 11:40:28 UTC
Possibly a duplicate of Bug 119451.

*** This bug has been marked as a duplicate of bug 119451 ***

Note You need to log in before you can comment on or make changes to this bug.