Bug 201933 - r8169 hangs with Oops/null pointer deref on shutdown
Summary: r8169 hangs with Oops/null pointer deref on shutdown
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-08 18:20 UTC by Andy Furniss
Modified: 2018-12-27 19:34 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.20
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg from 4.19 which works OK (58.82 KB, text/plain)
2018-12-08 18:47 UTC, Andy Furniss
Details
dmesg on linus master showing issue triggerd by ip li set down (63.79 KB, text/plain)
2018-12-09 09:58 UTC, Andy Furniss
Details

Description Andy Furniss 2018-12-08 18:20:01 UTC
On 4.20-rc? halt/reboot r8169 hangs. ip li set down seems to get same error.

Dec  8 16:14:56 ph4 kernel: r8169 0000:06:00.0 enp6s0: Link is Down
Dec  8 16:14:57 ph4 kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000003a0
Dec  8 16:14:57 ph4 kernel: PGD 22acfc067 P4D 22acfc067 PUD 232a33067 PMD 0 
Dec  8 16:14:57 ph4 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Dec  8 16:14:57 ph4 kernel: CPU: 2 PID: 22126 Comm: ip Not tainted 4.20.0-rc5 #1
Dec  8 16:14:57 ph4 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./970 Extreme3 R2.0, BIOS P1.60 06/05/2014
Dec  8 16:14:57 ph4 kernel: RIP: 0010:queue_work_on+0x4/0x20
Dec  8 16:14:57 ph4 kernel: Code: ff 48 c7 c7 20 55 de 81 e8 02 58 04 00 c6 05 aa 1c 3e 01 01 e9 75 ff ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 53 9c 5b fa <f0> 48 0f ba 2a 00 73 06 31 c0 53 9d 5b c3 e8 99 fb ff ff b8 01 00
Dec  8 16:14:57 ph4 kernel: RSP: 0018:ffffc90000d473a8 EFLAGS: 00010002
Dec  8 16:14:57 ph4 kernel: RAX: ffff8882362c4000 RBX: 0000000000000002 RCX: 0000000000000000
Dec  8 16:14:57 ph4 kernel: RDX: 00000000000003a0 RSI: ffff88823680d200 RDI: 0000000000000008
Dec  8 16:14:57 ph4 kernel: RBP: ffff8882362c4840 R08: ffff888236400248 R09: ffff888236400340
Dec  8 16:14:57 ph4 kernel: R10: 0000000000000000 R11: ffffffff82043b28 R12: ffff8882341ca960
Dec  8 16:14:57 ph4 kernel: R13: ffff8882341ca828 R14: 000000000000001a R15: ffff8882362c4840
Dec  8 16:14:57 ph4 kernel: FS:  00007f5fc097f700(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
Dec  8 16:14:57 ph4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  8 16:14:57 ph4 kernel: CR2: 00000000000003a0 CR3: 0000000233440000 CR4: 00000000000006e0
Dec  8 16:14:57 ph4 kernel: Call Trace:
Dec  8 16:14:57 ph4 kernel:  rtl8169_interrupt+0xba/0x260 [r8169]
Dec  8 16:14:57 ph4 kernel:  ? _raw_spin_unlock_irqrestore+0xf/0x30
Dec  8 16:14:57 ph4 kernel:  __free_irq+0x177/0x2d0
Dec  8 16:14:57 ph4 kernel:  free_irq+0x29/0x60
Dec  8 16:14:57 ph4 kernel:  pci_free_irq+0x13/0x20
Dec  8 16:14:57 ph4 kernel:  rtl8169_close+0xe8/0x1f0 [r8169]
Dec  8 16:14:57 ph4 kernel:  __dev_close_many+0x8f/0x100
Dec  8 16:14:57 ph4 kernel:  __dev_change_flags+0xe9/0x210
Dec  8 16:14:57 ph4 kernel:  dev_change_flags+0x29/0x70
Dec  8 16:14:57 ph4 kernel:  do_setlink+0x310/0xf30
Dec  8 16:14:57 ph4 kernel:  ? __nla_parse+0x48/0x190
Dec  8 16:14:57 ph4 kernel:  rtnl_newlink+0x571/0x880
Dec  8 16:14:57 ph4 kernel:  ? __nla_parse+0x48/0x190
Dec  8 16:14:57 ph4 kernel:  ? nla_parse+0x13/0x20
Dec  8 16:14:57 ph4 kernel:  ? rtnl_newlink+0x167/0x880
Dec  8 16:14:57 ph4 kernel:  ? preempt_count_add+0x55/0xb0
Dec  8 16:14:57 ph4 kernel:  ? get_page_from_freelist+0xdf0/0x1130
Dec  8 16:14:57 ph4 kernel:  ? kmem_cache_alloc+0x37/0x180
Dec  8 16:14:57 ph4 kernel:  rtnetlink_rcv_msg+0x141/0x380
Dec  8 16:14:57 ph4 kernel:  ? rtnl_calcit.isra.25+0x120/0x120
Dec  8 16:14:57 ph4 kernel:  netlink_rcv_skb+0x48/0x120
Dec  8 16:14:57 ph4 kernel:  netlink_unicast+0x1be/0x270
Dec  8 16:14:57 ph4 kernel:  netlink_sendmsg+0x2b7/0x3f0
Dec  8 16:14:57 ph4 kernel:  ? netlink_unicast+0x270/0x270
Dec  8 16:14:57 ph4 kernel:  ___sys_sendmsg+0x10b/0x310
Dec  8 16:14:57 ph4 kernel:  ? dev_ioctl+0x19d/0x410
Dec  8 16:14:57 ph4 kernel:  ? preempt_count_add+0x74/0xb0
Dec  8 16:14:57 ph4 kernel:  ? __inode_wait_for_writeback+0x7a/0xe0
Dec  8 16:14:57 ph4 kernel:  ? init_wait_var_entry+0x40/0x40
Dec  8 16:14:57 ph4 kernel:  ? preempt_count_add+0x74/0xb0
Dec  8 16:14:57 ph4 kernel:  ? _raw_spin_lock+0xe/0x30
Dec  8 16:14:57 ph4 kernel:  ? _raw_spin_unlock+0xd/0x30
Dec  8 16:14:57 ph4 kernel:  ? __dentry_kill+0x11a/0x160
Dec  8 16:14:57 ph4 kernel:  __sys_sendmsg+0x64/0xb0
Dec  8 16:14:57 ph4 kernel:  do_syscall_64+0x6b/0x1c0
Dec  8 16:14:57 ph4 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec  8 16:14:57 ph4 kernel: RIP: 0033:0x7f5fc00c1110
Dec  8 16:14:57 ph4 kernel: Code: 8b 15 84 3d 2b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb cd 66 0f 1f 44 00 00 83 3d d9 96 2b 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae b4 00 00 48 89 04 24
Dec  8 16:14:57 ph4 kernel: RSP: 002b:00007ffd05cb11e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
Dec  8 16:14:57 ph4 kernel: RAX: ffffffffffffffda RBX: 000000005c0bee01 RCX: 00007f5fc00c1110
Dec  8 16:14:57 ph4 kernel: RDX: 0000000000000000 RSI: 00007ffd05cb1250 RDI: 0000000000000003
Dec  8 16:14:57 ph4 kernel: RBP: 0000000000000000 R08: 0000000000000001 R09: fefefeff77686d74
Dec  8 16:14:57 ph4 kernel: R10: 00000000000005e1 R11: 0000000000000246 R12: 0000000000000001
Dec  8 16:14:57 ph4 kernel: R13: 0000000000676200 R14: 00007ffd05cb1f18 R15: 00007ffd05cb19c8
Dec  8 16:14:57 ph4 kernel: Modules linked in: amdgpu snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mfd_core chash i2c_algo_bit gpu_sched snd_hda_codec_realtek drm_kms_helper snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec syscopyarea sysfillrect snd_hda_core sysimgblt realtek snd_hwdep fb_sys_fops snd_pcm snd_timer snd ttm xhci_pci drm xhci_hcd r8169 e100 serio_raw k10temp libphy mii drm_panel_orientation_quirks joydev soundcore i2c_piix4 acpi_cpufreq
Dec  8 16:14:57 ph4 kernel: CR2: 00000000000003a0
Dec  8 16:14:57 ph4 kernel: ---[ end trace 2457a68e913c7a5f ]---
Dec  8 16:14:57 ph4 kernel: RIP: 0010:queue_work_on+0x4/0x20
Dec  8 16:14:57 ph4 kernel: Code: ff 48 c7 c7 20 55 de 81 e8 02 58 04 00 c6 05 aa 1c 3e 01 01 e9 75 ff ff ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 53 9c 5b fa <f0> 48 0f ba 2a 00 73 06 31 c0 53 9d 5b c3 e8 99 fb ff ff b8 01 00
Dec  8 16:14:57 ph4 kernel: RSP: 0018:ffffc90000d473a8 EFLAGS: 00010002
Dec  8 16:14:57 ph4 kernel: RAX: ffff8882362c4000 RBX: 0000000000000002 RCX: 0000000000000000
Dec  8 16:14:57 ph4 kernel: RDX: 00000000000003a0 RSI: ffff88823680d200 RDI: 0000000000000008
Dec  8 16:14:57 ph4 kernel: RBP: ffff8882362c4840 R08: ffff888236400248 R09: ffff888236400340
Dec  8 16:14:57 ph4 kernel: R10: 0000000000000000 R11: ffffffff82043b28 R12: ffff8882341ca960
Dec  8 16:14:57 ph4 kernel: R13: ffff8882341ca828 R14: 000000000000001a R15: ffff8882362c4840
Dec  8 16:14:57 ph4 kernel: FS:  00007f5fc097f700(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000
Dec  8 16:14:57 ph4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  8 16:14:57 ph4 kernel: CR2: 00000000000003a0 CR3: 0000000233440000 CR4: 00000000000006e0
Comment 1 Andy Furniss 2018-12-08 18:47:48 UTC
Created attachment 279905 [details]
dmesg from 4.19 which works OK

dmesg from 4.19 which works OK - just to give more info about system
Comment 2 Andy Furniss 2018-12-09 09:58:17 UTC
Created attachment 279907 [details]
dmesg on linus master showing issue triggerd by ip li set down

dmesg on linus master showing issue triggerd by ip li set down
Comment 3 Heiner Kallweit 2018-12-09 19:26:25 UTC
To make sure I interpret the stack trace correctly:
Is CONFIG_DEBUG_SHIRQ set on your system? Because in this case __free_irq() executes the irq handler after freeing the irq.
Comment 4 Andy Furniss 2018-12-09 20:00:29 UTC
Yes, CONFIG_DEBUG_SHIRQ=y

I have no idea why it originally ended up set, this is an old LFS setup and stuff like that gets carried forward over many kernel versions.
Comment 5 Heiner Kallweit 2018-12-09 20:19:47 UTC
I would expect the following to fix the issue, can you test it?

---
 drivers/net/ethernet/realtek/r8169.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index bb1847fd6..8462553e3 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6414,7 +6414,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		goto out;
 	}
 
-	if (status & LinkChg)
+	if (status & LinkChg && tp->dev->phydev)
 		phy_mac_interrupt(tp->dev->phydev);
 
 	if (unlikely(status & RxFIFOOver &&
-- 
2.19.2
Comment 6 Heiner Kallweit 2018-12-09 20:22:56 UTC
FWIW, disabling CONFIG_DEBUG_SHIRQ should also fix the issue for you.
Comment 7 Andy Furniss 2018-12-09 20:50:44 UTC
The patch fixes it, thanks.
Comment 8 Andy Furniss 2018-12-09 21:00:24 UTC
(In reply to Heiner Kallweit from comment #6)
> FWIW, disabling CONFIG_DEBUG_SHIRQ should also fix the issue for you.

Will try this tomorrow.
Comment 9 Andy Furniss 2018-12-10 16:42:53 UTC
(In reply to Andy Furniss from comment #8)
> (In reply to Heiner Kallweit from comment #6)
> > FWIW, disabling CONFIG_DEBUG_SHIRQ should also fix the issue for you.
> 
> Will try this tomorrow.

Can confirm this also works for me.
Comment 10 Heiner Kallweit 2018-12-13 21:11:00 UTC
Fixed with ee28b30cbbe0 ("r8169: fix crash if CONFIG_DEBUG_SHIRQ is enabled").
Comment 11 Andy Furniss 2018-12-27 19:34:35 UTC
Thanks, I guess this has propagated by now so closing.

Note You need to log in before you can comment on or make changes to this bug.