Bug 205621 - Linux next-20191121 NULL dereference on start-up, leading to unusable system
Summary: Linux next-20191121 NULL dereference on start-up, leading to unusable system
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-21 14:48 UTC by Nicholas Johnson
Modified: 2020-01-05 15:26 UTC (History)
2 users (show)

See Also:
Kernel Version: next-20191121
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The .config (232.28 KB, text/plain)
2019-11-21 14:48 UTC, Nicholas Johnson
Details
Photo of the stack trace (4.41 MB, image/jpeg)
2019-11-21 14:49 UTC, Nicholas Johnson
Details

Description Nicholas Johnson 2019-11-21 14:48:31 UTC
Created attachment 286005 [details]
The .config

Linux next-20191121 - .config file attached.

NULL dereference on start-up, leading to unusable system.

Used "git clone --depth=1 --single-branch --branch next-20191115 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git"

Where I am, I cannot use serial console to save the stack trace.

I will attach a photo of the screen.

The top few functions are:

kernfs_find_and_get_ns
sysfs_remove_group
netdev_queue_update_kobjects
netdev_unregister_kobject

This leads me to believe that it is part of networking.

I will do a bisect if I have to, but given how time consuming they are on a quad-core machine, I hope somebody else with a 32-core Threadripper can step up to the task.
Comment 1 Nicholas Johnson 2019-11-21 14:49:47 UTC
Created attachment 286007 [details]
Photo of the stack trace
Comment 2 Nicholas Johnson 2019-11-21 15:27:12 UTC
Oops, command was: "git clone --depth=1 --single-branch --branch next-20191121 https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git"
Comment 3 Dexuan Cui 2019-11-21 19:30:01 UTC
FYI: we see the same bug on a Linux VM (5.4.0-rc8-next-20191121) running on Hyper-V. We're trying to do a bisect.
Comment 5 Dexuan Cui 2019-11-23 00:24:46 UTC
From our test team:

It looks the culprit is:
net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject ( https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b8eb718348b8fb30b5a7d0a8fce26fb3f4ac741b )

and the bug has been fixed by:
net-sysfs: fix netdev_queue_add_kobject() breakage (https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=48a322b6f9965b2f1e4ce81af972f0e287b07ed0)
Comment 6 Sami Farin 2020-01-05 15:26:11 UTC
Is this patch also needed for 4.19.93, I get the following splat when I do "rmmod igb" on Ryzen x86_64 / Fedora 30?

general protection fault: 0000 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 6888 Comm: rmmod Tainted: G           O    T 4.19.93+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P5.10 12/17/2018
RIP: 0010:remove_files.isra.0+0x1f/0x70
Code: fe ff ff ff eb 9b 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 24 48 8b 06 48 89 f3 48 85 c0 74 19 <48> 8b 30 31 d2 48 89 ef 48 83 c3 08 e8 00 d6 ff ff 48 8b 03 48 85
RSP: 0018:ffffa2d943a53c90 EFLAGS: 00010286
RAX: a0caaab371d2d028 RBX: ffffa15d294063c0 RCX: 0000000000000000
RDX: ffffa15d3aced578 RSI: ffffa15d294063c0 RDI: ffffa15d235dfbb0
RBP: ffffa15d235dfbb0 R08: 0000000000000000 R09: ffffa15d23766458
R10: 0000000000000000 R11: ffffa15d2a17a218 R12: ffffa15d3aced578
R13: 0000000000000000 R14: ffffa15d3acec1d0 R15: 0000000000000000
FS:  00007f1ef0501740(0000) GS:ffffa15d3e940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561a3c85fea8 CR3: 00000007e1826000 CR4: 00000000003406e0
Call Trace:
 sysfs_remove_group+0x3d/0x80
 sysfs_remove_groups+0x29/0x40
 device_remove_attrs+0x42/0x80
 device_del+0x162/0x380
 cdev_device_del+0x15/0x30
 posix_clock_unregister+0x21/0x50
 ptp_clock_unregister+0x6e/0x80
 igb_ptp_stop+0x1f/0x50 [igb]
 igb_remove+0x47/0x160 [igb]
 pci_device_remove+0x3b/0xa0
 device_release_driver_internal+0x183/0x250
 driver_detach+0x53/0x84
 bus_remove_driver+0x55/0xc6
 pci_unregister_driver+0x29/0xb0
 __x64_sys_delete_module+0x176/0x2d0
 ? exit_to_usermode_loop+0x74/0xd0
 do_syscall_64+0x6f/0x329
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1ef0629acb
Code: 73 01 c3 48 8b 0d bd 33 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 33 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd0a515eb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000561a3c855e20 RCX: 00007f1ef0629acb
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000561a3c855e88
RBP: 00007ffd0a515f18 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f1ef069dac0 R11: 0000000000000206 R12: 00007ffd0a5160e0
R13: 00007ffd0a517f8a R14: 0000561a3c8552a0 R15: 0000561a3c855e20
Modules linked in: nfnetlink_acct ip6table_mangle nf_log_ipv6 xt_hl ip6t_REJECT nf_reject_ipv6 xt_state ip6t_rt ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_raw iptable_mangle nf_log_ipv4 nf_log_common xt_LOG xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_owner xt_length xt_limit xt_multiport xt_set xt_conntrack iptable_filter arptable_filter arp_tables dm_integrity nf_conntrack_netlink ip_set_bitmap_port ip_set_hash_mac ip_set_hash_net ip_set nfnetlink algif_hash algif_skcipher af_alg bnep hwmon_vid snd_usb_audio btusb btrtl snd_usbmidi_lib btbcm btintel snd_hwdep snd_rawmidi bluetooth ecdh_generic iwlmvm pktcdvd mac80211 iwlwifi kvm_amd kvm irqbypass cfg80211 wmi_bmof snd_hda_codec_realtek sp5100_tco k10temp snd_hda_codec_generic snd_hda_codec_hdmi i2c_piix4 snd_hda_intel
 snd_hda_codec snd_hda_core rtc_cmos acpi_cpufreq snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm wireguard(O) binfmt_misc ip6_udp_tunnel udp_tunnel sch_cake tcp_cubic tcp_westwood br_netfilter bridge stp llc ip_tables uas usb_storage usbhid rfkill mxm_wmi igb(-) ccp xhci_pci xhci_hcd usbcore usb_common wmi button 8021q mrp sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_timer snd soundcore tun xt_tcpudp x_tables tcp_bbr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_fq_codel sch_htb sch_pie fuse analog gameport joydev i2c_dev ecryptfs autofs4 amdkfd amd_iommu_v2 [last unloaded: pcspkr]
---[ end trace 4b6d51a0a27b5c23 ]---
RIP: 0010:remove_files.isra.0+0x1f/0x70
Code: fe ff ff ff eb 9b 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 24 48 8b 06 48 89 f3 48 85 c0 74 19 <48> 8b 30 31 d2 48 89 ef 48 83 c3 08 e8 00 d6 ff ff 48 8b 03 48 85
RSP: 0018:ffffa2d943a53c90 EFLAGS: 00010286
RAX: a0caaab371d2d028 RBX: ffffa15d294063c0 RCX: 0000000000000000
RDX: ffffa15d3aced578 RSI: ffffa15d294063c0 RDI: ffffa15d235dfbb0
RBP: ffffa15d235dfbb0 R08: 0000000000000000 R09: ffffa15d23766458
R10: 0000000000000000 R11: ffffa15d2a17a218 R12: ffffa15d3aced578
R13: 0000000000000000 R14: ffffa15d3acec1d0 R15: 0000000000000000
FS:  00007f1ef0501740(0000) GS:ffffa15d3e940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561a3c85fea8 CR3: 00000007e1826000 CR4: 00000000003406e0

Note You need to log in before you can comment on or make changes to this bug.