Bug 206133 - removing igb causes GPF at sysfs_remove_group
Summary: removing igb causes GPF at sysfs_remove_group
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-08 19:51 UTC by Sami Farin
Modified: 2020-01-23 13:24 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.19.93
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Sami Farin 2020-01-08 19:51:47 UTC
Is this patch (just guessing :-D ) needed for 4.19.93, I get the following splat when I do "rmmod igb" on Ryzen x86_64 / Fedora 30?
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=48a322b6f9965b2f1e4ce81af972f0e287b07ed0
I have wireguard module loaded but not in use when GPF happened.

I wanted to remove igb because network device renaming stopped working when using NetworkManager+systemd (tried both udev 70-persistent-net.rules and systemd network/10-persistent-net.link).

general protection fault: 0000 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 6888 Comm: rmmod Tainted: G           O    T 4.19.93+ #41
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X370 Taichi, BIOS P5.10 12/17/2018
RIP: 0010:remove_files.isra.0+0x1f/0x70
Code: fe ff ff ff eb 9b 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 24 48 8b 06 48 89 f3 48 85 c0 74 19 <48> 8b 30 31 d2 48 89 ef 48 83 c3 08 e8 00 d6 ff ff 48 8b 03 48 85
RSP: 0018:ffffa2d943a53c90 EFLAGS: 00010286
RAX: a0caaab371d2d028 RBX: ffffa15d294063c0 RCX: 0000000000000000
RDX: ffffa15d3aced578 RSI: ffffa15d294063c0 RDI: ffffa15d235dfbb0
RBP: ffffa15d235dfbb0 R08: 0000000000000000 R09: ffffa15d23766458
R10: 0000000000000000 R11: ffffa15d2a17a218 R12: ffffa15d3aced578
R13: 0000000000000000 R14: ffffa15d3acec1d0 R15: 0000000000000000
FS:  00007f1ef0501740(0000) GS:ffffa15d3e940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561a3c85fea8 CR3: 00000007e1826000 CR4: 00000000003406e0
Call Trace:
 sysfs_remove_group+0x3d/0x80
 sysfs_remove_groups+0x29/0x40
 device_remove_attrs+0x42/0x80
 device_del+0x162/0x380
 cdev_device_del+0x15/0x30
 posix_clock_unregister+0x21/0x50
 ptp_clock_unregister+0x6e/0x80
 igb_ptp_stop+0x1f/0x50 [igb]
 igb_remove+0x47/0x160 [igb]
 pci_device_remove+0x3b/0xa0
 device_release_driver_internal+0x183/0x250
 driver_detach+0x53/0x84
 bus_remove_driver+0x55/0xc6
 pci_unregister_driver+0x29/0xb0
 __x64_sys_delete_module+0x176/0x2d0
 ? exit_to_usermode_loop+0x74/0xd0
 do_syscall_64+0x6f/0x329
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1ef0629acb
Code: 73 01 c3 48 8b 0d bd 33 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8d 33 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd0a515eb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000561a3c855e20 RCX: 00007f1ef0629acb
RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000561a3c855e88
RBP: 00007ffd0a515f18 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f1ef069dac0 R11: 0000000000000206 R12: 00007ffd0a5160e0
R13: 00007ffd0a517f8a R14: 0000561a3c8552a0 R15: 0000561a3c855e20
Modules linked in: nfnetlink_acct ip6table_mangle nf_log_ipv6 xt_hl ip6t_REJECT nf_reject_ipv6 xt_state ip6t_rt ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_raw iptable_mangle nf_log_ipv4 nf_log_common xt_LOG xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_owner xt_length xt_limit xt_multiport xt_set xt_conntrack iptable_filter arptable_filter arp_tables dm_integrity nf_conntrack_netlink ip_set_bitmap_port ip_set_hash_mac ip_set_hash_net ip_set nfnetlink algif_hash algif_skcipher af_alg bnep hwmon_vid snd_usb_audio btusb btrtl snd_usbmidi_lib btbcm btintel snd_hwdep snd_rawmidi bluetooth ecdh_generic iwlmvm pktcdvd mac80211 iwlwifi kvm_amd kvm irqbypass cfg80211 wmi_bmof snd_hda_codec_realtek sp5100_tco k10temp snd_hda_codec_generic snd_hda_codec_hdmi i2c_piix4 snd_hda_intel
 snd_hda_codec snd_hda_core rtc_cmos acpi_cpufreq snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm wireguard(O) binfmt_misc ip6_udp_tunnel udp_tunnel sch_cake tcp_cubic tcp_westwood br_netfilter bridge stp llc ip_tables uas usb_storage usbhid rfkill mxm_wmi igb(-) ccp xhci_pci xhci_hcd usbcore usb_common wmi button 8021q mrp sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_timer snd soundcore tun xt_tcpudp x_tables tcp_bbr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sch_fq_codel sch_htb sch_pie fuse analog gameport joydev i2c_dev ecryptfs autofs4 amdkfd amd_iommu_v2 [last unloaded: pcspkr]
---[ end trace 4b6d51a0a27b5c23 ]---
RIP: 0010:remove_files.isra.0+0x1f/0x70
Code: fe ff ff ff eb 9b 0f 1f 44 00 00 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 24 48 8b 06 48 89 f3 48 85 c0 74 19 <48> 8b 30 31 d2 48 89 ef 48 83 c3 08 e8 00 d6 ff ff 48 8b 03 48 85
RSP: 0018:ffffa2d943a53c90 EFLAGS: 00010286
RAX: a0caaab371d2d028 RBX: ffffa15d294063c0 RCX: 0000000000000000
RDX: ffffa15d3aced578 RSI: ffffa15d294063c0 RDI: ffffa15d235dfbb0
RBP: ffffa15d235dfbb0 R08: 0000000000000000 R09: ffffa15d23766458
R10: 0000000000000000 R11: ffffa15d2a17a218 R12: ffffa15d3aced578
R13: 0000000000000000 R14: ffffa15d3acec1d0 R15: 0000000000000000
FS:  00007f1ef0501740(0000) GS:ffffa15d3e940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561a3c85fea8 CR3: 00000007e1826000 CR4: 00000000003406e0
Comment 1 Sami Farin 2020-01-09 16:23:35 UTC
4.19.93 has a33121e5487b424339636b25c35d3a180eaa5f5e , but I didn't get this splat with 4.19.90......
Comment 2 Chaser Huang (Chaserhkj) 2020-01-11 12:43:52 UTC
I encountered this bug on kernel branch 5.4 with versions >= 5.4.8 as well.

This commit (https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a33121e5487b424339636b25c35d3a180eaa5f5e) seems to be the pitfall, it's in versions >= 5.4.8 but not in versions < 5.4.8. And I reverse-patched the whole commit on 5.4.8, actually fixing the issue. However I'm not sure with the specific problem in the commit and could not figure out a fix yet.

Note that this bug's impact is bigger than it seems. Although removing a network driver module is very rare in real use, it might actually happen in daily VM usages. A typical use is to pass-through a PCI device from the host to a VM so that the VM could use it, and if it is a network adapter using igb that is being pass-through'd, the hypervisor would need to remove the associated driver module first before setting up the IOMMU. And I could confirm that this bug would cause the hypervisor(In my case libvirt/KVM) freezing and VM failing to start in these use cases. This is actually how I encountered the bug.
Comment 3 Sami Farin 2020-01-23 13:24:57 UTC
probably fixed by 75718584cb3c64e6269109d4d54f888ac5a5fd15

Note You need to log in before you can comment on or make changes to this bug.