If I run the following on my system: # while rmmod i7core_edac ; do modprobe i7core_edac ; done I quickly get a flood of warning and error messages in the kernel logs. First I get 3 correct cycles: EDAC PCI: Removed device 0 for i7core_edac EDAC PCI controller: DEV 0000:ff:03.0 EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:ff:03.0 EDAC MC0: Giving out device to 'i7core_edac.c' 'i7 core #0': DEV 0000:ff:03.0 EDAC PCI1: Giving out device to module 'i7core_edac' controller 'EDAC PCI controller': DEV '0000:ff:03.0' (POLLED) EDAC i7core: Driver loaded, 1 memory controller(s) found. but then a first failure occurs: EDAC PCI: Removed device 3 for i7core_edac EDAC PCI controller: DEV 0000:ff:03.0 EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:ff:03.0 EDAC i7core: Device not found: dev 00.0 PCI ID 8086:2c41 This last message is repeated a hundred times or so, and finally a WARNING followed by a BUG. ------------[ cut here ]------------ WARNING: at include/linux/kref.h:42 klist_next+0xba/0x110() Hardware name: System Product Name Modules linked in: i7core_edac(+) fuse xt_tcpudp xt_pkttype xt_physdev xt_LOG xt_limit nfsd lockd nfs_acl auth_rpcgss sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_utf8 loop snd_hda_codec_hdmi mt2060 snd_hda_codec_realtek snd_hda_intel snd_hda_codec e1000e container snd_hwdep snd_pcm button edac_core sr_mod dvb_usb_dib0700 dib0090 dib7000p dib7000m dib0070 dvb_usb dib8000 dvb_core dib3000mc rc_core sg iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm i2c_mux_gpio i2c_mux gpio_ich lpc_ich i2c_i801 crc32c_intel dibx000_common cdrom pcspkr microcode snd_timer snd edd soundcore snd_page_alloc autofs4 radeon ttm rtc_cmos drm_kms_helper drm i2c_algo_bit processor thermal_sys ata_generic [last unloaded: i7core_edac] Pid: 10962, comm: modprobe Not tainted 3.7.0-rc5 #12 Call Trace: [<ffffffff8104501c>] warn_slowpath_common+0x8c/0xc0 [<ffffffff81045065>] warn_slowpath_null+0x15/0x20 [<ffffffff815c34ea>] klist_next+0xba/0x110 [<ffffffff81317ac0>] ? pci_do_find_bus+0x60/0x60 [<ffffffff813be319>] next_device+0x9/0x30 [<ffffffff813beaf9>] bus_find_device+0x69/0x90 [<ffffffff81317b9b>] pci_get_dev_by_id+0x6b/0xb0 [<ffffffff81317ce0>] pci_get_subsys+0x30/0x40 [<ffffffff81317d03>] pci_get_device+0x13/0x20 [<ffffffffa0331448>] i7core_get_onedevice+0x3c/0x28e [i7core_edac] [<ffffffffa033177e>] i7core_probe+0x8e/0x16e [i7core_edac] [<ffffffff81316524>] local_pci_probe+0x74/0x100 [<ffffffff81316679>] __pci_device_probe+0xc9/0xf0 [<ffffffff81316c05>] pci_device_probe+0x35/0x60 [<ffffffff813c022f>] really_probe+0x10f/0x2f0 [<ffffffff813c05bb>] driver_probe_device+0x7b/0xa0 [<ffffffff813c063f>] __driver_attach+0x5f/0x90 [<ffffffff813c05e0>] ? driver_probe_device+0xa0/0xa0 [<ffffffff813be609>] bus_for_each_dev+0x49/0x80 [<ffffffff813bfe69>] driver_attach+0x19/0x20 [<ffffffff813bf83b>] bus_add_driver+0xdb/0x260 [<ffffffffa038d000>] ? 0xffffffffa038cfff [<ffffffff813c0c68>] driver_register+0xa8/0x150 [<ffffffffa038d000>] ? 0xffffffffa038cfff [<ffffffff81316d1c>] __pci_register_driver+0x5c/0x70 [<ffffffffa038d093>] i7core_init+0x93/0x1000 [i7core_edac] [<ffffffff8100024a>] do_one_initcall+0x8a/0x160 [<ffffffff810a99b5>] sys_init_module+0xc5/0x220 [<ffffffff815ff829>] system_call_fastpath+0x16/0x1b ---[ end trace e1e91b09c80c8edb ]--- BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffff815c30b8>] klist_put+0x28/0xa0 PGD 13699d067 PUD 137f4d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: i7core_edac(+) fuse xt_tcpudp xt_pkttype xt_physdev xt_LOG xt_limit nfsd lockd nfs_acl auth_rpcgss sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf nls_utf8 loop snd_hda_codec_hdmi mt2060 snd_hda_codec_realtek snd_hda_intel snd_hda_codec e1000e container snd_hwdep snd_pcm button edac_core sr_mod dvb_usb_dib0700 dib0090 dib7000p dib7000m dib0070 dvb_usb dib8000 dvb_core dib3000mc rc_core sg iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm i2c_mux_gpio i2c_mux gpio_ich lpc_ich i2c_i801 crc32c_intel dibx000_common cdrom pcspkr microcode snd_timer snd edd soundcore snd_page_alloc autofs4 radeon ttm rtc_cmos drm_kms_helper drm i2c_algo_bit processor thermal_sys ata_generic [last unloaded: i7core_edac] CPU 6 Pid: 10962, comm: modprobe Tainted: G W 3.7.0-rc5 #12 System manufacturer System Product Name/Z8NA-D6(C) [26130.232897] RIP: 0010:[<ffffffff815c30b8>] [<ffffffff815c30b8>] klist_put+0x28/0xa0 RSP: 0018:ffff880125389af8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000cb2dcb2c RDX: 00000000000035f0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff880125389b18 R08: ffffffff81e41cdc R09: ffffffff81e61e20 R10: 0000000000000044 R11: 000000000001fef0 R12: ffff8801391b24b8 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 FS: 00007fca8695c700(0000) GS:ffff88013f2c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 00000001357d5000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 10962, threadinfo ffff880125388000, task ffff880037a68700) Stack: ffff880125389b48 ffffffff81317ac0 0000000000000000 ffffffffa0332640 ffff880125389b38 ffffffff815c314c ffff880125389b38 ffff880125389bb8 ffff880125389b78 ffffffff813beb0a ffff8801397a68d8 ffff8801391b24b8 Call Trace: [<ffffffff81317ac0>] ? pci_do_find_bus+0x60/0x60 [<ffffffff815c314c>] klist_iter_exit+0x1c/0x30 [<ffffffff813beb0a>] bus_find_device+0x7a/0x90 [<ffffffff81317b9b>] pci_get_dev_by_id+0x6b/0xb0 [<ffffffff81317ce0>] pci_get_subsys+0x30/0x40 [<ffffffff81317d03>] pci_get_device+0x13/0x20 [<ffffffffa0331448>] i7core_get_onedevice+0x3c/0x28e [i7core_edac] [<ffffffffa033177e>] i7core_probe+0x8e/0x16e [i7core_edac] [<ffffffff81316524>] local_pci_probe+0x74/0x100 [<ffffffff81316679>] __pci_device_probe+0xc9/0xf0 [<ffffffff81316c05>] pci_device_probe+0x35/0x60 [<ffffffff813c022f>] really_probe+0x10f/0x2f0 [<ffffffff813c05bb>] driver_probe_device+0x7b/0xa0 [<ffffffff813c063f>] __driver_attach+0x5f/0x90 [<ffffffff813c05e0>] ? driver_probe_device+0xa0/0xa0 [<ffffffff813be609>] bus_for_each_dev+0x49/0x80 [<ffffffff813bfe69>] driver_attach+0x19/0x20 [<ffffffff813bf83b>] bus_add_driver+0xdb/0x260 [<ffffffffa038d000>] ? 0xffffffffa038cfff [<ffffffff813c0c68>] driver_register+0xa8/0x150 [<ffffffffa038d000>] ? 0xffffffffa038cfff [<ffffffff81316d1c>] __pci_register_driver+0x5c/0x70 [<ffffffffa038d093>] i7core_init+0x93/0x1000 [i7core_edac] [<ffffffff8100024a>] do_one_initcall+0x8a/0x160 [<ffffffff810a99b5>] sys_init_module+0xc5/0x220 [<ffffffff815ff829>] system_call_fastpath+0x16/0x1b Code: 00 00 00 55 48 89 e5 48 83 ec 20 4c 89 65 e8 49 89 fc 4c 89 75 f8 41 89 f6 48 89 5d e0 4c 89 6d f0 48 8b 1f 48 83 e3 fe 48 89 df <4c> 8b 6b 30 e8 4f 51 03 00 45 84 f6 74 2a 49 8b 04 24 a8 01 74 RIP [<ffffffff815c30b8>] klist_put+0x28/0xa0 RSP <ffff880125389af8> CR2: 0000000000000030 ---[ end trace e1e91b09c80c8edc ]--- A further call to lspci shows the following error: lspci: Cannot open /sys/bus/pci/devices/0000:ff:00.0/resource: No such file or directory PCI device 0000:ff:00.0 has simply disappeared (symbolic link /sys/bus/pci/devices/0000:ff:00.0 points to a nonexistent directory.)
Looks like it corrupts the pci device list and ends up over 'put'ting a pci device
Created attachment 86751 [details] i7core_edac: Fix PCI device reference count Thanks for the hint, Alan. You were right, the bug is caused by PCI device over-putting. This should fix it. I think more work is needed to get the error paths right though.
After investigation, the error paths are actually correct.