Bug 209041

Summary: sysfs group 'power' not found for kobject requester
Product: Drivers Reporter: michallinuxstuff
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.7.9-100 (fc31 build) Subsystem:
Regression: No Bisected commit-id:

Description michallinuxstuff 2020-08-26 08:51:18 UTC
OS: Fedora31

During the attempt of binding the ioatdma pci devices to the uio_pci_generic driver, kernel complains a bit with the following trace:

[308509.804434] pcieport 0000:00:1c.0: Enabling MPC IRBNCE
[308509.804440] pcieport 0000:00:1c.0: Intel PCH root port ACS workaround enabled
[308509.804718] pci 0000:00:1e.0: PCI bridge to [bus 11]
[308520.052190] ioatdma 0000:80:04.7: Removing dma and dca services
[308520.052311] igb 0000:04:00.0: DCA disabled
[308520.052346] ioatdma 0000:80:04.6: Removing dma and dca services
[308520.052350] igb 0000:04:00.1: DCA disabled
[308520.052353] igb 0000:04:00.1: DCA disabled
[308520.052355] igb 0000:04:00.2: DCA disabled
[308520.052360] ioatdma 0000:80:04.5: Removing dma and dca services
[308520.052365] igb 0000:04:00.3: DCA disabled
[308520.052393] ------------[ cut here ]------------
[308520.052394] sysfs group 'power' not found for kobject 'sequester'
[308520.052399] igb 0000:04:00.3: DCA disabled
[308520.052411] igb 0000:04:00.2: DCA disabled
[308520.052427] WARNING: CPU: 44 PID: 1124753 at fs/sysfs/group.c:279 sysfs_remove_group+0x74/0x80
[308520.052428] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio nbd ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter usdm_drv(OE) ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad ib_ipoib scsi_transport_iscsi ib_cm mlx5_ib intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ib_uverbs kvm ib_core iTCO_wdt irqbypass iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mlx5_core rapl act_ct intel_cstate qat_c62x(OE) nf_flow_table ipmi_si nf_nat ipmi_devintf intel_uncore intel_qat(OE) mlxfw joydev ipmi_msghandler pcspkr mei_me lpc_ich i2c_i801 uio mei ioatdma ip_tables xfs
[308520.052491]  mgag200 drm_kms_helper nf_conntrack drm_vram_helper drm_ttm_helper ttm isci nf_defrag_ipv6 nf_defrag_ipv4 drm nvme pci_hyperv_intf ixgbe libcrc32c crc32c_intel igb libsas nvme_core scsi_transport_sas mdio dca i2c_algo_bit wmi uas usb_storage [last unloaded: uio_pci_generic]
[308520.052508] CPU: 44 PID: 1124753 Comm: bash Tainted: G           OE     5.7.9-100.fc31.x86_64 #1
[308520.052509] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[308520.052513] RIP: 0010:sysfs_remove_group+0x74/0x80
[308520.052515] Code: ff 5b 48 89 ef 5d 41 5c e9 19 be ff ff 48 89 ef e8 a1 b9 ff ff eb cc 49 8b 14 24 48 8b 33 48 c7 c7 30 ab 3b bd e8 ce 8b d1 ff <0f> 0b 5b 5d 41 5c c3 0f 1f 44 00 00 0f 1f 44 00 00 48 85 f6 74 31
[308520.052517] RSP: 0018:ffffb6010f847c90 EFLAGS: 00010286
[308520.052518] RAX: 0000000000000039 RBX: ffffffffbd10f580 RCX: 0000000000000007
[308520.052519] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff8b449ed19cc0
[308520.052520] RBP: 0000000000000000 R08: 000000000000aa8a R09: 0000000000000003
[308520.052521] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b3ada833c00
[308520.052521] R13: 0000000000000282 R14: 0000000000000000 R15: 0000000000000000
[308520.052523] FS:  00007f41a5e6a740(0000) GS:ffff8b449ed00000(0000) knlGS:0000000000000000
[308520.052524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[308520.052525] CR2: 00005610d36f8690 CR3: 0000000e5ab54003 CR4: 00000000001606e0
[308520.052527] Call Trace:
[308520.052553]  device_del+0x97/0x3f0
[308520.052556]  device_unregister+0x16/0x60
[308520.052559]  device_destroy+0x39/0x40
[308520.052565]  dca_remove_requester+0x69/0xa0 [dca]
[308520.052591]  ? ixgbe_notify_dca+0x40/0x40 [ixgbe]
[308520.052598]  __ixgbe_notify_dca+0x56/0xa0 [ixgbe]
[308520.052603]  driver_for_each_device+0x5c/0x90
[308520.052610]  ixgbe_notify_dca+0x25/0x40 [ixgbe]
[308520.052615]  notifier_call_chain+0x4c/0x70
[308520.052619]  blocking_notifier_call_chain+0x48/0x60
[308520.052622]  unregister_dca_provider+0x23/0x130 [dca]
[308520.052626]  ioat_remove+0x4c/0x90 [ioatdma]
[308520.052633]  pci_device_remove+0x3b/0xa0
[308520.052636]  device_release_driver_internal+0xe4/0x1c0
[308520.052640]  unbind_store+0xef/0x120
[308520.052644]  kernfs_fop_write+0xce/0x1b0
[308520.052649]  vfs_write+0xb6/0x1a0
[308520.052652]  ksys_write+0x4f/0xc0
[308520.052659]  do_syscall_64+0x5b/0x1c0
[308520.052678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[308520.052681] RIP: 0033:0x7f41a5f5f4a7
[308520.052683] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[308520.052685] RSP: 002b:00007ffce7dfe838 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[308520.052686] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f41a5f5f4a7
[308520.052687] RDX: 000000000000000d RSI: 00005610d37ad8e0 RDI: 0000000000000001
[308520.052688] RBP: 00005610d37ad8e0 R08: 000000000000000a R09: 000000000000000c
[308520.052689] R10: 00005610d37acad0 R11: 0000000000000246 R12: 000000000000000d
[308520.052690] R13: 00007f41a6030500 R14: 000000000000000d R15: 00007f41a6030700
[308520.052693] ioatdma 0000:80:04.4: Removing dma and dca services
[308520.052694] ---[ end trace 4b051fba29d259b7 ]---

[308520.052702] BUG: kernel NULL pointer dereference, address: 0000000000000020
[308520.052755] ioatdma 0000:80:04.3: Removing dma and dca services
[308520.052758] #PF: supervisor read access in kernel mode
[308520.052818] #PF: error_code(0x0000) - not-present page
[308520.052846] PGD 0 P4D 0 
[308520.052867] Oops: 0000 [#1] SMP PTI
[308520.052891] CPU: 44 PID: 1124753 Comm: bash Tainted: G        W  OE     5.7.9-100.fc31.x86_64 #1
[308520.052936] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.06.0007.082420181029 08/24/2018
[308520.053001] RIP: 0010:klist_put+0x16/0x80
[308520.053029] Code: 89 ef c6 07 00 0f 1f 40 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 55 41 89 f5 41 54 49 89 fc 55 53 48 8b 2f 48 83 e5 fe 48 89 ef <48> 8b 5d 20 e8 b1 1d 4f 00 45 84 ed 74 10 49 8b 04 24 a8 01 75 42
[308520.053097] ioatdma 0000:80:04.2: Removing dma and dca services
[308520.053114] RSP: 0018:ffffb6010f847c88 EFLAGS: 00010246
[308520.053155] ioatdma 0000:80:04.1: Removing dma and dca services
[308520.053174] RAX: ffff8b3bb3d2b900 RBX: ffff8b3ada833c00 RCX: 0000000000000007
[308520.053176] RDX: 0000000000000007 RSI: 0000000000000001 RDI: 0000000000000000
[308520.053288] RBP: 0000000000000000 R08: 000000000000aa8a R09: 0000000000000003
[308520.053328] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b3bb3d2b928
[308520.053371] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[308520.053409] FS:  00007f41a5e6a740(0000) GS:ffff8b449ed00000(0000) knlGS:0000000000000000
[308520.053450] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[308520.053481] CR2: 0000000000000020 CR3: 0000000e5ab54003 CR4: 00000000001606e0
[308520.053518] Call Trace:
[308520.053541]  device_del+0xa9/0x3f0
[308520.053568]  device_unregister+0x16/0x60
[308520.053585] ioatdma 0000:00:04.7: Removing dma and dca services
[308520.053598]  device_destroy+0x39/0x40
[308520.053601]  dca_remove_requester+0x69/0xa0 [dca]
[308520.053743]  ? ixgbe_notify_dca+0x40/0x40 [ixgbe]
[308520.053746] ioatdma 0000:80:04.0: Removing dma and dca services
[308520.053845]  __ixgbe_notify_dca+0x56/0xa0 [ixgbe]
[308520.053896]  driver_for_each_device+0x5c/0x90
[308520.053955]  ixgbe_notify_dca+0x25/0x40 [ixgbe]
[308520.053988]  notifier_call_chain+0x4c/0x70
[308520.054019]  blocking_notifier_call_chain+0x48/0x60
[308520.054050]  unregister_dca_provider+0x23/0x130 [dca]
[308520.054081]  ioat_remove+0x4c/0x90 [ioatdma]
[308520.054110]  pci_device_remove+0x3b/0xa0
[308520.054138]  device_release_driver_internal+0xe4/0x1c0
[308520.054168]  unbind_store+0xef/0x120
[308520.054214]  kernfs_fop_write+0xce/0x1b0
[308520.054253]  vfs_write+0xb6/0x1a0
[308520.054688] ioatdma 0000:00:04.4: Removing dma and dca services
[308520.054766] ioatdma 0000:00:04.6: Removing dma and dca services
[308520.054778] ioatdma 0000:00:04.5: Removing dma and dca services
[308520.055305] ioatdma 0000:00:04.3: Removing dma and dca services
[308520.055447]  ksys_write+0x4f/0xc0
[308520.055451]  do_syscall_64+0x5b/0x1c0
[308520.055455]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[308520.056103] ioatdma 0000:00:04.0: Removing dma and dca services
[308520.056924] ioatdma 0000:00:04.1: Removing dma and dca services
[308520.057134] RIP: 0033:0x7f41a5f5f4a7
[308520.057236] ioatdma 0000:00:04.2: Removing dma and dca services
[308520.066189] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[308520.067419] RSP: 002b:00007ffce7dfe838 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[308520.068041] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f41a5f5f4a7
[308520.068646] RDX: 000000000000000d RSI: 00005610d37ad8e0 RDI: 0000000000000001
[308520.069280] RBP: 00005610d37ad8e0 R08: 000000000000000a R09: 000000000000000c
[308520.069822] R10: 00005610d37acad0 R11: 0000000000000246 R12: 000000000000000d
[308520.070360] R13: 00007f41a6030500 R14: 000000000000000d R15: 00007f41a6030700
[308520.070895] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio nbd ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat iptable_mangle iptable_raw iptable_security rfkill ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter usdm_drv(OE) ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad ib_ipoib scsi_transport_iscsi ib_cm mlx5_ib intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ib_uverbs kvm ib_core iTCO_wdt irqbypass iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mlx5_core rapl act_ct intel_cstate qat_c62x(OE) nf_flow_table ipmi_si nf_nat ipmi_devintf intel_uncore intel_qat(OE) mlxfw joydev ipmi_msghandler pcspkr mei_me lpc_ich i2c_i801 uio mei ioatdma ip_tables xfs
[308520.070924]  mgag200 drm_kms_helper nf_conntrack drm_vram_helper drm_ttm_helper ttm isci nf_defrag_ipv6 nf_defrag_ipv4 drm nvme pci_hyperv_intf ixgbe libcrc32c crc32c_intel igb libsas nvme_core scsi_transport_sas mdio dca i2c_algo_bit wmi uas usb_storage [last unloaded: uio_pci_generic]
[308520.078656] CR2: 0000000000000020
[308520.079461] ---[ end trace 4b051fba29d259b8 ]---
[308520.185276] RIP: 0010:klist_put+0x16/0x80
[308520.186007] Code: 89 ef c6 07 00 0f 1f 40 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 41 55 41 89 f5 41 54 49 89 fc 55 53 48 8b 2f 48 83 e5 fe 48 89 ef <48> 8b 5d 20 e8 b1 1d 4f 00 45 84 ed 74 10 49 8b 04 24 a8 01 75 42
[308520.187514] RSP: 0018:ffffb6010f847c88 EFLAGS: 00010246
[308520.188279] RAX: ffff8b3bb3d2b900 RBX: ffff8b3ada833c00 RCX: 0000000000000007
[308520.189053] RDX: 0000000000000007 RSI: 0000000000000001 RDI: 0000000000000000
[308520.189819] RBP: 0000000000000000 R08: 000000000000aa8a R09: 0000000000000003
[308520.190589] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8b3bb3d2b928
[308520.191359] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[308520.192126] FS:  00007f41a5e6a740(0000) GS:ffff8b449ed00000(0000) knlGS:0000000000000000
[308520.192902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[308520.193669] CR2: 0000000000000020 CR3: 0000000e5ab54003 CR4: 00000000001606e0

It doesn't panic, but the process which attempted to write to the sysfs interface (new_id in this case) freezes and goes completely unresponsive. It can be relatively easy to reproduced while trying to bind|unbind multiple devices at once. Eventually, the BUG pops up. The aftermath looks similar to:

# Bunch of dead Bash background processes stuck while writing to new_id
$
l-wx------ 1 root    root    64 Aug 26 10:27 /proc/127340/fd/1 -> /sys/bus/pci/drivers/uio_pci_generic/new_id
l-wx------ 1 root    root    64 Aug 26 10:28 /proc/127407/fd/1 -> /sys/bus/pci/drivers/uio_pci_generic/new_id
l-wx------ 1 root    root    64 Aug 26 10:28 /proc/127398/fd/1 -> /sys/bus/pci/drivers/uio_pci_generic/new_id
l-wx------ 1 root    root    64 Aug 26 10:28 /proc/127391/fd/1 -> /sys/bus/pci/drivers/uio_pci_generic/new_id
l-wx------ 1 root    root    64 Aug 26 10:28 /proc/127379/fd/1 -> /sys/bus/pci/drivers/uio_pci_generic/new_id
l-wx------ 1 root    root    64 Aug 26 10:28 /proc/127374/fd/1 -> /sys/bus/pci/drivers/uio_pci_generic/new_id

The 'requester1022' seems to be the dca sysfs object (/sys/class/dca/requester+([0-9])) which is being removed when, in this case it seems, igb disables DCA. Not sure how this call ties together though so any hints on what's going here and how to avoid it would be appreciated. :) If there's any additional info needed, please let me know.