Mika Westerberg <mika.westerberg@linux.intel.com> reported the following issue (and included a patch to fix it; see the URL). I'm just opening this bugzilla as a place for my notes about the details. Commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive) changed the behavior so that directory removals will be done recursively. This means that the sysfs group might already be removed if its parent directory has been removed. The current code outputs warnings similar to following log snippet when it detects that there is no group for the given kobject: WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0() sysfs group ffffffff81c6f1e0 not found for kobject 'host7' Modules linked in: CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13 Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 Workqueue: kacpi_hotplug acpi_hotplug_work_fn 0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8 ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0 ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48 Call Trace: [<ffffffff817daab1>] dump_stack+0x45/0x56 [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50 [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70 [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0 [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50 [<ffffffff8142a0d0>] device_del+0x40/0x1b0 [<ffffffff8142a24d>] device_unregister+0xd/0x20 [<ffffffff8144131a>] scsi_remove_host+0xba/0x110 [<ffffffff8145f526>] ata_host_detach+0xc6/0x100 [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20 [<ffffffff812e8f48>] pci_device_remove+0x28/0x60 [<ffffffff8142d854>] __device_release_driver+0x64/0xd0 [<ffffffff8142d8de>] device_release_driver+0x1e/0x30 [<ffffffff8142d257>] bus_remove_device+0xf7/0x140 [<ffffffff8142a1b1>] device_del+0x121/0x1b0 [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20 [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0 [<ffffffff812fd90d>] hotplug_event+0xcd/0x160 [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60 [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22 [<ffffffff8105cf3a>] process_one_work+0x17a/0x430 [<ffffffff8105db29>] worker_thread+0x119/0x390 [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0 [<ffffffff81063a5d>] kthread+0xcd/0xf0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 On this particular machine I see ~16 of these message during Thunderbolt hot-unplug.
Created attachment 115341 [details] qemu setup I reproduced this problem on qemu using the attached setup by removing the AHCI controller with echo 1 > /sys/bus/pci/devices/0000:00:1f.2/remove
Created attachment 115351 [details] dmesg from qemu Attaching the dmesg log (including the "sysfs group not found" warnings) from qemu. The callgraph where the warnings come from is below. scsi_remove_host(shost) scsi_forget_host(shost) __scsi_remove_device(sdev) bsg_unregister_queue(sdev->request_queue) device_unregister(bcd->class_dev) device_del dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'" device_unregister(&sdev->sdev_dev) device_del dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'" class_intf->remove_dev # .remove_dev = sg_remove sg_remove device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index)) device_unregister device_del dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject 'sg0'" device_del(&sdev->sdev_gendev) dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'" bus_remove_device device_release_driver __device_release_driver sd_remove device_del(&sdkp->dev) dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'" del_gendisk(&sdkp->disk) delete_partition device_del dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject 'sda5'" device_remove_attrs device_remove_groups sysfs_remove_groups "sysfs group ffffffff81e3ba60 ('trace') not found for kobject 'sda5'" blk_unregister_queue(disk) blk_trace_remove_sysfs sysfs_remove_group "sysfs group ffffffff81e3ba60 ('trace') not found for kobject 'sda'" device_del(disk_to_dev(disk)) dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject 'sda'" device_unregister(&shost->shost_dev) device_del dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject 'host0'" device_del(&shost->shost_gendev) dpm_sysfs_remove sysfs_remove_group "sysfs group ffffffff81e70720 ('power') not found for kobject 'host0'"
I wonder what's the output if you printk() pos->s_name above the sysfs_remove_one(&acxt, pos) in __sysfs_remove_dir()?
So what happens here is we get device_del() for 0000:00:1f.2 first and that removes all stuff up to when it calls bus_remove_device() which stops the driver and triggers device_del() for ata_device on dev1.0 and then for dev1.0 itself. Next, it does device_del() for ata_link on link1 (which is under 0000:00:1f.2/ata1/) and for link1 itself. Then, it does device_del() for ata_port on ata1 and for ata1 itself (that is, 0000:00:1f.2/ata1/). That descends into the host0 subdirectory and removes it recursively before trying to unregister host0. That removes host0/target0:0:0/0:0:0:0/power among other things. So when it finally goes to delete bsg on 0:0:0:0, it finds that there's no "power" group below the bsg's child also called 0:0:0:0 (whoever designed that subsystem had a sick sense of humor) - because that group has been removed already. So in this particular case there is an ordering problem, because 0000:00:1f.2/ata1/host0/ should have been deleted before 0000:00:1f.2/ata1/. I'm not sure about the Thunderbolt case, though, will look into it tomorrow.
In any case I don't see a clean way to fix the above and the Mika's patch seems to be the simplest viable workaround.
Created attachment 115751 [details] Debug dmesg output from Acer Aspire S5 with a Thunderbolt unplug/replug cycle I've reproduced the issue on an Acer Aspire S5 w/ Thunderbolt and attached is a dmesg output containing a Thunderbolt unplug/replug cycle. The kernel 3.13-rc1 with a few additional patches on top including this one: Index: linux-pm/drivers/base/core.c =================================================================== --- linux-pm.orig/drivers/base/core.c +++ linux-pm/drivers/base/core.c @@ -1188,6 +1188,8 @@ void device_del(struct device *dev) struct device *parent = dev->parent; struct class_interface *class_intf; + dev_err(dev, "%s\n", __func__); + /* Notify clients of device removal. This call must come * before dpm_sysfs_remove(). */ Index: linux-pm/fs/sysfs/dir.c =================================================================== --- linux-pm.orig/fs/sysfs/dir.c +++ linux-pm/fs/sysfs/dir.c @@ -875,8 +875,10 @@ static void __sysfs_remove(struct sysfs_ do { pos = next; next = sysfs_next_descendant_post(pos, sd); - if (pos) + if (pos) { sysfs_remove_one(acxt, pos); + pr_err("%s: %s\n", __func__, pos->s_name); + } } while (next); }
To my eyes for PCI the problem is that pci_stop_dev() does a device_del() which removes the device's sysfs directories recursively. That includes the "power" group of the bus device which is then removed by pci_remove_bus().
Created attachment 115761 [details] PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev() So this patch fixes the issue for me without the Mika's patch.
However, I don't have SATA devices down my Thunderbolt link, so the patch from comment #8 is not sufficient to fix the trace from #description.
The following patches are sufficient to make all of the warnings go away without the Mika's patch: https://patchwork.kernel.org/patch/3226081/ https://patchwork.kernel.org/patch/3229651/ (the first one is analogous to the one in comment #8).
(In reply to Rafael J. Wysocki from comment #10) > The following patches are sufficient to make all of the warnings go away > without the Mika's patch: > > https://patchwork.kernel.org/patch/3226081/ > https://patchwork.kernel.org/patch/3229651/ > > (the first one is analogous to the one in comment #8). I still can reproduce this bug with vanilla 3.13.6 kernel: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 5345 at fs/sysfs/group.c:214 device_del+0x3b/0x1b0() sysfs group ffffffff81a62480 not found for kobject 'target7:0:0' Modules linked in: kvm_intel kvm wmi CPU: 2 PID: 5345 Comm: eject Not tainted 3.13.6 #1 Hardware name: ASUS All Series/H87M-PRO, BIOS 0502 04/08/2013 0000000000000009 ffffffff817b8404 ffff880336b6dd58 ffffffff81062f7d ffff8802fc6c5c00 ffff880336b6dda8 ffff8800b8044900 ffff8802e758b188 ffff8800b8044800 ffffffff81062fe7 ffffffff819538e0 0000000000000028 Call Trace: [<ffffffff817b8404>] ? dump_stack+0x49/0x6a [<ffffffff81062f7d>] ? warn_slowpath_common+0x6d/0x90 [<ffffffff81062fe7>] ? warn_slowpath_fmt+0x47/0x50 [<ffffffff815008bb>] ? device_del+0x3b/0x1b0 [<ffffffff8151c8ad>] ? scsi_target_reap_usercontext+0x1d/0x30 [<ffffffff81077d57>] ? execute_in_process_context+0x57/0x60 [<ffffffff8151f86c>] ? scsi_device_dev_release_usercontext+0x16c/0x1b0 [<ffffffff81077d57>] ? execute_in_process_context+0x57/0x60 [<ffffffff81500098>] ? device_release+0x28/0x90 [<ffffffff813c49b3>] ? kobject_cleanup+0x33/0x70 [<ffffffff81522af6>] ? scsi_disk_put+0x26/0x40 [<ffffffff8115b24d>] ? __blkdev_put+0x14d/0x190 [<ffffffff8115bc5c>] ? blkdev_close+0x1c/0x20 [<ffffffff8112a600>] ? __fput+0xb0/0x1f0 [<ffffffff8107bbdf>] ? task_work_run+0x8f/0xd0 [<ffffffff81002901>] ? do_notify_resume+0x61/0x90 [<ffffffff8107bac5>] ? task_work_add+0x45/0x60 [<ffffffff817bf6ea>] ? int_signal+0x12/0x17 ---[ end trace e0d8e994af6f4ede ]---
On Wed, Mar 12, 2014 at 06:27:44AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: That makes sense as the patch is not yet merged into the tree :(
(In reply to Greg Kroah-Hartman from comment #12) > On Wed, Mar 12, 2014 at 06:27:44AM +0000, > bugzilla-daemon@bugzilla.kernel.org wrote: > > That makes sense as the patch is not yet merged into the tree :( If you're talking only about these two: https://patchwork.kernel.org/patch/3226081/ https://patchwork.kernel.org/patch/3229651/ they both have been merged. (I've checked sources myself).
I think I ran into this on 5.4.72: Jan 1 22:02:10 gentoo kernel: mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501) cb_idx mptbase_reply Jan 1 22:02:10 gentoo kernel: ------------[ cut here ]------------ Jan 1 22:02:10 gentoo kernel: sysfs group 'power' not found for kobject 'target1:0:2' Jan 1 22:02:10 gentoo kernel: WARNING: CPU: 1 PID: 27540 at fs/sysfs/group.c:278 sysfs_remove_group+0x74/0x80 Jan 1 22:02:10 gentoo kernel: Modules linked in: mptsas mptscsih mptbase scsi_transport_sas fuse xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables bpfilter br_netfilter bridge overlay raid1 md_mod tun tcp_diag udp_diag inet_diag ipv6 crc_ccitt cfg80211 rfkill 8021q garp mrp stp llc radeon snd_hda_codec_idt snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio gpio_ich iTCO_wdt snd_hda_intel iTCO_vendor_support kvm_intel i2c_algo_bit snd_intel_nhlt ttm ppdev snd_hda_codec kvm drm_kms_helper firewire_ohci snd_hda_core lpc_ich snd_hwdep pcspkr serio_raw i2c_i801 firewire_core irqbypass mfd_core crc_itu_t ftdi_sio snd_pcm drm ata_generic joydev backlight snd_timer coretemp usbserial pata_acpi parport_pc i2c_core snd parport mei_me soundcore mac_hid mei acpi_cpufreq ext4 mbcache jbd2 serpent_sse2_x86_64 serpent_generic dm_crypt dm_mod sd_mod xhci_pci xhci_hcd ahci libahci e1000e sata_sil24 Jan 1 22:02:10 gentoo kernel: pata_marvell uhci_hcd Jan 1 22:02:10 gentoo kernel: CPU: 1 PID: 27540 Comm: kworker/1:0 Not tainted 5.4.72-gentoo-x86_64 #1 Jan 1 22:02:10 gentoo kernel: Hardware name: /DP965LT, BIOS MQ96510J.86A.1761.2009.0326.0001 03/26/2009 Jan 1 22:02:10 gentoo kernel: Workqueue: mpt/0 mptsas_firmware_event_work [mptsas] Jan 1 22:02:10 gentoo kernel: RIP: 0010:sysfs_remove_group+0x74/0x80 Jan 1 22:02:10 gentoo kernel: Code: ff 5b 48 89 ef 5d 41 5c e9 79 be ff ff 48 89 ef e8 01 ba ff ff eb cc 49 8b 14 24 48 8b 33 48 c7 c7 f0 2d 0f b1 e8 5a 38 d2 ff <0f> 0b 5b 5d 41 5c c3 0f 1f 44 00 00 66 66 66 66 90 48 85 f6 74 31 Jan 1 22:02:10 gentoo kernel: RSP: 0018:ffffab98c4133bd8 EFLAGS: 00010286 Jan 1 22:02:10 gentoo kernel: RAX: 0000000000000000 RBX: ffffffffb0edafa0 RCX: 0000000000001e8a Jan 1 22:02:10 gentoo kernel: RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000247 Jan 1 22:02:10 gentoo kernel: RBP: 0000000000000000 R08: 0000000000001e8a R09: 0000000000000001 Jan 1 22:02:10 gentoo kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff93331e859c28 Jan 1 22:02:10 gentoo kernel: R13: ffff933421471010 R14: ffff933421470ff0 R15: ffff933421471000 Jan 1 22:02:10 gentoo kernel: FS: 0000000000000000(0000) GS:ffff933423a80000(0000) knlGS:0000000000000000 Jan 1 22:02:10 gentoo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 1 22:02:10 gentoo kernel: CR2: 00007fe8019a4008 CR3: 000000021d77a000 CR4: 00000000000006e0 Jan 1 22:02:10 gentoo kernel: Call Trace: Jan 1 22:02:10 gentoo kernel: device_del+0x81/0x370 Jan 1 22:02:10 gentoo kernel: ? attribute_container_device_trigger+0xb4/0xf0 Jan 1 22:02:10 gentoo kernel: scsi_target_reap+0x56/0x70 Jan 1 22:02:10 gentoo kernel: scsi_remove_target+0x1a0/0x1c0 Jan 1 22:02:10 gentoo kernel: sas_rphy_remove+0x52/0x60 [scsi_transport_sas] Jan 1 22:02:10 gentoo kernel: sas_port_delete+0x26/0x150 [scsi_transport_sas] Jan 1 22:02:10 gentoo kernel: mptsas_del_end_device.cold+0x139/0x1b4 [mptsas] Jan 1 22:02:10 gentoo kernel: mptsas_hotplug_work+0x467/0x750 [mptsas]