Bug 65281
Summary: | "sysfs group not found for kobject" when removing SCSI device | ||
---|---|---|---|
Product: | File System | Reporter: | Bjorn Helgaas (bjorn) |
Component: | SysFS | Assignee: | Greg Kroah-Hartman (greg) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | normal | CC: | jernej.jakob, rjw, vovan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://lkml.kernel.org/r/1384866598-19716-1-git-send-email-mika.westerberg@linux.intel.com | ||
Kernel Version: | 3.12 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
qemu setup
dmesg from qemu Debug dmesg output from Acer Aspire S5 with a Thunderbolt unplug/replug cycle PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev() |
Description
Bjorn Helgaas
2013-11-20 22:42:17 UTC
Created attachment 115341 [details]
qemu setup
I reproduced this problem on qemu using the attached setup by removing the AHCI controller with
echo 1 > /sys/bus/pci/devices/0000:00:1f.2/remove
Created attachment 115351 [details]
dmesg from qemu
Attaching the dmesg log (including the "sysfs group not found" warnings) from qemu. The callgraph where the warnings come from is below.
scsi_remove_host(shost)
scsi_forget_host(shost)
__scsi_remove_device(sdev)
bsg_unregister_queue(sdev->request_queue)
device_unregister(bcd->class_dev)
device_del
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'"
device_unregister(&sdev->sdev_dev)
device_del
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'"
class_intf->remove_dev # .remove_dev = sg_remove
sg_remove
device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index))
device_unregister
device_del
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject 'sg0'"
device_del(&sdev->sdev_gendev)
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'"
bus_remove_device
device_release_driver
__device_release_driver
sd_remove
device_del(&sdkp->dev)
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject '0:0:0:0'"
del_gendisk(&sdkp->disk)
delete_partition
device_del
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject 'sda5'"
device_remove_attrs
device_remove_groups
sysfs_remove_groups
"sysfs group ffffffff81e3ba60 ('trace') not found for kobject 'sda5'"
blk_unregister_queue(disk)
blk_trace_remove_sysfs
sysfs_remove_group
"sysfs group ffffffff81e3ba60 ('trace') not found for kobject 'sda'"
device_del(disk_to_dev(disk))
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject 'sda'"
device_unregister(&shost->shost_dev)
device_del
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject 'host0'"
device_del(&shost->shost_gendev)
dpm_sysfs_remove
sysfs_remove_group
"sysfs group ffffffff81e70720 ('power') not found for kobject 'host0'"
I wonder what's the output if you printk() pos->s_name above the sysfs_remove_one(&acxt, pos) in __sysfs_remove_dir()? So what happens here is we get device_del() for 0000:00:1f.2 first and that removes all stuff up to when it calls bus_remove_device() which stops the driver and triggers device_del() for ata_device on dev1.0 and then for dev1.0 itself. Next, it does device_del() for ata_link on link1 (which is under 0000:00:1f.2/ata1/) and for link1 itself. Then, it does device_del() for ata_port on ata1 and for ata1 itself (that is, 0000:00:1f.2/ata1/). That descends into the host0 subdirectory and removes it recursively before trying to unregister host0. That removes host0/target0:0:0/0:0:0:0/power among other things. So when it finally goes to delete bsg on 0:0:0:0, it finds that there's no "power" group below the bsg's child also called 0:0:0:0 (whoever designed that subsystem had a sick sense of humor) - because that group has been removed already. So in this particular case there is an ordering problem, because 0000:00:1f.2/ata1/host0/ should have been deleted before 0000:00:1f.2/ata1/. I'm not sure about the Thunderbolt case, though, will look into it tomorrow. In any case I don't see a clean way to fix the above and the Mika's patch seems to be the simplest viable workaround. Created attachment 115751 [details]
Debug dmesg output from Acer Aspire S5 with a Thunderbolt unplug/replug cycle
I've reproduced the issue on an Acer Aspire S5 w/ Thunderbolt and attached is a dmesg output containing a Thunderbolt unplug/replug cycle.
The kernel 3.13-rc1 with a few additional patches on top including this one:
Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -1188,6 +1188,8 @@ void device_del(struct device *dev)
struct device *parent = dev->parent;
struct class_interface *class_intf;
+ dev_err(dev, "%s\n", __func__);
+
/* Notify clients of device removal. This call must come
* before dpm_sysfs_remove().
*/
Index: linux-pm/fs/sysfs/dir.c
===================================================================
--- linux-pm.orig/fs/sysfs/dir.c
+++ linux-pm/fs/sysfs/dir.c
@@ -875,8 +875,10 @@ static void __sysfs_remove(struct sysfs_
do {
pos = next;
next = sysfs_next_descendant_post(pos, sd);
- if (pos)
+ if (pos) {
sysfs_remove_one(acxt, pos);
+ pr_err("%s: %s\n", __func__, pos->s_name);
+ }
} while (next);
}
To my eyes for PCI the problem is that pci_stop_dev() does a device_del() which removes the device's sysfs directories recursively. That includes the "power" group of the bus device which is then removed by pci_remove_bus(). Created attachment 115761 [details]
PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()
So this patch fixes the issue for me without the Mika's patch.
However, I don't have SATA devices down my Thunderbolt link, so the patch from comment #8 is not sufficient to fix the trace from #description. The following patches are sufficient to make all of the warnings go away without the Mika's patch: https://patchwork.kernel.org/patch/3226081/ https://patchwork.kernel.org/patch/3229651/ (the first one is analogous to the one in comment #8). (In reply to Rafael J. Wysocki from comment #10) > The following patches are sufficient to make all of the warnings go away > without the Mika's patch: > > https://patchwork.kernel.org/patch/3226081/ > https://patchwork.kernel.org/patch/3229651/ > > (the first one is analogous to the one in comment #8). I still can reproduce this bug with vanilla 3.13.6 kernel: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 5345 at fs/sysfs/group.c:214 device_del+0x3b/0x1b0() sysfs group ffffffff81a62480 not found for kobject 'target7:0:0' Modules linked in: kvm_intel kvm wmi CPU: 2 PID: 5345 Comm: eject Not tainted 3.13.6 #1 Hardware name: ASUS All Series/H87M-PRO, BIOS 0502 04/08/2013 0000000000000009 ffffffff817b8404 ffff880336b6dd58 ffffffff81062f7d ffff8802fc6c5c00 ffff880336b6dda8 ffff8800b8044900 ffff8802e758b188 ffff8800b8044800 ffffffff81062fe7 ffffffff819538e0 0000000000000028 Call Trace: [<ffffffff817b8404>] ? dump_stack+0x49/0x6a [<ffffffff81062f7d>] ? warn_slowpath_common+0x6d/0x90 [<ffffffff81062fe7>] ? warn_slowpath_fmt+0x47/0x50 [<ffffffff815008bb>] ? device_del+0x3b/0x1b0 [<ffffffff8151c8ad>] ? scsi_target_reap_usercontext+0x1d/0x30 [<ffffffff81077d57>] ? execute_in_process_context+0x57/0x60 [<ffffffff8151f86c>] ? scsi_device_dev_release_usercontext+0x16c/0x1b0 [<ffffffff81077d57>] ? execute_in_process_context+0x57/0x60 [<ffffffff81500098>] ? device_release+0x28/0x90 [<ffffffff813c49b3>] ? kobject_cleanup+0x33/0x70 [<ffffffff81522af6>] ? scsi_disk_put+0x26/0x40 [<ffffffff8115b24d>] ? __blkdev_put+0x14d/0x190 [<ffffffff8115bc5c>] ? blkdev_close+0x1c/0x20 [<ffffffff8112a600>] ? __fput+0xb0/0x1f0 [<ffffffff8107bbdf>] ? task_work_run+0x8f/0xd0 [<ffffffff81002901>] ? do_notify_resume+0x61/0x90 [<ffffffff8107bac5>] ? task_work_add+0x45/0x60 [<ffffffff817bf6ea>] ? int_signal+0x12/0x17 ---[ end trace e0d8e994af6f4ede ]--- On Wed, Mar 12, 2014 at 06:27:44AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: That makes sense as the patch is not yet merged into the tree :( (In reply to Greg Kroah-Hartman from comment #12) > On Wed, Mar 12, 2014 at 06:27:44AM +0000, > bugzilla-daemon@bugzilla.kernel.org wrote: > > That makes sense as the patch is not yet merged into the tree :( If you're talking only about these two: https://patchwork.kernel.org/patch/3226081/ https://patchwork.kernel.org/patch/3229651/ they both have been merged. (I've checked sources myself). I think I ran into this on 5.4.72: Jan 1 22:02:10 gentoo kernel: mptbase: ioc0: LogInfo(0x30030501): Originator={IOP}, Code={Invalid Page}, SubCode(0x0501) cb_idx mptbase_reply Jan 1 22:02:10 gentoo kernel: ------------[ cut here ]------------ Jan 1 22:02:10 gentoo kernel: sysfs group 'power' not found for kobject 'target1:0:2' Jan 1 22:02:10 gentoo kernel: WARNING: CPU: 1 PID: 27540 at fs/sysfs/group.c:278 sysfs_remove_group+0x74/0x80 Jan 1 22:02:10 gentoo kernel: Modules linked in: mptsas mptscsih mptbase scsi_transport_sas fuse xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_tables bpfilter br_netfilter bridge overlay raid1 md_mod tun tcp_diag udp_diag inet_diag ipv6 crc_ccitt cfg80211 rfkill 8021q garp mrp stp llc radeon snd_hda_codec_idt snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio gpio_ich iTCO_wdt snd_hda_intel iTCO_vendor_support kvm_intel i2c_algo_bit snd_intel_nhlt ttm ppdev snd_hda_codec kvm drm_kms_helper firewire_ohci snd_hda_core lpc_ich snd_hwdep pcspkr serio_raw i2c_i801 firewire_core irqbypass mfd_core crc_itu_t ftdi_sio snd_pcm drm ata_generic joydev backlight snd_timer coretemp usbserial pata_acpi parport_pc i2c_core snd parport mei_me soundcore mac_hid mei acpi_cpufreq ext4 mbcache jbd2 serpent_sse2_x86_64 serpent_generic dm_crypt dm_mod sd_mod xhci_pci xhci_hcd ahci libahci e1000e sata_sil24 Jan 1 22:02:10 gentoo kernel: pata_marvell uhci_hcd Jan 1 22:02:10 gentoo kernel: CPU: 1 PID: 27540 Comm: kworker/1:0 Not tainted 5.4.72-gentoo-x86_64 #1 Jan 1 22:02:10 gentoo kernel: Hardware name: /DP965LT, BIOS MQ96510J.86A.1761.2009.0326.0001 03/26/2009 Jan 1 22:02:10 gentoo kernel: Workqueue: mpt/0 mptsas_firmware_event_work [mptsas] Jan 1 22:02:10 gentoo kernel: RIP: 0010:sysfs_remove_group+0x74/0x80 Jan 1 22:02:10 gentoo kernel: Code: ff 5b 48 89 ef 5d 41 5c e9 79 be ff ff 48 89 ef e8 01 ba ff ff eb cc 49 8b 14 24 48 8b 33 48 c7 c7 f0 2d 0f b1 e8 5a 38 d2 ff <0f> 0b 5b 5d 41 5c c3 0f 1f 44 00 00 66 66 66 66 90 48 85 f6 74 31 Jan 1 22:02:10 gentoo kernel: RSP: 0018:ffffab98c4133bd8 EFLAGS: 00010286 Jan 1 22:02:10 gentoo kernel: RAX: 0000000000000000 RBX: ffffffffb0edafa0 RCX: 0000000000001e8a Jan 1 22:02:10 gentoo kernel: RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000247 Jan 1 22:02:10 gentoo kernel: RBP: 0000000000000000 R08: 0000000000001e8a R09: 0000000000000001 Jan 1 22:02:10 gentoo kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff93331e859c28 Jan 1 22:02:10 gentoo kernel: R13: ffff933421471010 R14: ffff933421470ff0 R15: ffff933421471000 Jan 1 22:02:10 gentoo kernel: FS: 0000000000000000(0000) GS:ffff933423a80000(0000) knlGS:0000000000000000 Jan 1 22:02:10 gentoo kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 1 22:02:10 gentoo kernel: CR2: 00007fe8019a4008 CR3: 000000021d77a000 CR4: 00000000000006e0 Jan 1 22:02:10 gentoo kernel: Call Trace: Jan 1 22:02:10 gentoo kernel: device_del+0x81/0x370 Jan 1 22:02:10 gentoo kernel: ? attribute_container_device_trigger+0xb4/0xf0 Jan 1 22:02:10 gentoo kernel: scsi_target_reap+0x56/0x70 Jan 1 22:02:10 gentoo kernel: scsi_remove_target+0x1a0/0x1c0 Jan 1 22:02:10 gentoo kernel: sas_rphy_remove+0x52/0x60 [scsi_transport_sas] Jan 1 22:02:10 gentoo kernel: sas_port_delete+0x26/0x150 [scsi_transport_sas] Jan 1 22:02:10 gentoo kernel: mptsas_del_end_device.cold+0x139/0x1b4 [mptsas] Jan 1 22:02:10 gentoo kernel: mptsas_hotplug_work+0x467/0x750 [mptsas] |