Bug 16215
Summary: | sysfs: cannot create duplicate filename '/class/net/bnep0' | ||
---|---|---|---|
Product: | File System | Reporter: | Janusz Krzysztofik (jkrzyszt) |
Component: | SysFS | Assignee: | Greg Kroah-Hartman (greg) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | akpm, ebiederm, error27, johannes, maciej.rutecki, marcel, rjw |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35-rc1 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 16055 | ||
Attachments: |
example .config
dmesg |
Description
Janusz Krzysztofik
2010-06-15 14:55:49 UTC
You marked it as a regression. Which is the most recent kernel version which didn't have this problem? Thanks. Tuesday 15 June 2010 21:01:24 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=16215 > > > Andrew Morton <akpm@linux-foundation.org> changed: > > What |Removed |Added > --------------------------------------------------------------------------- >- CC| |akpm@linux-foundation.org > > > > > --- Comment #1 from Andrew Morton <akpm@linux-foundation.org> 2010-06-15 > 19:01:20 --- You marked it as a regression. Which is the most recent > kernel version which didn't have this problem? > > Thanks. Hi, 2.6 34 still works for me. Thanks, Janusz Hm... "git log -p --no-merges v2.6.34.. net/bluetooth/bnep/" shows only three patches and none of them look suspicious. Could you post a complete dmesg? And maybe a .config (why not?). Created attachment 26834 [details]
example .config
Created attachment 26835 [details]
dmesg
How I reproduce the problem: I make a PAN conection with "pand --search --service NAP --persist". Then, I break the connection with "pand --kill <bdaddress>". After a few seconds, pand tries to reconnect and exhibits the problem. Thanks, Janusz I have looked into this. I found one bug, but I don't think it's related. This seems like a race condition. Is /sys/class/net/bnep0 there after you see the problem? Does it eventually go away? The locking here is pretty straight forward, we hold rtnl lock when we remove devices and also when we add them... I don't see where it can go wrong. [ 307.300000] Alignment trap: asterisk (1571) PC=0x400b9b54 Instr=0x15840000 Address=0xbe92beab FSR 0x811 [ 367.760000] ------------[ cut here ]------------ I notice that those two are exactly 1 minute apart. Probably they are related. My understanding is that this message indicates a userspace bug. Do you know what's causing that? Is this a memory corruption thing? Did you ever see the asterisk error with 2.6.34? > --- Comment #7 from Dan Carpenter <error27@gmail.com> 2010-06-18 23:52:24 > > This seems like a race condition. Then, a very stable one and 100% reproducible ;). > Is /sys/class/net/bnep0 there after you see the problem? Exactly. > Does it eventually go away? Never before reboot. > The locking here is pretty straight forward, we hold rtnl lock when we > remove devices and also when we add them... I don't see where it can go > wrong. Device removal procedure seems incomplete. As seen from userspace, 'bnep0' disappeares from net device list reported by ip, '[kbnepd bnep0]' thread disappeares from process list reported by ps, connection related sysfs node, /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/, also disappeares from sysfs tree, but a dangling symlink /sys/class/net/bnep0, pointing to just removed /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/bnep0, is left and prevents the bnep0 from being reused. Other, more hardware coupled net device types seem not affected. I was able to test device removal/hotplug procedure successfully for usb based eth (dm9601) and wlan (rt73usb) with wext compatibility. > [ 307.300000] Alignment trap: asterisk (1571) PC=0x400b9b54 > Instr=0x15840000 Address=0xbe92beab FSR 0x811 > [ 367.760000] ------------[ cut here ]------------ > > I notice that those two are exactly 1 minute apart. Probably they are > related. My understanding is that this message indicates a userspace bug. > Do you know what's causing that? Is this a memory corruption thing? > > Did you ever see the asterisk error with 2.6.34? I've been observing this asterisk error message since I installed the packet several months ago. It is displayed on every daemon restart, but TBH I never have any motivation to solve it since it works for me regardless :/. However, by prventing asterisk from starting on boot, I've verified that this bug is by no chance responsible for our dangling /sys/class/net/bnep0 problem. Thanks, Janusz Eric recently implemented "tagged directory" support for /sys/class/net/. Maybe this is something related? I added him to the CC list. bugzilla-daemon@bugzilla.kernel.org writes: > https://bugzilla.kernel.org/show_bug.cgi?id=16215 > > > > > > --- Comment #9 from Dan Carpenter <error27@gmail.com> 2010-06-19 14:01:17 > --- > Eric recently implemented "tagged directory" support for /sys/class/net/. > Maybe this is something related? I added him to the CC list. It is. There is a bug in the driver core that my patch exposes. That path to the network device that was: /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/bnep0 Should have been: /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/net/bnep0 This is the second network device this has hit, and in the other discussion there has not been a viable alternative found to fixing the driver core. Although it has been argued putting class devices under class devices is broken. I will see if I can cook up a patch that will make it through code review of the driver core. Short of that you can confirm I am on the right track by playing with the following patch: Eric --- diff --git a/drivers/base/core.c b/drivers/base/core.c index 9630fbd..3725f81 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -673,7 +673,7 @@ static struct kobject *get_device_parent(struct device *dev, */ if (parent == NULL) parent_kobj = virtual_device_parent(dev); - else if (parent->class) + else if (parent->class == dev->class) return &parent->kobj; else parent_kobj = &parent->kobj; > --- Comment #10 from Eric W. Biederman <ebiederm@xmission.com> 2010-06-20 > 02:05:41 --- > > That path to the network device that was: > /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/bnep0 > > Should have been: > /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/net/bnep0 Eric, With your patch applied, the path to the bnep0 network device is correct now. This solves my problem of not being able to restore a broken PAN connection. Thanks, Janusz Handled-By : Eric W. Biederman <ebiederm@xmission.com> Patch : https://bugzilla.kernel.org/show_bug.cgi?id=16215#c10 *** Bug 16257 has been marked as a duplicate of this bug. *** Ignore-Patch : https://bugzilla.kernel.org/show_bug.cgi?id=16215#c10 Fixed by commit 24b1442d01ae155ea716dfb94ed21605541c317d . |