Bug 16215 - sysfs: cannot create duplicate filename '/class/net/bnep0'
sysfs: cannot create duplicate filename '/class/net/bnep0'
Status: CLOSED CODE_FIX
Product: File System
Classification: Unclassified
Component: SysFS
All Linux
: P1 high
Assigned To: Greg Kroah-Hartman
:
: 16257 (view as bug list)
Depends on:
Blocks: 16055
  Show dependency treegraph
 
Reported: 2010-06-15 14:55 UTC by Janusz Krzysztofik
Modified: 2010-08-01 21:37 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.35-rc1
Tree: Mainline
Regression: Yes


Attachments
example .config (42.88 KB, text/plain)
2010-06-18 00:49 UTC, Janusz Krzysztofik
Details
dmesg (15.75 KB, text/plain)
2010-06-18 00:50 UTC, Janusz Krzysztofik
Details

Description Janusz Krzysztofik 2010-06-15 14:55:49 UTC
After a PAN connecton breaks, sysfs entry seems not cleaned up so it's no longer possible to create a new connection.

[  361.930000] ------------[ cut here ]------------
[  361.950000] WARNING: at fs/sysfs/dir.c:451 sysfs_add_one+0x68/0x94()
[  361.950000] sysfs: cannot create duplicate filename '/class/net/bnep0'
[  361.960000] Modules linked in:
[  361.980000] [<c00286d8>] (unwind_backtrace+0x0/0x164) from [<c003a554>] (warn_slowpath_common+0x48/0x60)
[  362.000000] [<c003a554>] (warn_slowpath_common+0x48/0x60) from [<c003a600>] (warn_slowpath_fmt+0x30/0x40)
[  362.010000] [<c003a600>] (warn_slowpath_fmt+0x30/0x40) from [<c00cf958>] (sysfs_add_one+0x68/0x94)
[  362.030000] [<c00cf958>] (sysfs_add_one+0x68/0x94) from [<c00d0b94>] (sysfs_do_create_link+0x11c/0x1c8)
[  362.050000] [<c00d0b94>] (sysfs_do_create_link+0x11c/0x1c8) from [<c0175268>] (device_add+0x1e8/0x4ec)
[  362.060000] [<c0175268>] (device_add+0x1e8/0x4ec) from [<c022f6e4>] (register_netdevice+0x18c/0x28c)
[  362.080000] [<c022f6e4>] (register_netdevice+0x18c/0x28c) from [<c022f824>] (register_netdev+0x40/0x54)
[  362.100000] [<c022f824>] (register_netdev+0x40/0x54) from [<c029e018>] (bnep_add_connection+0x1e8/0x2b0)
[  362.110000] [<c029e018>] (bnep_add_connection+0x1e8/0x2b0) from [<c029ee58>] (bnep_sock_ioctl+0x11c/0x350)
[  362.130000] [<c029ee58>] (bnep_sock_ioctl+0x11c/0x350) from [<c021eff4>] (sock_ioctl+0x1fc/0x258)
[  362.130000] [<c021eff4>] (sock_ioctl+0x1fc/0x258) from [<c0097df4>] (vfs_ioctl+0x30/0xb0)
[  362.160000] [<c0097df4>] (vfs_ioctl+0x30/0xb0) from [<c00984c4>] (do_vfs_ioctl+0x540/0x590)
[  362.160000] [<c00984c4>] (do_vfs_ioctl+0x540/0x590) from [<c009854c>] (sys_ioctl+0x38/0x58)
[  362.180000] [<c009854c>] (sys_ioctl+0x38/0x58) from [<c0022ea0>] (ret_fast_syscall+0x0/0x2c)
[  362.200000] ---[ end trace 3568a713ba0db0af ]---
Comment 1 Andrew Morton 2010-06-15 19:01:20 UTC
You marked it as a regression.  Which is the most recent kernel version which didn't have this problem?

Thanks.
Comment 2 Janusz Krzysztofik 2010-06-15 22:48:11 UTC
Tuesday 15 June 2010 21:01:24 bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=16215
>
>
> Andrew Morton <akpm@linux-foundation.org> changed:
>
>            What    |Removed                     |Added
> ---------------------------------------------------------------------------
>- CC|                            |akpm@linux-foundation.org
>
>
>
>
> --- Comment #1 from Andrew Morton <akpm@linux-foundation.org>  2010-06-15
> 19:01:20 --- You marked it as a regression.  Which is the most recent
> kernel version which didn't have this problem?
>
> Thanks.

Hi,
2.6 34 still works for me.

Thanks,
Janusz
Comment 3 Dan Carpenter 2010-06-17 11:36:30 UTC
Hm... "git log -p --no-merges v2.6.34.. net/bluetooth/bnep/" shows only three patches and none of them look suspicious.

Could you post a complete dmesg?  And maybe a .config (why not?).
Comment 4 Janusz Krzysztofik 2010-06-18 00:49:15 UTC
Created attachment 26834 [details]
example .config
Comment 5 Janusz Krzysztofik 2010-06-18 00:50:23 UTC
Created attachment 26835 [details]
dmesg
Comment 6 Janusz Krzysztofik 2010-06-18 01:02:33 UTC
How I reproduce the problem:
I make a PAN conection with "pand --search --service NAP --persist". Then, I break the connection with "pand --kill <bdaddress>". After a few seconds, pand tries to reconnect and exhibits the problem.

Thanks,
Janusz
Comment 7 Dan Carpenter 2010-06-18 23:52:24 UTC
I have looked into this.  I found one bug, but I don't think it's related.

This seems like a race condition.  Is /sys/class/net/bnep0 there after you see the problem?  Does it eventually go away?

The locking here is pretty straight forward, we hold rtnl lock when we remove devices and also when we add them...  I don't see where it can go wrong.

[  307.300000] Alignment trap: asterisk (1571) PC=0x400b9b54 Instr=0x15840000 Address=0xbe92beab FSR 0x811
[  367.760000] ------------[ cut here ]------------

I notice that those two are exactly 1 minute apart.  Probably they are related.  My understanding is that this message indicates a userspace bug.  Do you know what's causing that?  Is this a memory corruption thing?

Did you ever see the asterisk error with 2.6.34?
Comment 8 Janusz Krzysztofik 2010-06-19 12:08:45 UTC
> --- Comment #7 from Dan Carpenter <error27@gmail.com>  2010-06-18 23:52:24
>
> This seems like a race condition.  

Then, a very stable one and 100% reproducible ;).

> Is /sys/class/net/bnep0 there after you see the problem?  

Exactly.

> Does it eventually go away? 

Never before reboot.

> The locking here is pretty straight forward, we hold rtnl lock when we
> remove devices and also when we add them...  I don't see where it can go 
> wrong.

Device removal procedure seems incomplete. As seen from userspace, 'bnep0' 
disappeares from net device list reported by ip, '[kbnepd bnep0]' thread 
disappeares from process list reported by ps, connection related sysfs node, 
/sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/,
also disappeares from sysfs tree, but a dangling symlink /sys/class/net/bnep0, 
pointing to just removed
/sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/bnep0, 
is left and prevents the bnep0 from being reused.

Other, more hardware coupled net device types seem not affected. I was able to 
test device removal/hotplug procedure successfully for usb based eth (dm9601) 
and wlan (rt73usb) with wext compatibility.

> [  307.300000] Alignment trap: asterisk (1571) PC=0x400b9b54
> Instr=0x15840000 Address=0xbe92beab FSR 0x811
> [  367.760000] ------------[ cut here ]------------
>
> I notice that those two are exactly 1 minute apart.  Probably they are
> related. My understanding is that this message indicates a userspace bug. 
> Do you know what's causing that?  Is this a memory corruption thing?
>
> Did you ever see the asterisk error with 2.6.34?

I've been observing this asterisk error message since I installed the packet 
several months ago. It is displayed on every daemon restart, but TBH I never 
have any motivation to solve it since it works for me regardless :/.

However, by prventing asterisk from starting on boot, I've verified that this 
bug is by no chance responsible for our dangling /sys/class/net/bnep0 
problem.

Thanks,
Janusz
Comment 9 Dan Carpenter 2010-06-19 14:01:17 UTC
Eric recently implemented "tagged directory" support for /sys/class/net/.  Maybe this is something related?  I added him to the CC list.
Comment 10 Eric W. Biederman 2010-06-20 02:05:41 UTC
bugzilla-daemon@bugzilla.kernel.org writes:

> https://bugzilla.kernel.org/show_bug.cgi?id=16215
>
>
>
>
>
> --- Comment #9 from Dan Carpenter <error27@gmail.com>  2010-06-19 14:01:17 ---
> Eric recently implemented "tagged directory" support for /sys/class/net/. 
> Maybe this is something related?  I added him to the CC list.

It is.  There is a bug in the driver core that my patch exposes.

That path to the network device that was:
/sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/bnep0

Should have been:
/sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/net/bnep0

This is the second network device this has hit, and in the other discussion
there has not been a viable alternative found to fixing the driver core.
Although it has been argued putting class devices under class devices is
broken.

I will see if I can cook up a patch that will make it through code review
of the driver core.

Short of that you can confirm I am on the right track by playing with
the following patch:

Eric


---

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 9630fbd..3725f81 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -673,7 +673,7 @@ static struct kobject *get_device_parent(struct device *dev,
 		 */
 		if (parent == NULL)
 			parent_kobj = virtual_device_parent(dev);
-		else if (parent->class)
+		else if (parent->class == dev->class)
 			return &parent->kobj;
 		else
 			parent_kobj = &parent->kobj;
Comment 11 Janusz Krzysztofik 2010-06-20 11:20:17 UTC
> --- Comment #10 from Eric W. Biederman <ebiederm@xmission.com>  2010-06-20
> 02:05:41 ---
>
> That path to the network device that was:
> /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/bnep0
>
> Should have been:
> /sys/devices/platform/ohci/usb1/1-1/1-1:1.0/bluetooth/hci0/hci0:42/net/bnep0

Eric,
With your patch applied, the path to the bnep0 network device is correct now. 
This solves my problem of not being able to restore a broken PAN connection.

Thanks,
Janusz
Comment 12 Rafael J. Wysocki 2010-06-20 12:27:17 UTC
Handled-By : Eric W. Biederman <ebiederm@xmission.com>
Patch : https://bugzilla.kernel.org/show_bug.cgi?id=16215#c10
Comment 13 Rafael J. Wysocki 2010-07-24 22:45:48 UTC
*** Bug 16257 has been marked as a duplicate of this bug. ***
Comment 14 Rafael J. Wysocki 2010-08-01 13:40:58 UTC
Ignore-Patch : https://bugzilla.kernel.org/show_bug.cgi?id=16215#c10
Comment 15 Rafael J. Wysocki 2010-08-01 21:37:28 UTC
Fixed by commit 24b1442d01ae155ea716dfb94ed21605541c317d .

Note You need to log in before you can comment on or make changes to this bug.