Bug 7952

Summary: (Patch queued)slattach only works every other time
Product: Networking Reporter: Martin Fuzzey (mfuzzey)
Component: OtherAssignee: Alexey Dobriyan (adobriyan)
Status: RESOLVED CODE_FIX    
Severity: normal CC: adobriyan, alan, ben, jgarzik, protasnb
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: No Bisected commit-id:

Description Martin Fuzzey 2007-02-06 12:52:36 UTC
Most recent kernel where this bug did *NOT* occur:2.6.16
Distribution:Debian Etch
Hardware Environment: x86 with serial port (including qemu)
Software Environment: slattach
Problem Description:

First time slattach is run to set up a SLIP line all is ok.
If slattach process is then killed and restarted it fails with message:
SLIP_set_disc(1): File exists
Problem still occurs in 2.6.20rc6 kernel

dmesg shows:
object_add failed for sl0 with -EEXIST, don't try to register things 
with the same name in the same directory.
 [<c01b7b54>] kobject_add+0x147/0x16d
 [<c0211209>] class_device_add+0x9d/0x3b3
 [<c022829d>] register_netdevice+0x21a/0x2d0
 [<c8903213>] slip_open+0x3a1/0x4e2 [slip]
 [<c01fc709>] tty_ioctl+0x922/0xbac
 [<c0117778>] default_wake_function+0x0/0xc
 [<c01fd4bd>] n_tty_open+0x0/0x88
 [<c01fd498>] n_tty_close+0x0/0x25
 [<c01fd3ae>] n_tty_flush_buffer+0x0/0x3b
 [<c01fd2ab>] n_tty_chars_in_buffer+0x0/0x5b
 [<c01fe5eb>] read_chan+0x0/0x551
 [<c01fd545>] write_chan+0x0/0x294
 [<c01fef41>] n_tty_ioctl+0x0/0x40d
 [<c01fd082>] n_tty_set_termios+0x0/0x1cc
 [<c01fe4d2>] normal_poll+0x0/0x119
 [<c01fd7d9>] n_tty_receive_buf+0x0/0xcf9
 [<c01fd24e>] n_tty_write_wakeup+0x0/0x27
 [<c88b2474>] parport_pc_interrupt+0x1a/0x42 [parport_pc]
 [<c013fb83>] handle_IRQ_event+0x23/0x49
 [<c013fc5c>] __do_IRQ+0xb3/0xe8
 [<c013fc80>] __do_IRQ+0xd7/0xe8
 [<c0130170>] hrtimer_run_queues+0xcf/0x157
 [<c016937b>] do_ioctl+0x47/0x5d
 [<c01695db>] vfs_ioctl+0x24a/0x25c
 [<c0121c58>] tasklet_action+0x55/0xaf
 [<c0169635>] sys_ioctl+0x48/0x5f
 [<c0102c11>] sysenter_past_esp+0x56/0x79


Steps to reproduce:
(requires a serial port but nothing needs to be attached to it):
# slattach -L -vd -p slip -s 115200 /dev/ttyS0
slattach: tty_open: looking for lock
slattach: tty_open: trying to open /dev/ttyS0
slattach: tty_open: /dev/ttyS0 (fd=3) slattach: tty_set_speed: 115200
slattach: tty_set_databits: 8
slattach: tty_set_stopbits: 1
slattach: tty_set_parity: N
slip started on /dev/ttyS0 interface sl0

Above is OK, now kill process with CTRL-C

slattach: tty_set_speed: 0
# slattach -L -vd -p slip -s 115200 /dev/ttyS0
slattach: tty_open: looking for lock
slattach: tty_open: trying to open /dev/ttyS0
slattach: tty_open: /dev/ttyS0 (fd=3) slattach: tty_set_speed: 115200
slattach: tty_set_databits: 8
slattach: tty_set_stopbits: 1
slattach: tty_set_parity: N
SLIP_set_disc(1): File exists

# dmesg
Gives trace shown above

I believe this is called by this changeset :
http://www2.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commit;h=b17a7c179dd3ce7d04373fddf660eda21efc9db9

Regards,

Martin
Comment 1 Anonymous Emailer 2007-02-06 13:48:42 UTC
Reply-To: akpm@linux-foundation.org



Begin forwarded message:

Date: Tue, 6 Feb 2007 13:01:55 -0800
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 7952] New: slattach only works every other time


http://bugzilla.kernel.org/show_bug.cgi?id=7952

           Summary: slattach only works every other time
    Kernel Version: 2.6.18
            Status: NEW
          Severity: normal
             Owner: acme@conectiva.com.br
         Submitter: mfuzzey@mailclub.net


Most recent kernel where this bug did *NOT* occur:2.6.16
Distribution:Debian Etch
Hardware Environment: x86 with serial port (including qemu)
Software Environment: slattach
Problem Description:

First time slattach is run to set up a SLIP line all is ok.
If slattach process is then killed and restarted it fails with message:
SLIP_set_disc(1): File exists
Problem still occurs in 2.6.20rc6 kernel

dmesg shows:
object_add failed for sl0 with -EEXIST, don't try to register things 
with the same name in the same directory.
 [<c01b7b54>] kobject_add+0x147/0x16d
 [<c0211209>] class_device_add+0x9d/0x3b3
 [<c022829d>] register_netdevice+0x21a/0x2d0
 [<c8903213>] slip_open+0x3a1/0x4e2 [slip]
 [<c01fc709>] tty_ioctl+0x922/0xbac
 [<c0117778>] default_wake_function+0x0/0xc
 [<c01fd4bd>] n_tty_open+0x0/0x88
 [<c01fd498>] n_tty_close+0x0/0x25
 [<c01fd3ae>] n_tty_flush_buffer+0x0/0x3b
 [<c01fd2ab>] n_tty_chars_in_buffer+0x0/0x5b
 [<c01fe5eb>] read_chan+0x0/0x551
 [<c01fd545>] write_chan+0x0/0x294
 [<c01fef41>] n_tty_ioctl+0x0/0x40d
 [<c01fd082>] n_tty_set_termios+0x0/0x1cc
 [<c01fe4d2>] normal_poll+0x0/0x119
 [<c01fd7d9>] n_tty_receive_buf+0x0/0xcf9
 [<c01fd24e>] n_tty_write_wakeup+0x0/0x27
 [<c88b2474>] parport_pc_interrupt+0x1a/0x42 [parport_pc]
 [<c013fb83>] handle_IRQ_event+0x23/0x49
 [<c013fc5c>] __do_IRQ+0xb3/0xe8
 [<c013fc80>] __do_IRQ+0xd7/0xe8
 [<c0130170>] hrtimer_run_queues+0xcf/0x157
 [<c016937b>] do_ioctl+0x47/0x5d
 [<c01695db>] vfs_ioctl+0x24a/0x25c
 [<c0121c58>] tasklet_action+0x55/0xaf
 [<c0169635>] sys_ioctl+0x48/0x5f
 [<c0102c11>] sysenter_past_esp+0x56/0x79


Steps to reproduce:
(requires a serial port but nothing needs to be attached to it):
# slattach -L -vd -p slip -s 115200 /dev/ttyS0
slattach: tty_open: looking for lock
slattach: tty_open: trying to open /dev/ttyS0
slattach: tty_open: /dev/ttyS0 (fd=3) slattach: tty_set_speed: 115200
slattach: tty_set_databits: 8
slattach: tty_set_stopbits: 1
slattach: tty_set_parity: N
slip started on /dev/ttyS0 interface sl0

Above is OK, now kill process with CTRL-C

slattach: tty_set_speed: 0
# slattach -L -vd -p slip -s 115200 /dev/ttyS0
slattach: tty_open: looking for lock
slattach: tty_open: trying to open /dev/ttyS0
slattach: tty_open: /dev/ttyS0 (fd=3) slattach: tty_set_speed: 115200
slattach: tty_set_databits: 8
slattach: tty_set_stopbits: 1
slattach: tty_set_parity: N
SLIP_set_disc(1): File exists

# dmesg
Gives trace shown above

I believe this is called by this changeset :
http://www2.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commit;h=b17a7c179dd3ce7d04373fddf660eda21efc9db9

Regards,

Martin

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 2 Jarek Poplawski 2007-02-12 00:23:18 UTC
On 06-02-2007 22:57, Andrew Morton wrote:
...
> First time slattach is run to set up a SLIP line all is ok.
> If slattach process is then killed and restarted it fails with message:
> SLIP_set_disc(1): File exists
> Problem still occurs in 2.6.20rc6 kernel
> 
> dmesg shows:
> object_add failed for sl0 with -EEXIST, don't try to register things 
> with the same name in the same directory.
>  [<c01b7b54>] kobject_add+0x147/0x16d
>  [<c0211209>] class_device_add+0x9d/0x3b3
>  [<c022829d>] register_netdevice+0x21a/0x2d0
>  [<c8903213>] slip_open+0x3a1/0x4e2 [slip]
>  [<c01fc709>] tty_ioctl+0x922/0xbac
... 
> Steps to reproduce:
> (requires a serial port but nothing needs to be attached to it):
> # slattach -L -vd -p slip -s 115200 /dev/ttyS0
...
> slip started on /dev/ttyS0 interface sl0
> 
> Above is OK, now kill process with CTRL-C
> 
> slattach: tty_set_speed: 0
> # slattach -L -vd -p slip -s 115200 /dev/ttyS0
...
> SLIP_set_disc(1): File exists
...
> I believe this is called by this changeset :
> http://www2.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commit;h=b17a7c179dd3ce7d04373fddf660eda21efc9db9

I think Martin is probably right here.

It would be useful to check if time has anything to do
with this and wait longer (e.g. >= 1 min.) before the
second slattach.

Anyway, even if there is some other reason, the above
trace shows (IMHO) some inconsistency in register/
unregister_netdevice: if class_device_add is reached
it means the name is valid (so was unregistered) and
EEXIST from netdev_register_sysfs is wrong about the
state of this device. So maybe there should be some
warning plus some delayed action instead of register
cancelled?

Regards,
Jarek P. 

Comment 3 Anonymous Emailer 2007-02-12 09:36:05 UTC
Reply-To: shemminger@linux-foundation.org

On Mon, 12 Feb 2007 09:36:09 +0100
Jarek Poplawski <jarkao2@o2.pl> wrote:

> On 06-02-2007 22:57, Andrew Morton wrote:
> ...
> > First time slattach is run to set up a SLIP line all is ok.
> > If slattach process is then killed and restarted it fails with message:
> > SLIP_set_disc(1): File exists
> > Problem still occurs in 2.6.20rc6 kernel
> > 
> > dmesg shows:
> > object_add failed for sl0 with -EEXIST, don't try to register things 
> > with the same name in the same directory.
> >  [<c01b7b54>] kobject_add+0x147/0x16d
> >  [<c0211209>] class_device_add+0x9d/0x3b3
> >  [<c022829d>] register_netdevice+0x21a/0x2d0
> >  [<c8903213>] slip_open+0x3a1/0x4e2 [slip]
> >  [<c01fc709>] tty_ioctl+0x922/0xbac
> ... 
> > Steps to reproduce:
> > (requires a serial port but nothing needs to be attached to it):
> > # slattach -L -vd -p slip -s 115200 /dev/ttyS0
> ...
> > slip started on /dev/ttyS0 interface sl0
> > 
> > Above is OK, now kill process with CTRL-C
> > 
> > slattach: tty_set_speed: 0
> > # slattach -L -vd -p slip -s 115200 /dev/ttyS0
> ...
> > SLIP_set_disc(1): File exists
> ...
> > I believe this is called by this changeset :
> > http://www2.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.17.y.git;a=commit;h=b17a7c179dd3ce7d04373fddf660eda21efc9db9
> 
> I think Martin is probably right here.
> 
> It would be useful to check if time has anything to do
> with this and wait longer (e.g. >= 1 min.) before the
> second slattach.
> 
> Anyway, even if there is some other reason, the above
> trace shows (IMHO) some inconsistency in register/
> unregister_netdevice: if class_device_add is reached
> it means the name is valid (so was unregistered) and
> EEXIST from netdev_register_sysfs is wrong about the
> state of this device. So maybe there should be some
> warning plus some delayed action instead of register
> cancelled?
> 
> Regards,
> Jarek P. 

The problem is that the code in sl_alloc() tries to clear out
an net device by calling unregister_netdevice(), the device won't
actually disappear until after rtnl_unlock. This whole idea of
searching for unused devices is racy crap and needs to go.

Comment 4 Natalie Protasevich 2007-07-07 19:20:26 UTC
Any updates on this problem?
Thanks.
Comment 5 Alan 2009-05-21 16:10:48 UTC
Rewritten the relevant code to use the destructor hooks in the modern kernel - seem to cure it and clean the code up a huge amount
Comment 6 Ben Hutchings 2009-05-31 23:17:30 UTC
(In reply to comment #5)
> Rewritten the relevant code to use the destructor hooks in the modern kernel
> -
> seem to cure it and clean the code up a huge amount

What is the state of these changes?  I don't see them in net-next-2.6 or posted to any mailing list.
Comment 7 Alan 2009-06-01 07:08:12 UTC
They are in the ttydev tree
Comment 8 Ben Hutchings 2009-06-01 10:06:14 UTC
(In reply to comment #7)
> They are in the ttydev tree

That's surprisingly hard to find (Google seems to have forgotten you), but I've got it now, thanks:

http://www.linux.org.uk/~alan/ttydev/net-slip-ttyfix
Comment 9 Alexey Dobriyan 2009-11-15 17:33:42 UTC
commit 5342b77c4123ba39f911d92a813295fb3bb21f69
"slip: Clean up create and destroy"