Bug 4615 - Modem connection stalls out.
Summary: Modem connection stalls out.
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Serial (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: Russell King
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-05-11 17:13 UTC by Alan Grimes
Modified: 2006-05-03 23:29 UTC (History)
0 users

See Also:
Kernel Version: 2.6.0 through 2.6.12-rc5 (all inclusive)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
The boring DMESG output. (12.94 KB, text/plain)
2005-06-06 08:26 UTC, Alan Grimes
Details

Description Alan Grimes 2005-05-11 17:13:51 UTC
Distribution: Gentoo
Hardware Environment: Dual K7.
Software Environment: Wvdial/pppd
Problem Description: 

My dialup internet connection will simply STALL....

If I ping the terminal server on the other end of the ppp connection, it will
usually come in at around 103 msec, or no higher than 5,000 msec when I'm
surfing the net. However, with no output to the xterm that is running the
dialer, the connection will simply stop cold and Ping will report that it has no
buffer space. The stalling is frequently associated with attempting to go to
certain websites with no obvious pattern. In general, uploading any
non-negilgable ammount of data tends to break it such as an e-mail over 8k or
so. On one occasion it actually recovered from this state (it usually doesn't),
and ping reported a time of a little over 51 seconds for the first return packets...

This bug was reported earlier. I submitted a patch through bugzilla which I
thought had solved the problem, well the problem is back. =( The patch does seem
to reduce the frequency and severity of the problem somewhat but it's still
there. As I mentioned when I closed the earlier bug report, I'd be back if it
comes up again... 

The dialer program, Wvdial, behaves strangely too. 

#########################################

leenooks ~ # wvdial
--> WvDial: Internet dialer version 1.54.0
--> Initializing modem.
--> Sending: ATZ
ATZ
OK
--> Sending: AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=50
AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=50
OK
--> Modem initialized.
--> Sending: ATDT7038298111
--> Waiting for carrier.
ATDT7038298111
CONNECT 48000
--> Carrier detected.  Waiting for prompt.
** Ascend TNT2.LNHVA.MD.RCN.NET Terminal Server **
Login: 
Login: 
--> Looks like a login prompt.
--> Sending: alangrimes
alangrimes
Password: 
--> Looks like a password prompt.
--> Sending: (password)
    Entering PPP Session.
    IP address is 66.44.57.81
    MTU is 1006.
--> Looks like a welcome message.
--> Starting pppd at Wed May 11 14:16:50 2005
--> pid of pppd: 21778
--> Using interface ppp0
--> Disconnecting at Wed May 11 14:17:13 2005
--> The PPP daemon has died: A modem hung up the phone (exit code = 16)
--> man pppd explains pppd error codes in more detail.
--> Try again and look into /var/log/messages and the wvdial and pppd man pages
for more information.
--> Auto Reconnect will be attempted in 5 seconds
--> Initializing modem.
--> Sending: ATZ
NO CARRIER
--> Sending: ATQ0
ATQ0
OK
--> Re-Sending: ATZ
ATZ
OK
--> Initializing modem.
--> Sending: ATZ
ATZ
OK
--> Sending: AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=50
AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=50
OK
--> Modem initialized.
--> Initializing modem.
--> Sending: ATZ
ATZ
OK
--> Sending: AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=50
AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=50
OK
--> Modem initialized.
--> Sending: ATDT7038298111
--> Waiting for carrier.
ATDT7038298111
CONNECT 49333
--> Carrier detected.  Waiting for prompt.
** Ascend TNT3.LNHVA.MD.RCN.NET Terminal Server **
Login: 
--> Looks like a login prompt.
--> Sending: alangrimes
alangrimes
Password: 
--> Looks like a password prompt.
--> Sending: (password)
    Entering PPP Session.
    IP address is 66.44.58.13
    MTU is 1006.
--> Looks like a welcome message.
--> Starting pppd at Wed May 11 14:17:48 2005
--> pid of pppd: 21786
--> Using interface ppp0
--> local  IP address 66.44.58.13
--> remote IP address 10.65.28.28
--> primary   DNS address 207.172.3.10
--> secondary DNS address 207.172.3.11
Caught signal #2!  Attempting to exit gracefully...    
--> Terminating on signal 15
--> Connect time 0.8 minutes.

#######################################

Ofcourse, the reason I sent it the break signal was because the connection had
stalled...

Ofcourse this could be a hardware problem but then linux offers utterly no tools
that could possibly help me determine wheather that is actually the case. 

Steps to reproduce:

Just try to use Linux version 2.6.x with an external serial modem. =\
Comment 1 Alan Grimes 2005-06-05 19:52:37 UTC
The bug is reproducable in 2.6.12-rc5

The modem will stall out as soon as you put any non-trivial load on it, eg
attempting to pull a medium-large sized package from CVS or visiting a webpage
with many graphics...

This bug is seriously diminishing my quality of life. I'm going to have to
upgrade to Windows XP soon if this isn't fixed. 

The only other part of the system that exhibits a similar failure mode is OpenGL
graphics via DRI... (I have an ATI R128 ) 

Games will stall out randomly for multiples of exactly ten seconds. It is
possible that whatever kernel plumbing is shared between the serial port and the
DRI infrastructure is the culprit. 

Rarely, the modem will hang from anywhere from 17 to 51 seconds as recorded by
ping, using the terminal server at my ISP as the other end... (normal ping time
is 0.1 seconds..) 
Comment 2 Russell King 2005-06-06 01:37:45 UTC
Please attach the entire kernel messages - preferably the output of dmesg
after the problem has been noticed.
Comment 3 Alan Grimes 2005-06-06 08:26:15 UTC
Created attachment 5133 [details]
The boring DMESG output. 

Earlier kernel versions, around 2.6.6 and earlier used to panic at the time of
the modem stall... Kernels 2.6.10 and later have been completely silent about
any issues.
Comment 4 Alan Grimes 2005-06-06 09:19:46 UTC
The only reliable feedback I get from the bug, asside from its obvious symptoms,
is through ping... 

Here is a typical output: 

64 bytes from 10.65.28.26: icmp_seq=296 ttl=255 time=2552 ms
64 bytes from 10.65.28.26: icmp_seq=297 ttl=255 time=1561 ms
64 bytes from 10.65.28.26: icmp_seq=298 ttl=255 time=567 ms
64 bytes from 10.65.28.26: icmp_seq=299 ttl=255 time=137 ms
64 bytes from 10.65.28.26: icmp_seq=300 ttl=255 time=484 ms  # Hmm, exactly 5 
##  minutes, though I've seen it quit after only 10 seconds...) 
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
## It would have continued repeating this message indefinitely... 
## Note: many more iterations have been removed from this report!!! =P  

## Below is what happens when I manually disconnect the modem by sending 
## the break signal to the dialer. 
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable

--- 10.65.28.26 ping statistics ---
337 packets transmitted, 300 received, 10% packet loss, time 35529
5ms
rtt min/avg/max/mdev = 122.973/687.037/6298.163/1093.624 ms, pipe 
7
###################################

After power cycling the modem here's what the dialer does: 
leenooks ~ # wvdial
--> WvDial: Internet dialer version 1.54.0
--> Initializing modem.
--> Sending: ATZ
--> Sending: ATQ0
--> Re-Sending: ATZ 

### the dialer is hung and will report that the modem is not responding in a 
### few seconds...

### At this point I could ctrl-break the dialer and try again, 
### However, this would be entirely unproductive as I'd get the same mesage 
### each and every time.

### Only by allowing it to complete its cycle, will it return the modem to 
### functionality. I suspect that the dialer sends an IOCTL or something to the
### driver which clears the fault... 

--> Modem not responding.
leenooks ~ # wvdial
--> WvDial: Internet dialer version 1.54.0
--> Initializing modem.
--> Sending: ATZ
ATZ
OK
--> Sending: AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=60
AT&F&D2&C1X4V1Q0S7=70W2\N3&K3S11=60
OK
--> Modem initialized.
--> Sending: ATDT7038298111
--> Waiting for carrier.
ATDT7038298111
CONNECT 49333
--> Carrier detected.  Waiting for prompt.
** Ascend TNT2.LNHVA.MD.RCN.NET Terminal Server **
Login: 
--> Looks like a login prompt.
--> Sending: alangrimes
alangrimes
Password: 
--> Looks like a password prompt.
--> Sending: (password)
    Entering PPP Session.
    IP address is 66.44.56.212
    MTU is 1006.
--> Looks like a welcome message.
--> Starting pppd at Tue Jun  7 04:58:30 2005
--> pid of pppd: 19733
--> Using interface ppp0
--> local  IP address 66.44.56.212
--> remote IP address 10.65.28.27
--> primary   DNS address 207.172.3.10
--> secondary DNS address 207.172.3.11
Comment 5 Alan Grimes 2005-06-07 00:07:46 UTC
Having been utterly fed up with Linux, I plugged the same modem into the back of my Athlon 800 running BeOS. 

THE DAMN THING WORKS LIKE A DREAM!!!! 

I can load three websites at once and download at the same time and the modem is ROCK SOLID!!!

My fone line isn't perfect and it will drop on occasion (loss of carrier) but host--modem communication is not an 
issue at all. 

and oh god, is this 5 year old dialer program pure bliss!!! 

I don't have to muck about in /etc...
I don't have to su to root.
It connects with barely two clicks on my trackball.. 
It provides realtimes statistics in a pleasant, easy-to-read GUI... 
it connects IMMEDIATELY, on the first shot. 

If only I had a Javascript/CSS web-browser and I'd never need to go back... =P
Comment 6 Russell King 2005-06-07 00:16:44 UTC
TBH I've no idea what's going on here.  I don't think its a serial problem
though, since serial will not cause the sendmsg errors.  I've asked the
networking folk to comment, and they've come back with:

> It's probably not running low of system memory.  However, it might
> be running out of things such as routing cache entries.
> 
> Check the dmesg output, it might have a clue on what went wrong.

Sorry.
Comment 7 Russell King 2005-06-07 00:20:19 UTC
In addition, I notice that you say older kernels paniced, though you
don't say how.  Do you really expect me to guess what the panic was?
Comment 8 Russell King 2005-06-07 00:30:11 UTC
Bug author has lost interest in Linux.
Comment 9 Alan Grimes 2005-06-30 19:40:10 UTC
I just received a link to my earlier bug-report, # 1893.

Of interest is the date it was submitted... =(((( 
Comment 10 Ville Herva 2006-05-03 23:29:02 UTC
I'm seeing a similar thing with 2.6.17-rc1 and a ppp connection over 
ssh
.
                                                                                        
                                                                                                             
Earlier kernels (at least linux-2.6.14-rc4) did not show this with the 
exactly same 
settings
                                                                                        
                                                                                                             
How it 
happens
:
                                                                                              
 - open up the ppp connection over 
ssh.                                                                      
 - stress it a bit with 
samba
                                                                                
 - after a few minutes, pinging the remote end 
gives                                                         
     ping: sendmsg: No buffer space 
available                                                                
   and I have to re-establish the 
connection                                                                 
                                                                                                             
The ssh connection seems solid afaict. Also, with bare ssh connetion, 
no                                     
problems occur even under load. The ppp connection seems prone to 
hang                                       
especially with SMB traffic (don't know why
.)                                                                
                                                                                                             
In dmesg, there seems to be nothing relevant. /proc/slabinfo doesn't seem 
to                                 
to have anything alerting in 
it.                                                                             
                                                                                                             
The underlying connection is ADSL (8/1Mbit). The driver is eepro100. ppp 
is                                  
ppp-2.4.3-6.2.
1
.
                                                                                             
                                                                                                             
ppp_deflate is in 
use
:
                                                                                       
                                                                                                             
ppp_deflate             5536  
3                                                                              
zlib_deflate           18360  1 ppp_
deflate                                                                  
bsd_comp                5312  
0                                                                              
ppp_async               9664  
2                                                                              
crc_ccitt               1952  1 ppp_
async                                                                    
ppp_generic            20468  11 ppp_deflate,bsd_comp,ppp_async                   



Note You need to log in before you can comment on or make changes to this bug.