Bug 8151 - Established connections not displayed in /proc/tcp/netstat
Summary: Established connections not displayed in /proc/tcp/netstat
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-08 07:53 UTC by David Tung
Modified: 2007-08-23 18:23 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.17-rt7-default #3 PREEMPT
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
fdsock test program (6.42 KB, text/x-csrc)
2007-03-14 23:26 UTC, Olaf Kirch
Details

Description David Tung 2007-03-08 07:53:53 UTC
Most recent kernel where this bug did *NOT* occur: Unknown

Distribution: SUSE 10 Pro

Hardware Environment: Intel(R) Pentium(R) 4 CPU 2.80GHz, 32 bit, 512 Megs of RAM

Software Environment: SUSE 10, 2.6.17-rt7-default with ingo molnar RT patch

Problem Description: 

 

Established tcp network connections are not shown in /proc/net/tcp (not
displayed in netstat).

 

I began to notice this when I would use netstat to display all established
connections. There were a few connections not shown as ESTABLISHED by netstat,
yet my application did not report a connection broken with the remote device. I
logged into the remote device and observed the connection was ESTABLISHED on the
remote side. In addition, I verified the transfer of data to the remote device.
I searched through the proc file system to find out more info and it seems that
my application still held the socket file descriptor as opened. 

 

 

#### LOCAL MACHINE WITH BUG# 10.1.2.71 is IP of remote device note connection not established

local:/ # netstat -a | grep 10.1.2.71

## NOTE IP of local device

local:/ # ifconfig eth1

eth1      Link encap:Ethernet  HWaddr 00:03:2D:09:0E:6F

          inet addr:10.1.2.10  Bcast:10.1.2.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:558434 errors:0 dropped:0 overruns:0 frame:0

          TX packets:264478 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:37082257 (35.3 Mb)  TX bytes:19489061 (18.5 Mb)

          Interrupt:18

 

# ##telnet into remote machine

local:/ # telnet 10.1.2.71

Trying 10.1.2.71...

Connected to 10.1.2.71.

Escape character is '^]'.

 

MontaVista(R) Linux(R) Professional Edition 3.1

Linux/ppc 2.4.20_mvl31-405gp_eval

 

 

(none) login: root

Password:

Linux (none) 2.4.20_mvl31-405gp_eval #474 Tue Dec 27 16:37:25 PST 2005 ppc unknown

 

MontaVista(R) Linux(R) Professional Edition 3.1

 

 

 

BusyBox v0.60.3 (2004.01.09-22:55+0000) Built-in shell (ash)

Enter 'help' for a list of built-in commands.

######## Remote machine. Note connection established on port 1400 with local machine

# netstat -a | grep 10.1.2.10

tcp        0      0 10.1.2.71:telnet        10.1.2.10:51555         ESTABLISHED

tcp        0      0 10.1.2.71:1400          10.1.2.10:60380         ESTABLISHED

 

 

###### Back on LOCAL machine with bug

### Note established connection to IP 10.1.2.71 on port 1400 not shown. However,
other established connections shown to other devices running the same remote
server software. 

local:/proc/2774/fd # grep 0A02010A /proc/net/tcp | more

  64: 0A02010A:EBDC 4702010A:0578 01 00000000:00000000 00:00000000 00000000    
0        0 7477 1 d8ba1980 201 40 10 2 100

  70: 0A02010A:A547 4802010A:0578 01 00000000:00000000 00:00000000 00000000    
0        0 7478 1 d8ba1400 201 40 10 2 100

 

#### Display open inodes

### I grep for the inode of the established connections from above. My
application will open the sockets to the remote devices sequentially.

ml_alc10:/proc/2774/fd # ls -l | grep socket | grep 747[0-9]

lrwx------  1 root root 64 Mar  7 10:37 154 -> socket:[7471]

lrwx------  1 root root 64 Mar  7 10:37 155 -> socket:[7472]

lrwx------  1 root root 64 Mar  7 10:37 158 -> socket:[7473]

lrwx------  1 root root 64 Mar  7 10:37 159 -> socket:[7474]

lrwx------  1 root root 64 Mar  7 10:37 160 -> socket:[7475]

lrwx------  1 root root 64 Mar  7 10:37 161 -> socket:[7476]

lrwx------  1 root root 64 Mar  7 10:37 162 -> socket:[7477]

lrwx------  1 root root 64 Mar  7 10:37 163 -> socket:[7478]

lrwx------  1 root root 64 Mar  7 10:37 164 -> socket:[7479]

 

## Check if socket inode 7479 exists in /proc/net/tcp

local:/proc/2774/fd # grep 7479 /proc/net/tcp | more

local:/proc/2774/fd # ls -l | grep 7479

lrwx------  1 root root 64 Mar  7 10:37 164 -> socket:[7479]

local:/proc/2774/fd # grep 7479 /proc/net/tcp | more

local:/proc/2774/fd # 

 

#### NOTE Very Strange behavior. The connection will intermittently be shown in
/proc/net/tcp with the same local port number and inode. This leads me to
believe that the connection persists throughout and that the kernel is failing
to accurately display all established socket connections.

 

Please advise on how display all ESTABLISHED connections correctly. 

 

Thanks.
Comment 1 Anonymous Emailer 2007-03-08 10:19:54 UTC
Reply-To: akpm@linux-foundation.org



Begin forwarded message:

Date: Thu, 8 Mar 2007 07:53:57 -0800
From: bugme-daemon@bugzilla.kernel.org
To: bugme-new@lists.osdl.org
Subject: [Bugme-new] [Bug 8151] New: Established connections not displayed in /proc/tcp/netstat


http://bugzilla.kernel.org/show_bug.cgi?id=8151

           Summary: Established connections not displayed in
                    /proc/tcp/netstat
    Kernel Version: 2.6.17-rt7-default #3 PREEMPT
            Status: NEW
          Severity: normal
             Owner: acme@conectiva.com.br
         Submitter: getitupsidelines@yahoo.com
                CC: getitupsidelines@yahoo.com


Most recent kernel where this bug did *NOT* occur: Unknown

Distribution: SUSE 10 Pro

Hardware Environment: Intel(R) Pentium(R) 4 CPU 2.80GHz, 32 bit, 512 Megs of RAM

Software Environment: SUSE 10, 2.6.17-rt7-default with ingo molnar RT patch

Problem Description: 

 

Established tcp network connections are not shown in /proc/net/tcp (not
displayed in netstat).

 

I began to notice this when I would use netstat to display all established
connections. There were a few connections not shown as ESTABLISHED by netstat,
yet my application did not report a connection broken with the remote device. I
logged into the remote device and observed the connection was ESTABLISHED on the
remote side. In addition, I verified the transfer of data to the remote device.
I searched through the proc file system to find out more info and it seems that
my application still held the socket file descriptor as opened. 

 

 

#### LOCAL MACHINE WITH BUG# 10.1.2.71 is IP of remote device note connection not established

local:/ # netstat -a | grep 10.1.2.71

## NOTE IP of local device

local:/ # ifconfig eth1

eth1      Link encap:Ethernet  HWaddr 00:03:2D:09:0E:6F

          inet addr:10.1.2.10  Bcast:10.1.2.255  Mask:255.255.255.0

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:558434 errors:0 dropped:0 overruns:0 frame:0

          TX packets:264478 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:37082257 (35.3 Mb)  TX bytes:19489061 (18.5 Mb)

          Interrupt:18

 

# ##telnet into remote machine

local:/ # telnet 10.1.2.71

Trying 10.1.2.71...

Connected to 10.1.2.71.

Escape character is '^]'.

 

MontaVista(R) Linux(R) Professional Edition 3.1

Linux/ppc 2.4.20_mvl31-405gp_eval

 

 

(none) login: root

Password:

Linux (none) 2.4.20_mvl31-405gp_eval #474 Tue Dec 27 16:37:25 PST 2005 ppc unknown

 

MontaVista(R) Linux(R) Professional Edition 3.1

 

 

 

BusyBox v0.60.3 (2004.01.09-22:55+0000) Built-in shell (ash)

Enter 'help' for a list of built-in commands.

######## Remote machine. Note connection established on port 1400 with local machine

# netstat -a | grep 10.1.2.10

tcp        0      0 10.1.2.71:telnet        10.1.2.10:51555         ESTABLISHED

tcp        0      0 10.1.2.71:1400          10.1.2.10:60380         ESTABLISHED

 

 

###### Back on LOCAL machine with bug

### Note established connection to IP 10.1.2.71 on port 1400 not shown. However,
other established connections shown to other devices running the same remote
server software. 

local:/proc/2774/fd # grep 0A02010A /proc/net/tcp | more

  64: 0A02010A:EBDC 4702010A:0578 01 00000000:00000000 00:00000000 00000000    
0        0 7477 1 d8ba1980 201 40 10 2 100

  70: 0A02010A:A547 4802010A:0578 01 00000000:00000000 00:00000000 00000000    
0        0 7478 1 d8ba1400 201 40 10 2 100

 

#### Display open inodes

### I grep for the inode of the established connections from above. My
application will open the sockets to the remote devices sequentially.

ml_alc10:/proc/2774/fd # ls -l | grep socket | grep 747[0-9]

lrwx------  1 root root 64 Mar  7 10:37 154 -> socket:[7471]

lrwx------  1 root root 64 Mar  7 10:37 155 -> socket:[7472]

lrwx------  1 root root 64 Mar  7 10:37 158 -> socket:[7473]

lrwx------  1 root root 64 Mar  7 10:37 159 -> socket:[7474]

lrwx------  1 root root 64 Mar  7 10:37 160 -> socket:[7475]

lrwx------  1 root root 64 Mar  7 10:37 161 -> socket:[7476]

lrwx------  1 root root 64 Mar  7 10:37 162 -> socket:[7477]

lrwx------  1 root root 64 Mar  7 10:37 163 -> socket:[7478]

lrwx------  1 root root 64 Mar  7 10:37 164 -> socket:[7479]

 

## Check if socket inode 7479 exists in /proc/net/tcp

local:/proc/2774/fd # grep 7479 /proc/net/tcp | more

local:/proc/2774/fd # ls -l | grep 7479

lrwx------  1 root root 64 Mar  7 10:37 164 -> socket:[7479]

local:/proc/2774/fd # grep 7479 /proc/net/tcp | more

local:/proc/2774/fd # 

 

#### NOTE Very Strange behavior. The connection will intermittently be shown in
/proc/net/tcp with the same local port number and inode. This leads me to
believe that the connection persists throughout and that the kernel is failing
to accurately display all established socket connections.

 

Please advise on how display all ESTABLISHED connections correctly. 

 

Thanks.

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 2 Stephen Hemminger 2007-03-08 11:01:01 UTC
The data in /proc/net/tcp is not atomic. If there are lots of connections it is
easily possible for a list to change during the traversal and some entries will
be missed.  Is this problem occurring on an active machine with connections
coming and going? or is it a standalone test/workstation?

Comment 3 David Tung 2007-03-08 16:13:40 UTC
Hello and thanks for the response.

The machine is active. Of course connections can occur at anytime the
application is running but for the specific sockets that are "missing" from the
/proc/net/tcp file the connection is proven to persists for hours or days at a
time. 

Comment 4 Herbert Xu 2007-03-08 23:13:56 UTC
Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> Date: Thu, 8 Mar 2007 07:53:57 -0800
> From: bugme-daemon@bugzilla.kernel.org
> To: bugme-new@lists.osdl.org
> Subject: [Bugme-new] [Bug 8151] New: Established connections not displayed in /proc/tcp/netstat

Please try running 'ss -nt' to see if it displays the missing
connections.  If it does then we can confirm a problem with
/proc, otherwise the you might have a TCP problem instead.

Note that with a recent kernel you may have to build/load the
tcp_diag module before 'ss -nt' will work.

Cheers,
Comment 5 Olaf Kirch 2007-03-09 03:22:15 UTC
This is confusing. Your example shows

# grep 0A02010A /proc/net/tcp | more
  64: 0A02010A:EBDC 4702010A:0578 01 00000000:00000000 00:00000000 00000000    
0        0 7477 1 d8ba1980 201 40 10 2 100

and you say "established connection to IP 10.1.2.71 on port 1400 not shown".
But 4702010A:0578 really is 10.1.2.71, port 1400 (0x0578 in hex), so it _is_
shown.

The remainder of the report - you list some open sockets of your application,
and then go looking for the one with inode 7479 in /proc/net/tcp. Have you
confirmed prior to this that the socket is indeed a TCP socket? What does
lsof -p <pid-of-your-application> report?
Comment 6 David Tung 2007-03-09 08:55:24 UTC
ss -nt does not show the established socket. 

In addition, lsof -p shows "can't identify protocol" for my missing sockets.

So If understand correctly: as far as the system is concerned the missing
sockets are not tcp sockets (according to lsof, even though they are according
to everything else I see) therefore, they are not listed in /proc/net/tcp and
netstat. 

So unless im missing something here I assume this is NOT a kernel bug and should
therefore be reported on another board. Please advise.

Thanks
Comment 7 Olaf Kirch 2007-03-12 02:49:57 UTC
The lsof error message could mean that lsof is broken, or that the kernel
indeed isn't printing all sockets.

Can you please try the following:

 - compile the program attached below, put the binary in /tmp/fdsock
 - pick an application where you believe netstat et al don't show
   connected sockets properly. Run the app from the command line,
   and note the PID
 - Attach to the running process using gdb:
	gdb /path/to/my/application <pid>
 - Within gdb, run
	p system("/tmp/fdsock")

This should display a listing of all file handles the application has open,
with an indication of whether it's a socket, tty, etc. Note that the output
goes to the tty of the application you have attached to, not the terminal
you're running gdb in.
For sockets, fdsock will also print out the local address, and - if
connected - the peer's address, like this:

socket inet/stream 192.168.5.32:58131 connected to 192:168.5.1:22 flags 802
(RDWR,NONBLOCK on fd 3

Compare that to the output of /proc/net/tcp, netstat, etc.
Comment 8 Olaf Kirch 2007-03-14 23:26:31 UTC
Created attachment 10773 [details]
fdsock test program

Well, it helps to actually attach the test program, not just talk about
doing it :-)
Comment 9 Stephen Hemminger 2007-07-02 19:20:45 UTC
Maybe the missing sockets are Unix domain, not TCP at all.
Comment 10 Stephen Hemminger 2007-07-09 12:09:38 UTC
Please give us some feedback, is this still a bug or is it a non-problem.
Comment 11 Adrian Bunk 2007-08-23 18:23:58 UTC
Please reopen this bug if:
- it is still present with kernel 2.6.22 and
- you can provide the requested information.

Note You need to log in before you can comment on or make changes to this bug.