Bug 5131 - Computer hangs when default-gw becomes unreachable
Summary: Computer hangs when default-gw becomes unreachable
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: S.Mohideen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-08-26 02:21 UTC by Sebastian Hyrwall
Modified: 2008-03-10 20:48 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.13-rc3-mm1
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Sebastian Hyrwall 2005-08-26 02:21:29 UTC
Distribution: Gentoo 
Hardware Environment: 

Motherboard : Intel E7210 with onboard e1000

cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz

4x 512MB DIMM DDR Synchronous 333 MHz

Software Environment:
Linux amnesia 2.6.13-rc3-mm1 #3 SMP Sun Aug 14 16:38:51 CEST 2005 i686 Intel(R)
Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux

Gnu C                  3.3.5-20050130
Gnu make               3.80
binutils               2.15.92.0.2
util-linux             2.12q
mount                  2.12q
module-init-tools      3.0
e2fsprogs              1.37
reiserfsprogs          3.6.19
reiser4progs           1.0.4
xfsprogs               2.6.25
nfs-utils              1.0.7
Linux C Library        2.3.4
Dynamic linker (ldd)   2.3.4
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   058
Modules Loaded         ip_gre e1000



Problem Description:

First I must say this is a very odd problem. It happend the first time a couple
of weeks ago. My ISP's router became unreachable then when it came back online
all computers where up except this one. All others are running a 2.6.12-version.

Back then I thought it was just some weird random occurance but it's happened
twice after that. Everytime when my ISP's router becomes unreachable for some
seconds the computer hangs totally. Takes no input, magic sysrq doesn't work
either, and the screen is black.

The problem does not occur when I tried 2.6.12.1.

I'm connected with 1gbit using an onboard e1000-card to the ISP to I thought it
could maybe be some driver problem where the nic gets corrupted packets and
causes a panic or something some how.

Right now I'm going to try 2.6.13-rc7 to see if the problem occurs there too.

Steps to reproduce:

Using the same kernelversion and waiting for the ISP's router (my default gw) to
go down. Reproduced 3 times.
Comment 1 Andrew Morton 2005-08-26 04:44:05 UTC
bugme-daemon@kernel-bugs.osdl.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=5131
> 
>            Summary: Computer hangs when default-gw becomes unreachable
>     Kernel Version: 2.6.13-rc3-mm1
> 
> ...
>
> Right now I'm going to try 2.6.13-rc7 to see if the problem occurs there too.
> 

Thanks.  It would be useful if you could also test 2.6.13-rc6-mm2.

I assume there was nothing interesting in the kernel logs?

Comment 2 Sebastian Hyrwall 2005-08-26 11:44:50 UTC
bugme-daemon@kernel-bugs.osdl.org wrote:

>http://bugzilla.kernel.org/show_bug.cgi?id=5131
>
>
>
>
>
>------- Additional Comments From akpm@osdl.org  2005-08-26 04:44 -------
>bugme-daemon@kernel-bugs.osdl.org wrote:
>  
>
>>http://bugzilla.kernel.org/show_bug.cgi?id=5131
>>
>>           Summary: Computer hangs when default-gw becomes unreachable
>>    Kernel Version: 2.6.13-rc3-mm1
>>
>>...
>>
>>Right now I'm going to try 2.6.13-rc7 to see if the problem occurs there too.
>>
>>    
>>
>
>Thanks.  It would be useful if you could also test 2.6.13-rc6-mm2.
>
>I assume there was nothing interesting in the kernel logs?
>
>
>
>------- You are receiving this mail because: -------
>You reported the bug, or are watching the reporter.
>
>
>  
>
I may be able to test 2.6.13-rc6-mm2 in a while. Unfortunetly this 
system is in production so I can not just take it down. But If the 
problem happens again I will change to 2.6.13-rc6-mm2.

No nothing interesting at all in the kernel logs.

I can however add that a friend with similar specs and same kernel 
version had a similar problem except that the computer didn't crash 
right away. After the problem happened he could login as root on the 
console and everything seemed to be working except that the e1000-card 
could not communicate with the gateway. The gateway did answer the 
arp-requests from the e1000-card but it didn't respond to icmp or route 
any traffic. However he could communicate with all other computers in 
the same subnet and those computers could also communicate with the 
gateway properly.
He also tried changing mac-address on the card and IP-address. After 
that the gateway would answer to a few icmp-packets and then dying like 
before. He also tried unloading and loading the e1000-module without 
success. A few minutes later the computer hanged like mine does.
After a power-reset everything worked fine again.

Comment 3 Sebastian Hyrwall 2005-08-26 11:46:02 UTC
Andrew Morton wrote:

>bugme-daemon@kernel-bugs.osdl.org wrote:
>  
>
>>http://bugzilla.kernel.org/show_bug.cgi?id=5131
>>
>>           Summary: Computer hangs when default-gw becomes unreachable
>>    Kernel Version: 2.6.13-rc3-mm1
>>
>>...
>>
>>Right now I'm going to try 2.6.13-rc7 to see if the problem occurs there too.
>>
>>    
>>
>
>Thanks.  It would be useful if you could also test 2.6.13-rc6-mm2.
>
>I assume there was nothing interesting in the kernel logs?
>
>
>  
>
I don't know if this is any help but I was able to reproduce the problem 
in another way.
I also have a second e1000-nic in the box. A 64-bit one sitting in a 
pci-slot. The problem occurs every time if I do something like this:

Both nic's are connected to the same switch and is not separated by 
vlans or anything like that.
The first nic (eth0) has address 192.168.0.2, the second nic (eth1) has 
192.168.0.2.
Then what I did was that I applied a staticroute to a second box so that 
it would use the eth1-nic:
ip route add 192.168.0.3 dev eth1

Then the second box connected to 192.168.0.2 (eth0) via ftp and 
downloads a file which is then sourced with the 192.168.0.2 ip but
transferred
via eth1.
192.168.0.2(eth0) --> eth1 -> 192.168.0.3

Packets are returned coming in on eth0.

192.168.0.3 -> 192.168.0.2(eth0)

Immedietly when i transferred the file the nic's stopped transferring 
data and I was back at the problem my friend had.
The box could for some reason not communicate with the gateway 
(192.168.0.1) but it could with any other box in the subnet.
Why this problem occured now when the gateway had nothing to do with any 
of the test I have no idea.
I solved the problem by unloading the e1000-module and loading it again.

And this now happened in 2.6.13-rc7

Comment 4 Natalie Protasevich 2007-10-11 00:34:58 UTC
Any update on this problem please.
Thanks.
Comment 5 Natalie Protasevich 2008-03-10 20:48:08 UTC
Closing the bug since no recent activity.
Please reopen if confirmed with newest kernel.

Note You need to log in before you can comment on or make changes to this bug.