Bug 57681 - atl1c device stops working after 15 minutes
Summary: atl1c device stops working after 15 minutes
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL: https://lkml.kernel.org/r/5153620E.40...
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-07 01:40 UTC by Bjorn Helgaas
Modified: 2016-03-19 19:52 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.8.4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
3.6.11 dmesg (working) (55.35 KB, text/plain)
2013-05-07 01:41 UTC, Bjorn Helgaas
Details
3.8.11 dmesg (not working) (85.74 KB, text/plain)
2013-05-07 01:42 UTC, Bjorn Helgaas
Details

Description Bjorn Helgaas 2013-05-07 01:40:47 UTC
On Lenovo G770, the wired ethernet device (driver atl1c) works correctly in 3.6.11.  With a 3.8.4 or later kernel, the device works correctly at first, but  after about 15 minutes, it stops responding to ping and Samba stops working.
Comment 1 Bjorn Helgaas 2013-05-07 01:41:50 UTC
Created attachment 100901 [details]
3.6.11 dmesg (working)
Comment 2 Bjorn Helgaas 2013-05-07 01:42:30 UTC
Created attachment 100911 [details]
3.8.11 dmesg (not working)
Comment 4 Arpit Chaudhary 2014-02-24 12:00:06 UTC
I too have a similar problem.
I am using elementary OS luna based on Ubuntu 12.04.
Wifi works fine but my ethernet stops working after some time. Which makes my system to hang up.

Had the same problem in linux mint 16 and open suse 13.1.
Do not know about the kernel version in linux mint and open suse but i am currently using 3.2.0-59-generic
Comment 5 Bjorn Helgaas 2014-02-24 21:46:37 UTC
Hi Arpit, Xiong thought this was likely the same as bug #54021 [1,2].  That bug was resolved in v3.11.  Can you try a v3.11 or newer kernel, please?

[1] http://marc.info/?l=linux-kernel&m=136789513301536&w=2
[2] https://bugzilla.kernel.org/show_bug.cgi?id=54021
Comment 6 Alejandro Donato 2015-05-04 11:45:47 UTC
Same issue here, NIC disconnects ramdomly. I notice, this uses to happen when relatively heavy traffic is requested (like copying large files or using VNC):

Extract of the relevant info.

Linux smarttv-1005HA 3.14.1-031401-generic #201404141220 SMP Mon Apr 14 16:59:42 UTC 2014 i686 i686 i686 GNU/Linux

lspci

01:00.0 Ethernet controller: Qualcomm Atheros AR8132 Fast Ethernet (rev c0)

lsmod

atl1c                  40945  0 

dmesg

[39462.757758] atl1c 0000:01:00.0: irq 45 for MSI/MSI-X
[39462.757977] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
[39463.785390] atl1c 0000:01:00.0: irq 45 for MSI/MSI-X
[39463.785569] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
[39464.770187] atl1c 0000:01:00.0: irq 45 for MSI/MSI-X
[39464.770384] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>
[40674.229668] atl1c 0000:01:00.0: irq 45 for MSI/MSI-X
[40674.229845] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>

Syslog
May  4 08:24:02 smarttv-1005HA NetworkManager[744]: <info> (eth0): carrier now OFF (device state 100, deferring action for 4 seconds)
May  4 08:24:02 smarttv-1005HA NetworkManager[744]: <info> (eth0): carrier now ON (device state 100)
May  4 08:24:02 smarttv-1005HA kernel: [40917.437880] atl1c 0000:01:00.0: irq 45 for MSI/MSI-X
May  4 08:24:02 smarttv-1005HA kernel: [40917.438049] atl1c 0000:01:00.0: atl1c: eth0 NIC Link is Up<100 Mbps Full Duplex>

Infrastructure is ok. All checked with another equipments.

Please tell me test and info needed to grab to help solving this issue.
Comment 7 Bjorn Helgaas 2016-03-19 19:52:05 UTC
Since this only seems to affect the atl1c driver, my first guess is that this is a problem in that driver.

Does this problem occur with a current upstream kernel, e.g., v4.5?  If so, the maintainers should be very interested in fixing it:

Jay Cliburn <jcliburn@gmail.com> (maintainer:ATLX ETHERNET DRIVERS)
Chris Snook <chris.snook@gmail.com> (maintainer:ATLX ETHERNET DRIVERS)

If it's already fixed in v4.5, then the question is whether we need to find the fix and backport it to older kernels, e.g., for distro updates.

If somebody wanted to bisect between v3.6.11 (working) and v3.8.11 (broken), we could probably find the commit that broke it that way.  Time-consuming but effective.

I see there was some follow-up in the email thread: https://lkml.kernel.org/r/5153620E.4010608@gmail.com, but I didn't see any conclusion.

Note You need to log in before you can comment on or make changes to this bug.