Bug 15388

Summary: After suspend to ram, laptop doesn't connect to the wired network. Sky2 error. Marvell 88E8036
Product: Drivers Reporter: Eduardo (aberkoke)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: aberkoke, akpm, alan, auxsvr, bj.cardon, bjorn, jbarnes, rjw
Priority: P1    
Hardware: All   
OS: Linux   
URL: http://bbs.archlinux.org/viewtopic.php?id=91833
Kernel Version: 2.6.38.10 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: lspci output
everything.log
lsmod output
git bisect log
proposed patch (reverts c82f63e411)

Description Eduardo 2010-02-24 18:54:55 UTC
Hi. I think there is a problem with kernel in my laptop. After suspend (to ram), my computer doesn't connect to wired network (eth0 doesn't connect). The problem occurs in ArchLinux 32 bits (with kernels 2.6.31, 32 and 33 rc-8) and with Ubuntu Karmik 9.10 (2.6.31). I have not checked with 2.6.33-rc1 to -rc7 but I suppose the problem is there.

Archlinux with kernel 2.6.30 and ubuntu 9.04 (kernel 2.6.28) doesn't present this problem.

I think, due to problem is present in ubuntu and archlinux, since 2.6.31, there is a bug in the kernel or related to it. I think the problem is with Marvell 88E8036, reponsible of wired network and sky2 module (log file shows: kernel: sky2 0000:02:00.0: eth0: phy I/O error). 

Wifi network works fine after suspend.

My computer is a laptop Toshiba M40-285. Attached files: everything.log, output of lsmod and lspci, all after suspend to ram.

I could help you to solve this problem. If you want more logs or more information tell me. 

Thanks,
Comment 1 Eduardo 2010-02-24 18:55:54 UTC
Created attachment 25195 [details]
lspci output
Comment 2 Eduardo 2010-02-24 18:57:16 UTC
Created attachment 25196 [details]
everything.log
Comment 3 Eduardo 2010-02-24 18:57:49 UTC
Created attachment 25197 [details]
lsmod output
Comment 4 Eduardo 2010-02-24 18:59:03 UTC
Comment on attachment 25195 [details]
lspci output

from archlinux 2.6.32
Comment 5 Eduardo 2010-02-24 18:59:38 UTC
Comment on attachment 25196 [details]
everything.log

from archlinux 2.6.32
Comment 6 Eduardo 2010-02-24 19:00:03 UTC
Comment on attachment 25197 [details]
lsmod output

from archlinux 2.6.32
Comment 7 Eduardo 2010-02-27 02:38:25 UTC
Problem has between 2.6.31-rc7 and 2.6.31-rc8.
Comment 8 Eduardo 2010-03-01 23:12:51 UTC
Hi. I hope someone can read this. With git bisect, i found commit that causes the problem. 

eduardo@eduardo-laptop:~/linux-git$ git bisect bad
c82f63e411f1b58427c103bd95af2863b1c96dd1 is first bad commit
commit c82f63e411f1b58427c103bd95af2863b1c96dd1
Author: Alek Du <alek.du@intel.com>
Date:   Sat Aug 8 08:46:19 2009 +0800

    PCI: check saved state before restore
    
    Without the check, the config space may be filled with zeros. Though
    the driver should try to avoid call restoring before saving, but the
    pci layer also should check this.
    
    Also removes the existing check in pci_restore_standard_config, since
    it's superfluous with the new check in restore_state.
    
    Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
    Signed-off-by: Alek Du <alek.du@intel.com>
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

:040000 040000 b363995a162a427fdf907059d38882036d68109d 6aca235abde6bf4545e479a87b7f7171e934a988 M	drivers

Attached is content of BISECT_LOG file. I think, because my problem is after suspend, this is the commit responsible of the problem. Can i do anything else?

Thanks
Comment 9 Eduardo 2010-03-01 23:13:52 UTC
Created attachment 25302 [details]
git bisect log
Comment 10 Eduardo 2010-03-25 01:06:17 UTC
I can't check if problem is present in 2.6.33 and 2.6.34rc1 because suspend to ram doesn't work with those kernels. I will test with 2.6.33.1 and 2.6.34rc2 soon.
Comment 11 Eduardo 2010-05-04 10:47:28 UTC
The problem is also present in kernel 2.6.33.3
Comment 12 Rafael J. Wysocki 2010-05-04 19:46:14 UTC
Did you try to revert the commit you found via 'git bisect' from 2.6.33.3, for example?
Comment 13 Eduardo 2010-05-05 16:59:45 UTC
Of course. With 2.6.33.3 without c82f63e411f1b58427c103bd95af2863b1c96dd1, problem is solved!!
Comment 14 Rafael J. Wysocki 2011-01-16 22:24:42 UTC
Is the problem still present in 2.6.37 (works for me)?
Comment 15 bj.cardon 2011-08-16 21:07:17 UTC
I am on 2.6.38.10 and I still have this issue. I'm using this:

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8040 PCI-E Fast Ethernet Controller
        Subsystem: Hewlett-Packard Company Device 361a
        Kernel driver in use: sky2
        Kernel modules: sky2

I don't know what other information is useful.
Comment 16 Bjorn Helgaas 2012-08-23 23:31:38 UTC
Eduardo, bj, does this problem still occur on 3.5?  Sorry to just ask without any actual debugging on my part, but this has fallen through the cracks for a long time and I don't want to waste time if it's been accidentally fixed in the meantime.
Comment 17 Eduardo 2012-08-26 20:37:43 UTC
At this moment I don't know. I will test with kernel 3.5.3 and i will post here with the results. Thank you.
Comment 18 Bjorn Helgaas 2012-10-01 20:03:38 UTC
Ping, Eduardo, bj, any update?
Comment 19 bj.cardon 2012-10-02 16:51:47 UTC
Bjorn,

I tested this on 3.5.0 (built on Kubuntu 12.10 Beta) and it seems to be working correctly finally. My netbook was virtually useless because you couldn't plug in ethernet on the fly or suspend and get your ethernet back. Both of those issues seem fixed on 3.5.

Thanks!
Comment 20 Eduardo 2012-10-02 23:24:43 UTC
Hi Bjorn. I'm so sorry for the delay.

I tested with latest stable 3.5 kernel (3.5.4) and the problem is still present.
Comment 21 Eduardo 2012-10-02 23:37:35 UTC
For 3.6 the problem is still present in my hardware. Thanks
Comment 22 Bjorn Helgaas 2012-10-03 19:57:04 UTC
Alek, Rafael, any ideas?

Eduardo, I assume that reverting c82f63e411f1 still fixes 3.6 on your hardware?  Can you attach the complete dmesg log (covering initial boot, suspend to RAM, and resume) from both 3.6 and 3.6 with c82f63e411f1 reverted?
Comment 23 Bjorn Helgaas 2012-10-25 20:06:45 UTC
Eduardo, you said v3.6 still has the problem on your hardware.  I haven't heard any defense of c82f63e411f1, so if you confirm that v3.6 with that change reverted it fixes your hardware, I'll push that revert upstream.  The last test was from 2.6.33.3, which is getting a bit old.
Comment 24 Bjorn Helgaas 2012-10-26 01:25:24 UTC
Created attachment 84911 [details]
proposed patch (reverts c82f63e411)

It's no longer trivial to revert c82f63e411 because of other changes in that area.  This patch applies to v3.7-rc2 and effectively reverts c82f63e411.

Eduardo, can you test this and verify that it fixes the problem?
Comment 25 auxsvr 2012-10-27 19:20:27 UTC
I have been having the same problem for quite some time, even received the following warning once:

------------[ cut here ]------------
WARNING: at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.6.0/linux-3.6/net/sched/sch_generic.c:255 dev_watchdog+0x1e0/0x1f0()
Hardware name: HP Mini 5102
NETDEV WATCHDOG: eth0 (sky2): transmit queue 0 timed out
Modules linked in: cpufreq_stats nls_utf8 loop rfcomm bnep btusb bluetooth arc4 brcmsmac mac80211 bcma brcmutil cfg80211 cordic af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave fuse hp_wmi snd_hda_codec_idt snd_hda_intel sparse_keymap rfkill snd_hda_codec iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_core videodev videobuf2_vmalloc snd_hwdep snd_pcm videobuf2_memops sg acpi_cpufreq mperf coretemp microcode hp_accel container lis3lv02d snd_timer wmi joydev input_polldev snd lpc_ich battery serio_raw soundcore mfd_core snd_page_alloc edd ac sky2 autofs4 i915 drm_kms_helper drm i2c_algo_bit button video scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh_hp_sw scsi_dh fan processor thermal thermal_sys
Pid: 0, comm: swapper/1 Tainted: G        W    3.6.0-1-desktop #1
Call Trace:
 [<c02054a9>] try_stack_unwind+0x199/0x1b0
 [<c02041c7>] dump_trace+0x47/0xf0
 [<c020550b>] show_trace_log_lvl+0x4b/0x60
 [<c0205538>] show_trace+0x18/0x20
 [<c0714c48>] dump_stack+0x6d/0x72
 [<c0237b18>] warn_slowpath_common+0x78/0xb0
 [<c0237be3>] warn_slowpath_fmt+0x33/0x40
 [<c065a020>] dev_watchdog+0x1e0/0x1f0
 [<c0246e53>] run_timer_softirq+0x103/0x310
 [<c023fd99>] __do_softirq+0x99/0x1e0
 [<c02040a6>] do_softirq+0x76/0xb0
 [<00000003>] 0x2
---[ end trace 35cf0f09d1d8830f ]---
sky2 0000:43:00.0: eth0: tx timeout
sky2 0000:43:00.0: eth0: transmit ring 43 .. 54 report=43 done=43

If I reload sky2, then the NIC works fine.
Comment 26 Bjorn Helgaas 2012-10-29 23:30:13 UTC
auxsvr, can you test the patch in comment #24 and see whether it resolves the problem?
Comment 27 auxsvr 2012-10-30 09:25:52 UTC
In my case the problem is triggered by 

ethtool -s eth0 autoneg off

which is executed by the laptop-mode scripts when on battery. I will check the patch later, I'm too busy at the moment.
Comment 28 auxsvr 2012-11-18 18:05:35 UTC
Just tried the patch on 3.7-rc5 and the connection stops with ethtool -s eth0 autoneg off duplex full speed 100, yet it is restored with ethtool -s eth0 autoneg on, which wouldn't work before (no module reload necessary). Also, there is no error message in the log. 

This looks fixed to me.
Comment 29 auxsvr 2012-11-19 07:53:00 UTC
Well, I just tried  3.6.3 without the patch and it has the same behaviour, i.e. with autoneg off the connection stops and resumes with autoneg on, without errors.  
Hopefully, it won't take many suspend-resume cycles to observe the problem reported originally.
Comment 30 Bjorn Helgaas 2013-09-10 22:38:32 UTC
I'm closing this for lack of information.  If the problem still occurs on a recent kernel (v3.11), please reopen the bug and attach the complete dmesg and "lspci -vv" logs.