Bug 14974 - tg3 does not resume from hibernation properly on BCM5787M
tg3 does not resume from hibernation properly on BCM5787M
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: Network
All Linux
: P1 normal
Assigned To: drivers_network@kernel-bugs.osdl.org
:
Depends on:
Blocks: 7216 14885
  Show dependency treegraph
 
Reported: 2010-01-02 05:30 UTC by Chow Loong Jin
Modified: 2010-01-15 21:52 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.33-rc1
Tree: Mainline
Regression: Yes


Attachments
Dmesg at boot time (60.05 KB, text/plain)
2010-01-08 20:44 UTC, Detlev Casanova
Details
lspci -vvxx (3.09 KB, text/plain)
2010-01-08 20:48 UTC, Detlev Casanova
Details

Description Chow Loong Jin 2010-01-02 05:30:50 UTC
After a resume from hibernation, no incoming packets seem to be registered by eth0 on a BCM5787M card on Lenovo 3000 Y410. The RX packet count appears to increase, but tcpdump shows nothing incoming. ARP requests also fail. tcpdump on a remote machine show the ARP requests being replied, but the replies are not registered by the local machine. Broadcast packets also get transmitted successfully, but nothing gets received.

Restarting the machine into the same kernel does not resolve this issue, i.e. eth0 remains blind to all incoming packets. A restart into an older kernel does seem to work though.

A bisect shows that 87668d352aa8d135bd695a050f18bbfc7b50b506 is the commit which caused the regression. Reverting this on top of my local git tree solves the issue.
Comment 1 Andrew Morton 2010-01-07 22:45:19 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 2 Jan 2010 05:30:52 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=14974
> 
>            Summary: tg3 does not resume from hibernation properly on
>                     BCM5787M
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.33-rc1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: hyperair@ubuntu.com
>                 CC: mcarlson@broadcom.com
>         Regression: Yes
> 
> 
> After a resume from hibernation, no incoming packets seem to be registered by
> eth0 on a BCM5787M card on Lenovo 3000 Y410. The RX packet count appears to
> increase, but tcpdump shows nothing incoming. ARP requests also fail. tcpdump
> on a remote machine show the ARP requests being replied, but the replies are
> not registered by the local machine. Broadcast packets also get transmitted
> successfully, but nothing gets received.
> 
> Restarting the machine into the same kernel does not resolve this issue, i.e.
> eth0 remains blind to all incoming packets. A restart into an older kernel does
> seem to work though.
> 
> A bisect shows that 87668d352aa8d135bd695a050f18bbfc7b50b506 is the commit
> which caused the regression. Reverting this on top of my local git tree solves
> the issue.
> 

Thanks for bisecting.

Probably the tg3 developers will want to know what sort of card you
have - the relvant boot-time dmesg output and the output of `lspci
-vvxx -s <PCI address>' would help.

Rafael, this is a post-2.6.32 regression.
Comment 2 Rafael J. Wysocki 2010-01-07 22:48:21 UTC
First-Bad-Commit : 87668d352aa8d135bd695a050f18bbfc7b50b506
Comment 3 Detlev Casanova 2010-01-08 20:44:16 UTC
Created attachment 24484 [details]
Dmesg at boot time

dmesg with the Broadcom Corporation NetLink BCM5784M Gigabit Ethernet device
Comment 4 Detlev Casanova 2010-01-08 20:48:51 UTC
Created attachment 24485 [details]
lspci -vvxx

lspci for the device
Comment 5 Detlev Casanova 2010-01-08 20:50:47 UTC
I have the same problem, not with the exact same card though. Mine is the "Ethernet controller: Broadcom Corporation NetLink BCM5784M Gigabit Ethernet PCIe (rev 10)" from lspci.

I attached the dmesg and lspci -vvxx

Thanks,

Detlev Casanova.
Comment 6 Chow Loong Jin 2010-01-08 21:05:13 UTC
On Friday 08,January,2010 06:45 AM, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Sat, 2 Jan 2010 05:30:52 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=14974
>>
>>            Summary: tg3 does not resume from hibernation properly on
>>                     BCM5787M
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 2.6.33-rc1
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Network
>>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>>         ReportedBy: hyperair@ubuntu.com
>>                 CC: mcarlson@broadcom.com
>>         Regression: Yes
>>
>>
>> After a resume from hibernation, no incoming packets seem to be registered by
>> eth0 on a BCM5787M card on Lenovo 3000 Y410. The RX packet count appears to
>> increase, but tcpdump shows nothing incoming. ARP requests also fail. tcpdump
>> on a remote machine show the ARP requests being replied, but the replies are
>> not registered by the local machine. Broadcast packets also get transmitted
>> successfully, but nothing gets received.
>>
>> Restarting the machine into the same kernel does not resolve this issue, i.e.
>> eth0 remains blind to all incoming packets. A restart into an older kernel does
>> seem to work though.
>>
>> A bisect shows that 87668d352aa8d135bd695a050f18bbfc7b50b506 is the commit
>> which caused the regression. Reverting this on top of my local git tree solves
>> the issue.
>>
> 
> Thanks for bisecting.
> 
> Probably the tg3 developers will want to know what sort of card you
> have - the relvant boot-time dmesg output and the output of `lspci
> -vvxx -s <PCI address>' would help.
Relevant messages extracted from kern.log:
Dec 28 15:03:37 ipwn kernel: [    2.746254] tg3.c:v3.105 (December 2, 2009)
Dec 28 15:03:37 ipwn kernel: [    2.748051] tg3 0000:06:00.0: PCI INT A -> GSI
19 (level, low) -> IRQ 19
Dec 28 15:03:37 ipwn kernel: [    2.749921] tg3 0000:06:00.0: setting latency
timer to 64
Dec 28 15:03:37 ipwn kernel: [    2.782767]   alloc irq_desc for 22 on node -1
Dec 28 15:03:37 ipwn kernel: [    2.782771]   alloc kstat_irqs on node -1
Dec 28 15:03:37 ipwn kernel: [    2.782782] ohci1394 0000:08:06.0: PCI INT A ->
GSI 22 (level, low) -> IRQ 22
Dec 28 15:03:37 ipwn kernel: [    2.800633] eth0: Tigon3 [partno(none) rev b002]
(PCI Express) MAC address xx:xx:xx:xx:xx:xx
Dec 28 15:03:37 ipwn kernel: [    2.802433] eth0: attached PHY is 5787
(10/100/1000Base-T Ethernet) (WireSpeed[1])
Dec 28 15:03:37 ipwn kernel: [    2.802436] eth0: RXcsums[1] LinkChgREG[0]
MIirq[0] ASF[0] TSOcap[1]
Dec 28 15:03:37 ipwn kernel: [    2.802438] eth0: dma_rwctrl[76180000]
dma_mask[64-bit]

lspci:
06:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5787M Gigabit
Ethernet PCI Express (rev 02)
        Subsystem: Lenovo Device 3860
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 30
        Region 0: Memory at b8000000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Vendor Specific Information <?>
        Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
Enable+
                Address: 00000000fee0300c  Data: 417b
        Capabilities: [d0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1
unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency
L0 <4us, L1 <64us
                        ClockPM+ Suprise- LLActRep- BwNot-
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain-
CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100] Advanced Error Reporting <?>
        Capabilities: [13c] Virtual Channel <?>
        Capabilities: [160] Device Serial Number 8c-4b-62-fe-ff-ec-1e-00
        Capabilities: [16c] Power Budgeting <?>
        Kernel driver in use: tg3
        Kernel modules: tg3
00: e4 14 93 16 06 05 10 00 02 00 00 02 10 00 00 00
10: 04 00 00 b8 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 aa 17 60 38
30: 00 00 fe ff 48 00 00 00 00 00 00 00 0a 01 00 00

> 
> Rafael, this is a post-2.6.32 regression.


-- 
Kind regards,
Chow Loong Jin (GPG: 0x8F02A411)
Ubuntu Contributing Developer
Comment 7 Matt Carlson 2010-01-08 23:06:09 UTC
Reply to all bounced.  Here was my reply:

I have personally witnessed the problem and have a patch in-hand that
fixes the problem I was seeing.  I submitted the patch back to the bug
reporter, but I haven't heard anything from him yet.

I have rolled this patch into a small patchset that I will be submitting
upstream as soon as I get to the bottom of a similar 5755M issue.
Comment 8 Chow Loong Jin 2010-01-09 01:35:17 UTC
(In reply to comment #7)
> Reply to all bounced.  Here was my reply:
> 
> I have personally witnessed the problem and have a patch in-hand that
> fixes the problem I was seeing.  I submitted the patch back to the bug
> reporter, but I haven't heard anything from him yet.
> 
> I have rolled this patch into a small patchset that I will be submitting
> upstream as soon as I get to the bottom of a similar 5755M issue.
Sorry, I must have overlooked the e-mail. I just found it with the patch. I'm compiling it for a test now.
Comment 9 Chow Loong Jin 2010-01-09 16:04:30 UTC
On Saturday 09,January,2010 09:36 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> Sorry, I must have overlooked the e-mail. I just found it with the patch. I'm
> compiling it for a test now.
Okay, I've tested and can confirm that it works well now with the patch provided.

-- 
Kind regards,
Chow Loong Jin (GPG: 0x8F02A411)
Ubuntu Contributing Developer
Comment 10 Rafael J. Wysocki 2010-01-09 17:25:21 UTC
Can you please attach the patch here?
Comment 11 Rafael J. Wysocki 2010-01-09 17:26:01 UTC
Handled-By : Matt Carlson <mcarlson@broadcom.com>
Comment 12 Chow Loong Jin 2010-01-09 17:36:22 UTC
On Sunday 10,January,2010 01:25 AM, bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=14974
> 
> 
> 
> 
> 
> --- Comment #10 from Rafael J. Wysocki <rjw@sisk.pl>  2010-01-09 17:25:21 ---
> Can you please attach the patch here?
> 
From 90b4b05362db57cce0d187d30366f98e2ecd6a4d Mon Sep 17 00:00:00 2001
From: Matt Carlson <mcarlson@broadcom.com>
Date: Mon, 4 Jan 2010 14:58:19 -0800
Subject: [PATCH] tg3: Fix std prod ring nicaddr for 5787 and 57765

Commit 87668d352aa8d135bd695a050f18bbfc7b50b506, titled "tg3: Don't
touch RCB nic addresses", tried to avoid assigning the nic address of
the standard producer ring.  Unfortunately, the default nic address is
not correct for the 5787.  The same is also true for the 57765.  This
patch reenables the old behavior and opts out of the assignment only
for the 5717.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/tg3.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 3a74d21..5c77e6a 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -7742,7 +7742,7 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
 	     ((u64) tpr->rx_std_mapping >> 32));
 	tw32(RCVDBDI_STD_BD + TG3_BDINFO_HOST_ADDR + TG3_64BIT_REG_LOW,
 	     ((u64) tpr->rx_std_mapping & 0xffffffff));
-	if (!(tp->tg3_flags3 & TG3_FLG3_5755_PLUS))
+	if (GET_ASIC_REV(tp->pci_chip_rev_id) != ASIC_REV_5717)
 		tw32(RCVDBDI_STD_BD + TG3_BDINFO_NIC_ADDR,
 		     NIC_SRAM_RX_BUFFER_DESC);

-- 1.6.4.4

-- 
Kind regards,
Chow Loong Jin (GPG: 0x8F02A411)
Ubuntu Contributing Developer
Comment 13 Rafael J. Wysocki 2010-01-09 17:57:55 UTC
Patch : http://bugzilla.kernel.org/show_bug.cgi?id=14974#c12
Comment 14 Matt Carlson 2010-01-14 22:47:36 UTC
The official commit is 13fa95b0398d65885a79c6e95a09976ee9f8c009.

Note You need to log in before you can comment on or make changes to this bug.