Bug 47041

Summary: my laptop HP Pavilion ZT3000 became unusable cause kernel panic
Product: Drivers Reporter: Giovanni Venturi (slacky)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: alan, florian, gilboad, jasowang, rggjan, romieu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.5.3 Tree: Mainline
Regression: Yes
Attachments: dmesg from kernel 3.5.3 before to panic
[3.0.42] 8139cp: set ring address before enabling receiver
dmesg from kernel 3.4.12 compiled from sources
lsmod from kernel 3.4.12 compiled from sources

Description Giovanni Venturi 2012-09-04 23:19:33 UTC
Created attachment 79301 [details]
dmesg from kernel 3.5.3 before to panic

I started ArchLinux with the default 3.5.3 kernel and the 8139cp doesn't work anymore. I got IP assigned, but the network card doesn't work anymore. Till 3.4.x it worked. I've seen a problem in the dmesg I attached.

I rebooted to start with kernel 3.0.42 and the system gone into a wonderful kernel panic.

I see new kernel release, new kernel panic introduced. Lots of regression. What happening to the kernel source code?
Comment 1 Jason Wang 2012-09-05 06:15:25 UTC
Hi Giovanni:

I didn't have a 8139cp in hand. Could you please try to test: 3.5.3 with reverting 0bc777bca480357941418952cf228484f5485daf 
and 3.5.3 with revering b01af4579ec41f48e9b9c774e70bd6474ad210db to see if one of them does something wrong?

Thanks
Comment 2 Francois Romieu 2012-09-05 21:56:29 UTC
Created attachment 79341 [details]
[3.0.42] 8139cp: set ring address before enabling receiver
Comment 3 Francois Romieu 2012-09-05 22:24:02 UTC
(In reply to comment #0)
[...]
> I rebooted to start with kernel 3.0.42 and the system gone into a wonderful
> kernel panic.

It probably needs the attached patch as a start.

I'd welcome a known working 3.4 kernel dmesg. IRQ 10 is used by the NIC an
some USB at the same time. I wonder if it was always true.

Thanks.

-- 
Ueimor
Comment 4 Jan Rüegg 2012-10-03 06:41:42 UTC
I have the same problem with my hp compaq nx7000, kernel 3.5.4, with Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)

Everything still works fine in the old kernel 3.2.12, shall I provide a dmesg with this one?
Comment 5 Francois Romieu 2012-10-03 20:02:54 UTC
(In reply to comment #4)
> I have the same problem with my hp compaq nx7000, kernel 3.5.4, with Ethernet
> controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 20)
> 
> Everything still works fine in the old kernel 3.2.12, shall I provide a dmesg
> with this one?

No.

You should try Jason's suggestion in comment #1.

There are few changesets for the 8139cp driver. It should not be hard to pinpoint
a suspect.

-- 
Ueimor
Comment 6 Giovanni Venturi 2012-10-11 21:39:12 UTC
Created attachment 82981 [details]
dmesg from kernel 3.4.12 compiled from sources
Comment 7 Giovanni Venturi 2012-10-11 21:39:54 UTC
Created attachment 82991 [details]
lsmod from kernel 3.4.12 compiled from sources
Comment 8 Giovanni Venturi 2012-10-20 09:08:05 UTC
I gave you the information you requested. Have you fixed the bug into kernel 3.6 or not?
Comment 9 Francois Romieu 2012-10-20 19:58:51 UTC
(In reply to comment #8)
> I gave you the information you requested.

No obvious change wrt to the shared IRQ.

> Have you fixed the bug into kernel 3.6 or not?

Not much will happen until someone revert the commits as Jason suggested.

-- 
Ueimor
Comment 10 Gilboa Davara 2012-11-15 13:14:14 UTC
Jason,

I assume that you want us to test 8139cp w/ each of the reverted patches separately? (I'm on 3.6.6 FWIW).

- Gilboa
Comment 11 Gilboa Davara 2012-11-15 15:17:32 UTC
Started by reverting *10db (set ring address before enabling receiver) and at least as far as I can test, I've got a fully working network device.

Should I also revert *5daf?

- Gilboa
Comment 12 Jason Wang 2012-11-22 08:40:12 UTC
(In reply to comment #11)
> Started by reverting *10db (set ring address before enabling receiver) and at
> least as far as I can test, I've got a fully working network device.
> 
> Should I also revert *5daf?
> 
> - Gilboa

Sorry for the late response, back from vacation and catch up the email. I've replied the upstream discussion, maybe we could just move the tx ring setup after enabling the C+ mode, but I think reverting should be ok.

btw. Better to let some realtek engineers to explain how the card work, since its datasheet is not clear.

Thanks
Comment 13 Jason Wang 2012-11-22 08:43:34 UTC
(In reply to comment #12)
> (In reply to comment #11)
> > Started by reverting *10db (set ring address before enabling receiver) and
> at
> > least as far as I can test, I've got a fully working network device.
> > 
> > Should I also revert *5daf?
> > 
> > - Gilboa
> 
> Sorry for the late response, back from vacation and catch up the email. I've
> replied the upstream discussion, maybe we could just move the tx ring setup
> after enabling the C+ mode, but I think reverting should be ok.
> 
> btw. Better to let some realtek engineers to explain how the card work, since
> its datasheet is not clear.
> 
> Thanks

Speak too fast, I didn't see *5daf and *10db, what I see is 

b01af45 8139cp: set ring address before enabling receiver

So revert b01af45 is enough to let the card work but may still have some subtle race.

Thanks
Comment 14 Gilboa Davara 2012-11-22 09:26:58 UTC
b01af45 (prefix) is ends with 10db (suffix) :)

Even w/ b01af45 reverted, I'm getting a weird OOPs within the NFS code (related or not, I'm not certain).
I'm trying to see if I can reproduce it w/ the 8139cp disabled to see its reproduces.

- Gilboa
Comment 15 Florian Mickler 2012-11-26 18:44:17 UTC
A patch referencing this bug report has been merged in Linux v3.7-rc7:

commit b26623dab7eeb1e9f5898c7a49458789dd492f20
Author: françois romieu <romieu@fr.zoreil.com>
Date:   Wed Nov 21 10:07:29 2012 +0000

    8139cp: revert "set ring address before enabling receiver"
Comment 16 Florian Mickler 2012-12-22 09:29:08 UTC
A patch referencing a commit referencing this bug report has been merged in Linux v3.8-rc1:

commit 071e3ef4a94a021b16a2912f3885c86f4ff36b49
Author: David S. Miller <davem@davemloft.net>
Date:   Sun Nov 25 15:52:09 2012 -0500

    Revert "8139cp: revert "set ring address before enabling receiver""
Comment 17 Giovanni Venturi 2013-01-18 09:23:47 UTC
No more kernel panic on 3.6.11 version.