Bug 15474

Summary: r8169 fails to bring up ethernet
Product: Drivers Reporter: Matteo Croce (rootkit85)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: CLOSED CODE_FIX    
Severity: normal CC: maciej.rutecki, rjw, romieu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14885    
Attachments: System log
not working lspci -vv
working lspci -vv
Remove MWI

Description Matteo Croce 2010-03-07 23:57:27 UTC
Created attachment 25399 [details]
System log

With the release 2.6.33 the kernel can't bring up my ethernet.
I have found the commit which broke it:

root@raver:/usr/src/linux-2.6# git bisect good
ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Oct 26 13:20:44 2009 -0700

    PCI: determine CLS more intelligently
    
    Till now, CLS has been determined either by arch code or as
    L1_CACHE_BYTES.  Only x86 and ia64 set CLS explicitly and x86 doesn't
    always get it right.  On most configurations, the chance is that
    firmware configures the correct value during boot.
    
    This patch makes pci_init() determine CLS by looking at what firmware
    has configured.  It scans all devices and if all non-zero values
    agree, the value is used.  If none is configured or there is a
    disagreement, pci_dfl_cache_line_size is used.  arch can set the dfl
    value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
    override the actual one.
    
    ia64, x86 and sparc64 updated to set the default cls instead of the
    actual one.                                                                                                                                                                               
                                                                                                                                                                                              
    While at it, declare pci_cache_line_size and pci_dfl_cache_line_size                                                                                                                      
    in pci.h and drop private declarations from arch code.                                                                                                                                    
                                                                                                                                                                                              
    Signed-off-by: Tejun Heo <tj@kernel.org>                                                                                                                                                  
    Acked-by: David Miller <davem@davemloft.net>                                                                                                                                              
    Acked-by: Greg KH <gregkh@suse.de>                                                                                                                                                        
    Cc: Ingo Molnar <mingo@elte.hu>                                                                                                                                                           
    Cc: Thomas Gleixner <tglx@linutronix.de>                                                                                                                                                  
    Cc: Tony Luck <tony.luck@intel.com>                                                                                                                                                       
    Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

:040000 040000 8b20ad60ed3e273b74bfb588dbea6948547e8de1 92498585770ca360c2716c0d3c55d5f4a37356a1 M      arch
:040000 040000 56d1abd61286dd303bb37c2002b699e526988f85 3f20bba2d1e107a80a738e3561fc0fa92e0c4024 M      drivers
:040000 040000 26d85393248c542ca2cea0e3ac4ceabd0ea659aa 326cbb98321cd4e490d888f48bc584b3662c8f06 M      include
Comment 1 Andrew Morton 2010-03-18 19:09:38 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sun, 7 Mar 2010 23:57:32 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15474

Thanks for doing the bisection - it really helps.

Guys, this is a 2.6.32 -> 2.6.33 regression.

>            Summary: r8169 fails to bring up ethernet
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.33
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: rootkit85@yahoo.it
>         Regression: Yes
> 
> 
> Created an attachment (id=25399)
>  --> (http://bugzilla.kernel.org/attachment.cgi?id=25399)
> System log
> 
> With the release 2.6.33 the kernel can't bring up my ethernet.
> I have found the commit which broke it:
> 
> root@raver:/usr/src/linux-2.6# git bisect good
> ac1aa47b131416a6ff37eb1005a0a1d2541aad6c is the first bad commit
> commit ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
> Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> Date:   Mon Oct 26 13:20:44 2009 -0700
> 
>     PCI: determine CLS more intelligently
> 
>     Till now, CLS has been determined either by arch code or as
>     L1_CACHE_BYTES.  Only x86 and ia64 set CLS explicitly and x86 doesn't
>     always get it right.  On most configurations, the chance is that
>     firmware configures the correct value during boot.
> 
>     This patch makes pci_init() determine CLS by looking at what firmware
>     has configured.  It scans all devices and if all non-zero values
>     agree, the value is used.  If none is configured or there is a
>     disagreement, pci_dfl_cache_line_size is used.  arch can set the dfl
>     value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
>     override the actual one.
> 
>     ia64, x86 and sparc64 updated to set the default cls instead of the
>     actual one.                                                               
> 
>     While at it, declare pci_cache_line_size and pci_dfl_cache_line_size      
>     in pci.h and drop private declarations from arch code.                    
> 
>     Signed-off-by: Tejun Heo <tj@kernel.org>                                  
>     Acked-by: David Miller <davem@davemloft.net>                              
>     Acked-by: Greg KH <gregkh@suse.de>                                        
>     Cc: Ingo Molnar <mingo@elte.hu>                                           
>     Cc: Thomas Gleixner <tglx@linutronix.de>                                  
>     Cc: Tony Luck <tony.luck@intel.com>                                       
>     Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> :040000 040000 8b20ad60ed3e273b74bfb588dbea6948547e8de1
> 92498585770ca360c2716c0d3c55d5f4a37356a1 M      arch
> :040000 040000 56d1abd61286dd303bb37c2002b699e526988f85
> 3f20bba2d1e107a80a738e3561fc0fa92e0c4024 M      drivers
> :040000 040000 26d85393248c542ca2cea0e3ac4ceabd0ea659aa
> 326cbb98321cd4e490d888f48bc584b3662c8f06 M      include
Comment 2 Francois Romieu 2010-03-18 22:57:55 UTC
Matteo, does it get fixed by (post-2.6.33) commit 76b1a87b217927f905f4b01c586452b2a1d33913 ?

If it does not, please send a lspci -vv with and without the culprit as
well as a working dmesg.

-- 
Ueimor
Comment 3 Matteo Croce 2010-03-18 23:27:45 UTC
I will tell you saturday, I haven't the PC with me now.

Thanks anyway :)

On Thu, Mar 18, 2010 at 11:58 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=15474
>
>
> Francois Romieu <romieu@fr.zoreil.com> changed:
>
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |romieu@fr.zoreil.com
>
>
>
>
> --- Comment #2 from Francois Romieu <romieu@fr.zoreil.com>  2010-03-18
> 22:57:55 ---
> Matteo, does it get fixed by (post-2.6.33) commit
> 76b1a87b217927f905f4b01c586452b2a1d33913 ?
>
> If it does not, please send a lspci -vv with and without the culprit as
> well as a working dmesg.
>
> --
> Ueimor
>
> --
> Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>



-- 
Matteo Croce
OpenWrt developer
  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 KAMIKAZE (bleeding edge) ------------------
  * 10 oz Vodka       Shake well with ice and strain
  * 10 oz Triple sec  mixture into 10 shot glasses.
  * 10 oz lime juice  Salute!
 ---------------------------------------------------
Comment 4 Matteo Croce 2010-03-20 12:14:00 UTC
(In reply to comment #2)
> Matteo, does it get fixed by (post-2.6.33) commit
> 76b1a87b217927f905f4b01c586452b2a1d33913 ?
> 
> If it does not, please send a lspci -vv with and without the culprit as
> well as a working dmesg.
> 
> -- 
> Ueimor

patching file arch/x86/pci/common.c
Reversed (or previously applied) patch detected!  Assume -R? [n]

seems like it already was in 2.6.33
Comment 5 Rafael J. Wysocki 2010-03-21 19:32:44 UTC
First-Bad-Commit : ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
Handled-By : Francois Romieu <romieu@fr.zoreil.com>

Matteo, the information requested in comment #2 still hasn't been provided yet.
Comment 6 Matteo Croce 2010-03-24 01:14:45 UTC
Created attachment 25666 [details]
not working lspci -vv
Comment 7 Matteo Croce 2010-03-24 01:15:15 UTC
Created attachment 25667 [details]
working lspci -vv
Comment 8 Matteo Croce 2010-03-24 01:16:51 UTC
(In reply to comment #5)
> First-Bad-Commit : ac1aa47b131416a6ff37eb1005a0a1d2541aad6c
> Handled-By : Francois Romieu <romieu@fr.zoreil.com>
> 
> Matteo, the information requested in comment #2 still hasn't been provided
> yet.

I have added both outputs
Comment 9 Matteo Croce 2010-03-29 00:42:27 UTC
The bug happens also in a KVM virtual machine, it seems that qemu-kvm expose such chip to the guest OS
Comment 10 Francois Romieu 2010-03-29 21:50:53 UTC
Created attachment 25758 [details]
Remove MWI
Comment 11 Francois Romieu 2010-03-29 21:51:52 UTC
Can you append a 'debug=32767' option to the module and try the attached
patch ?

-- 
Ueimor
Comment 12 Matteo Croce 2010-03-29 22:40:46 UTC
On Mon, Mar 29, 2010 at 10:51 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15474
>
>
>
>
>
> --- Comment #11 from Francois Romieu <romieu@fr.zoreil.com>  2010-03-29
> 21:51:52 ---
> Can you append a 'debug=32767' option to the module and try the attached
> patch ?
>
> --
> Ueimor
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>

I will have access to such laptop on thursday, anyone can try with kvm
in the meantime?

-- 
Matteo Croce
OpenWrt developer
  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 KAMIKAZE (bleeding edge) ------------------
  * 10 oz Vodka       Shake well with ice and strain
  * 10 oz Triple sec  mixture into 10 shot glasses.
  * 10 oz lime juice  Salute!
 ---------------------------------------------------
Comment 13 Matteo Croce 2010-04-01 00:00:13 UTC
On Mon, Mar 29, 2010 at 11:51 PM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15474
>
>
>
>
>
> --- Comment #11 from Francois Romieu <romieu@fr.zoreil.com>  2010-03-29
> 21:51:52 ---
> Can you append a 'debug=32767' option to the module and try the attached
> patch ?
>
> --
> Ueimor
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>

It works, thanks!

r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:0b:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
r8169 0000:0b:00.0: PCI INT A disabled
r8169: probe of 0000:0b:00.0 failed with error -22
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:0b:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
r8169 0000:0b:00.0: setting latency timer to 64
  alloc irq_desc for 39 on node -1
  alloc kstat_irqs on node -1
r8169 0000:0b:00.0: irq 39 for MSI/MSI-X
eth0: RTL8168d/8111d at 0xffffc900110c2000, 00:26:b9:17:93:97, XID
081000c0 IRQ 39
r8169: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready

-- 
Matteo Croce
OpenWrt developer
  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 KAMIKAZE (bleeding edge) ------------------
  * 10 oz Vodka       Shake well with ice and strain
  * 10 oz Triple sec  mixture into 10 shot glasses.
  * 10 oz lime juice  Salute!
 ---------------------------------------------------
Comment 14 Francois Romieu 2010-04-02 22:07:49 UTC
Ok, I'll make MWI non-mandatory.

Can you send an hexadecimal dump of the config space of the 8169 when it
previously failed ?

I'd like to check that it failed because of a bad cache line setting.

-- 
Ueimor
Comment 15 Matteo Croce 2010-04-03 23:21:06 UTC
On Sat, Apr 3, 2010 at 12:07 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15474
>
>
>
>
>
> --- Comment #14 from Francois Romieu <romieu@fr.zoreil.com>  2010-04-02
> 22:07:49 ---
> Ok, I'll make MWI non-mandatory.
>
> Can you send an hexadecimal dump of the config space of the 8169 when it
> previously failed ?
>
> I'd like to check that it failed because of a bad cache line setting.
>
> --
> Ueimor
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>

How to get such dump?

-- 
Matteo Croce
OpenWrt developer
  _______                     ________        __
 |       |.-----.-----.-----.|  |  |  |.----.|  |_
 |   -   ||  _  |  -__|     ||  |  |  ||   _||   _|
 |_______||   __|_____|__|__||________||__|  |____|
          |__| W I R E L E S S   F R E E D O M
 KAMIKAZE (bleeding edge) ------------------
  * 10 oz Vodka       Shake well with ice and strain
  * 10 oz Triple sec  mixture into 10 shot glasses.
  * 10 oz lime juice  Salute!
 ---------------------------------------------------
Comment 16 Francois Romieu 2010-04-04 11:50:57 UTC
Matteo :
[...]
> How to get such dump?

'-x' option of lspci.

-- 
Ueimor
Comment 17 Matteo Croce 2010-04-10 02:56:56 UTC
(In reply to comment #16)
> Matteo :
> [...]
> > How to get such dump?
> 
> '-x' option of lspci.
> 
> -- 
> Ueimor

0b:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
00: ec 10 68 81 03 01 10 00 03 00 00 02 10 00 00 00
10: 01 70 00 00 00 00 00 00 0c 40 80 f0 00 00 00 00
20: 0c 00 80 f0 00 00 00 00 00 00 00 00 28 10 bd 02
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00