Bug 203025 - Ethernet driver r8169 exits with error -5 (kernel 4.20 and later)
Summary: Ethernet driver r8169 exits with error -5 (kernel 4.20 and later)
Status: RESOLVED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: ARM Linux
: P1 high
Assignee: Stephen Hemminger
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-23 20:28 UTC by Artur
Modified: 2019-03-26 19:06 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.20.0-1-ARCH - 5.0.3-1-ARCH
Subsystem:
Regression: No
Bisected commit-id:


Attachments
4.19.9 kernel - full dmesg (32.51 KB, text/plain)
2019-03-24 12:54 UTC, Artur
Details

Description Artur 2019-03-23 20:28:41 UTC
Latest working kernel version: 4.19.9-1-ARCH #1 PREEMPT Fri Dec 14 03:34:38 UTC 2018
Earliest failing kernel version: 4.20.0-1-ARCH #1 PREEMPT Fri Dec 28 08:19:47 UTC 2018
Distribution: ARCH Linux
Hardware Environment: NAS Zyxel NSA-310:
Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 03)
Software Environment: Linux x64
Problem Description:
The driver always fails to start with the -5 error - below part of dmesg:
[ 1.839771] r8169 0000:01:00.0: enabling device (0140 -> 0143) 
[ 1.846456] r8169 0000:01:00.0 (unnamed net_device) (uninitialized): rtl_ph 0 (loop: 20, delay: 25). 
[ 1.856836] r8160000:01:00.0 failed with error -5"

Realtek 8168 driver also don't work 

Additional information:
https://forum.doozan.com/read.php?2,77609,page=1

Steps to reproduce:
The same networking error on every NSA310 machine since the 4.20 kernel.
Comment 1 Heiner Kallweit 2019-03-24 12:34:38 UTC
If r8168 fails too then it doesn't seem to be an issue with r8169. Maybe something in the PCI sub-system. Could you please attach a full 4.19 dmesg log?
Platform seems to be Marvell ARM, right? Maybe the issue is platform-dependent. Most helpful would be if you could bisect the issue.
Comment 2 Artur 2019-03-24 12:54:54 UTC
Created attachment 281985 [details]
4.19.9 kernel - full dmesg

Thank you for your answer.
I attached the full dmesg.
If something else needed I will gladly provide later.
Comment 3 Artur 2019-03-24 13:06:29 UTC
And I'm not sure if the r8168 works on this machine - just tried both drivers after this failure. 
Now I tried to blacklist the r8169 (the /etc/modprobe.d/r8169_blacklist.conf file), and reboot, but still the r8169 is loading (not sure why)
Comment 4 Heiner Kallweit 2019-03-24 13:17:05 UTC
It seems that PCI is broken, the system can't acess the network chip.
You could build a recent linux-next kernel or 5.1-rc1 and check whether the issue still exists.
Or check linux-pci mailing list archive for issues regarding the mvebu platform.
Last but not least you report the issue to the linux-pci kernel mailing list directly.
Comment 5 Artur 2019-03-24 13:52:14 UTC
I really appreciate your answer, however it's too high level for me.
I guess this is the linux-pci mailing list? 
https://www.spinics.net/lists/linux-pci/maillist.html
Forgive me my lack of skills and knowledge, but I don't know what to write there (can only paste a log), don't even see a place to write a new thread.
I put "mvebu" for the search, but the results is like Chinese for me.
Regarding the new kernel revisions I am always checking new releases - without any change since the 4.20 kernel, I almost lost my hope, it's 5.0.3 now though and still same error.
Comment 6 Heiner Kallweit 2019-03-24 14:03:39 UTC
Yes, that's mailing list I mean. The following sounds related:
https://www.spinics.net/lists/linux-pci/msg80510.html
These patches are included in 5.0.4 and 5.1-rc1.
Comment 7 Artur 2019-03-24 16:15:18 UTC
Great! It may be something, thank you for giving me hope :)
Comment 8 Artur 2019-03-25 21:45:27 UTC
WOW! Blessed day today! My network on r8189 driver is working! Or rather the whole PCI bus as mr Heiner Kallweit rightly pointed out.
uname -rv
5.0.4-1-ARCH #1 PREEMPT Sun Mar 24 23:39:21 UTC 2019

Thank you
Comment 9 Matthew Whitehead 2019-03-26 18:48:50 UTC
I am running the latest 5.1.0-rc2 and also 5.0.4, and the symptoms persist, but only on certain PCI hardware. On other hardware it works fine.

A failing system:

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 81)
02:01.0 CardBus bridge [0607]: Texas Instruments PCI1520 PC card Cardbus Controller [104c:ac55] (rev 01)
02:01.1 CardBus bridge [0607]: Texas Instruments PCI1520 PC card Cardbus Controller [104c:ac55] (rev 01)
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller [10ec:8169] (rev 10)

A working system:

00:00.0 Host bridge [0600]: Cyrix Corporation PCI Master [1078:0001]
00:11.0 CardBus bridge [0607]: Texas Instruments PCI1221 [104c:ac19]
00:11.1 CardBus bridge [0607]: Texas Instruments PCI1221 [104c:ac19]
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller [10ec:8169] (rev 10)
Comment 10 Matthew Whitehead 2019-03-26 19:02:51 UTC
More PCI data from one working and one failing system:

A failing system (same PCI bridge as above):

00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev 81)
02:00.0 CardBus bridge [0607]: Texas Instruments PCI4520 PC card Cardbus Controller [104c:ac46] (rev 01)
02:00.1 CardBus bridge [0607]: Texas Instruments PCI4520 PC card Cardbus Controller [104c:ac46] (rev 01)
07:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller [10ec:8169] (rev 10)

A working system:

00:01.0 PCI bridge [0604]: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge [8086:7191] (rev 02)
00:04.0 CardBus bridge [0607]: Texas Instruments PCI1220 [104c:ac17] (rev 02)
00:04.1 CardBus bridge [0607]: Texas Instruments PCI1220 [104c:ac17] (rev 02)
06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8169 PCI Gigabit Ethernet Controller [10ec:8169] (rev 10)
Comment 11 Heiner Kallweit 2019-03-26 19:06:47 UTC
(In reply to Matthew Whitehead from comment #10)
> More PCI data from one working and one failing system:
> 
> A failing system (same PCI bridge as above):
> 
> 00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge
> [8086:2448] (rev 81)
> 02:00.0 CardBus bridge [0607]: Texas Instruments PCI4520 PC card Cardbus
> Controller [104c:ac46] (rev 01)
> 02:00.1 CardBus bridge [0607]: Texas Instruments PCI4520 PC card Cardbus
> Controller [104c:ac46] (rev 01)
> 07:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8169
> PCI Gigabit Ethernet Controller [10ec:8169] (rev 10)
> 
> A working system:
> 
> 00:01.0 PCI bridge [0604]: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP
> bridge [8086:7191] (rev 02)
> 00:04.0 CardBus bridge [0607]: Texas Instruments PCI1220 [104c:ac17] (rev 02)
> 00:04.1 CardBus bridge [0607]: Texas Instruments PCI1220 [104c:ac17] (rev 02)
> 06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8169
> PCI Gigabit Ethernet Controller [10ec:8169] (rev 10)

This (closed) (false) network bug report is the wrong place for these findings. Please address this to the linux-pci mailing list.

Note You need to log in before you can comment on or make changes to this bug.