Bug 198133

Summary: r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25) followed by connectivity problems
Product: Drivers Reporter: Marta Löfstedt (marta.lofstedt)
Component: NetworkAssignee: drivers_network (drivers_network)
Status: REOPENED ---    
Severity: blocking CC: lakshminarayana.vudum, martin.peres, soprwa
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.15.0-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg exposing the issue
part of "lspci -vv" output

Description Marta Löfstedt 2017-12-11 06:42:46 UTC
Created attachment 261103 [details]
dmesg exposing the issue

Repeatedly since 4.15.0-rc1, the Geminilake system used for CI testing for the i915 driver has lost contact with the testing server causing the DUT to be rebooted. This issue appear to start after a hibernation test and is followed by dmesg spamming from Real Tech R8169:

<3>[  246.687298] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).

There is links to more data on the freedesktop bug:
https://bugs.freedesktop.org/show_bug.cgi?id=103359
Comment 1 Marta Löfstedt 2018-01-23 07:38:21 UTC
This issue is still happening frequently, drm-tip is now based of 4.15.0-rc9

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3671/shard-glkb6/igt@kms_cursor_crc@cursor-256x256-onscreen.html
Comment 2 Marta Löfstedt 2018-02-14 07:56:36 UTC
Issue is still reproducible on 4.16-rc1

It frequently starts with:
<6>[   50.872216] r8169 0000:01:00.0: Refused to change power state, currently in D3
...
<3>[   51.133172] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).

then connection problem. This is causing serious problems in our i915 continous integration system, since machines keep disconnecting and give us wrong results.

More data is kept in this freedesktop bug:
https://bugs.freedesktop.org/show_bug.cgi?id=104787
Comment 3 Przemek 2018-02-15 12:12:36 UTC
Created attachment 274181 [details]
part of "lspci -vv" output

I can confirm the issue, kernel 4.15.2-gentoo, moreover this occurs on AMD system A6 APU - Lenovo netbook.

Recently I've moved from proprietary driver to the kernel's one.

<<"dmesg | grep r8169
[    2.992557] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    3.001145] r8169 0000:02:00.0 eth0: RTL8168g/8111g at 0x00000000c8440df5, 50:7b:9d:60:46:39, XID 10900800 IRQ 32
[    3.001148] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[    4.680386] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   23.362288] r8169 0000:02:00.0 enp2s0: link down
[ 9135.131327] r8169 0000:02:00.0 enp2s0: link down
[ 9146.546360] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[ 9146.546624] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[ 9146.546891] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[ 9150.753344] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[ 9150.753607] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[ 9150.753869] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
[ 9154.814429] r8169 0000:02:00.0 enp2s0: link down">>
Comment 4 Przemek 2018-02-17 09:49:17 UTC
There has been a patch committed to linux git tree:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=086ca23d03c0d2f4088f472386778d293e15c5f6

and

https://lkml.org/lkml/2018/2/2/156

After manually applying to 4.15.2 problem with "rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25)" has gone.
Comment 5 Marta Löfstedt 2018-02-19 06:47:29 UTC
(In reply to Przemek from comment #4)
> There has been a patch committed to linux git tree:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/
> ?id=086ca23d03c0d2f4088f472386778d293e15c5f6
> 
> and
> 
> https://lkml.org/lkml/2018/2/2/156
> 
> After manually applying to 4.15.2 problem with "rtl_ocp_gphy_cond == 1
> (loop: 10, delay: 25)" has gone.

Thanks, we will test this on i915 CI system.

/Marta
Comment 6 Marta Löfstedt 2018-02-23 07:47:50 UTC
We can no longer reproduce this issue on our farms. Thanks for fixing!
Comment 7 Marta Löfstedt 2018-02-26 07:48:32 UTC
This was just unfortunately reproduced again:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4297/shard-glkb6/igt@gem_pwrite_pread@uncached-pwrite-blt-gtt_mmap-performance.html

kernel is drm-tip based on: 4.16.0-rc2

from dmesg:
<6>[  239.274005] r8169 0000:01:00.0: Refused to change power state, currently in D3
...
<3>[  239.368429] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 8 Marta Löfstedt 2018-02-26 07:53:02 UTC
Also here:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4292/shard-glkb6/igt@kms_cursor_legacy@pipe-a-single-bo.html

from dmesg:
<6>[  163.193959] r8169 0000:01:00.0: Refused to change power state, currently in D3
...
<3>[  163.293452] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 9 Przemek 2018-03-05 12:42:54 UTC
Unfortunately I can confirm that the issue is back on my machine too, but only when running on battery and laptop-mode-tools enabled with ethernet throttling mode.
Comment 10 Marta Löfstedt 2018-03-05 13:00:04 UTC
(In reply to Przemek from comment #9)
> Unfortunately I can confirm that the issue is back on my machine too, but
> only when running on battery and laptop-mode-tools enabled with ethernet
> throttling mode.

The reproduction rate has gone down significantly in the i915 CI lab. However, I believe this is more due to us removing the majority of hibernation/s4 tests, than this issue being fixed.
Comment 11 Martin Peres 2018-06-19 13:58:35 UTC
We are still seeing this issue every day... What can we do to help fixing this issue?
Comment 12 Lakshminarayana Vudum 2020-07-16 10:58:19 UTC
Haven't seen these messages in dmesg. I propose to close this issue, if seen again, we can re-open the issue. 
r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 13 cibuglog 2020-07-16 10:58:28 UTC
The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.