Created attachment 261103 [details] dmesg exposing the issue Repeatedly since 4.15.0-rc1, the Geminilake system used for CI testing for the i915 driver has lost contact with the testing server causing the DUT to be rebooted. This issue appear to start after a hibernation test and is followed by dmesg spamming from Real Tech R8169: <3>[ 246.687298] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). There is links to more data on the freedesktop bug: https://bugs.freedesktop.org/show_bug.cgi?id=103359
This issue is still happening frequently, drm-tip is now based of 4.15.0-rc9 https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3671/shard-glkb6/igt@kms_cursor_crc@cursor-256x256-onscreen.html
Issue is still reproducible on 4.16-rc1 It frequently starts with: <6>[ 50.872216] r8169 0000:01:00.0: Refused to change power state, currently in D3 ... <3>[ 51.133172] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). then connection problem. This is causing serious problems in our i915 continous integration system, since machines keep disconnecting and give us wrong results. More data is kept in this freedesktop bug: https://bugs.freedesktop.org/show_bug.cgi?id=104787
Created attachment 274181 [details] part of "lspci -vv" output I can confirm the issue, kernel 4.15.2-gentoo, moreover this occurs on AMD system A6 APU - Lenovo netbook. Recently I've moved from proprietary driver to the kernel's one. <<"dmesg | grep r8169 [ 2.992557] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded [ 3.001145] r8169 0000:02:00.0 eth0: RTL8168g/8111g at 0x00000000c8440df5, 50:7b:9d:60:46:39, XID 10900800 IRQ 32 [ 3.001148] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko] [ 4.680386] r8169 0000:02:00.0 enp2s0: renamed from eth0 [ 23.362288] r8169 0000:02:00.0 enp2s0: link down [ 9135.131327] r8169 0000:02:00.0 enp2s0: link down [ 9146.546360] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). [ 9146.546624] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). [ 9146.546891] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). [ 9150.753344] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). [ 9150.753607] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). [ 9150.753869] r8169 0000:02:00.0 enp2s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25). [ 9154.814429] r8169 0000:02:00.0 enp2s0: link down">>
There has been a patch committed to linux git tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=086ca23d03c0d2f4088f472386778d293e15c5f6 and https://lkml.org/lkml/2018/2/2/156 After manually applying to 4.15.2 problem with "rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25)" has gone.
(In reply to Przemek from comment #4) > There has been a patch committed to linux git tree: > > https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/ > ?id=086ca23d03c0d2f4088f472386778d293e15c5f6 > > and > > https://lkml.org/lkml/2018/2/2/156 > > After manually applying to 4.15.2 problem with "rtl_ocp_gphy_cond == 1 > (loop: 10, delay: 25)" has gone. Thanks, we will test this on i915 CI system. /Marta
We can no longer reproduce this issue on our farms. Thanks for fixing!
This was just unfortunately reproduced again: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4297/shard-glkb6/igt@gem_pwrite_pread@uncached-pwrite-blt-gtt_mmap-performance.html kernel is drm-tip based on: 4.16.0-rc2 from dmesg: <6>[ 239.274005] r8169 0000:01:00.0: Refused to change power state, currently in D3 ... <3>[ 239.368429] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Also here: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4292/shard-glkb6/igt@kms_cursor_legacy@pipe-a-single-bo.html from dmesg: <6>[ 163.193959] r8169 0000:01:00.0: Refused to change power state, currently in D3 ... <3>[ 163.293452] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Unfortunately I can confirm that the issue is back on my machine too, but only when running on battery and laptop-mode-tools enabled with ethernet throttling mode.
(In reply to Przemek from comment #9) > Unfortunately I can confirm that the issue is back on my machine too, but > only when running on battery and laptop-mode-tools enabled with ethernet > throttling mode. The reproduction rate has gone down significantly in the i915 CI lab. However, I believe this is more due to us removing the majority of hibernation/s4 tests, than this issue being fixed.
We are still seeing this issue every day... What can we do to help fixing this issue?
Haven't seen these messages in dmesg. I propose to close this issue, if seen again, we can re-open the issue. r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore.