While running Ci test for i915 driver the e1000e driver failed to suspend: <3>[ 395.316767] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2 <3>[ 395.316772] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2 <3>[ 395.316783] PM: Device 0000:00:19.0 failed to suspend async: error -2 <3>[ 395.316884] PM: Some devices failed to suspend, or early wake event detected There are link to more occurrences and data in the original freedesktop bug: https://bugs.freedesktop.org/show_bug.cgi?id=104550 Also, from dmesg0.log there is: <3>[ 191.130545] e1000e 0000:00:19.0 eth0: Hardware Error However, there is no hint of Hardware errors on the exact same machine on the run after, see dmesg1.log. Here is an example where the issue happen: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3609/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html Here is an example of the next consecutive run: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3610/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
Created attachment 273717 [details] dmesg when the issue occur
Created attachment 273719 [details] dmesg on the run after when the issue does not occur
This was just reproduced with kernel: 4.16.0-rc2 Here are links to more data: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3816/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3816/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3816/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html
I got this after update to 4.15.x. I suspended 23 times successfully and than once it started failing and got 'stuck' in this mode. When I rmmod e1000e I got: [ 2192.366747] e1000e 0000:00:1f.6 enp0s31f6: removed PHC [ 2192.895875] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error [ 2192.898100] e1000e: enp0s31f6 NIC Link is Down But suspend started working again even after loading e1000e again.
(In reply to uzytkownik2@gmail.com from comment #4) > I got this after update to 4.15.x. I suspended 23 times successfully and > than once it started failing and got 'stuck' in this mode. When I rmmod > e1000e I got: > > [ 2192.366747] e1000e 0000:00:1f.6 enp0s31f6: removed PHC > [ 2192.895875] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error > [ 2192.898100] e1000e: enp0s31f6 NIC Link is Down > > But suspend started working again even after loading e1000e again. Sure, but our test checks rtcwake which fail due to e1000e failing suspend. This doesn't happen frequently, but it still do happen. If you want to see more history of this specific machine in our lab: https://intel-gfx-ci.01.org/tree/drm-tip/ then click fi-ivb-3520m and you'll see results on this machine for ~last 80 runs. /Marta
(In reply to Marta Löfstedt from comment #5) > (In reply to uzytkownik2@gmail.com from comment #4) > > I got this after update to 4.15.x. I suspended 23 times successfully and > > than once it started failing and got 'stuck' in this mode. When I rmmod > > e1000e I got: > > > > [ 2192.366747] e1000e 0000:00:1f.6 enp0s31f6: removed PHC > > [ 2192.895875] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error > > [ 2192.898100] e1000e: enp0s31f6 NIC Link is Down > > > > But suspend started working again even after loading e1000e again. > > Sure, but our test checks rtcwake which fail due to e1000e failing suspend. > This doesn't happen frequently, but it still do happen. > I'm not sure what you're trying to correct me on. I wanted to add information: - That it happens 'in the wild' on user systems (namely mine) - not only in testing - It is e1000e internal state problem, not for example HW sticky state which wouldn't be reset by reloading (sure it was unlikely probably).
The issue is reproduced on Cannonlake. kernel is drm-tip based on Linux kernel 4.16.0-rc5 From dmesg: <3>[ 451.299599] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2 <3>[ 451.299606] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2 <3>[ 451.299642] PM: Device 0000:00:1f.6 failed to suspend async: error -2 <3>[ 451.299714] PM: Some devices failed to suspend, or early wake event detected More data: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3918/fi-cnl-drrs/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html
This was reproduced on another CNL machine: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-cnl-y3/igt@kms_cursor_crc@cursor-64x64-suspend.html <3>[ 218.537930] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2 <3>[ 218.537936] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2 <3>[ 218.537951] PM: Device 0000:00:1f.6 failed to suspend async: error -2 <3>[ 218.538017] PM: Some devices failed to suspend, or early wake event detected
Also, here: https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-cnl-y3/igt@kms_vblank@pipe-c-ts-continuation-suspend.html <3>[ 109.019792] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2 <3>[ 109.019798] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2 <3>[ 109.019812] PM: Device 0000:00:1f.6 failed to suspend async: error -2 <3>[ 109.019916] PM: Some devices failed to suspend, or early wake event detected