Bug 198519 - e1000e failed to suspend
Summary: e1000e failed to suspend
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-01-19 12:02 UTC by Marta Löfstedt
Modified: 2018-04-03 12:51 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.15.0-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg when the issue occur (3.68 MB, text/plain)
2018-01-19 12:05 UTC, Marta Löfstedt
Details
dmesg on the run after when the issue does not occur (3.72 MB, text/plain)
2018-01-19 12:06 UTC, Marta Löfstedt
Details

Description Marta Löfstedt 2018-01-19 12:02:11 UTC
While running Ci test for i915 driver the e1000e driver failed to suspend:

<3>[  395.316767] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2
<3>[  395.316772] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
<3>[  395.316783] PM: Device 0000:00:19.0 failed to suspend async: error -2
<3>[  395.316884] PM: Some devices failed to suspend, or early wake event detected

There are link to more occurrences and data in the original freedesktop bug: https://bugs.freedesktop.org/show_bug.cgi?id=104550

Also, from dmesg0.log there is:
<3>[  191.130545] e1000e 0000:00:19.0 eth0: Hardware Error

However, there is no hint of Hardware errors on the exact same machine on the run after, see dmesg1.log.

Here is an example where the issue happen:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3609/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

Here is an example of the next consecutive run:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3610/fi-ivb-3520m/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
Comment 1 Marta Löfstedt 2018-01-19 12:05:27 UTC
Created attachment 273717 [details]
dmesg when the issue occur
Comment 2 Marta Löfstedt 2018-01-19 12:06:11 UTC
Created attachment 273719 [details]
dmesg on the run after when the issue does not occur
Comment 4 uzytkownik2@gmail.com 2018-02-23 04:54:04 UTC
I got this after update to 4.15.x. I suspended 23 times successfully and than once it started failing and got 'stuck' in this mode. When I rmmod e1000e I got:

[ 2192.366747] e1000e 0000:00:1f.6 enp0s31f6: removed PHC
[ 2192.895875] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error
[ 2192.898100] e1000e: enp0s31f6 NIC Link is Down

But suspend started working again even after loading e1000e again.
Comment 5 Marta Löfstedt 2018-02-23 06:46:51 UTC
(In reply to uzytkownik2@gmail.com from comment #4)
> I got this after update to 4.15.x. I suspended 23 times successfully and
> than once it started failing and got 'stuck' in this mode. When I rmmod
> e1000e I got:
> 
> [ 2192.366747] e1000e 0000:00:1f.6 enp0s31f6: removed PHC
> [ 2192.895875] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error
> [ 2192.898100] e1000e: enp0s31f6 NIC Link is Down
> 
> But suspend started working again even after loading e1000e again.

Sure, but our test checks rtcwake which fail due to e1000e failing suspend. This doesn't happen frequently, but it still do happen.

If you want to see more history of this specific machine in our lab:

https://intel-gfx-ci.01.org/tree/drm-tip/

then click fi-ivb-3520m and you'll see results on this machine for ~last 80 runs.

/Marta
Comment 6 uzytkownik2@gmail.com 2018-02-23 16:29:25 UTC
(In reply to Marta Löfstedt from comment #5)
> (In reply to uzytkownik2@gmail.com from comment #4)
> > I got this after update to 4.15.x. I suspended 23 times successfully and
> > than once it started failing and got 'stuck' in this mode. When I rmmod
> > e1000e I got:
> > 
> > [ 2192.366747] e1000e 0000:00:1f.6 enp0s31f6: removed PHC
> > [ 2192.895875] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error
> > [ 2192.898100] e1000e: enp0s31f6 NIC Link is Down
> > 
> > But suspend started working again even after loading e1000e again.
> 
> Sure, but our test checks rtcwake which fail due to e1000e failing suspend.
> This doesn't happen frequently, but it still do happen.
> 

I'm not sure what you're trying to correct me on. I wanted to add information:

 - That it happens 'in the wild' on user systems (namely mine) - not only in testing
 - It is e1000e internal state problem, not for example HW sticky state which wouldn't be reset by reloading (sure it was unlikely probably).
Comment 7 Marta Löfstedt 2018-03-14 06:31:42 UTC
The issue is reproduced on Cannonlake.

kernel is drm-tip based on Linux kernel 4.16.0-rc5

From dmesg:

<3>[  451.299599] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2
<3>[  451.299606] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
<3>[  451.299642] PM: Device 0000:00:1f.6 failed to suspend async: error -2
<3>[  451.299714] PM: Some devices failed to suspend, or early wake event detected

More data:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3918/fi-cnl-drrs/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html
Comment 8 Marta Löfstedt 2018-04-03 12:49:53 UTC
This was reproduced on another CNL machine:
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-cnl-y3/igt@kms_cursor_crc@cursor-64x64-suspend.html

<3>[  218.537930] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2
<3>[  218.537936] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
<3>[  218.537951] PM: Device 0000:00:1f.6 failed to suspend async: error -2
<3>[  218.538017] PM: Some devices failed to suspend, or early wake event detected
Comment 9 Marta Löfstedt 2018-04-03 12:51:41 UTC
Also, here:

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_9/fi-cnl-y3/igt@kms_vblank@pipe-c-ts-continuation-suspend.html

<3>[  109.019792] pci_pm_suspend(): e1000e_pm_suspend+0x0/0x40 [e1000e] returns -2
<3>[  109.019798] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -2
<3>[  109.019812] PM: Device 0000:00:1f.6 failed to suspend async: error -2
<3>[  109.019916] PM: Some devices failed to suspend, or early wake event detected

Note You need to log in before you can comment on or make changes to this bug.