Bug 198019 - iTCO_wdt breaks resume on T460s
Summary: iTCO_wdt breaks resume on T460s
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Watchdog (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_watchdog@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-28 22:19 UTC by Yaroslav Isakov
Modified: 2018-01-26 16:57 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.14.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
[iTCO_wdt] restore timeout on resume (780 bytes, patch)
2018-01-21 11:56 UTC, Thomas Weißschuh
Details | Diff
My patch (962 bytes, patch)
2018-01-21 13:52 UTC, Yaroslav Isakov
Details | Diff
2nd try to fix the issue (824 bytes, patch)
2018-01-26 16:57 UTC, Thomas Weißschuh
Details | Diff

Description Yaroslav Isakov 2017-11-28 22:19:28 UTC
Hello! I found that enabling iTCO_wdt watchdog on my system leads to complete freeze after resume from suspend on my T460s. Unfortunately, I cannot provide any logs, as system is totally unresponsive after resume, but I cannot see any errors in console after resume - some messages are shown that wifi is started, RC6 enabled, but then it's freezing. There is no such problem, if the driver is loaded, but no watchdog started. I've tried with both systemd internal watchdog daemon, and standalone one.
Comment 1 Yaroslav Isakov 2017-11-28 23:21:08 UTC
It looks like I've found a cause. For some reason, suspend/resume code is working only for S0, not S3. I've patched function "need_suspend" to always return true, and after next resume, my machine reboots instead of freeze, almost immediately. So, I've also checked what is timeout value after suspend/resume, and it's only 2 seconds. I've patched iTCO_wdt_suspend_noirq to also set timeout, and now everything works perfectly.

And now I wonder, why timeout of 2 seconds after suspend/resume cycle leads to froze kernel... Maybe this problem was cause by watchdog daemons are trying to ping watchdog which is not initialized? I have no idea.

P.S. My watchdog is version 4
Comment 2 Thomas Weißschuh 2018-01-21 11:56:49 UTC
Created attachment 273769 [details]
[iTCO_wdt] restore timeout on resume

I experience the same problem.
But in my case it is enough to restore the watchdog timeout during resume.
Comment 3 Yaroslav Isakov 2018-01-21 13:52:01 UTC
Your solution is working, because you're setting timeout unconditionally, which is not quite safe. I'm setting it in under "if (p->suspended)" block, so timeout will be set only if watchdog has been suspended. But in my case, p->suspended could not be set, as need_suspend returns true only in case of suspend-to-idle.
Comment 4 Yaroslav Isakov 2018-01-21 13:52:47 UTC
Created attachment 273773 [details]
My patch

Here is my patch
Comment 5 Yaroslav Isakov 2018-01-21 13:59:00 UTC
BTW, from comment in original code, it looks like firmware doesn't properly stop watchdog on suspend, so maybe some check for HW is needed... or some kind of whitelist
Comment 6 Thomas Weißschuh 2018-01-21 16:53:03 UTC
Hi Yaroslav,

could you elaborate, why setting the watchdog timeout uncoditionally in the resume hook may not be safe?
As far as I can see, iTCO_wdt_set_timeout() is also unconditionally called from iTCO_wdt_probe().

In my opinion either the suspension from the firmware itself is broken or some other component (of the firmware) messes with the timeout during resume.
I was trying to find some hardware errata on developer.intel.com about this issues but could not find any so far.
Comment 7 Yaroslav Isakov 2018-01-23 19:13:11 UTC
Thomas,
My opinion here is that if watchdog hasn't been enabled/started, there is no point in setting the timeout. So, setting timeout should be under p->suspended check, which will be true only if watchdog was active before suspend, and iTCO_wdt_stop doesn't return error.
Comment 8 Thomas Weißschuh 2018-01-26 16:57:56 UTC
Created attachment 273875 [details]
2nd try to fix the issue

Hi Yaroslav,

As I see it, my code is too eager to set the timeout, while yours is too eager to suspend the device, even if its not needed.

What do you think about the attached patch? It should be a middle ground.

(I did not yet test the patch, maybe iTCO_wdt_stop() will make `watchdog_active() == false`)

Note You need to log in before you can comment on or make changes to this bug.