Bug 198019
Summary: | iTCO_wdt breaks resume on T460s | ||
---|---|---|---|
Product: | Drivers | Reporter: | Yaroslav Isakov (yaroslav.isakov) |
Component: | Watchdog | Assignee: | drivers_watchdog (drivers_watchdog) |
Status: | NEW --- | ||
Severity: | normal | CC: | linux |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.14.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
[iTCO_wdt] restore timeout on resume
My patch 2nd try to fix the issue |
Description
Yaroslav Isakov
2017-11-28 22:19:28 UTC
It looks like I've found a cause. For some reason, suspend/resume code is working only for S0, not S3. I've patched function "need_suspend" to always return true, and after next resume, my machine reboots instead of freeze, almost immediately. So, I've also checked what is timeout value after suspend/resume, and it's only 2 seconds. I've patched iTCO_wdt_suspend_noirq to also set timeout, and now everything works perfectly. And now I wonder, why timeout of 2 seconds after suspend/resume cycle leads to froze kernel... Maybe this problem was cause by watchdog daemons are trying to ping watchdog which is not initialized? I have no idea. P.S. My watchdog is version 4 Created attachment 273769 [details]
[iTCO_wdt] restore timeout on resume
I experience the same problem.
But in my case it is enough to restore the watchdog timeout during resume.
Your solution is working, because you're setting timeout unconditionally, which is not quite safe. I'm setting it in under "if (p->suspended)" block, so timeout will be set only if watchdog has been suspended. But in my case, p->suspended could not be set, as need_suspend returns true only in case of suspend-to-idle. Created attachment 273773 [details]
My patch
Here is my patch
BTW, from comment in original code, it looks like firmware doesn't properly stop watchdog on suspend, so maybe some check for HW is needed... or some kind of whitelist Hi Yaroslav, could you elaborate, why setting the watchdog timeout uncoditionally in the resume hook may not be safe? As far as I can see, iTCO_wdt_set_timeout() is also unconditionally called from iTCO_wdt_probe(). In my opinion either the suspension from the firmware itself is broken or some other component (of the firmware) messes with the timeout during resume. I was trying to find some hardware errata on developer.intel.com about this issues but could not find any so far. Thomas, My opinion here is that if watchdog hasn't been enabled/started, there is no point in setting the timeout. So, setting timeout should be under p->suspended check, which will be true only if watchdog was active before suspend, and iTCO_wdt_stop doesn't return error. Created attachment 273875 [details]
2nd try to fix the issue
Hi Yaroslav,
As I see it, my code is too eager to set the timeout, while yours is too eager to suspend the device, even if its not needed.
What do you think about the attached patch? It should be a middle ground.
(I did not yet test the patch, maybe iTCO_wdt_stop() will make `watchdog_active() == false`)
|