Bug 116941
Summary: | Commit ab76f7b4ab2397f x86/mm: Set NX on gap between __ex_table and rodata break resume from hibernation - Acer V5-573P-6896 Laptop (Intel i5-4200U) | ||
---|---|---|---|
Product: | Power Management | Reporter: | Logan Gunthorpe (logang+bug) |
Component: | Hibernation/Suspend | Assignee: | Stephen Smalley (sds) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | lenb, r.oburka, rui.zhang, yu.c.chen |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 4.6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg output
acpi grep Hacked up hibernate_64.c Patch of my current complete debug hacks to the kernel Bisect log between v3.18 and v3.19 Patch to fix the issue in v3.19 Bisect log between v4.1 and v4.3 (with patch) Kernel config for 4.5.2 Kernel Page Tables 4.6 - bad Kernel Page Tables 4.6 - good kallsyms for a working 4.6 kernel kallsyms for a stock 4.6 kernel Kernel config for 4.6 kallsyms for a working 4.6 kernel kallsyms for a stock 4.6 kernel Kernel Page Tables 4.6 - good Kernel Page Tables 4.6 - bad |
Description
Logan Gunthorpe
2016-04-22 19:04:04 UTC
Created attachment 213721 [details]
dmesg output
Attached a dmesg output just after a failed resume with pm_trace and instrumented kernel.
I found this bug today, looks very similar but no new information: https://bugzilla.kernel.org/show_bug.cgi?id=112761 (In reply to Logan Gunthorpe from comment #2) > I found this bug today, looks very similar but no new information: > > https://bugzilla.kernel.org/show_bug.cgi?id=112761 I'm not sure if this is the same problem, because 3.18 kernel work well for bug #112761. So please double check if 3.18 is broken. I just re-tested v3.18.0 and have been able to reproduce the issue. I saw some warning for resume of pm_trace [ 1.355199] calling late_resume_init+0x0/0x1a0 @ 1 [ 1.356024] Magic number: 0:212:178 [ 1.356841] hash matches arch/x86/power/hibernate_64.c:101 [ 1.357724] acpi device:0e: hash matches [ 1.358559] platform: hash matches would you please provide grep . /sys/bus/acpi/devices/device\:0e/* could you provide your arch/x86/power/hibernate_64.c (In reply to Chen Yu from comment #6) > could you provide your arch/x86/power/hibernate_64.c I mean, the version for #Comment 1 Created attachment 214391 [details]
acpi grep
As requested, the output of the grep command.
Created attachment 214401 [details]
Hacked up hibernate_64.c
Created attachment 214411 [details]
Patch of my current complete debug hacks to the kernel
So you can see other potential places for the pm_trace to have occurred. I've attached my complete diff from 4.5.1
I'll also note that 3.18.0 seems to be more reliable. It's harder to reproduce the problem with that kernel version. But I did see it a couple times, almost in a row. Ok, so I've found that a 3.18.0 kernel with a localmodconfig largely hibernates reliably. 3.19.0 with the same config largely does not. I managed to bisect this to commit f5b2831d654167d7. (I'll attach a bisect log). Building a v3.19 kernel with the patch reverted also produced a kernel that largely works. I noticed a subtle logic error in the offending change such that the attached patch also fixes the problem. However, the _endlessly_ frustrating thing is it looks like a similar change has already been made by 4.5 and thus I can't apply a similar fix to a modern kernel. I'll keep digging as I have time. Created attachment 215361 [details]
Bisect log between v3.18 and v3.19
Created attachment 215371 [details]
Patch to fix the issue in v3.19
Looks like the patch I attached is very similar to commit 55696b1f66 Way too many kernel builds later: 3.19 through 4.2 with the attached patch work. 4.3 with the patch did not work. I bisected again (I'll attach a second log) to find the bug in commit: [ab76f7b4ab2397ffdd2f1eb07c55697d19991d10] x86/mm: Set NX on gap between __ex_table and rodata Reverting that commit in 4.3 (with the attached patch) and 4.5 result in working kernels. I don't understand why that commit causes my issue. Based on my understanding, it looks like it should be pretty benign. So, in summary, looks like hibernation on my hardware has been broken by two issues: 1. Between 3.19 and 4.3: f5b2831d6 breaks hibernation and is fixed by 55696b1f66 in 4.4. 2. From 4.3 on: ab76f7b4a breaks hibernation I don't know what a reasonable fix would be for the second issue besides reverting the commit. Created attachment 215471 [details]
Bisect log between v4.1 and v4.3 (with patch)
(In reply to Logan Gunthorpe from comment #16) > Way too many kernel builds later: > > 3.19 through 4.2 with the attached patch work. 4.3 with the patch did not > work. I bisected again (I'll attach a second log) to find the bug in commit: > > [ab76f7b4ab2397ffdd2f1eb07c55697d19991d10] x86/mm: Set NX on gap between > __ex_table and rodata This is a good point. > > Reverting that commit in 4.3 (with the attached patch) and 4.5 result in > working kernels. > 2. From 4.3 on: ab76f7b4a breaks hibernation > > I don't know what a reasonable fix would be for the second issue besides > reverting the commit. Me neither, I was thinking if hibernation resume process is using the function between end of __ex_table and the start of rodata, since this commit has disabled its 'x' attribute.. if there any suspect func in 'sudo cat /proc/kallsyms' , from my laptop, there is one: ffffffff81829590 R __start___ex_table ffffffff8182b658 R __stop___ex_table ffffffff81a00000 r __func__.50233 //humm? ffffffff81a00000 R __start_rodata (In reply to Logan Gunthorpe from comment #17) > Created attachment 215471 [details] > Bisect log between v4.1 and v4.3 (with patch) Could you also attach your kernel config? Created attachment 215541 [details]
Kernel config for 4.5.2
As requested, the kernel config I used. For all other versions it's the same but with a 'make olddeefconfig'.
Yeah, the function you quoted from kallsyms is actually in the rodata section: notice that it has the same address as __start_rodata. In theory, I don't think there can be anything between the two unless somewhere the hibernation code is occasionally specifically using it. But I can't find anything to suggest it does. *** Bug 112761 has been marked as a duplicate of this bug. *** What does cat /sys/kernel/debug/kernel_page_tables show? Created attachment 216641 [details]
Kernel Page Tables 4.6 - bad
Attached: Kernel page tables on a stock 4.6 kernel without patch reverted. (So hibernation does not work)
Created attachment 216651 [details]
Kernel Page Tables 4.6 - good
Attached: Kernel page tables on a 4.6 kernel with patch reverted. (So hibernation does work)
Note: I've reproduced the exact same thing on 4.6 kernel. On a stock kernel, hibernation does not work. With the same commit reverted it does work. Can you report this via email to x86 at kernel.org, linux-kernel at vger.kernel.org, keescook at chromium.org, and me? Include the kernel config used to test 4.6 as well. See REPORTING-BUGS in the kernel tree. I tried adding Ingo and Kees directly to this bug but bugzilla.kernel.org doesn't seem to recognize their addresses, so maybe they don't use bugzilla. Also, along with kernel config, can you attach your /proc/kallsyms for good and bad? Should probably also cc the hibernation maintainers, linux-pm at vger.kernel.org Created attachment 216831 [details]
kallsyms for a working 4.6 kernel
Created attachment 216841 [details]
kallsyms for a stock 4.6 kernel
Created attachment 216851 [details]
Kernel config for 4.6
No problem. I'll send an email shortly. Thanks for the list of who to copy. Enable CONFIG_KALLSYMS_ALL=y so we can see all the symbols in kallsyms. Or you could use objdump -x to dump the symbols from your vmlinux file, either way. Created attachment 216891 [details]
kallsyms for a working 4.6 kernel
With CONFIG_KALLSYMS_ALL=y
Created attachment 216901 [details]
kallsyms for a stock 4.6 kernel
With CONFIG_KALLSYMS_ALL=y
Created attachment 216911 [details]
Kernel Page Tables 4.6 - good
With CONFIG_KALLSYMS_ALL=y
Created attachment 216921 [details]
Kernel Page Tables 4.6 - bad
CONFIG_KALLSYMS_ALL=y
Ok, I was wondering why my kallsyms had some missing symbols. Anyway, I've recompiled both kernels and attached new files. commit 65c0554b73c920023cc8998802e508b798113b46 Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Date: Thu Jun 30 18:11:41 2016 +0200 x86/power/64: Fix kernel text mapping corruption during image restoration |