Bug 13710

Summary: Corrupted low memory after resume
Product: ACPI Reporter: Oleksij Rempel (fishor) (bug-track)
Component: Power-Sleep-WakeAssignee: Zhang Rui (rui.zhang)
Status: CLOSED CODE_FIX    
Severity: normal CC: lenb, rjw, rui.zhang, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
dmesg _after
dmesg_after_w_patch
intel_bios.diff

Description Oleksij Rempel (fishor) 2009-07-05 05:37:53 UTC
Created attachment 22215 [details]
dmesg

It was tested with latest linux-2.6 git and drm-linux. To make clean test i used test_suspend=mem with acpi_sleep options (s3_bios and/or s3_mode) and without acpi_sleep. Any acpi_sleep option will make my PC freeze. Without acpi_sleep it will resume but with black screen and i can access it with ssh. Disabling kms will resume with black console but Xorg will enable video. 

After resume dmesg has two traces, memory corruption and warn_slowpath_common:



[   60.816159] Corrupted low memory at ffff880000004200 (4200 phys) = 00420301
[   60.816163] ------------[ cut here ]------------
[   60.816169] WARNING: at arch/x86/kernel/check.c:134 check_for_bios_corruption+0xe4/0xf0()
[   60.816171] Hardware name:         
[   60.816172] Memory corruption detected in low memory
[   60.816173] Modules linked in: binfmt_misc kvm_intel kvm snd_hda_codec_intelhdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device psmouse snd iTCO_wdt iTCO_vendor_support serio_raw soundcore snd_page_alloc e1000e
[   60.816190] Pid: 9, comm: events/0 Tainted: G        W  2.6.31-rc1-25446-g1ae8c0a #1
[   60.816192] Call Trace:
[   60.816196]  [<ffffffff8104a8e8>] warn_slowpath_common+0x78/0xb0
[   60.816198]  [<ffffffff8104a97c>] warn_slowpath_fmt+0x3c/0x40
[   60.816200]  [<ffffffff8102aff4>] check_for_bios_corruption+0xe4/0xf0
[   60.816202]  [<ffffffff8102b000>] ? check_corruption+0x0/0x30
[   60.816204]  [<ffffffff8102b009>] check_corruption+0x9/0x30
[   60.816206]  [<ffffffff8105e4d4>] worker_thread+0x144/0x260
[   60.816209]  [<ffffffff810635c0>] ? autoremove_wake_function+0x0/0x40
[   60.816211]  [<ffffffff8105e390>] ? worker_thread+0x0/0x260
[   60.816213]  [<ffffffff810631e6>] kthread+0x96/0xa0
[   60.816215]  [<ffffffff8100c69a>] child_rip+0xa/0x20
[   60.816217]  [<ffffffff81063150>] ? kthread+0x0/0xa0
[   60.816219]  [<ffffffff8100c690>] ? child_rip+0x0/0x20
[   60.816220] ---[ end trace e934ad520ce5f9cf ]---


[   11.621573] PM: resume devices took 5.784 seconds
[   11.621574] ------------[ cut here ]------------
[   11.621579] WARNING: at kernel/power/suspend_test.c:52 suspend_test_finish+0x7c/0x80()
[   11.621580] Hardware name:         
[   11.621580] Component: resume devices
[   11.621581] Modules linked in:
[   11.621583] Pid: 1, comm: swapper Not tainted 2.6.31-rc1-25446-g1ae8c0a #1
[   11.621585] Call Trace:
[   11.621589]  [<ffffffff8104a8e8>] warn_slowpath_common+0x78/0xb0
[   11.621590]  [<ffffffff8104a97c>] warn_slowpath_fmt+0x3c/0x40
[   11.621592]  [<ffffffff8107ce7c>] suspend_test_finish+0x7c/0x80
[   11.621594]  [<ffffffff8107cac1>] suspend_devices_and_enter+0xb1/0x210
[   11.621595]  [<ffffffff8107cd4a>] enter_state+0x12a/0x150
[   11.621598]  [<ffffffff81377843>] ? rtc_time_to_tm+0xe3/0x1a0
[   11.621600]  [<ffffffff8107cd8d>] pm_suspend+0x1d/0x30
[   11.621603]  [<ffffffff81748f83>] test_suspend+0x140/0x1a7
[   11.621605]  [<ffffffff81748e43>] ? test_suspend+0x0/0x1a7
[   11.621608]  [<ffffffff8100905b>] do_one_initcall+0x4b/0x190
[   11.621610]  [<ffffffff8109d802>] ? register_irq_proc+0xe2/0x110
[   11.621613]  [<ffffffff81140000>] ? load_elf_binary+0x1b0/0x1fe0
[   11.621616]  [<ffffffff817346ef>] kernel_init+0x169/0x1bf
[   11.621618]  [<ffffffff8100c69a>] child_rip+0xa/0x20
[   11.621619]  [<ffffffff81734586>] ? kernel_init+0x0/0x1bf
[   11.621621]  [<ffffffff8100c690>] ? child_rip+0x0/0x20
[   11.621622] ---[ end trace e934ad520ce5f9ce ]---
Comment 1 ykzhao 2009-07-06 01:11:24 UTC
Hi, Alexey
    From the dmesg log it seems that the box can be resumed when there is no boot option of "acpi_sleep=". But there is no video. Right?
    If so, it seems that this is a graphics issue instead of an ACPI issue.
    Will you please do the following test to double check it? (Don't add the boot option of: "test_suspend=", "acpi_sleep="
    a. kill the process using the /proc/acpi/event
    b. assure that the i915 driver is loaded(If it is compiled as built-in kernel, please skip this step)
    c. dmesg >dmesg_before; echo mem > /sys/power/state; dmesg >dmesg_after; sync;
    d. press the power button and see whether the box can be resumed.
    e. reboot the system and see whether there exists the file of dmesg_after. If it exists, it is a graphics issue. Please file a new bug in https://bugs.freedesktop.org/

    At the same time from the dmesg log it seems that there exist two warning backtrace.
    1. PM: resume devices took 5.784 seconds
[   11.621574] ------------[ cut here ]------------
[   11.621579] WARNING: at kernel/power/suspend_test.c:52 suspend_test_finish+0x7c/0x80()
[   11.621580] Hardware name:       
       This is caused by that it will take more than five seconds to resume the device on this box. This is related with the following commit:
       >commit a9d7052363a6e06bb623ed1876c56c7ca5b2c6d8
       >Author: Rafael J. Wysocki <rjw@sisk.pl>
       >Date:   Wed Jun 10 01:27:12 2009 +0200

    PM: Separate suspend to RAM functionality from core 

    2. Corrupted low memory at ffff880000004200 (4200 phys) = 00420301
[   60.816163] ------------[ cut here ]------------
[   60.816169] WARNING: at arch/x86/kernel/check.c:134
       It seems that this is related with the hardware issue. After the CHECK_CORRUPTION is enabled in kernel configuration, it will check the possible memory corruption.When the memory corruption is detected, it will complain the above warning message.
      
    Thanks.
Comment 2 Zhang Rui 2009-07-06 01:23:36 UTC
(In reply to comment #0)

> To make clean test i used test_suspend=mem with acpi_sleep options
> (s3_bios and/or s3_mode) and without acpi_sleep. Any acpi_sleep option will
> make my PC freeze.

that's right. these options don't work all the time.
sometimes they may freeze the laptop.

> Without acpi_sleep it will resume but with black screen and i can access
> it with ssh. Disabling kms will resume with black console but Xorg will
> enable video.

In order to get the video back, you MUST set CONFIG_DRM_I915_KMS.
why do you want to disable kms?

So I think the main problem here is the two warnings after resume.
 
> [   60.816159] Corrupted low memory at ffff880000004200 (4200 phys) =
> 00420301

I have no idea about this yet.

> [   11.621573] PM: resume devices took 5.784 seconds
> [   11.621574] ------------[ cut here ]------------
> [   11.621579] WARNING: at kernel/power/suspend_test.c:52
> suspend_test_finish+0x7c/0x80()

look at the suspend_test_finish source code.

        /* Warning on suspend means the RTC alarm period needs to be
         * larger -- the system was sooo slooowwww to suspend that the
         * alarm (should have) fired before the system went to sleep!
         *
         * Warning on either suspend or resume also means the system
         * has some performance issues.  The stack dump of a WARN_ON
         * is more likely to get the right attention than a printk...
         */
        WARN(msec > (TEST_SUSPEND_SECONDS * 1000), "Component: %s\n", label);

this doesn't seem like a problem to me.
Comment 3 Zhang Rui 2009-07-06 01:26:44 UTC
So the only problem left is the low memory corruption.

Alexey Fisher,
please make sure the memory corruption also occurs on a real resume.
Comment 4 Oleksij Rempel (fishor) 2009-07-06 04:23:23 UTC
Created attachment 22226 [details]
dmesg _after

- agp and drm module are and was compiled in
- after real resume i get both warning: slow_path and memory corruption.
Comment 5 Oleksij Rempel (fishor) 2009-07-06 04:44:39 UTC
Yakui pointed me to this patch:
https://bugs.freedesktop.org/show_bug.cgi?id=21576#c16

It solves part of the problem. Now after resume i can get video back... but i still get this two traces.
Comment 6 Oleksij Rempel (fishor) 2009-07-06 04:45:02 UTC
Created attachment 22227 [details]
dmesg_after_w_patch
Comment 7 ykzhao 2009-07-06 05:17:44 UTC
Hi, Alexey
    Thanks for the test.
    From the test it seems that the box can be resumed from S3. And the remaining issue is the two warning backtrace after resume.
    And these two issues are not related with the ACPI.
    a. The following backtrace is caused by that it will take more than five seconds to resume device. 
>[   11.621573] PM: resume devices took 5.784 seconds
> [   11.621574] ------------[ cut here ]------------
> [   11.621579] WARNING: at kernel/power/suspend_test.c:52
> suspend_test_finish+0x7c/0x80()
       And this is related with the following commit: (sorry that I give the incorrect commit).
   >ommit 77437fd4e61f87cc94d9314baa5cbf50e3ccdf54
Author: David Brownell <dbrownell@users.sourceforge.net>
Date:   Wed Jul 23 21:28:33 2008 -0700

    b. the second backtrace is related with the memory corruption check.

thanks.
Comment 8 Oleksij Rempel (fishor) 2009-07-06 09:43:20 UTC
So if i understand correctly:
- suspend_test trace is sort of spam and do not mean anything serious.. except you'll get a lot of wrong bug reports? 

- second trace ... should actually be handled by CONFIG_X86_RESERVE_LOW_64K, which is by enabled me. But it's not activated because BIOS vendor is "Intel Corp."
Comment 9 Oleksij Rempel (fishor) 2009-07-06 10:23:38 UTC
Created attachment 22230 [details]
intel_bios.diff

This patch solve the issue with memory corruption on Intel BIOS.
Comment 10 ykzhao 2009-07-08 08:59:03 UTC
(In reply to comment #8)
> So if i understand correctly:
> - suspend_test trace is sort of spam and do not mean anything serious..
> except
> you'll get a lot of wrong bug reports? 
yes. This is not serious. It only comlains that the device resume time exceeds the predefined test time.

> - second trace ... should actually be handled by CONFIG_X86_RESERVE_LOW_64K,
> which is by enabled me. But it's not activated because BIOS vendor is "Intel
> Corp."
What you have done is right. Please send your patch to mailing list.

Thanks.
Comment 11 ykzhao 2009-07-15 03:13:46 UTC
Hi, Rui
    From the log it seems that the issue of low memory corrupt can be fixed by the patch in comment 39.
    In fact this is not related with Linux ACPI. Even when the system is booted normally, it will complain that the low memory is corrupted when the patch in comment #9 is not applied.
    
    How about close this bug?
    Thanks.
Comment 12 Oleksij Rempel (fishor) 2009-07-15 05:59:43 UTC
I send this patch to mailinglist.

I did some more investigation and found that Windows have same issue on this board.
Comment 13 ykzhao 2009-07-20 03:05:56 UTC
The patch is shipped in linux-tip tree.
>http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=6aa542a694dc9ea4344a8a590d2628c33d1b9431

So this bug can be marked as resolved.

Thanks.
Comment 14 Len Brown 2009-08-30 02:34:39 UTC
6aa542a694dc9ea4344a8a590d2628c33d1b9431
x86: Add quirk for Intel DG45ID board to avoid low memory corruption

shipped in v2.6.31-rc5
closed