Bug 10568 - random crashes after resume - ASUS P5LD2-VM
Summary: random crashes after resume - ASUS P5LD2-VM
Alias: None
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: acpi_power-sleep-wake
Depends on:
Blocks: 7216
  Show dependency tree
Reported: 2008-04-27 23:31 UTC by Oleksij Rempel (fishor)
Modified: 2008-11-20 21:43 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.20 - 2.6.25-git11
Regression: ---
Bisected commit-id:

dmesg (51.86 KB, text/plain)
2008-04-27 23:32 UTC, Oleksij Rempel (fishor)
dmesg-2.6.25-git11 (46.05 KB, text/plain)
2008-04-29 02:32 UTC, Oleksij Rempel (fishor)
config (51.31 KB, application/octet-stream)
2008-05-06 22:02 UTC, Oleksij Rempel (fishor)

Description Oleksij Rempel (fishor) 2008-04-27 23:31:50 UTC
Latest working kernel version: no
Earliest failing kernel version:
Distribution:ubuntu 8.04
Hardware Environment:ASUS P5LD2-VM ( ICH7, Pentium D, i945G )
Software Environment:
Problem Description:
After resume system getting unstable, some applications will randomly crash.
For example firefox or complete xorg. Normally i'll get no dmesg but after enabling CONFIG_X86_PAT there is some thing interisting.

[  943.744299] mencoder[5002]: segfault at 48 ip 08163f93 sp bfc682b0 error 4 in mencoder[8048000+7df000]
[ 2272.597773] Xorg:4127 freeing invalid memtype d0020000-d002a000
[ 2272.598099] Xorg:4127 freeing invalid memtype d002a000-d0032000
[ 2272.598397] Xorg:4127 freeing invalid memtype d0032000-d0033000
[ 2272.598705] Xorg:4127 freeing invalid memtype d1000000-d1a00000
[ 2272.599001] Xorg:4127 freeing invalid memtype d2000000-d3e00000
[ 2272.599296] Xorg:4127 freeing invalid memtype d4000000-d4a00000
[ 2272.599593] Xorg:4127 freeing invalid memtype d5000000-d5a00000
[ 2272.599882] Xorg:4127 freeing invalid memtype d6000000-d8000000
[ 2273.400610] Xorg:4127 freeing invalid memtype d0000000-d0020000
[ 2277.832883] Overlap at 0xd0000000-0xe0000000
[ 2277.832897] Xorg:5360 /dev/mem expected mapping type write-back for d07bf000-d1000000, got uncached
[ 2277.832907] Overlap at 0xd0000000-0xe0000000
[ 2277.832911] Xorg:5360 /dev/mem expected mapping type write-back for d1a00000-e0000000, got uncached

I think this bug is just other form of Bug 10131
There is no scripts used to suspend. Only "echo mem > /sys/power/state"
Comment 1 Oleksij Rempel (fishor) 2008-04-27 23:32:14 UTC
Created attachment 15942 [details]
Comment 2 Zhang Rui 2008-04-27 23:56:03 UTC
(In reply to comment #0)
> After resume system getting unstable, some applications will randomly crash.
> For example firefox or complete xorg. Normally i'll get no dmesg
That's weird, please try boot option "ignore_loglevel".
Please try S3 in a later kernel, say 2.6.24.
And attach the dmesg output after S3 resume.
Comment 3 Oleksij Rempel (fishor) 2008-04-29 02:32:30 UTC
agrr...  I still can't find 100% crasher. With latest git i can do:

1. debsums or md5deep -e -r Desktop/
2. S3
3. debsums
4. firefox
5. X will segfault

with some kernel before this all will nothing do but if i try to compile kernel, gcc will segfault.

2.6.24 GOOD
2.6.25 GOOD
2.6.25-01393-g2cca775 seems to be GOOD

2.6.25-05301-gc3bf9bc BAD
2.6.25-05561-g064922a BAD

There is still no segfault before suspend.
Comment 4 Oleksij Rempel (fishor) 2008-04-29 02:32:57 UTC
Created attachment 15973 [details]
Comment 5 Zhang Rui 2008-04-30 01:43:05 UTC
does the problem still exist if X is not started?
And it would be great if you can find the exact commit that cause this problem using git bisect.
Comment 6 Oleksij Rempel (fishor) 2008-04-30 02:05:15 UTC
i can't find exact commit without 100% killer aplication. I can't reproduce one crash 2 times, but every time will crash some thing other. Today i tryed to kill firefox or gcc with "make -j2 all" and killed killed systemmonitor.

Any suggestion what can i use to make better test? like memory test, cpu banch?
Comment 7 Oleksij Rempel (fishor) 2008-04-30 02:06:46 UTC
It seesm like some memory allocation issue. This bug is hard to reproduce with suspend after freh start.
Comment 8 Oleksij Rempel (fishor) 2008-05-01 01:36:14 UTC
So i get 100% killer.
It is xorg dependet. i can't reproduce it without X

Steps to reproduce:
1. boot
2. suspend S3 && resume 
3. run in gnome-terminal 
for i in $( seq 100000 ); do 
   echo "jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj";
done && 
sleep 5s && 
cd tmp/kernel/linux-2.6.20/ && 
make -j2 all

4. compilation will segfault.

There is still 9 revisions to bisect but i get kernel oops at __wake_up_common+0x21/0x58

latest BAD commit d2bcbad5f3ad38a1c09861bca7e252dde7bb8259
    x86: do not zap_low_mappings in __smp_prepare_cpus
    It was okay when cpus were cold booted before this point.
    But with the new state machine, they will not have arrived to
    the trampoline yet. zapping low mappings will have the bad effect
    of breaking it completely after paging enablement
    Signed-off-by: Glauber Costa <gcosta@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

latest GOOD commit fbac7fcbadc54cc5d374873a2e60e924a056d198
    x86: fix alloc_bootmem_pages_node macro
    missing a semicolon
    Signed-off-by: Glauber Costa <gcosta@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

Patch set autors:    
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Comment 9 Oleksij Rempel (fishor) 2008-05-01 02:41:17 UTC
after some commit betvene v2.6.25-06492-g7663c1e and v2.6.25-07245-ge4c576b ( last day ) i can't reproduce it any more.
Comment 10 Oleksij Rempel (fishor) 2008-05-01 22:57:26 UTC
i do not why it was not reproducable with latest kernel, but it really good reproducable now with latest 2.6.25-testing-07351-g886c35f . SO i assume it was some site effect of some patch but not fixed bug.
Comment 11 Oleksij Rempel (fishor) 2008-05-03 23:18:55 UTC
with nosmp it working 100% stable. Even suspend with HAL and Gnome will crush on resume but with nosmp it's working too.
Comment 12 Oleksij Rempel (fishor) 2008-05-04 00:35:20 UTC
Heh... you don't need to suspend to reproduce this bug, just disable one cpu and enable it agene.

echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
Comment 13 Len Brown 2008-05-05 18:41:52 UTC
please boot with "nopat" and see if the problem goes away
Comment 14 Oleksij Rempel (fishor) 2008-05-05 22:12:54 UTC
No. the problem exist.
Comment 15 Glauber Costa 2008-05-06 17:34:32 UTC

I'm trying to reproduce this issue, but without any success so far.
Can you please post your .config ?
Comment 16 Oleksij Rempel (fishor) 2008-05-06 22:02:48 UTC
Created attachment 16055 [details]

I think important factors are: 2G RAM ( if i set it to 1G it working ok ), SMP ( if set nosmp it working ok ), grafik on board ? broken acpi ?
Comment 17 Oleksij Rempel (fishor) 2008-05-24 23:29:47 UTC
Seems like this bug was partelly fixed some where betwene 2.6.26-rc2 and -rc3. I can't reproduce it with this "test script". And for first time of this PC live resume with gnome worked!!! At lest one time, second suspend/resume may not always work. I still assume Bug 10131 is just worse form of this bug. 
Comment 18 Zhang Rui 2008-09-27 00:57:45 UTC
hi, alexey,
can you reproduce the bug in the latest kernel release?
Comment 19 Zhang Rui 2008-11-20 21:43:03 UTC
please re-open this bug if you can reproduce this bug in the latest kernel release.

Note You need to log in before you can comment on or make changes to this bug.