Bug 6492 - Suspend to ram regression 2.6.17-rc6
Summary: Suspend to ram regression 2.6.17-rc6
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Shaohua
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-04 12:02 UTC by Erik Slagter
Modified: 2006-06-25 19:59 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.17-rc6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Photo of oops (118.25 KB, image/jpeg)
2006-06-14 06:08 UTC, Erik Slagter
Details
System map (810.69 KB, application/octet-stream)
2006-06-14 06:10 UTC, Erik Slagter
Details
Config (32.87 KB, application/octet-stream)
2006-06-14 06:11 UTC, Erik Slagter
Details

Description Erik Slagter 2006-05-04 12:02:40 UTC
Most recent kernel where this bug did not occur: 2.6.16.1
Distribution: FC4/5 (vanilla kernel)
Hardware Environment: Dell Inspiron 9300 laptop
Software Environment: Fedore Core 5
Problem Description: Suspend to ram broken
2.6.16-git19 during suspend says:
"kernel tried to execute NX-protected page - exploit attempt? (uid 0)
 BUG: unable to handle kernel paging request at virtual address c04ce31"

Complete OOPS at request.

2.6.17-rc3 completes the suspend cycle (although it complains about usb_hcd) but
cannot be woken properly; the fans start to run, leds light, keyboard caps lock
key/led combo work, but nothing else. No display, no ip, no USB keyboard (so
gathering info using ssh or similar is not possible).

Steps to reproduce:

#!/bin/sh

logger -t acpi-suspend "acpi-suspend $*"

state=3

killall vncviewer xvncviewer gaim

service openvpn stop
ifconfig tap0 down
service ntpd stop
service network stop
service cups stop

hwclock --systohc --utc

sync
sync

echo "${state}" > /proc/acpi/sleep
logger -t acpi-suspend "awake from acpi-suspend"

hwclock --hctosys --utc

service network start
service ntpd start
service cups start
Comment 1 Andrew Morton 2006-05-04 16:43:41 UTC
Yes, the full oops trace would be useful, please.  A digital photo of the screen
is sometimes convenient.
Comment 2 Erik Slagter 2006-05-05 02:14:50 UTC
Additional info

2.6.16.1: works
2.6.16-git19: output below (hand-copied, almost complete)
2.6.17-rc3: no output, because screen is off and no network connectivity, so
also no remote syslog

hci_usb: 2-1:1.1: no suspend for driver hci_usb?
hci_usb: 2-1:1.0: no suspend for driver hci_usb?
kernel tried to execute NX-protected page - exploit attempt? (uid 0)
BUG: unable to handle kernel paging request at virtual address c04ce31
 
Printing EIP
c04ce31c
*pde = 00000000
Oops: 0011 [#1]
Modules linked in: pl2303 usbserial ehci_hcd ipw2200 uhci_hcd
CPU: 0
EIP: 0060:[<c04ce31c>] Not tainted VLI
EFLAGS: 00010046 (2.6.16-git14 #4)
EIP is at 0x04ce31c
eax: c04dec00 ebx: c0001000 ecx: c000080 edx: 00000100
esi: 00000003 edi: 00000000 ebp: 0000046 esp: f4f0bef4
ds: 007b es: 007b ss: 0068
Process zsh (pid: 3657, threadinfo=f4f40a000, task=f4c0ea90)
Stack: <0>c0244b4e c04e3f08 00000000 ....
Call trace: acpi_pm_enter+0x53/0x99     suspend_enter+0x4f/0x60
            enter_state+0xde/0x160      state_store+0x97/0xc0
            state_store+0x0/0xc0        subsys_attr_store+0x29/0x40
            sysfs_write_file+0x93/0xf0  vfs_write+0xa6/0x160
            sysfs_write+0x0/0xf0        sys_write+0x41/0x70
            sysenter_past_esp+0x54/0x75
Comment 3 Andrew Morton 2006-05-05 07:28:29 UTC
We seem to have us an acpi regression since 2.6.16, even though we didn't
any merge acpi stuff :(

Eric, would you be able to resolve EIP c04ce31c please?   You can do that with

gdb vmlinux
(gdb) l *0xc04ce31c

That won't work if you didn't select CONFIG_DEBUG_INFO, in which case

gdb vmlinux
(gdb) x/10i 0xc04ce31c


Thanks.



Begin forwarded message:

Date: Fri, 5 May 2006 02:14:59 -0700
From: bugme-daemon@bugzilla.kernel.org
To: akpm@osdl.org
Subject: [Bug 6492] Suspend to ram regression 2.6.17-rc3


http://bugzilla.kernel.org/show_bug.cgi?id=6492





------- Additional Comments From erik@slagter.name  2006-05-05 02:14 -------
Additional info

2.6.16.1: works
2.6.16-git19: output below (hand-copied, almost complete)
2.6.17-rc3: no output, because screen is off and no network connectivity, so
also no remote syslog

hci_usb: 2-1:1.1: no suspend for driver hci_usb?
hci_usb: 2-1:1.0: no suspend for driver hci_usb?
kernel tried to execute NX-protected page - exploit attempt? (uid 0)
BUG: unable to handle kernel paging request at virtual address c04ce31
 
Printing EIP
c04ce31c
*pde = 00000000
Oops: 0011 [#1]
Modules linked in: pl2303 usbserial ehci_hcd ipw2200 uhci_hcd
CPU: 0
EIP: 0060:[<c04ce31c>] Not tainted VLI
EFLAGS: 00010046 (2.6.16-git14 #4)
EIP is at 0x04ce31c
eax: c04dec00 ebx: c0001000 ecx: c000080 edx: 00000100
esi: 00000003 edi: 00000000 ebp: 0000046 esp: f4f0bef4
ds: 007b es: 007b ss: 0068
Process zsh (pid: 3657, threadinfo=f4f40a000, task=f4c0ea90)
Stack: <0>c0244b4e c04e3f08 00000000 ....
Call trace: acpi_pm_enter+0x53/0x99     suspend_enter+0x4f/0x60
            enter_state+0xde/0x160      state_store+0x97/0xc0
            state_store+0x0/0xc0        subsys_attr_store+0x29/0x40
            sysfs_write_file+0x93/0xf0  vfs_write+0xa6/0x160
            sysfs_write+0x0/0xf0        sys_write+0x41/0x70
            sysenter_past_esp+0x54/0x75

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

Comment 4 Shaohua 2006-05-07 23:23:46 UTC
If you use other shell (instead of zsh), did you see the oops in 2.6.16-git14?
Comment 5 Erik Slagter 2006-05-15 12:35:58 UTC
Unfortunately I don't have this kernel image (2.6.16-git19) anymore, nor do I
have the exact config file. Also please note that this is a "production" laptop
that I need for my work. Everytime it crashes there is chance the fs will become
corrupt, which I cannot afford. If no-one else has this problem, it's fine with
me to close that part of the bug.

The 2.6.17-rc3 still stands, any recommendations on how I can obtain usable
debugging info?
Comment 6 Erik Slagter 2006-06-14 06:07:32 UTC
Today I tried 2.2.17-rc6, exactly same behaviour. I removed all more or less
suspicious modules (uhcd, ehcd, pl2303, ipw2200 and ieee80211) but no
difference. I recompiled the kernel with preemption off and still no success.

I do have a screenshot this time, although it's a but blurry, I only a have
phone at my disposition at the moment. I now also have config and symbol files
for this kernel.
Comment 7 Erik Slagter 2006-06-14 06:08:31 UTC
Created attachment 8300 [details]
Photo of oops
Comment 8 Erik Slagter 2006-06-14 06:10:15 UTC
Created attachment 8301 [details]
System map
Comment 9 Erik Slagter 2006-06-14 06:11:42 UTC
Created attachment 8302 [details]
Config
Comment 10 Erik Slagter 2006-06-14 06:17:45 UTC
This is what gdb says about the EIP address:

(gdb) x/10i 0xc045d35c
0xc045d35c <do_suspend_lowlevel>:
    call   0xc032b010 <save_processor_state>
0xc045d361 <do_suspend_lowlevel+5>:     call   0xc045d308 <save_registers>
0xc045d366 <do_suspend_lowlevel+10>:    push   $0x3
0xc045d368 <do_suspend_lowlevel+12>:
    call   0xc024bd20 <acpi_enter_sleep_state>
0xc045d36d <do_suspend_lowlevel+17>:    add    $0x4,%esp
0xc045d370 <do_suspend_lowlevel+20>:    ret
0xc045d371 <ret_point>: call   0xc045d33b <restore_registers>
0xc045d376 <ret_point+5>:       call   0xc032afc0 <restore_processor_state>
0xc045d37b <ret_point+10>:      ret
0xc045d37c <saved_gdt>: add    %al,(%eax)
Comment 11 Shaohua 2006-06-14 19:52:04 UTC
Ok, thanks! 'do_suspend_lowlevel' really shoudln't be in .data, I'll fix it.
But this can't explain why we see the failure. IIRC, even for kernel data (< 
__init_end) will be set to executable. I'll take a close look.
Comment 12 Shaohua 2006-06-14 20:45:09 UTC
Ok, I could reproduce this bug here and I already sent a patch to lkml, please 
try.
http://marc.theaimsgroup.com/?l=linux-kernel&m=115034292902654&w=2
Comment 13 Erik Slagter 2006-06-15 03:17:35 UTC
Yep, confirmed, this patch resolves the issue.

Please note the I also had a similar issue when the kernel was compiled
_without_ NX support; the kernel would simply crash then. I guess this boils
down to the same problem, presumably some memory corruption?
Comment 14 Shaohua 2006-06-15 17:46:54 UTC
>wn to the same problem, presumably some memory corruption?
properly not. Please open a new track with the issue, and see if you could
provide  some info.
Comment 15 Shaohua 2006-06-25 19:59:15 UTC
It's in base kernel now. Closed!

Note You need to log in before you can comment on or make changes to this bug.