Most recent kernel where this bug did not occur: 2.6.16.1 Distribution: FC4/5 (vanilla kernel) Hardware Environment: Dell Inspiron 9300 laptop Software Environment: Fedore Core 5 Problem Description: Suspend to ram broken 2.6.16-git19 during suspend says: "kernel tried to execute NX-protected page - exploit attempt? (uid 0) BUG: unable to handle kernel paging request at virtual address c04ce31" Complete OOPS at request. 2.6.17-rc3 completes the suspend cycle (although it complains about usb_hcd) but cannot be woken properly; the fans start to run, leds light, keyboard caps lock key/led combo work, but nothing else. No display, no ip, no USB keyboard (so gathering info using ssh or similar is not possible). Steps to reproduce: #!/bin/sh logger -t acpi-suspend "acpi-suspend $*" state=3 killall vncviewer xvncviewer gaim service openvpn stop ifconfig tap0 down service ntpd stop service network stop service cups stop hwclock --systohc --utc sync sync echo "${state}" > /proc/acpi/sleep logger -t acpi-suspend "awake from acpi-suspend" hwclock --hctosys --utc service network start service ntpd start service cups start
Yes, the full oops trace would be useful, please. A digital photo of the screen is sometimes convenient.
Additional info 2.6.16.1: works 2.6.16-git19: output below (hand-copied, almost complete) 2.6.17-rc3: no output, because screen is off and no network connectivity, so also no remote syslog hci_usb: 2-1:1.1: no suspend for driver hci_usb? hci_usb: 2-1:1.0: no suspend for driver hci_usb? kernel tried to execute NX-protected page - exploit attempt? (uid 0) BUG: unable to handle kernel paging request at virtual address c04ce31 Printing EIP c04ce31c *pde = 00000000 Oops: 0011 [#1] Modules linked in: pl2303 usbserial ehci_hcd ipw2200 uhci_hcd CPU: 0 EIP: 0060:[<c04ce31c>] Not tainted VLI EFLAGS: 00010046 (2.6.16-git14 #4) EIP is at 0x04ce31c eax: c04dec00 ebx: c0001000 ecx: c000080 edx: 00000100 esi: 00000003 edi: 00000000 ebp: 0000046 esp: f4f0bef4 ds: 007b es: 007b ss: 0068 Process zsh (pid: 3657, threadinfo=f4f40a000, task=f4c0ea90) Stack: <0>c0244b4e c04e3f08 00000000 .... Call trace: acpi_pm_enter+0x53/0x99 suspend_enter+0x4f/0x60 enter_state+0xde/0x160 state_store+0x97/0xc0 state_store+0x0/0xc0 subsys_attr_store+0x29/0x40 sysfs_write_file+0x93/0xf0 vfs_write+0xa6/0x160 sysfs_write+0x0/0xf0 sys_write+0x41/0x70 sysenter_past_esp+0x54/0x75
We seem to have us an acpi regression since 2.6.16, even though we didn't any merge acpi stuff :( Eric, would you be able to resolve EIP c04ce31c please? You can do that with gdb vmlinux (gdb) l *0xc04ce31c That won't work if you didn't select CONFIG_DEBUG_INFO, in which case gdb vmlinux (gdb) x/10i 0xc04ce31c Thanks. Begin forwarded message: Date: Fri, 5 May 2006 02:14:59 -0700 From: bugme-daemon@bugzilla.kernel.org To: akpm@osdl.org Subject: [Bug 6492] Suspend to ram regression 2.6.17-rc3 http://bugzilla.kernel.org/show_bug.cgi?id=6492 ------- Additional Comments From erik@slagter.name 2006-05-05 02:14 ------- Additional info 2.6.16.1: works 2.6.16-git19: output below (hand-copied, almost complete) 2.6.17-rc3: no output, because screen is off and no network connectivity, so also no remote syslog hci_usb: 2-1:1.1: no suspend for driver hci_usb? hci_usb: 2-1:1.0: no suspend for driver hci_usb? kernel tried to execute NX-protected page - exploit attempt? (uid 0) BUG: unable to handle kernel paging request at virtual address c04ce31 Printing EIP c04ce31c *pde = 00000000 Oops: 0011 [#1] Modules linked in: pl2303 usbserial ehci_hcd ipw2200 uhci_hcd CPU: 0 EIP: 0060:[<c04ce31c>] Not tainted VLI EFLAGS: 00010046 (2.6.16-git14 #4) EIP is at 0x04ce31c eax: c04dec00 ebx: c0001000 ecx: c000080 edx: 00000100 esi: 00000003 edi: 00000000 ebp: 0000046 esp: f4f0bef4 ds: 007b es: 007b ss: 0068 Process zsh (pid: 3657, threadinfo=f4f40a000, task=f4c0ea90) Stack: <0>c0244b4e c04e3f08 00000000 .... Call trace: acpi_pm_enter+0x53/0x99 suspend_enter+0x4f/0x60 enter_state+0xde/0x160 state_store+0x97/0xc0 state_store+0x0/0xc0 subsys_attr_store+0x29/0x40 sysfs_write_file+0x93/0xf0 vfs_write+0xa6/0x160 sysfs_write+0x0/0xf0 sys_write+0x41/0x70 sysenter_past_esp+0x54/0x75 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.
If you use other shell (instead of zsh), did you see the oops in 2.6.16-git14?
Unfortunately I don't have this kernel image (2.6.16-git19) anymore, nor do I have the exact config file. Also please note that this is a "production" laptop that I need for my work. Everytime it crashes there is chance the fs will become corrupt, which I cannot afford. If no-one else has this problem, it's fine with me to close that part of the bug. The 2.6.17-rc3 still stands, any recommendations on how I can obtain usable debugging info?
Today I tried 2.2.17-rc6, exactly same behaviour. I removed all more or less suspicious modules (uhcd, ehcd, pl2303, ipw2200 and ieee80211) but no difference. I recompiled the kernel with preemption off and still no success. I do have a screenshot this time, although it's a but blurry, I only a have phone at my disposition at the moment. I now also have config and symbol files for this kernel.
Created attachment 8300 [details] Photo of oops
Created attachment 8301 [details] System map
Created attachment 8302 [details] Config
This is what gdb says about the EIP address: (gdb) x/10i 0xc045d35c 0xc045d35c <do_suspend_lowlevel>: call 0xc032b010 <save_processor_state> 0xc045d361 <do_suspend_lowlevel+5>: call 0xc045d308 <save_registers> 0xc045d366 <do_suspend_lowlevel+10>: push $0x3 0xc045d368 <do_suspend_lowlevel+12>: call 0xc024bd20 <acpi_enter_sleep_state> 0xc045d36d <do_suspend_lowlevel+17>: add $0x4,%esp 0xc045d370 <do_suspend_lowlevel+20>: ret 0xc045d371 <ret_point>: call 0xc045d33b <restore_registers> 0xc045d376 <ret_point+5>: call 0xc032afc0 <restore_processor_state> 0xc045d37b <ret_point+10>: ret 0xc045d37c <saved_gdt>: add %al,(%eax)
Ok, thanks! 'do_suspend_lowlevel' really shoudln't be in .data, I'll fix it. But this can't explain why we see the failure. IIRC, even for kernel data (< __init_end) will be set to executable. I'll take a close look.
Ok, I could reproduce this bug here and I already sent a patch to lkml, please try. http://marc.theaimsgroup.com/?l=linux-kernel&m=115034292902654&w=2
Yep, confirmed, this patch resolves the issue. Please note the I also had a similar issue when the kernel was compiled _without_ NX support; the kernel would simply crash then. I guess this boils down to the same problem, presumably some memory corruption?
>wn to the same problem, presumably some memory corruption? properly not. Please open a new track with the issue, and see if you could provide some info.
It's in base kernel now. Closed!