Bug 5962

Summary: CPU lock up during suspend to disk
Product: ACPI Reporter: Markus Walser (markus.walser)
Component: Power-Sleep-WakeAssignee: Rafael J. Wysocki (rjwysocki)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, bunk, jlp.bugs, pavel, rjwysocki
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.16-rc1-mm3 Subsystem:
Regression: --- Bisected commit-id:
Bug Depends on: 5534    
Bug Blocks:    

Description Markus Walser 2006-01-26 02:45:13 UTC
Most recent kernel where this bug did not occur: ?

Distribution: 
*SuSE 10

Hardware Environment: 
*nx6125
*PCI: http://homepage.hispeed.ch/hb9xcg/lspci
*CPU: AMD Turion(tm) 64 Mobile ML-34 stepping: 2

Software Environment: 

*gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)
*Config:    http://homepage.hispeed.ch/hb9xcg/config-2.6.16-rc1-mm3
*SystemMap: http://homepage.hispeed.ch/hb9xcg/System.map-2.6.16-rc1-mm3-mw
*dmesg:     http://homepage.hispeed.ch/hb9xcg/resume-2.6.16-rc1-mm3.log

Problem Description: 
During suspend to disk the system hangs for a while and reports after a few
moments a callstack:
http://homepage.hispeed.ch/hb9xcg/img_0410.jpg
After the callstack suspend to disk completes and can even be resumed.

Steps to reproduce:
1) boot in single user mode
2) enable the swap partition,
3) echo 4 > /proc/acpi/sleep

May be irrelevant, but during normal operation I get also this messages in the
kernel log:
"     osl-0822 [77] os_wait_semaphore     : Failed to acquire
semaphore[ffff81005733ad40|1|0], AE_TIME"

On the SuSE kernel 2.6.13-15.7-default, this message appears as well. Especially
when accessing the proc fs by f.e. "cat /proc/acpi/thermal_zone/TZ1/temperature"
after accessing the proc kacpi spins up to 100% cpu usage for a while. This may
 be connected to http://bugzilla.kernel.org/show_bug.cgi?id=5534
Comment 1 Andrew Morton 2006-01-26 19:20:33 UTC
suspend went oops in the acpi code.   Maybe
a bug in Len's devel tree?
Comment 2 Shaohua 2006-02-07 23:52:40 UTC
>soft lockup
This should has been fixed in 2.6.16-rc2. Please try!
Comment 3 Markus Walser 2006-02-08 12:33:56 UTC
With plain 2.6.16-rc2 dmesg still reports messages like:
"os_wait_semaphore     : Failed to acquire semaphore[ffff81005733ad40|1|0], AE_TIME"

Suspend to disk locks up as well with 2.6.16-rc2 but with just a few debug messages:
http://homepage.hispeed.ch/hb9xcg/suspend-2.6.16-rc2.JPG

Andrew, is there special debug code enabled in your kernel? Obviously
2.6.16-rc1-mm3 is much more verbose during suspend even though the same .config
was used. If yes, is there a way to enable this on the vanilla 2.6.16-r2?
Comment 4 Shaohua 2006-02-08 16:57:08 UTC
>>soft lockup
>This should has been fixed in 2.6.16-rc2. Please try!
Ha, the patch is in latest git tree, but not in 2.6.16-rc2. sorry. please try 
latest git tree. or you could try the patch at:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-
2.6.git;a=commit;h=0dd2ea9af8f0eca43cf6200baa182b3aba307049
Comment 5 Shaohua 2006-02-13 17:32:30 UTC
I suppose it's fixed in 2.6.16-rc3. if not, please reopen this bug and provide 
the dmesg. Thanks!
Comment 6 Markus Walser 2006-02-14 15:02:18 UTC
Hi
Unfortunately 2.6.16-rc3 doesn't work different than the rc2 did.
Dmesg after booting looks like this:
     
    http://homepage.hispeed.ch/hb9xcg/linux-2.6.16-rc3.dmesg

During suspend (not resume) the system hangs as described in #3. 
The rc-1-mm3 at least reported the lock up while the vanilla rc's remain silent.
Keyboard events get still processed, e.g. switching back to vt1 and pressing CR
works. So obviously the system isn't hard locked.
Comment 7 Shaohua 2006-02-14 17:01:04 UTC
How about unload the USB driver before suspend?
Comment 8 Markus Walser 2006-02-15 12:21:58 UTC
...hm, interessting things are happening without usb modules loaded.

lsmod:
Module                  Size  Used by
usbcore               132220  1
reiserfs              232560  1
fan                     7304  0
thermal                23884  0
ide_cd                 40864  0
cdrom                  36344  1 ide_cd
processor              42392  1 thermal
atiixp                  6288  0 [permanent]
ide_disk               16384  3
ide_core              142616  3 ide_cd,atiixp,ide_disk

suspend hangs with screen like:
   http://homepage.hispeed.ch/hb9xcg/without_usb_before_pressing_key.jpg

After pressing carriage return suspend continues with this screen:
   http://homepage.hispeed.ch/hb9xcg/without_usb_after_pressing_key.jpg

Having usb loaded, pressing any key doesn't make it possible to continue.

After resuming. the system is in a strange state, e.g. non stopping bell,
but that's a different story.

Do you have any ideas how to nail down this bug furthermore?
Comment 9 Shaohua 2006-02-22 19:33:20 UTC
So the suspend is hang in shutdown processing.
Does shutdown work in the system?
Comment 10 Markus Walser 2006-03-04 05:35:54 UTC
Yes normal shutdown work on the system. But I suppose this bug is very close
connected to the bug http://bugzilla.kernel.org/show_bug.cgi?id=5534. There seems
to be trouble with processing acpi events with this bios.
In http://bugzilla.kernel.org/show_bug.cgi?id=5534#c65 and
http://bugzilla.kernel.org/show_bug.cgi?id=5534#c66 are some interessing
investigation described. Couldn't this also lock-up other acpi subsystems during
suspend/resume?
Comment 11 Alexey Starikovskiy 2006-05-05 03:38:56 UTC
Markus,

Please verify if patch to #5534 fixes your problem as well.
Comment 12 Markus Walser 2006-05-06 08:29:35 UTC
Hi Alexey
My first try applying your 2nd version of the patch to a 2.6.16.14 kernel and
trying to suspend didn't solve the problem. The machine kept hanging as on
earlier tries.
After changing back to a 32bit installation I tried again. The result was the
same hanger as always.
After setting the parameter noapic suspending worked suddenly. I've no clue why
"noapic" is working suddenly, but I'm sure that I tried "noapic" with earlier
kernels on the 64bit installation. Those kernels never even booted.
A lot changed between the two tries: compiler 4.0/4.1 and 32bit/64bit. What I
could do is to install again a 64bit/gcc4.1 version and take a look if "noapic"
would still work.
Comment 13 Rafael J. Wysocki 2006-09-27 00:15:39 UTC
Markus, can you please verify if this is a duplicate of Bug #5534 and if the 
final patches for that bug fix your problem?

Comment 14 Rafael J. Wysocki 2006-10-25 10:13:18 UTC
I'm closing this entry.  Please reopen if you think that's necessary.