Bug 14320

Summary: Various BUGs on resume
Product: Drivers Reporter: Alan Jenkins (alan-jenkins)
Component: Video(DRI - Intel)Assignee: drivers_video-dri-intel (drivers_video-dri-intel)
Status: RESOLVED INSUFFICIENT_DATA    
Severity: normal CC: jbarnes, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32-rc1-git Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: Current kconfig
Dmesg showing "bad swap file entry" on current kernel
Another dmesg showing "bad swap file entry" on current kernel
Dmesg showing BUG in fget_light() on slightly earlier kernel
/var/log/messages showing freezer/scheduler debug spew after suspend to ram
Dmesg showing hung tasks after suspend to ram

Description Alan Jenkins 2009-10-04 16:32:29 UTC
References: http://lkml.org/lkml/2009/10/4/64


I'm seeing a variety of BUGs on my EeePC 701 after hibernation.  Sometimes they cause a hang during resume; sometimes they happen just after resume.  It doesn't happen all the time either -  I've just hibernated three times in a row with no problems.  It's most perplexing.
 
One resume hang showed a series of SCSI backtraces and errors.  Unfortunately I wasn't able to capture it at the time.  They were most probably related to the root device, an SSD controlled by ata_piix.

I have full dmesgs for two different sets of backtraces following resume: "bad swap file entry" backtraces, and BUGs in fget_light().  The fget_light() bug was on a slightly older kernel (still a child of 32-rc1).  I will attach them forthwith.
Comment 1 Alan Jenkins 2009-10-04 16:33:22 UTC
Created attachment 23258 [details]
Current kconfig
Comment 2 Alan Jenkins 2009-10-04 16:35:46 UTC
Created attachment 23259 [details]
Dmesg showing "bad swap file entry" on current kernel
Comment 3 Alan Jenkins 2009-10-04 16:36:55 UTC
Created attachment 23260 [details]
Another dmesg showing "bad swap file entry" on current kernel
Comment 4 Alan Jenkins 2009-10-04 16:38:09 UTC
Created attachment 23261 [details]
Dmesg showing BUG in fget_light() on slightly earlier kernel
Comment 5 Alan Jenkins 2009-10-04 16:42:20 UTC
And then suspend-to-ram failed with a long freezer/scheduler debug spew, followed by a set of hung task warnings.  I used suspend-to-ram a couple of times in the past few days before finding this problem.
Comment 6 Alan Jenkins 2009-10-04 16:45:14 UTC
Created attachment 23262 [details]
/var/log/messages showing freezer/scheduler debug spew after suspend to ram

Ah, I should also have said that the suspend actually failed, unlike the hibernation problems
Comment 7 Alan Jenkins 2009-10-04 16:49:17 UTC
Created attachment 23263 [details]
Dmesg showing hung tasks after suspend to ram

This is the same event as the /var/log/messages attachment.  Dmesg shows the "hung task" message which /var/log/messages misses.

(/var/log/messages is otherwise more complete, since the kernel log buffer has overflowed).
Comment 8 Alan Jenkins 2009-10-04 16:55:53 UTC
And here's ps showing the hung tasks:

$ ps ax | grep D
  PID TTY      STAT   TIME COMMAND
 3521 ?        D      0:00 /usr/lib/ConsoleKit/run-session.d/udev-acl.ck session_active_changed
 3691 ?        D      0:00 setfacl -m u:1000:rw /dev/audio -m u:1000:rw /dev/dsp -m u:1000:rw /dev/mixer -m u:1000:rw /dev/snd/controlC0 -m u:1000:rw /dev/snd/hwC0D0 -m u:1000:rw /dev/snd/pcmC0D0c -m u:1000:rw /dev/snd/pcmC0D0p -m u:1000:rw /dev/snd/timer -m u:1000:rw /dev/video0
 3874 ?        D      0:00 /usr/lib/ConsoleKit/run-session.d/udev-acl.ck session_active_changed
 4050 ?        D      0:00 setfacl -m u:1000:rw /dev/audio -m u:1000:rw /dev/dsp -m u:1000:rw /dev/mixer -m u:1000:rw /dev/snd/controlC0 -m u:1000:rw /dev/snd/hwC0D0 -m u:1000:rw /dev/snd/pcmC0D0c -m u:1000:rw /dev/snd/pcmC0D0p -m u:1000:rw /dev/snd/timer -m u:1000:rw /dev/video0
Comment 9 Rafael J. Wysocki 2009-10-04 20:30:35 UTC
Is this reproducible without KMS?
Comment 10 Rafael J. Wysocki 2009-10-06 00:06:02 UTC
The issue doesn't appear to be reproducible without KMS.
Comment 11 Alan Jenkins 2009-10-06 09:03:45 UTC
> 2. Did it work with 2.6.31 (and with KMS)? 

I thought so, but my recollection is hazy.  I didn't test it for very
long if I did.  I've tried it now and 2.6.31 behaves pretty similarly. 
Sorry for ringing the regression bell.  (Corrected in bugzilla).


Firstly the _hibernation_ process hung (after a couple of suspend-to-ram
cycles).

No text on the console (despite using s2disk).  It echoed keypresses and
responds to SysRq keys.  No messages from lockdep or the hung task
detector (after waiting 5 minutes).  SysRq-P said we're in the idle
loop; SysRq-T said both events/0 and hald-addon-input were runnable.


Then suspend-to-ram hung (following a hibernation cycle).  This time it
showed the contents of vt1, but didn't appear to respond to anything
short of SysRq+B.
Comment 12 Jesse Barnes 2009-12-02 19:12:42 UTC
Can you bisect it?  Sounds like we may be corrupting memory...
Comment 13 Alan Jenkins 2009-12-03 12:16:17 UTC
I don't have a "last known good version" with KMS enabled.

It seems to be working fine now (2.6.32-rc8).  I use hibernation frequently and I've left KMS enabled, so I'll notice if it breaks again :).
Comment 14 Jesse Barnes 2009-12-03 17:13:26 UTC
Ok, thanks for the update.