Bug 53591 - core dump through pipe during restoring from suspend-to-disk hangs the system somewhere in I/O area
Summary: core dump through pipe during restoring from suspend-to-disk hangs the system...
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Aaron Lu
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-11 10:20 UTC by Ivan Matylitski
Modified: 2013-04-10 01:24 UTC (History)
2 users (show)

See Also:
Kernel Version: 3.5.0-24-generic #37-Ubuntu SMP Thu Feb 7 05:32:22 UTC 2013 i686 i686 i686 GNU/Linux
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Kernel log (113.92 KB, text/x-log)
2013-02-11 10:20 UTC, Ivan Matylitski
Details
dmesg dump (63.43 KB, application/octet-stream)
2013-02-11 10:23 UTC, Ivan Matylitski
Details
Previous dmesg dump (62.39 KB, application/octet-stream)
2013-02-11 10:23 UTC, Ivan Matylitski
Details
entire syslog (188.08 KB, application/octet-stream)
2013-02-11 10:26 UTC, Ivan Matylitski
Details
DMI decode (10.86 KB, text/plain)
2013-02-11 10:28 UTC, Ivan Matylitski
Details
PCI devices (2.18 KB, text/plain)
2013-02-11 10:29 UTC, Ivan Matylitski
Details
USB devices (530 bytes, text/plain)
2013-02-11 10:30 UTC, Ivan Matylitski
Details

Description Ivan Matylitski 2013-02-11 10:20:31 UTC
Created attachment 93031 [details]
Kernel log

CUPS system provided with Ubuntu has issues with restoring from suspend state. It's just a beginning of the story to provide you with some ideas with what happens during my restoring-from-hibernate. Sometimes CUPS crashes with SIGSEGV. Ubuntu has it's own reporting system named apport. Point of entrance is /usr/share/apport/apport which is being called by kernel because of following core_dump settings:

$ cat /proc/sys/kernel/core_pattern 
|/usr/share/apport/apport %p %s %c

From here I have just assumptions about architecture. I assume that kernel-invoked apport program looks for available user session for displaying nice GUI dialog talking to user about problem.

When this happens (80% of times) just after restarting user processes, troubles strike. System dives in I/O wait which is being accounted as 100% CPU using. Sometimes, system gets to X session. Sometimes not. Nevertheless, X session is totally unusable because of high level of I/O wait. I used to think that I have trouble with my HD. But I've checked it several times with rigorous SMART tests and with badblocks utility. Everything looks fine with my HD.

One of justification of my assumption might be the fact that I started experiencing problem just after I had installed full CUPS which is suffering from random SIGSEGV during wake-up from hibernation.

I will try to provide all possible system info. Let me know if I haven't covered something.

Here is uname output:

Linux badtop 3.5.0-24-generic #37-Ubuntu SMP Thu Feb 7 05:32:22 UTC 2013 i686 i686 i686 GNU/Linux

Here is fragment with observable failure from kern.log:
...
Feb 11 11:44:25 badtop kernel: [89118.281400] Restarting tasks ... done.
Feb 11 11:44:25 badtop kernel: [89118.286213] PM: Basic memory bitmaps freed
Feb 11 11:44:25 badtop kernel: [89118.286388] video LNXVIDEO:00: Restoring backlight state
Feb 11 11:44:25 badtop kernel: [89118.346456] sdhci-pci 0000:01:01.1: Will use DMA mode even though HW doesn't fully claim to support it.
Feb 11 11:44:25 badtop kernel: [89118.410782] sdhci-pci 0000:01:01.1: Will use DMA mode even though HW doesn't fully claim to support it.
Feb 11 11:44:25 badtop kernel: [89118.479685] sdhci-pci 0000:01:01.1: Will use DMA mode even though HW doesn't fully claim to support it.
Feb 11 11:44:25 badtop kernel: [89118.551730] sdhci-pci 0000:01:01.1: Will use DMA mode even though HW doesn't fully claim to support it.
Feb 11 11:44:42 badtop kernel: [89161.777914] cupsd[16764]: segfault at 4 ip b756e0d2 sp b704ac98 error 6 in libdbus-1.so.3.7.2[b7544000+48000]
Feb 11 11:46:42 badtop kernel: [89282.116090] INFO: task cupsd:16759 blocked for more than 120 seconds.
Feb 11 11:46:42 badtop kernel: [89282.116095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 11 11:46:42 badtop kernel: [89282.116098] cupsd           D f7bdce00     0 16759      1 0x00000080
Feb 11 11:46:42 badtop kernel: [89282.116103]  f3ab1de0 00000082 f7bdce00 f7bdce00 f3ab1d78 c1841240 c93d0000 00005117
Feb 11 11:46:42 badtop kernel: [89282.116111]  c196de00 c196de00 00000000 00005117 f7bdce00 f392e580 c50b2a0a c93d1db4
Feb 11 11:46:42 badtop kernel: [89282.116118]  c93d1e58 00000001 f3ab1da8 c1076370 f3ab1dcc c106db65 00000000 00000000
Feb 11 11:46:42 badtop kernel: [89282.116126] Call Trace:
Feb 11 11:46:42 badtop kernel: [89282.116136]  [<c1076370>] ? default_wake_function+0x10/0x20
Feb 11 11:46:42 badtop kernel: [89282.116141]  [<c106db65>] ? __wake_up_common+0x45/0x70
Feb 11 11:46:42 badtop kernel: [89282.116145]  [<c1070c6e>] ? complete+0x4e/0x60
Feb 11 11:46:42 badtop kernel: [89282.116150]  [<c15cac73>] schedule+0x23/0x60
Feb 11 11:46:42 badtop kernel: [89282.116154]  [<c104a345>] exit_mm+0x75/0x110
Feb 11 11:46:42 badtop kernel: [89282.116158]  [<c104a505>] do_exit+0x125/0x7a0
Feb 11 11:46:42 badtop kernel: [89282.116163]  [<c1161785>] ? switch_names+0x95/0xa0
Feb 11 11:46:42 badtop kernel: [89282.116167]  [<c1056cc7>] ? recalc_sigpending+0x17/0x50
Feb 11 11:46:42 badtop kernel: [89282.116176]  [<c1056f72>] ? dequeue_signal+0x32/0x180
Feb 11 11:46:42 badtop kernel: [89282.116180]  [<c15cb888>] ? _raw_spin_lock_irq+0x18/0x20
Feb 11 11:46:42 badtop kernel: [89282.116184]  [<c104ae24>] do_group_exit+0x34/0xa0
Feb 11 11:46:42 badtop kernel: [89282.116189]  [<c1059095>] get_signal_to_deliver+0x175/0x5a0
Feb 11 11:46:42 badtop kernel: [89282.116193]  [<c1169a8a>] ? mntput_no_expire+0x3a/0x120
Feb 11 11:46:42 badtop kernel: [89282.116197]  [<c11589bb>] ? putname+0x2b/0x40
Feb 11 11:46:42 badtop kernel: [89282.116202]  [<c101091d>] do_signal+0x2d/0x890
Feb 11 11:46:42 badtop kernel: [89282.116207]  [<c10cc1cc>] ? call_rcu_sched+0x1c/0x20
Feb 11 11:46:42 badtop kernel: [89282.116210]  [<c11618fd>] ? d_free+0x3d/0x50
Feb 11 11:46:42 badtop kernel: [89282.116215]  [<c10b6cde>] ? __audit_syscall_exit+0x36e/0x3c0
Feb 11 11:46:42 badtop kernel: [89282.116218]  [<c10b6cde>] ? __audit_syscall_exit+0x36e/0x3c0
Feb 11 11:46:42 badtop kernel: [89282.116222]  [<c10b6cde>] ? __audit_syscall_exit+0x36e/0x3c0
Feb 11 11:46:42 badtop kernel: [89282.116226]  [<c10113a5>] do_notify_resume+0x85/0xb0
Feb 11 11:46:42 badtop kernel: [89282.116230]  [<c15cbd2d>] work_notifysig+0x30/0x37
Feb 11 11:46:42 badtop kernel: [89282.116234] INFO: task cupsd:20779 blocked for more than 120 seconds.
Feb 11 11:46:42 badtop kernel: [89282.116236] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 11 11:46:42 badtop kernel: [89282.116238] cupsd           D 000080d0     0 20779      1 0x00000080
Feb 11 11:46:42 badtop kernel: [89282.116242]  f25ddde0 00000082 c10cfc95 000080d0 f4801588 f48958d0 f6ea6000 00005117
Feb 11 11:46:42 badtop kernel: [89282.116249]  c196de00 c196de00 981a9bec 00005117 f7beae00 f22c4c20 00000000 800c000b
Feb 11 11:46:42 badtop kernel: [89282.116256]  f25ddf4c 00000000 00000000 f25dddac ffffffec 00000800 f3afea80 f22c4c20
Feb 11 11:46:42 badtop kernel: [89282.116263] Call Trace:
Feb 11 11:46:42 badtop kernel: [89282.116267]  [<c10cfc95>] ? taskstats_exit+0x235/0x3a0
Feb 11 11:46:42 badtop kernel: [89282.116273]  [<c104290f>] ? mm_release+0x3f/0x120
Feb 11 11:46:42 badtop kernel: [89282.116276]  [<c10cef42>] ? __delayacct_add_tsk+0x212/0x2c0
Feb 11 11:46:42 badtop kernel: [89282.116280]  [<c15cac73>] schedule+0x23/0x60
Feb 11 11:46:42 badtop kernel: [89282.116283]  [<c104a345>] exit_mm+0x75/0x110
Feb 11 11:46:42 badtop kernel: [89282.116287]  [<c104a505>] do_exit+0x125/0x7a0
Feb 11 11:46:42 badtop kernel: [89282.116292]  [<c12d4bd1>] ? _copy_from_user+0x41/0x60
Feb 11 11:46:42 badtop kernel: [89282.116296]  [<c1056cc7>] ? recalc_sigpending+0x17/0x50
Feb 11 11:46:42 badtop kernel: [89282.116299]  [<c1056f72>] ? dequeue_signal+0x32/0x180
Feb 11 11:46:42 badtop kernel: [89282.116303]  [<c104ae24>] do_group_exit+0x34/0xa0
Feb 11 11:46:42 badtop kernel: [89282.116307]  [<c1165f14>] ? file_update_time+0x84/0xb0
Feb 11 11:46:42 badtop kernel: [89282.116311]  [<c1059095>] get_signal_to_deliver+0x175/0x5a0
Feb 11 11:46:42 badtop kernel: [89282.116315]  [<c1157737>] ? pipe_write+0x377/0x480
Feb 11 11:46:42 badtop kernel: [89282.116319]  [<c101091d>] do_signal+0x2d/0x890
Feb 11 11:46:42 badtop kernel: [89282.116323]  [<c1186929>] ? fsnotify+0x1e9/0x2d0
Feb 11 11:46:42 badtop kernel: [89282.116327]  [<c10b6cde>] ? __audit_syscall_exit+0x36e/0x3c0
Feb 11 11:46:42 badtop kernel: [89282.116331]  [<c10b6cde>] ? __audit_syscall_exit+0x36e/0x3c0
Feb 11 11:46:42 badtop kernel: [89282.116335]  [<c10113a5>] do_notify_resume+0x85/0xb0
Feb 11 11:46:42 badtop kernel: [89282.116339]  [<c15cbd2d>] work_notifysig+0x30/0x37
Feb 11 11:48:05 badtop kernel: [89364.433933] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
Feb 11 11:48:05 badtop kernel: [89364.435893] 8139too 0000:01:07.0: eth0: link up, 100Mbps, full-duplex, lpa 0x43E1
Feb 11 11:52:06 badtop kernel: imklog 5.8.6, log source = /proc/kmsg started.
Feb 11 11:52:06 badtop kernel: [    0.000000] Initializing cgroup subsys cpuset
Feb 11 11:52:06 badtop kernel: [    0.000000] Initializing cgroup subsys cpu
Feb 11 11:52:06 badtop kernel: [    0.000000] Linux version 3.5.0-24-generic (buildd@akateko) (gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-2ubuntu1) ) #37-Ubuntu SMP Thu Feb 7 05:32:22 UTC 2013 (Ubuntu 3.5.0-24.37-generic 3.5.7.4)
...
Comment 1 Ivan Matylitski 2013-02-11 10:23:13 UTC
Created attachment 93041 [details]
dmesg dump
Comment 2 Ivan Matylitski 2013-02-11 10:23:44 UTC
Created attachment 93051 [details]
Previous dmesg dump
Comment 3 Ivan Matylitski 2013-02-11 10:26:08 UTC
Created attachment 93061 [details]
entire syslog
Comment 4 Ivan Matylitski 2013-02-11 10:28:26 UTC
Created attachment 93071 [details]
DMI decode
Comment 5 Ivan Matylitski 2013-02-11 10:29:17 UTC
Created attachment 93081 [details]
PCI devices
Comment 6 Ivan Matylitski 2013-02-11 10:30:07 UTC
Created attachment 93091 [details]
USB devices
Comment 7 Aaron Lu 2013-02-21 06:48:10 UTC
Hi Ivan,

Thanks for the report. Can you please try to identify which kernels have this problem and which don't(e.g. 3.1, 3.2, etc.)? Thanks.
Comment 8 Ivan Matylitski 2013-02-21 08:00:03 UTC
(In reply to comment #7)
> Hi Ivan,
> 
> Thanks for the report. Can you please try to identify which kernels have this
> problem and which don't(e.g. 3.1, 3.2, etc.)? Thanks.

Will be done definitely. Not sure about terms
Comment 9 Aaron Lu 2013-04-07 06:31:19 UTC
Hi Ivan,

Is there any update? Thanks.
Comment 10 Ivan Matylitski 2013-04-09 19:08:27 UTC
(In reply to comment #9)
> Hi Ivan,
> 
> Is there any update? Thanks.
Hi Aaron,

Really sorry for delay. Looks like it was the last song of my died hard-drive. A month after this issue post, SMART signaled me that I got unrecoverable and unwearable bad sectors. Just after replacing my old spindle HD with brand new SSD (accidentally, from Intel :-) ), several problems with programs (including CUPS daemon) crashes during wake-up disappeared. Everything is fine now.

Thank you for your answer and persistence in improving Linux kernel!
Comment 11 Aaron Lu 2013-04-10 01:24:01 UTC
Hi Ivan,

It's great to see you don't have problems, thanks for reporting back.

Note You need to log in before you can comment on or make changes to this bug.