Bug 15941

Summary:	Thinkpad T61 after suspend to ram ide cdrom doesn't work
Product:	IO/Storage	Reporter:	Ivan Volosyuk (ivan.volosyuk)
Component:	IDE	Assignee:	io_ide (io_ide)
Status:	CLOSED WILL_NOT_FIX
Severity:	normal	CC:	akpm, hancockrwd, rjw
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	2.6.34 - 2.6.37	Subsystem:
Regression:	No	Bisected commit-id:
Bug Depends on:
Bug Blocks:	7216
Attachments:	dmesg output dmesg from 2.6.34-rc7 dmesg output on Linux 2.6.37

Description Ivan Volosyuk 2010-05-08 09:16:32 UTC

Created attachment 26279 [details]
dmesg output

If mplayer or other player access DVD (/dev/hda) when doing suspend to ram following way:
echo mem >/sys/power/state
the task hangs and dvd drive is no longer accessible.
If no tasks use DVD (/dev/hda) - dvd is accessible after resume and works well.
Distribution: Gentoo
Kernel: 2.6.33.2 (vanilla)

Comment 1 Ivan Volosyuk 2010-05-08 09:23:11 UTC

The bug is also reproducible in 2.6.32.12 kernel.
I can provide System.map and .config if needed.

Comment 2 Andrew Morton 2010-05-10 22:40:37 UTC

(recategorised as ACPI, although I don't think that's right).

Rafael, could you please take a look?  Shouldn't all the tasks be stopped at this stage?  How can any task run to access the DVD while a suspend is in progress?

Comment 3 Rafael J. Wysocki 2010-05-10 23:13:02 UTC

Apparently, the driver doesn't survive suspend/resume if an application is attached to it while suspending.

Ivan, does 2.6.34-rc7 still have this problem?

Comment 4 Ivan Volosyuk 2010-05-11 03:58:40 UTC

Created attachment 26329 [details]
dmesg from 2.6.34-rc7

Yes, it is reproducible in 2.6.34-rc7.
'top' shows that one core is 100% busy doing io:
Cpu0  :  0.3%us,  0.3%sy,  0.0%ni,  0.0%id, 99.3%wa,  0.0%hi,  0.0%si,  0.0%st

Device becomes unusable - every process accessing it blocks. New process 'cat' trying to access the device hangs with following stacktrace:

[ 2040.744572] INFO: task cat:14994 blocked for more than 120 seconds.
[ 2040.744576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2040.744581] cat           D c135f640     0 14994  14310 0x00000000
[ 2040.744590]  f5942890 00200086 00000000 c135f640 f5942a24 00000001 c135f640 c135f640
[ 2040.744603]  f5942890 00000001 f6fd4e10 f5942890 f6fd4e18 f6fd4e14 c120e107 00000002
[ 2040.744616]  00300000 00000000 f6fd4e18 f6fd4e18 f5942890 ffffffff 00000000 00000000
[ 2040.744629] Call Trace:
[ 2040.744638]  [<c120e107>] ? __mutex_lock_common+0x14a/0x1f2
[ 2040.744648]  [<c120e1be>] ? __mutex_lock_slowpath+0xf/0x11
[ 2040.744655]  [<c120df33>] ? mutex_lock+0x10/0x1d
[ 2040.744663]  [<c120df33>] ? mutex_lock+0x10/0x1d
[ 2040.744671]  [<c109dbed>] ? __blkdev_get+0x50/0x2bf
[ 2040.744681]  [<c109e3c8>] ? blkdev_open+0x0/0x87
[ 2040.744688]  [<c109e424>] ? blkdev_open+0x5c/0x87
[ 2040.744697]  [<c107f2af>] ? __dentry_open+0x120/0x207
[ 2040.744706]  [<c107f426>] ? nameidata_to_filp+0x29/0x39
[ 2040.744714]  [<c10884af>] ? do_last+0x376/0x46e
[ 2040.744721]  [<c108894c>] ? do_filp_open+0x3a5/0x460
[ 2040.744732]  [<c107f097>] ? do_sys_open+0x49/0xe2
[ 2040.744740]  [<c107f174>] ? sys_open+0x1e/0x23
[ 2040.744747]  [<c1002650>] ? sysenter_do_call+0x12/0x26

I am able to pull the git version of kernel if you make a fix for this.

Comment 5 Ivan Volosyuk 2010-05-17 11:50:09 UTC

Reproducible on 2.6.34.

Is it possible to workaround the problem to invalidate all open files before suspend? Or does the ide-cd driver normally survives the suspend/resume cycle? I'm going to try force module unload for 'ide_cd_mod'.

Comment 6 Rafael J. Wysocki 2011-01-16 22:32:40 UTC

Is the problem still present in 2.6.37?

Comment 7 Ivan Volosyuk 2011-01-17 00:04:53 UTC

Let me build the 2.6.37. For now I can say that it is reproducible on 2.6.36.2 with tuxonice-3.2-rc2 patch.

Comment 8 Ivan Volosyuk 2011-01-17 01:58:25 UTC

Still reproducible on vanilla Linux 2.6.37 (3c0eee3fe6)
"cat /dev/hda" is hang after resuming, while top shows that one processor core busy with iowait. Access to /dev/hda hangs processes

$ uname -a
Linux vol-1 2.6.37 #27 SMP PREEMPT Mon Jan 17 11:43:42 EST 2011 i686 Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz GenuineIntel GNU/Linux

$ gcc --version
gcc (Gentoo 4.4.4-r2 p1.2, pie-0.4.5) 4.4.4

$ ld -V
GNU ld (GNU Binutils) 2.20.1.20100303

$ top | head -n 5
top - 12:53:44 up 15 min,  4 users,  load average: 0.99, 1.19, 1.68
Tasks: 118 total,   1 running, 117 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 50.0%id, 49.8%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   2057680k total,   117936k used,  1939744k free,    29284k buffers
Swap:  5681788k total,        0k used,  5681788k free,    47216k cached

$ cat /proc/9002/stack
[<c10b1930>] __blkdev_get+0x39/0x289
[<c10b1b8a>] blkdev_get+0xa/0xc
[<c10b1c15>] blkdev_open+0x89/0xbd
[<c108ff97>] __dentry_open+0x126/0x20a
[<c109013c>] nameidata_to_filp+0x46/0x52
[<c1099805>] do_last+0x31c/0x406
[<c109aba9>] do_filp_open+0x194/0x3d4
[<c108fd73>] do_sys_open+0x51/0xdd
[<c108fe49>] sys_open+0x23/0x29
[<c1002850>] sysenter_do_call+0x12/0x26
[<ffffffff>] 0xffffffff

Comment 9 Ivan Volosyuk 2011-01-17 02:05:15 UTC

Created attachment 43762 [details]
dmesg output on Linux 2.6.37

Interesting part from attached dmesg output:
[  645.984069] PM: Entering mem sleep
....

[ 1080.436094] INFO: task cat:8617 blocked for more than 120 seconds.
[ 1080.436096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.436098] cat           D f2839c5c     0  8617   8611 0x00000004
[ 1080.436102]  f2839c6c 00000046 00000002 f2839c5c f2839be4 c108e7a2 f5c03640 f5c14a40
[ 1080.436108]  c13eba80 f2a2ee20 c13eba80 f2a2f08c f2a2f088 f2a2f088 c13eba80 c13eba80
[ 1080.436114]  90090723 000000ce 8fd114ab 000000ce f2a2ee20 00000002 ffffffff f6428e80
[ 1080.436121] Call Trace:
[ 1080.436125]  [<c108e7a2>] ? slab_destroy+0x2e/0x42
[ 1080.436130]  [<c1062c53>] ? call_rcu+0xd/0xf
[ 1080.436134]  [<c124137a>] io_schedule+0x2b/0x3d
[ 1080.436138]  [<c106fb7d>] sync_page+0x3c/0x40
[ 1080.436141]  [<c1241608>] __wait_on_bit_lock+0x34/0x71
[ 1080.436144]  [<c106fb41>] ? sync_page+0x0/0x40
[ 1080.436147]  [<c106fb2e>] __lock_page+0x71/0x78
[ 1080.436151]  [<c103d74a>] ? wake_bit_function+0x0/0x39
[ 1080.436155]  [<c1076fe3>] truncate_inode_pages_range+0x1d7/0x22f
[ 1080.436164]  [<c1077055>] truncate_inode_pages+0x1a/0x1c
[ 1080.436167]  [<c10b0c90>] kill_bdev+0x2c/0x2f
[ 1080.436170]  [<c10b17f3>] __blkdev_put+0x3e/0xee
[ 1080.436173]  [<c10b18ad>] blkdev_put+0xa/0xc
[ 1080.436175]  [<c10b18d4>] blkdev_close+0x25/0x29
[ 1080.436178]  [<c1092493>] fput+0xda/0x160
[ 1080.436181]  [<c108fc7a>] filp_close+0x55/0x5d
[ 1080.436184]  [<c102c48a>] put_files_struct+0x66/0xb4
[ 1080.436187]  [<c102c50f>] exit_files+0x37/0x3c
[ 1080.436189]  [<c102d82a>] do_exit+0x1c7/0x5ba
[ 1080.436192]  [<c1034c75>] ? __dequeue_signal+0x13/0xd7
[ 1080.436195]  [<c1035065>] ? dequeue_signal+0xcd/0x13b
[ 1080.436197]  [<c102dc7f>] do_group_exit+0x62/0x85
[ 1080.436200]  [<c103607d>] get_signal_to_deliver+0x340/0x357
[ 1080.436203]  [<c10022ab>] do_signal+0x5d/0x505
[ 1080.436206]  [<c1091129>] ? do_sync_read+0x9c/0xd6
[ 1080.436210]  [<c1090def>] ? fsnotify_modify+0x60/0x68
[ 1080.436212]  [<c109108d>] ? do_sync_read+0x0/0xd6
[ 1080.436215]  [<c10919f3>] ? vfs_read+0x79/0xad
[ 1080.436217]  [<c1091ad5>] ? sys_read+0x55/0x5e
[ 1080.436220]  [<c1002763>] do_notify_resume+0x10/0x6d
[ 1080.436223]  [<c1242c2a>] work_notifysig+0x13/0x19

Comment 10 Ivan Volosyuk 2011-01-17 02:11:01 UTC

Sorry, but I can't change status of the bug from NEEDINFO to ACCEPTED or something like that.

Comment 11 Rafael J. Wysocki 2011-01-17 22:13:43 UTC

IDE drivers are in a "cold maintenance" mode, so to speak, and I don't think
this issue is going to be fixed.

Any chance to try PATA/libata instead?

Comment 12 Ivan Volosyuk 2011-01-17 22:19:59 UTC

Do I understand you correctly, the ide_cd_mod is legacy ide driver?
On Jan 18, 2011 9:13 AM, <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15941
>
>
>
>
>
> --- Comment #11 from Rafael J. Wysocki <rjw@sisk.pl> 2011-01-17 22:13:43
---
> IDE drivers are in a "cold maintenance" mode, so to speak, and I don't
think
> this issue is going to be fixed.
>
> Any chance to try PATA/libata instead?
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.

Comment 13 Rafael J. Wysocki 2011-01-17 22:58:48 UTC

Yes, it is.  The new driver is cdrom depending on sr_mod (SCSI stack).

Can you try this one, please?  You'll need the right PATA driver for
your controller too.

Comment 14 Ivan Volosyuk 2011-01-18 05:44:06 UTC

With libata the bug is not reproducible, no hang processes, but
current read fails with:

[  332.456512] sr 0:0:0:0: [sr0] Media Changed
[  332.456514] sr 0:0:0:0: [sr0]  Result: hostbyte=0x00 driverbyte=0x08
[  332.456517] sr 0:0:0:0: [sr0]  Sense Key : 0x6 [current]
[  332.456521] sr 0:0:0:0: [sr0]  ASC=0x28 ASCQ=0x0
[  332.456524] sr 0:0:0:0: [sr0] CDB: cdb[0]=0x28: 28 00 00 00 51 60 00 00 40 00
[  332.456532] end_request: I/O error, dev sr0, sector 83328

This is probably expected behavior.

It could be fine with me, but AFAIR the libsata driver had pretty
fragile error handling on sense errors. When using 'safecopy' program
the kernel or device often get stuck in some weird state. Could you
take a look at it? I can try to reproduce it, but I'm not sure what
information will be useful for a bug report...

On Tue, Jan 18, 2011 at 9:58 AM,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15941
>
>
>
>
>
> --- Comment #13 from Rafael J. Wysocki <rjw@sisk.pl>  2011-01-17 22:58:48 ---
> Yes, it is.  The new driver is cdrom depending on sr_mod (SCSI stack).
>
> Can you try this one, please?  You'll need the right PATA driver for
> your controller too.
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.
>

Comment 15 Rafael J. Wysocki 2011-01-18 19:03:38 UTC

I'm not a libata developer.  Apart from this, I think it's better if you file
a separate bug under Storage IO/SATA for that issue (please put my address
into the CC list in there so that I can reassign it to the right people).

I'm closing this one as "won't fix", because I don't think anyone is going to
do anything about it at this point.

Comment 16 Robert Hancock 2011-01-19 00:13:21 UTC

(In reply to comment #14)
> It could be fine with me, but AFAIR the libsata driver had pretty
> fragile error handling on sense errors. When using 'safecopy' program
> the kernel or device often get stuck in some weird state. Could you
> take a look at it? I can try to reproduce it, but I'm not sure what
> information will be useful for a bug report...

If you see problems in current kernels, please open a new bug report and include the full dmesg log.