|Summary:||Thinkpad T61 after suspend to ram ide cdrom doesn't work|
|Product:||IO/Storage||Reporter:||Ivan Volosyuk (ivan.volosyuk)|
|Severity:||normal||CC:||akpm, hancockrwd, rjw|
|Kernel Version:||2.6.34 - 2.6.37||Tree:||Mainline|
|Bug Depends on:|
dmesg from 2.6.34-rc7
dmesg output on Linux 2.6.37
Description Ivan Volosyuk 2010-05-08 09:16:32 UTC
Created attachment 26279 [details] dmesg output If mplayer or other player access DVD (/dev/hda) when doing suspend to ram following way: echo mem >/sys/power/state the task hangs and dvd drive is no longer accessible. If no tasks use DVD (/dev/hda) - dvd is accessible after resume and works well. Distribution: Gentoo Kernel: 220.127.116.11 (vanilla)
Comment 1 Ivan Volosyuk 2010-05-08 09:23:11 UTC
The bug is also reproducible in 18.104.22.168 kernel. I can provide System.map and .config if needed.
Comment 2 Andrew Morton 2010-05-10 22:40:37 UTC
(recategorised as ACPI, although I don't think that's right). Rafael, could you please take a look? Shouldn't all the tasks be stopped at this stage? How can any task run to access the DVD while a suspend is in progress?
Comment 3 Rafael J. Wysocki 2010-05-10 23:13:02 UTC
Apparently, the driver doesn't survive suspend/resume if an application is attached to it while suspending. Ivan, does 2.6.34-rc7 still have this problem?
Comment 4 Ivan Volosyuk 2010-05-11 03:58:40 UTC
Created attachment 26329 [details] dmesg from 2.6.34-rc7 Yes, it is reproducible in 2.6.34-rc7. 'top' shows that one core is 100% busy doing io: Cpu0 : 0.3%us, 0.3%sy, 0.0%ni, 0.0%id, 99.3%wa, 0.0%hi, 0.0%si, 0.0%st Device becomes unusable - every process accessing it blocks. New process 'cat' trying to access the device hangs with following stacktrace: [ 2040.744572] INFO: task cat:14994 blocked for more than 120 seconds. [ 2040.744576] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2040.744581] cat D c135f640 0 14994 14310 0x00000000 [ 2040.744590] f5942890 00200086 00000000 c135f640 f5942a24 00000001 c135f640 c135f640 [ 2040.744603] f5942890 00000001 f6fd4e10 f5942890 f6fd4e18 f6fd4e14 c120e107 00000002 [ 2040.744616] 00300000 00000000 f6fd4e18 f6fd4e18 f5942890 ffffffff 00000000 00000000 [ 2040.744629] Call Trace: [ 2040.744638] [<c120e107>] ? __mutex_lock_common+0x14a/0x1f2 [ 2040.744648] [<c120e1be>] ? __mutex_lock_slowpath+0xf/0x11 [ 2040.744655] [<c120df33>] ? mutex_lock+0x10/0x1d [ 2040.744663] [<c120df33>] ? mutex_lock+0x10/0x1d [ 2040.744671] [<c109dbed>] ? __blkdev_get+0x50/0x2bf [ 2040.744681] [<c109e3c8>] ? blkdev_open+0x0/0x87 [ 2040.744688] [<c109e424>] ? blkdev_open+0x5c/0x87 [ 2040.744697] [<c107f2af>] ? __dentry_open+0x120/0x207 [ 2040.744706] [<c107f426>] ? nameidata_to_filp+0x29/0x39 [ 2040.744714] [<c10884af>] ? do_last+0x376/0x46e [ 2040.744721] [<c108894c>] ? do_filp_open+0x3a5/0x460 [ 2040.744732] [<c107f097>] ? do_sys_open+0x49/0xe2 [ 2040.744740] [<c107f174>] ? sys_open+0x1e/0x23 [ 2040.744747] [<c1002650>] ? sysenter_do_call+0x12/0x26 I am able to pull the git version of kernel if you make a fix for this.
Comment 5 Ivan Volosyuk 2010-05-17 11:50:09 UTC
Reproducible on 2.6.34. Is it possible to workaround the problem to invalidate all open files before suspend? Or does the ide-cd driver normally survives the suspend/resume cycle? I'm going to try force module unload for 'ide_cd_mod'.
Comment 6 Rafael J. Wysocki 2011-01-16 22:32:40 UTC
Is the problem still present in 2.6.37?
Comment 7 Ivan Volosyuk 2011-01-17 00:04:53 UTC
Let me build the 2.6.37. For now I can say that it is reproducible on 22.214.171.124 with tuxonice-3.2-rc2 patch.
Comment 8 Ivan Volosyuk 2011-01-17 01:58:25 UTC
Still reproducible on vanilla Linux 2.6.37 (3c0eee3fe6) "cat /dev/hda" is hang after resuming, while top shows that one processor core busy with iowait. Access to /dev/hda hangs processes $ uname -a Linux vol-1 2.6.37 #27 SMP PREEMPT Mon Jan 17 11:43:42 EST 2011 i686 Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz GenuineIntel GNU/Linux $ gcc --version gcc (Gentoo 4.4.4-r2 p1.2, pie-0.4.5) 4.4.4 $ ld -V GNU ld (GNU Binutils) 126.96.36.19900303 $ top | head -n 5 top - 12:53:44 up 15 min, 4 users, load average: 0.99, 1.19, 1.68 Tasks: 118 total, 1 running, 117 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 50.0%id, 49.8%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2057680k total, 117936k used, 1939744k free, 29284k buffers Swap: 5681788k total, 0k used, 5681788k free, 47216k cached $ cat /proc/9002/stack [<c10b1930>] __blkdev_get+0x39/0x289 [<c10b1b8a>] blkdev_get+0xa/0xc [<c10b1c15>] blkdev_open+0x89/0xbd [<c108ff97>] __dentry_open+0x126/0x20a [<c109013c>] nameidata_to_filp+0x46/0x52 [<c1099805>] do_last+0x31c/0x406 [<c109aba9>] do_filp_open+0x194/0x3d4 [<c108fd73>] do_sys_open+0x51/0xdd [<c108fe49>] sys_open+0x23/0x29 [<c1002850>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff
Comment 9 Ivan Volosyuk 2011-01-17 02:05:15 UTC
Created attachment 43762 [details] dmesg output on Linux 2.6.37 Interesting part from attached dmesg output: [ 645.984069] PM: Entering mem sleep .... [ 1080.436094] INFO: task cat:8617 blocked for more than 120 seconds. [ 1080.436096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1080.436098] cat D f2839c5c 0 8617 8611 0x00000004 [ 1080.436102] f2839c6c 00000046 00000002 f2839c5c f2839be4 c108e7a2 f5c03640 f5c14a40 [ 1080.436108] c13eba80 f2a2ee20 c13eba80 f2a2f08c f2a2f088 f2a2f088 c13eba80 c13eba80 [ 1080.436114] 90090723 000000ce 8fd114ab 000000ce f2a2ee20 00000002 ffffffff f6428e80 [ 1080.436121] Call Trace: [ 1080.436125] [<c108e7a2>] ? slab_destroy+0x2e/0x42 [ 1080.436130] [<c1062c53>] ? call_rcu+0xd/0xf [ 1080.436134] [<c124137a>] io_schedule+0x2b/0x3d [ 1080.436138] [<c106fb7d>] sync_page+0x3c/0x40 [ 1080.436141] [<c1241608>] __wait_on_bit_lock+0x34/0x71 [ 1080.436144] [<c106fb41>] ? sync_page+0x0/0x40 [ 1080.436147] [<c106fb2e>] __lock_page+0x71/0x78 [ 1080.436151] [<c103d74a>] ? wake_bit_function+0x0/0x39 [ 1080.436155] [<c1076fe3>] truncate_inode_pages_range+0x1d7/0x22f [ 1080.436164] [<c1077055>] truncate_inode_pages+0x1a/0x1c [ 1080.436167] [<c10b0c90>] kill_bdev+0x2c/0x2f [ 1080.436170] [<c10b17f3>] __blkdev_put+0x3e/0xee [ 1080.436173] [<c10b18ad>] blkdev_put+0xa/0xc [ 1080.436175] [<c10b18d4>] blkdev_close+0x25/0x29 [ 1080.436178] [<c1092493>] fput+0xda/0x160 [ 1080.436181] [<c108fc7a>] filp_close+0x55/0x5d [ 1080.436184] [<c102c48a>] put_files_struct+0x66/0xb4 [ 1080.436187] [<c102c50f>] exit_files+0x37/0x3c [ 1080.436189] [<c102d82a>] do_exit+0x1c7/0x5ba [ 1080.436192] [<c1034c75>] ? __dequeue_signal+0x13/0xd7 [ 1080.436195] [<c1035065>] ? dequeue_signal+0xcd/0x13b [ 1080.436197] [<c102dc7f>] do_group_exit+0x62/0x85 [ 1080.436200] [<c103607d>] get_signal_to_deliver+0x340/0x357 [ 1080.436203] [<c10022ab>] do_signal+0x5d/0x505 [ 1080.436206] [<c1091129>] ? do_sync_read+0x9c/0xd6 [ 1080.436210] [<c1090def>] ? fsnotify_modify+0x60/0x68 [ 1080.436212] [<c109108d>] ? do_sync_read+0x0/0xd6 [ 1080.436215] [<c10919f3>] ? vfs_read+0x79/0xad [ 1080.436217] [<c1091ad5>] ? sys_read+0x55/0x5e [ 1080.436220] [<c1002763>] do_notify_resume+0x10/0x6d [ 1080.436223] [<c1242c2a>] work_notifysig+0x13/0x19
Comment 10 Ivan Volosyuk 2011-01-17 02:11:01 UTC
Sorry, but I can't change status of the bug from NEEDINFO to ACCEPTED or something like that.
Comment 11 Rafael J. Wysocki 2011-01-17 22:13:43 UTC
IDE drivers are in a "cold maintenance" mode, so to speak, and I don't think this issue is going to be fixed. Any chance to try PATA/libata instead?
Comment 12 Ivan Volosyuk 2011-01-17 22:19:59 UTC
Do I understand you correctly, the ide_cd_mod is legacy ide driver? On Jan 18, 2011 9:13 AM, <firstname.lastname@example.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15941 > > > > > > --- Comment #11 from Rafael J. Wysocki <email@example.com> 2011-01-17 22:13:43 --- > IDE drivers are in a "cold maintenance" mode, so to speak, and I don't think > this issue is going to be fixed. > > Any chance to try PATA/libata instead? > > -- > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug.
Comment 13 Rafael J. Wysocki 2011-01-17 22:58:48 UTC
Yes, it is. The new driver is cdrom depending on sr_mod (SCSI stack). Can you try this one, please? You'll need the right PATA driver for your controller too.
Comment 14 Ivan Volosyuk 2011-01-18 05:44:06 UTC
With libata the bug is not reproducible, no hang processes, but current read fails with: [ 332.456512] sr 0:0:0:0: [sr0] Media Changed [ 332.456514] sr 0:0:0:0: [sr0] Result: hostbyte=0x00 driverbyte=0x08 [ 332.456517] sr 0:0:0:0: [sr0] Sense Key : 0x6 [current] [ 332.456521] sr 0:0:0:0: [sr0] ASC=0x28 ASCQ=0x0 [ 332.456524] sr 0:0:0:0: [sr0] CDB: cdb=0x28: 28 00 00 00 51 60 00 00 40 00 [ 332.456532] end_request: I/O error, dev sr0, sector 83328 This is probably expected behavior. It could be fine with me, but AFAIR the libsata driver had pretty fragile error handling on sense errors. When using 'safecopy' program the kernel or device often get stuck in some weird state. Could you take a look at it? I can try to reproduce it, but I'm not sure what information will be useful for a bug report... On Tue, Jan 18, 2011 at 9:58 AM, <firstname.lastname@example.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15941 > > > > > > --- Comment #13 from Rafael J. Wysocki <email@example.com> 2011-01-17 22:58:48 --- > Yes, it is. The new driver is cdrom depending on sr_mod (SCSI stack). > > Can you try this one, please? You'll need the right PATA driver for > your controller too. > > -- > Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. >
Comment 15 Rafael J. Wysocki 2011-01-18 19:03:38 UTC
I'm not a libata developer. Apart from this, I think it's better if you file a separate bug under Storage IO/SATA for that issue (please put my address into the CC list in there so that I can reassign it to the right people). I'm closing this one as "won't fix", because I don't think anyone is going to do anything about it at this point.
Comment 16 Robert Hancock 2011-01-19 00:13:21 UTC
(In reply to comment #14) > It could be fine with me, but AFAIR the libsata driver had pretty > fragile error handling on sense errors. When using 'safecopy' program > the kernel or device often get stuck in some weird state. Could you > take a look at it? I can try to reproduce it, but I'm not sure what > information will be useful for a bug report... If you see problems in current kernels, please open a new bug report and include the full dmesg log.