Bug 6811
Summary: | cifs: suspend2ram fails if CIFS server is unreachable at suspend2ram time | ||
---|---|---|---|
Product: | File System | Reporter: | Xu (development--bugzilla.kernel.org) |
Component: | Other | Assignee: | Steve French (sfrench) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, bunk, pavel, rjwysocki |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.18-rc1 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Experimental patch
Modified patch (with try_to_freeze()) Modified patch (with try_to_freeze() in wait_for_connect()) Treat the freezing of current like a pending signal in tcp_recvmsg Fix candidate |
Description
Xu
2006-07-10 17:47:49 UTC
We'll need to find out where cifs_demultiplex_thread is stuck. Could you please get the machine into that state (ie: unplug the cable) and then run echo t > /proc/sysrq-trigger and then locate the trace for cifsd and include it in this report? Thanks. other debug information that might be useful (the stack trace is most useful) 1) echo 1 > /proc/fs/cifs/cifsFYI 2) try to suspend 3) save the dmesg info (e.g. "dmesg > debugdata") Hello, doing "echo t > /proc/sysrq-trigger" to catch "cifsd" in the act is not that easy because at suspend time, all processes are suspended, too. That's why I used "while true; do echo t > /proc/sysrq-trigger; sleep 2; done". Note, that, at the same time, I tried to unmount the cifs share. I suppose that cifsd is stuck at "inet_stream_connect". This is an excerpt of my /var/log/messages: Jul 11 18:16:50 notebook2 kernel: cifsd S 00000000 0 14561 1 14697 5716 (L-TLB) Jul 11 18:16:50 notebook2 kernel: 82adde84 00000000 00000000 00000000 00000006 bd01f5e5 00000000 00000009 Jul 11 18:16:50 notebook2 kernel: b6852560 d93b9747 00000000 0000a10e b6852670 0200a8c0 0700a8c0 01a880d5 Jul 11 18:16:50 notebook2 kernel: 00000000 ffffff8d 85cb4500 7fffffff 82addeb8 80322006 00000006 bd010000 Jul 11 18:16:50 notebook2 kernel: Call Trace: Jul 11 18:16:50 notebook2 kernel: [<80322006>] schedule_timeout+0x76/0xc0 Jul 11 18:16:50 notebook2 kernel: [<80301a23>] inet_stream_connect+0x153/0x270 Jul 11 18:16:50 notebook2 kernel: [<e0da68b9>] ipv4_connect+0x1a9/0x310 [cifs] Jul 11 18:16:50 notebook2 kernel: [<e0da6cbc>] cifs_reconnect+0x29c/0x500 [cifs] Jul 11 18:16:50 notebook2 kernel: [<e0da99bd>] cifs_demultiplex_thread+0x4fd/0xd16 [cifs] Jul 11 18:16:50 notebook2 kernel: umount.cifs S 00000000 0 16259 16258 (NOTLB) Jul 11 18:16:50 notebook2 kernel: d8ca1da0 9fd55c94 00000156 00000000 d8ca1d6c 8020306d 9ffe4f68 00000006 Jul 11 18:16:50 notebook2 kernel: ae965560 e9842ab3 00000010 000909c8 ae965670 d8ca1da0 80126d8e 001480aa Jul 11 18:16:50 notebook2 kernel: 00000000 d8ca1db4 05ace962 00000032 d8ca1dd4 80321fdd d8ca1e4c 9fd55c94 Jul 11 18:16:50 notebook2 kernel: Call Trace: Jul 11 18:16:50 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 11 18:16:50 notebook2 kernel: [<e0d9acbf>] smb_init+0xff/0x2b0 [cifs] Jul 11 18:16:50 notebook2 kernel: [<e0d9af44>] CIFSSMBQFSPosixInfo+0x64/0x250 [cifs] Jul 11 18:16:50 notebook2 kernel: [<e0d9a622>] cifs_statfs+0x122/0x140 [cifs] Jul 11 18:16:50 notebook2 kernel: [<80169ca8>] vfs_statfs+0x68/0x80 Jul 11 18:16:50 notebook2 kernel: [<80169dc8>] vfs_statfs64+0x18/0x30 Jul 11 18:16:50 notebook2 kernel: [<8016b19e>] sys_statfs64+0x5e/0x90 Jul 11 18:16:50 notebook2 kernel: [<80103147>] syscall_call+0x7/0xb Jul 11 18:16:51 notebook2 kernel: cifsoplockd S BDCE7A90 0 5953 6 5954 5259 (L-TLB) Jul 11 18:16:51 notebook2 kernel: be169f54 be169f20 80101c51 bdce7a90 b9dc7e40 191e691d 00000013 0000000a Jul 11 18:16:51 notebook2 kernel: bdce7a90 191e691d 00000013 00000fde bdce7ba0 000018d9 00000000 0038edca Jul 11 18:16:51 notebook2 kernel: 00000000 be169f68 05ad7f2f 00000000 be169f88 80321fdd 00011a06 00000000 Jul 11 18:16:51 notebook2 kernel: Call Trace: Jul 11 18:16:51 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 11 18:16:51 notebook2 kernel: [<e0d9aa5d>] cifs_oplock_thread+0x17d/0x265 [cifs] Jul 11 18:16:51 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 11 18:16:51 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 11 18:16:51 notebook2 kernel: cifsdnotifyd S BDCE7560 0 5954 6 13234 5953 (L-TLB) Jul 11 18:16:51 notebook2 kernel: bdc39f80 bdc39f4c 80101c51 bdce7560 b9dc7e40 649bd5e3 00000012 0000000a Jul 11 18:16:51 notebook2 kernel: bdce7560 649bd5e3 00000012 000010f3 bdce7670 00001bf2 00000000 007d99d8 Jul 11 18:16:51 notebook2 kernel: 00000000 bdc39f94 05ad15b7 00000000 bdc39fb4 80321fdd 0001a9d3 00000000 Jul 11 18:16:51 notebook2 kernel: Call Trace: Jul 11 18:16:51 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 11 18:16:51 notebook2 kernel: [<e0d9a291>] cifs_dnotify_thread+0x41/0xd0 [cifs] Jul 11 18:16:51 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 11 18:16:51 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 11 18:16:52 notebook2 kernel: cifsd S 00000000 0 14561 1 14697 5716 (L-TLB) Jul 11 18:16:52 notebook2 kernel: 82adde84 00000000 00000000 00000000 00000006 bd01f5e5 00000000 00000009 Jul 11 18:16:52 notebook2 kernel: b6852560 d93b9747 00000000 0000a10e b6852670 0200a8c0 0700a8c0 01a880d5 Jul 11 18:16:52 notebook2 kernel: 00000000 ffffff8d 85cb4500 7fffffff 82addeb8 80322006 00000006 bd010000 Jul 11 18:16:52 notebook2 kernel: Call Trace: Jul 11 18:16:52 notebook2 kernel: [<80322006>] schedule_timeout+0x76/0xc0 Jul 11 18:16:52 notebook2 kernel: [<80301a23>] inet_stream_connect+0x153/0x270 Jul 11 18:16:52 notebook2 kernel: [<e0da68b9>] ipv4_connect+0x1a9/0x310 [cifs] Jul 11 18:16:52 notebook2 kernel: [<e0da6cbc>] cifs_reconnect+0x29c/0x500 [cifs] Jul 11 18:16:52 notebook2 kernel: [<e0da99bd>] cifs_demultiplex_thread+0x4fd/0xd16 [cifs] Jul 11 18:16:52 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 11 18:16:52 notebook2 kernel: umount.cifs R running 0 16259 16258 (NOTLB) Jul 11 18:16:52 notebook2 kernel: prepare_suspe S 8015A6DA 0 16267 5754 16288 16289 (NOTLB) Jul 11 18:16:52 notebook2 kernel: c6e09efc 03879045 c6e09f00 8015a6da b5acb940 0b24cc60 00000013 00000001 Jul 11 18:16:52 notebook2 kernel: b8e14a90 0b24cc60 00000013 00021f19 b8e14ba0 003410e5 00000000 03b21ee2 Jul 11 18:16:52 notebook2 kernel: 00000000 ffffffff df237030 00000001 c6e09f8c 80120144 00000001 080cb0e0 Jul 11 18:16:52 notebook2 kernel: Call Trace: Jul 11 18:16:52 notebook2 kernel: [<80120144>] do_wait+0x754/0xc20 Jul 11 18:16:52 notebook2 kernel: [<80120642>] sys_wait4+0x32/0x40 Jul 11 18:16:52 notebook2 kernel: [<80120677>] sys_waitpid+0x27/0x30 Jul 11 18:16:52 notebook2 kernel: [<80103147>] syscall_call+0x7/0xb Jul 11 18:17:16 notebook2 kernel: PM: Preparing system for mem sleep Jul 11 18:17:16 notebook2 kernel: Stopping tasks: ================================================================================================================================================== =================================================================<7>TKIP: replay detected: STA=00:11:d8:8d:57:04 previous TSC 000000000033 received TSC 000000000001 Jul 11 18:17:16 notebook2 kernel: Jul 11 18:17:16 notebook2 kernel: stopping tasks timed out after 20 seconds (1 tasks remaining): Jul 11 18:17:16 notebook2 kernel: cifsd Jul 11 18:17:16 notebook2 kernel: Restarting tasks...<6> Strange, cifsd not stopped Jul 11 18:17:16 notebook2 kernel: SysRq : Show State Jul 11 18:17:16 notebook2 kernel: cifsoplockd S BDCE7A90 0 5953 6 5954 5259 (L-TLB) Jul 11 18:17:16 notebook2 kernel: be169f54 be169f20 80101c51 bdce7a90 b8e00e40 bdce7a90 00000000 0000000a Jul 11 18:17:16 notebook2 kernel: bdce7a90 30dc7097 00000019 00000a37 bdce7ba0 be169f54 80126d8e 0039a149 Jul 11 18:17:16 notebook2 kernel: 00000000 be169f68 05ade565 00000000 be169f88 80321fdd 00017671 00000000 Jul 11 18:17:16 notebook2 kernel: Call Trace: Jul 11 18:17:16 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 11 18:17:16 notebook2 kernel: [<e0d9aa5d>] cifs_oplock_thread+0x17d/0x265 [cifs] Jul 11 18:17:16 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 11 18:17:16 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 11 18:17:16 notebook2 kernel: cifsdnotifyd S BDCE7560 0 5954 6 13234 5953 (L-TLB) Jul 11 18:17:16 notebook2 kernel: bdc39f80 bdc39f4c 80101c51 bdce7560 b8e00e40 bdce7560 00000000 0000000a Jul 11 18:17:16 notebook2 kernel: bdce7560 30dc7769 00000019 000006d2 bdce7670 bdc39f80 80126d8e 007e4665 Jul 11 18:17:16 notebook2 kernel: 00000000 bdc39f94 05ad87a5 00000000 bdc39fb4 80321fdd 0001c686 00000000 Jul 11 18:17:16 notebook2 kernel: Call Trace: Jul 11 18:17:16 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 11 18:17:16 notebook2 kernel: [<e0d9a291>] cifs_dnotify_thread+0x41/0xd0 [cifs] Jul 11 18:17:16 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 11 18:17:16 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 11 18:17:17 notebook2 kernel: cifsd D DF159834 0 14561 1 14697 5716 (L-TLB) Jul 11 18:17:17 notebook2 kernel: 82addef8 00200246 9ffe7240 df159834 82addec8 00200246 c3f79824 00000009 Jul 11 18:17:17 notebook2 kernel: b6852560 b9fd516c 00000018 0000345d b6852670 82addef8 80126d8e 01b089ab Jul 11 18:17:17 notebook2 kernel: 00000000 82addf0c 05ad50fb bdce9525 82addf2c 80321fdd 80341440 80341440 Jul 11 18:17:17 notebook2 kernel: Call Trace: Jul 11 18:17:17 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 11 18:17:17 notebook2 kernel: [<80322068>] schedule_timeout_uninterruptible+0x18/0x20 Jul 11 18:17:17 notebook2 kernel: [<80127e3c>] msleep+0x1c/0x30 Jul 11 18:17:17 notebook2 kernel: [<e0da6c8c>] cifs_reconnect+0x26c/0x500 [cifs] Jul 11 18:17:17 notebook2 kernel: [<e0da99bd>] cifs_demultiplex_thread+0x4fd/0xd16 [cifs] It almost certainly is stuck in schedule_timeout_uninterruptible at 0x80322068, because of which the process in is the D state that prevents suspend from succeeding. Created attachment 8548 [details]
Experimental patch
Could you please test if this patch helps (or breaks anything)?
Hello, I tried your patch. It does not seem to break anything, but it also does not seem to solve the problem (fully). During my suspend2ram attempt, I ran "while true; do echo t > /proc/sysrq-trigger; sleep 2; done" again and I got, among other things,: Jul 14 17:56:54 notebook2 kernel: cifsd S 00000000 0 10167 1 6284 (L-TLB) Jul 14 17:56:54 notebook2 kernel: 974d1e84 00000000 00000000 00000000 00000006 ee260cd2 00000123 0000000a Jul 14 17:56:54 notebook2 kernel: 83c39560 ee2d26b7 00000123 00006154 83c39670 0002ff8e 00000000 a7e7a8c0 Jul 14 17:56:54 notebook2 kernel: 00000004 ffffff8d b6fff700 7fffffff 974d1eb8 80322006 00000006 8b007188 Jul 14 17:56:54 notebook2 kernel: Call Trace: Jul 14 17:56:54 notebook2 kernel: [<80322006>] schedule_timeout+0x76/0xc0 Jul 14 17:56:54 notebook2 kernel: [<80301a23>] inet_stream_connect+0x153/0x270 Jul 14 17:56:54 notebook2 kernel: [<e0df18f4>] ipv4_connect+0x1e4/0x310 [cifs] Jul 14 17:56:54 notebook2 kernel: [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs] Jul 14 17:56:54 notebook2 kernel: [<e0df486a>] cifs_demultiplex_thread+0x3aa/0xd16 [cifs] Jul 14 17:56:54 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 14 17:56:54 notebook2 kernel: PM: Preparing system for mem sleep Jul 14 17:57:14 notebook2 kernel: Stopping tasks: ================================================================================================================================================== ==================================================================== Jul 14 17:57:14 notebook2 kernel: stopping tasks timed out after 20 seconds (1 tasks remaining): Jul 14 17:57:14 notebook2 kernel: cifsd Jul 14 17:57:14 notebook2 kernel: Restarting tasks...<6> Strange, cifsd not stopped Jul 14 17:57:14 notebook2 kernel: done Jul 14 17:57:17 notebook2 kernel: cifsd S C6A2A434 0 10167 1 6284 (L-TLB) Jul 14 17:57:17 notebook2 kernel: 974d1ef8 00200246 9ffe7240 c6a2a434 974d1ec8 7d5c231d 00000129 00000001 Jul 14 17:57:17 notebook2 kernel: 83c39560 86273850 00000129 00002eda 83c39670 03b6750a 00000000 52ed2370 Jul 14 17:57:17 notebook2 kernel: 00000009 974d1f0c 0f4e35d1 8610a925 974d1f2c 80321fdd 80341440 80341440 Jul 14 17:57:17 notebook2 kernel: Call Trace: Jul 14 17:57:17 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 14 17:57:17 notebook2 kernel: [<80322088>] schedule_timeout_interruptible+0x18/0x20 Jul 14 17:57:17 notebook2 kernel: [<8012726f>] msleep_interruptible+0x2f/0x40 Jul 14 17:57:17 notebook2 kernel: [<e0df1dfc>] cifs_reconnect+0x26c/0x500 [cifs] Jul 14 17:57:17 notebook2 kernel: [<e0df486a>] cifs_demultiplex_thread+0x3aa/0xd16 [cifs] Jul 14 17:57:17 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 14 17:57:22 notebook2 kernel: Call Trace: Jul 14 17:57:22 notebook2 kernel: [<80322006>] schedule_timeout+0x76/0xc0 Jul 14 17:57:22 notebook2 kernel: [<80301a23>] inet_stream_connect+0x153/0x270 Jul 14 17:57:22 notebook2 kernel: [<e0df18f4>] ipv4_connect+0x1e4/0x310 [cifs] Jul 14 17:57:22 notebook2 kernel: [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs] Jul 14 17:57:22 notebook2 kernel: [<e0df486a>] cifs_demultiplex_thread+0x3aa/0xd16 [cifs] Jul 14 17:57:22 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 14 17:57:20 notebook2 kernel: cifsd S 00000000 0 10167 1 6284 (L-TLB) Jul 14 17:57:20 notebook2 kernel: 974d1e84 00000000 00000000 00000000 00000006 39477c07 0000012a 0000000a Jul 14 17:57:20 notebook2 kernel: 83c39560 39477c07 0000012a 0000a375 83c39670 0000ab2a 00000000 52edc6e5 Jul 14 17:57:20 notebook2 kernel: 00000009 ffffff8d baea9b80 7fffffff 974d1eb8 80322006 00000006 bd010000 Jul 14 17:57:20 notebook2 kernel: Call Trace: Jul 14 17:57:20 notebook2 kernel: [<80322006>] schedule_timeout+0x76/0xc0 Jul 14 17:57:20 notebook2 kernel: [<80301a23>] inet_stream_connect+0x153/0x270 Jul 14 17:57:20 notebook2 kernel: [<e0df18b9>] ipv4_connect+0x1a9/0x310 [cifs] Jul 14 17:57:20 notebook2 kernel: [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs] Jul 14 17:57:20 notebook2 kernel: [<e0df486a>] cifs_demultiplex_thread+0x3aa/0xd16 [cifs] Created attachment 8554 [details]
Modified patch (with try_to_freeze())
Well, perhaps we should put try_to_freeze() in there too.
Hello Rafael. The problem still happens (but maybe it is a little bit better, as two '=' signs in the "Stopping tasks: ===============" list are added after some short time (<3 seconds). Shortly after the failed suspend2ram, I took these strack traces: Jul 15 02:14:21 notebook2 kernel: cifsoplockd S 9E321560 0 10164 6 10166 4868 (L-TLB) Jul 15 02:14:21 notebook2 kernel: ac779f54 ac779f20 80101c51 9e321560 b7c23740 9e321560 00000000 0000000a Jul 15 02:14:21 notebook2 kernel: 9e321560 347d533c 00000018 00000984 9e321670 ac779f54 80126d8e 002aa607 Jul 15 02:14:21 notebook2 kernel: 00000000 ac779f68 1115ddef 00000000 ac779f88 80321fdd 00015366 00000000 Jul 15 02:14:21 notebook2 kernel: Call Trace: Jul 15 02:14:21 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 15 02:14:21 notebook2 kernel: [<e0de5a5d>] cifs_oplock_thread+0x17d/0x265 [cifs] Jul 15 02:14:21 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 15 02:14:21 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 15 02:14:21 notebook2 kernel: cifsdnotifyd S D47B4A90 0 10166 6 10164 (L-TLB) Jul 15 02:14:21 notebook2 kernel: ab335f80 ab335f4c 80101c51 d47b4a90 b7c23740 d47b4a90 00000000 0000000a Jul 15 02:14:21 notebook2 kernel: d47b4a90 347d5b0b 00000018 000007cf d47b4ba0 ab335f80 80126d8e 007061a4 Jul 15 02:14:21 notebook2 kernel: 00000000 ab335f94 1115802f 00000000 ab335fb4 80321fdd 0000803d 00000000 Jul 15 02:14:21 notebook2 kernel: Call Trace: Jul 15 02:14:21 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 15 02:14:21 notebook2 kernel: [<e0de5291>] cifs_dnotify_thread+0x41/0xd0 [cifs] Jul 15 02:14:21 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 15 02:14:21 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 15 02:14:21 notebook2 kernel: cifsd S 00000000 0 10167 1 10806 6284 (L-TLB) Jul 15 02:14:21 notebook2 kernel: 974d1e84 00000000 00000000 00000000 00000006 024e14d3 00000019 0000000a Jul 15 02:14:21 notebook2 kernel: 83c39560 024e14d3 00000019 0000a30b 83c39670 0002b01e 00000000 0c06ac52 Jul 15 02:14:21 notebook2 kernel: 00000013 ffffff8d c72c6e00 7fffffff 974d1eb8 80322006 00000006 8b000000 Jul 15 02:14:21 notebook2 kernel: Call Trace: Jul 15 02:14:21 notebook2 kernel: [<80322006>] schedule_timeout+0x76/0xc0 Jul 15 02:14:21 notebook2 kernel: [<80301a23>] inet_stream_connect+0x153/0x270 Jul 15 02:14:21 notebook2 kernel: [<e0df18b9>] ipv4_connect+0x1a9/0x310 [cifs] Jul 15 02:14:21 notebook2 kernel: [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs] Jul 15 02:14:21 notebook2 kernel: [<e0df49bd>] cifs_demultiplex_thread+0x4fd/0xd16 [cifs] Jul 15 02:14:21 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Are you sure, that, during the call to inet_stream_connect(), any suspend2ram is possible? Or should inet_stream_connect() return fastly, allowing the execution of cifs_reconnect() to continue? Maybe inet_stream_connect() does not return for a long time and that is the reason for the suspend2ram to fail? Yes, inet_stream_connect() may be one reason, but there also is the cifsoplockd thread. Perhaps we should place try_to_freeze() in there too. I'll have a look at it in a while. Created attachment 8556 [details]
Modified patch (with try_to_freeze() in wait_for_connect())
Well, cifs_oplock_thread already contains try_to_freeze() so it most likely is
innocent.
Let's place try_to_freeze() in net/ipv4/af_inet.c#inet_wait_for_connect() and
see what happens.
Hello Rafael! I think there is progress. :-) Some suspend2rams succeeded (after waiting about 10 seconds in "Stopping tasks"), and some did not. A short time before invoking suspend2ram, I started "ls -la /remote/cifs/directory" already in the state where the network is not available, so the "ls -la" was doomed to fail and it failed with message "/bin/ls: /remote/cifs/directory: Host is down" However, getting this message takes some time (because "cifs" tries to communicate with the CIFS server) and it seems that this time is the same time I have to wait for suspend2ram to succeed. So, if I wait long after invoking "ls -la", I can suspend, and if I wait short after invoking "ls -la", I cannot. I made some other stack traces and got: Jul 15 10:28:57 notebook2 kernel: cifsoplockd S D47B4A90 0 21325 6 21326 4868 (L-TLB) Jul 15 10:28:57 notebook2 kernel: ab335f54 ab335f20 80101c51 d47b4a90 df13ce40 d47b4a90 00000000 0000000a Jul 15 10:28:57 notebook2 kernel: d47b4a90 f2a3fec1 00000009 00000970 d47b4ba0 ab335f54 80126d8e 000389f1 Jul 15 10:28:57 notebook2 kernel: 00000000 ab335f68 12da3b11 00000000 ab335f88 80321fdd 00014e38 00000000 Jul 15 10:28:57 notebook2 kernel: Call Trace: Jul 15 10:28:57 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 15 10:28:57 notebook2 kernel: [<e0de5a5d>] cifs_oplock_thread+0x17d/0x265 [cifs] Jul 15 10:28:57 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 15 10:28:57 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 15 10:28:57 notebook2 kernel: cifsdnotifyd S 00000001 0 21326 6 21325 (L-TLB) Jul 15 10:28:57 notebook2 kernel: ac779f80 00000000 00000000 00000001 8042e420 5544982b 00000012 0000000a Jul 15 10:28:57 notebook2 kernel: 9e321560 5544982b 00000012 0000144a 9e321670 00001a33 00000000 00045c70 Jul 15 10:28:57 notebook2 kernel: 00000000 ac779f94 12da69f1 00000000 ac779fb4 80321fdd 00000000 00000003 Jul 15 10:28:57 notebook2 kernel: Call Trace: Jul 15 10:28:57 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 15 10:28:57 notebook2 kernel: [<e0de5291>] cifs_dnotify_thread+0x41/0xd0 [cifs] Jul 15 10:28:57 notebook2 kernel: [<80132055>] kthread+0xd5/0xe0 Jul 15 10:28:57 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 15 10:28:57 notebook2 kernel: cifsd S 8797DD30 0 21329 1 5724 (L-TLB) Jul 15 10:28:57 notebook2 kernel: 8797dd68 8797dd24 80126de4 8797dd30 802b350f e524fabe 00000011 0000000a Jul 15 10:28:57 notebook2 kernel: 966f7030 e524fabe 00000011 00002de8 966f7140 000b10a2 00000000 0123f72a Jul 15 10:28:57 notebook2 kernel: 00000000 8797dd7c 12da435e 8797de34 8797dd9c 80321fdd d40a107c 9ffec140 Jul 15 10:28:57 notebook2 kernel: Call Trace: Jul 15 10:28:57 notebook2 kernel: [<80321fdd>] schedule_timeout+0x4d/0xc0 Jul 15 10:28:57 notebook2 kernel: [<802b3cb4>] sk_wait_data+0x84/0xc0 Jul 15 10:28:57 notebook2 kernel: [<802e742e>] tcp_recvmsg+0x48e/0xb30 Jul 15 10:28:57 notebook2 kernel: [<802b2d03>] sock_common_recvmsg+0x43/0x60 Jul 15 10:28:57 notebook2 kernel: [<802b07be>] sock_recvmsg+0x10e/0x130 Jul 15 10:28:57 notebook2 kernel: [<802b2915>] kernel_recvmsg+0x35/0x50 Jul 15 10:28:57 notebook2 kernel: [<e0df462c>] cifs_demultiplex_thread+0x15c/0xd16 [cifs] Jul 15 10:28:57 notebook2 kernel: [<80101005>] kernel_thread_helper+0x5/0x10 Jul 15 10:28:57 notebook2 kernel: ls D 9034DC24 0 21702 26704 (NOTLB) Jul 15 10:28:57 notebook2 kernel: 9034dc24 ffffffa8 00000058 9034dc24 e0dff810 bc10fd6c 00000011 00000007 Jul 15 10:28:57 notebook2 kernel: da584a90 57b8fa2d 00000012 0000138c da584ba0 038e5b0d 9034dc24 002fd41b Jul 15 10:28:57 notebook2 kernel: 00000000 b21de9c0 12da6080 9034dc64 9034dc84 e0e00798 960a3634 00000ffc Jul 15 10:28:57 notebook2 kernel: Call Trace: Jul 15 10:28:57 notebook2 kernel: [<e0e00798>] SendReceive+0x3e8/0x80c [cifs] Jul 15 10:28:57 notebook2 kernel: [<e0de89d7>] CIFSSMBUnixQPathInfo+0x187/0x250 [cifs] Jul 15 10:28:57 notebook2 kernel: [<e0dfaf14>] cifs_get_inode_info_unix+0x64/0x610 [cifs] Jul 15 10:28:57 notebook2 kernel: [<e0dfbf5b>] cifs_revalidate+0x11b/0x3c0 [cifs] Jul 15 10:28:57 notebook2 kernel: [<e0df52f5>] cifs_d_revalidate+0x15/0x100 [cifs] Jul 15 10:28:57 notebook2 kernel: [<8017a2dc>] do_lookup+0x3c/0x140 Jul 15 10:28:57 notebook2 kernel: [<8017be40>] __link_path_walk+0x140/0x1060 Jul 15 10:28:57 notebook2 kernel: [<8017cdaa>] link_path_walk+0x4a/0xe0 Jul 15 10:28:57 notebook2 kernel: [<8017d073>] do_path_lookup+0xb3/0x2c0 Jul 15 10:28:57 notebook2 kernel: [<8017dae8>] __user_walk_fd+0x38/0x50 Jul 15 10:28:57 notebook2 kernel: [<80175e3e>] vfs_lstat_fd+0x1e/0x50 Jul 15 10:28:57 notebook2 kernel: [<80175eb1>] vfs_lstat+0x11/0x20 Jul 15 10:28:57 notebook2 kernel: [<80175ed4>] sys_lstat64+0x14/0x30 Jul 15 10:28:57 notebook2 kernel: [<80103147>] syscall_call+0x7/0xb So maybe sk_wait_data() also needs a try_to_freeze()? Created attachment 8560 [details]
Treat the freezing of current like a pending signal in tcp_recvmsg
I'd like to try a different approach first.
Let's treat the freezing of the current task like a pending signal in
net/ipv4/tcp.c#tcp_recvmsg and see what happens.
Hello Rafael, I applied your last two patches. However, I do not think that there is an improvenment by the last patch. If I freshly mount a CIFS share and then disconnect the network and then try to access a directory of that share using "ls -la" and then try to suspend2ram, this suspend2ram-attempt fails (after 20 seconds of waiting). Then, "ls -la" returns with "Host is down". However, the next suspend2ram-attempts succeed immediately. (I assume that this success is because "cifs" is already in a "host is down" state and cifs_reconnect() allows to suspend2ram immediately.) Evidently on the first attempt it's unable to freeze processes (20s is the timeout for that). Well, it looks like I have to set up a test box with a cifs share. Created attachment 8598 [details]
Fix candidate
This patch seems to be sufficient to fix the issue on my test box.
If 'ls' is run before suspend on the non-accessible share right before suspend,
it takes more time to suspend and I have to wait until the CIFS timeout expires
to get the command prompt after resume, but finally it works.
Could you please verify?
I got suspend to ram working while at OLS (strange problem with Gnome button not working - but s2ram worked at command line as root) - patch you suggest seems quite reasonable. Will try it tomorrow. Added fix to reconnect path in cifs to cifs git tree |