Bug 6811

Summary: cifs: suspend2ram fails if CIFS server is unreachable at suspend2ram time
Product: File System Reporter: Xu (development--bugzilla.kernel.org)
Component: OtherAssignee: Steve French (sfrench)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, bunk, pavel, rjwysocki
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.18-rc1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Experimental patch
Modified patch (with try_to_freeze())
Modified patch (with try_to_freeze() in wait_for_connect())
Treat the freezing of current like a pending signal in tcp_recvmsg
Fix candidate

Description Xu 2006-07-10 17:47:49 UTC
When trying suspend2ram with a cifs share mounted while the CIFS server is not
reachable anymore (e.g.: network cable is already manually disconnected or CIFS
server has gone offline in the meantime), suspend2ram fails. That is: the system
does not get into ACPI S3 state, rather it "resumes" prematurely.

The correct behavious is to go into suspend2ram-mode as fast as possible (to
allow the user to put his laptop computer into its bag or backpack as soon as
possible without risk of hard disk crashes due to spinning hard disks while the
user moves his laptop computer).
Comment 1 Andrew Morton 2006-07-10 21:31:56 UTC
We'll need to find out where cifs_demultiplex_thread is stuck.

Could you please get the machine into that state (ie: unplug the cable) and then run

echo t > /proc/sysrq-trigger

and then locate the trace for cifsd and include it in this report?

Thanks.
Comment 2 Steve French 2006-07-11 09:13:49 UTC
other debug information that might be useful (the stack trace is most useful)
 1) echo 1 > /proc/fs/cifs/cifsFYI
 2) try to suspend
 3) save the dmesg info (e.g. "dmesg > debugdata")
Comment 3 Xu 2006-07-11 09:24:50 UTC
Hello,

doing "echo t > /proc/sysrq-trigger" to catch "cifsd" in the act is not that
easy because at suspend time, all processes are suspended, too. 

That's why I used "while true; do echo t > /proc/sysrq-trigger; sleep 2; done".
Note, that, at the same time, I tried to unmount the cifs share.

I suppose that cifsd is stuck at "inet_stream_connect".

This is an excerpt of my /var/log/messages:


Jul 11 18:16:50 notebook2 kernel: cifsd         S 00000000     0 14561      1  
      14697  5716 (L-TLB)
Jul 11 18:16:50 notebook2 kernel:        82adde84 00000000 00000000 00000000
00000006 bd01f5e5 00000000 00000009
Jul 11 18:16:50 notebook2 kernel:        b6852560 d93b9747 00000000 0000a10e
b6852670 0200a8c0 0700a8c0 01a880d5
Jul 11 18:16:50 notebook2 kernel:        00000000 ffffff8d 85cb4500 7fffffff
82addeb8 80322006 00000006 bd010000
Jul 11 18:16:50 notebook2 kernel: Call Trace:
Jul 11 18:16:50 notebook2 kernel:  [<80322006>] schedule_timeout+0x76/0xc0
Jul 11 18:16:50 notebook2 kernel:  [<80301a23>] inet_stream_connect+0x153/0x270
Jul 11 18:16:50 notebook2 kernel:  [<e0da68b9>] ipv4_connect+0x1a9/0x310 [cifs]
Jul 11 18:16:50 notebook2 kernel:  [<e0da6cbc>] cifs_reconnect+0x29c/0x500 [cifs]
Jul 11 18:16:50 notebook2 kernel:  [<e0da99bd>]
cifs_demultiplex_thread+0x4fd/0xd16 [cifs]

Jul 11 18:16:50 notebook2 kernel: umount.cifs   S 00000000     0 16259  16258  
                  (NOTLB)
Jul 11 18:16:50 notebook2 kernel:        d8ca1da0 9fd55c94 00000156 00000000
d8ca1d6c 8020306d 9ffe4f68 00000006
Jul 11 18:16:50 notebook2 kernel:        ae965560 e9842ab3 00000010 000909c8
ae965670 d8ca1da0 80126d8e 001480aa
Jul 11 18:16:50 notebook2 kernel:        00000000 d8ca1db4 05ace962 00000032
d8ca1dd4 80321fdd d8ca1e4c 9fd55c94
Jul 11 18:16:50 notebook2 kernel: Call Trace:
Jul 11 18:16:50 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 11 18:16:50 notebook2 kernel:  [<e0d9acbf>] smb_init+0xff/0x2b0 [cifs]
Jul 11 18:16:50 notebook2 kernel:  [<e0d9af44>] CIFSSMBQFSPosixInfo+0x64/0x250
[cifs]
Jul 11 18:16:50 notebook2 kernel:  [<e0d9a622>] cifs_statfs+0x122/0x140 [cifs]
Jul 11 18:16:50 notebook2 kernel:  [<80169ca8>] vfs_statfs+0x68/0x80
Jul 11 18:16:50 notebook2 kernel:  [<80169dc8>] vfs_statfs64+0x18/0x30
Jul 11 18:16:50 notebook2 kernel:  [<8016b19e>] sys_statfs64+0x5e/0x90
Jul 11 18:16:50 notebook2 kernel:  [<80103147>] syscall_call+0x7/0xb

Jul 11 18:16:51 notebook2 kernel: cifsoplockd   S BDCE7A90     0  5953      6  
       5954  5259 (L-TLB)
Jul 11 18:16:51 notebook2 kernel:        be169f54 be169f20 80101c51 bdce7a90
b9dc7e40 191e691d 00000013 0000000a
Jul 11 18:16:51 notebook2 kernel:        bdce7a90 191e691d 00000013 00000fde
bdce7ba0 000018d9 00000000 0038edca
Jul 11 18:16:51 notebook2 kernel:        00000000 be169f68 05ad7f2f 00000000
be169f88 80321fdd 00011a06 00000000
Jul 11 18:16:51 notebook2 kernel: Call Trace:
Jul 11 18:16:51 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 11 18:16:51 notebook2 kernel:  [<e0d9aa5d>] cifs_oplock_thread+0x17d/0x265
[cifs]
Jul 11 18:16:51 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 11 18:16:51 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 11 18:16:51 notebook2 kernel: cifsdnotifyd  S BDCE7560     0  5954      6  
      13234  5953 (L-TLB)
Jul 11 18:16:51 notebook2 kernel:        bdc39f80 bdc39f4c 80101c51 bdce7560
b9dc7e40 649bd5e3 00000012 0000000a
Jul 11 18:16:51 notebook2 kernel:        bdce7560 649bd5e3 00000012 000010f3
bdce7670 00001bf2 00000000 007d99d8
Jul 11 18:16:51 notebook2 kernel:        00000000 bdc39f94 05ad15b7 00000000
bdc39fb4 80321fdd 0001a9d3 00000000
Jul 11 18:16:51 notebook2 kernel: Call Trace:
Jul 11 18:16:51 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 11 18:16:51 notebook2 kernel:  [<e0d9a291>] cifs_dnotify_thread+0x41/0xd0 [cifs]
Jul 11 18:16:51 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 11 18:16:51 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10

Jul 11 18:16:52 notebook2 kernel: cifsd         S 00000000     0 14561      1  
      14697  5716 (L-TLB)
Jul 11 18:16:52 notebook2 kernel:        82adde84 00000000 00000000 00000000
00000006 bd01f5e5 00000000 00000009
Jul 11 18:16:52 notebook2 kernel:        b6852560 d93b9747 00000000 0000a10e
b6852670 0200a8c0 0700a8c0 01a880d5
Jul 11 18:16:52 notebook2 kernel:        00000000 ffffff8d 85cb4500 7fffffff
82addeb8 80322006 00000006 bd010000
Jul 11 18:16:52 notebook2 kernel: Call Trace:
Jul 11 18:16:52 notebook2 kernel:  [<80322006>] schedule_timeout+0x76/0xc0
Jul 11 18:16:52 notebook2 kernel:  [<80301a23>] inet_stream_connect+0x153/0x270
Jul 11 18:16:52 notebook2 kernel:  [<e0da68b9>] ipv4_connect+0x1a9/0x310 [cifs]
Jul 11 18:16:52 notebook2 kernel:  [<e0da6cbc>] cifs_reconnect+0x29c/0x500 [cifs]
Jul 11 18:16:52 notebook2 kernel:  [<e0da99bd>]
cifs_demultiplex_thread+0x4fd/0xd16 [cifs]
Jul 11 18:16:52 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10

Jul 11 18:16:52 notebook2 kernel: umount.cifs   R running     0 16259  16258   
                 (NOTLB)
Jul 11 18:16:52 notebook2 kernel: prepare_suspe S 8015A6DA     0 16267   5754
16288   16289       (NOTLB)
Jul 11 18:16:52 notebook2 kernel:        c6e09efc 03879045 c6e09f00 8015a6da
b5acb940 0b24cc60 00000013 00000001
Jul 11 18:16:52 notebook2 kernel:        b8e14a90 0b24cc60 00000013 00021f19
b8e14ba0 003410e5 00000000 03b21ee2
Jul 11 18:16:52 notebook2 kernel:        00000000 ffffffff df237030 00000001
c6e09f8c 80120144 00000001 080cb0e0
Jul 11 18:16:52 notebook2 kernel: Call Trace:
Jul 11 18:16:52 notebook2 kernel:  [<80120144>] do_wait+0x754/0xc20
Jul 11 18:16:52 notebook2 kernel:  [<80120642>] sys_wait4+0x32/0x40
Jul 11 18:16:52 notebook2 kernel:  [<80120677>] sys_waitpid+0x27/0x30
Jul 11 18:16:52 notebook2 kernel:  [<80103147>] syscall_call+0x7/0xb

Jul 11 18:17:16 notebook2 kernel: PM: Preparing system for mem sleep
Jul 11 18:17:16 notebook2 kernel: Stopping tasks:
==================================================================================================================================================
=================================================================<7>TKIP: replay
detected: STA=00:11:d8:8d:57:04 previous TSC 000000000033 received TSC 000000000001
Jul 11 18:17:16 notebook2 kernel:
Jul 11 18:17:16 notebook2 kernel:  stopping tasks timed out after 20 seconds (1
tasks remaining):
Jul 11 18:17:16 notebook2 kernel:   cifsd
Jul 11 18:17:16 notebook2 kernel: Restarting tasks...<6> Strange, cifsd not stopped
Jul 11 18:17:16 notebook2 kernel: SysRq : Show State

Jul 11 18:17:16 notebook2 kernel: cifsoplockd   S BDCE7A90     0  5953      6  
       5954  5259 (L-TLB)
Jul 11 18:17:16 notebook2 kernel:        be169f54 be169f20 80101c51 bdce7a90
b8e00e40 bdce7a90 00000000 0000000a
Jul 11 18:17:16 notebook2 kernel:        bdce7a90 30dc7097 00000019 00000a37
bdce7ba0 be169f54 80126d8e 0039a149
Jul 11 18:17:16 notebook2 kernel:        00000000 be169f68 05ade565 00000000
be169f88 80321fdd 00017671 00000000
Jul 11 18:17:16 notebook2 kernel: Call Trace:
Jul 11 18:17:16 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 11 18:17:16 notebook2 kernel:  [<e0d9aa5d>] cifs_oplock_thread+0x17d/0x265
[cifs]
Jul 11 18:17:16 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 11 18:17:16 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 11 18:17:16 notebook2 kernel: cifsdnotifyd  S BDCE7560     0  5954      6  
      13234  5953 (L-TLB)
Jul 11 18:17:16 notebook2 kernel:        bdc39f80 bdc39f4c 80101c51 bdce7560
b8e00e40 bdce7560 00000000 0000000a
Jul 11 18:17:16 notebook2 kernel:        bdce7560 30dc7769 00000019 000006d2
bdce7670 bdc39f80 80126d8e 007e4665
Jul 11 18:17:16 notebook2 kernel:        00000000 bdc39f94 05ad87a5 00000000
bdc39fb4 80321fdd 0001c686 00000000
Jul 11 18:17:16 notebook2 kernel: Call Trace:
Jul 11 18:17:16 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 11 18:17:16 notebook2 kernel:  [<e0d9a291>] cifs_dnotify_thread+0x41/0xd0 [cifs]
Jul 11 18:17:16 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 11 18:17:16 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10

Jul 11 18:17:17 notebook2 kernel: cifsd         D DF159834     0 14561      1  
      14697  5716 (L-TLB)
Jul 11 18:17:17 notebook2 kernel:        82addef8 00200246 9ffe7240 df159834
82addec8 00200246 c3f79824 00000009
Jul 11 18:17:17 notebook2 kernel:        b6852560 b9fd516c 00000018 0000345d
b6852670 82addef8 80126d8e 01b089ab
Jul 11 18:17:17 notebook2 kernel:        00000000 82addf0c 05ad50fb bdce9525
82addf2c 80321fdd 80341440 80341440
Jul 11 18:17:17 notebook2 kernel: Call Trace:
Jul 11 18:17:17 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 11 18:17:17 notebook2 kernel:  [<80322068>]
schedule_timeout_uninterruptible+0x18/0x20
Jul 11 18:17:17 notebook2 kernel:  [<80127e3c>] msleep+0x1c/0x30
Jul 11 18:17:17 notebook2 kernel:  [<e0da6c8c>] cifs_reconnect+0x26c/0x500 [cifs]
Jul 11 18:17:17 notebook2 kernel:  [<e0da99bd>]
cifs_demultiplex_thread+0x4fd/0xd16 [cifs]
Comment 4 Rafael J. Wysocki 2006-07-14 04:00:32 UTC
It almost certainly is stuck in schedule_timeout_uninterruptible at 0x80322068, 
because of which the process in is the D state that prevents suspend from 
succeeding.
Comment 5 Rafael J. Wysocki 2006-07-14 05:52:20 UTC
Created attachment 8548 [details]
Experimental patch

Could you please test if this patch helps (or breaks anything)?
Comment 6 Xu 2006-07-14 09:02:54 UTC
Hello,

I tried your patch. It does not seem to break anything, but it also does not
seem to solve the problem (fully).

During my suspend2ram attempt, I ran "while true; do echo t >
/proc/sysrq-trigger; sleep 2; done" again and I got, among other things,:

Jul 14 17:56:54 notebook2 kernel: cifsd         S 00000000     0 10167      1  
             6284 (L-TLB)
Jul 14 17:56:54 notebook2 kernel:        974d1e84 00000000 00000000 00000000
00000006 ee260cd2 00000123 0000000a
Jul 14 17:56:54 notebook2 kernel:        83c39560 ee2d26b7 00000123 00006154
83c39670 0002ff8e 00000000 a7e7a8c0
Jul 14 17:56:54 notebook2 kernel:        00000004 ffffff8d b6fff700 7fffffff
974d1eb8 80322006 00000006 8b007188
Jul 14 17:56:54 notebook2 kernel: Call Trace:
Jul 14 17:56:54 notebook2 kernel:  [<80322006>] schedule_timeout+0x76/0xc0
Jul 14 17:56:54 notebook2 kernel:  [<80301a23>] inet_stream_connect+0x153/0x270
Jul 14 17:56:54 notebook2 kernel:  [<e0df18f4>] ipv4_connect+0x1e4/0x310 [cifs]
Jul 14 17:56:54 notebook2 kernel:  [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs]
Jul 14 17:56:54 notebook2 kernel:  [<e0df486a>]
cifs_demultiplex_thread+0x3aa/0xd16 [cifs]
Jul 14 17:56:54 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10

Jul 14 17:56:54 notebook2 kernel: PM: Preparing system for mem sleep
Jul 14 17:57:14 notebook2 kernel: Stopping tasks:
==================================================================================================================================================
====================================================================
Jul 14 17:57:14 notebook2 kernel:  stopping tasks timed out after 20 seconds (1
tasks remaining):
Jul 14 17:57:14 notebook2 kernel:   cifsd
Jul 14 17:57:14 notebook2 kernel: Restarting tasks...<6> Strange, cifsd not stopped
Jul 14 17:57:14 notebook2 kernel:  done

Jul 14 17:57:17 notebook2 kernel: cifsd         S C6A2A434     0 10167      1  
             6284 (L-TLB)
Jul 14 17:57:17 notebook2 kernel:        974d1ef8 00200246 9ffe7240 c6a2a434
974d1ec8 7d5c231d 00000129 00000001
Jul 14 17:57:17 notebook2 kernel:        83c39560 86273850 00000129 00002eda
83c39670 03b6750a 00000000 52ed2370
Jul 14 17:57:17 notebook2 kernel:        00000009 974d1f0c 0f4e35d1 8610a925
974d1f2c 80321fdd 80341440 80341440
Jul 14 17:57:17 notebook2 kernel: Call Trace:
Jul 14 17:57:17 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 14 17:57:17 notebook2 kernel:  [<80322088>]
schedule_timeout_interruptible+0x18/0x20
Jul 14 17:57:17 notebook2 kernel:  [<8012726f>] msleep_interruptible+0x2f/0x40
Jul 14 17:57:17 notebook2 kernel:  [<e0df1dfc>] cifs_reconnect+0x26c/0x500 [cifs]
Jul 14 17:57:17 notebook2 kernel:  [<e0df486a>]
cifs_demultiplex_thread+0x3aa/0xd16 [cifs]
Jul 14 17:57:17 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10


Jul 14 17:57:22 notebook2 kernel: Call Trace:
Jul 14 17:57:22 notebook2 kernel:  [<80322006>] schedule_timeout+0x76/0xc0
Jul 14 17:57:22 notebook2 kernel:  [<80301a23>] inet_stream_connect+0x153/0x270
Jul 14 17:57:22 notebook2 kernel:  [<e0df18f4>] ipv4_connect+0x1e4/0x310 [cifs]
Jul 14 17:57:22 notebook2 kernel:  [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs]
Jul 14 17:57:22 notebook2 kernel:  [<e0df486a>]
cifs_demultiplex_thread+0x3aa/0xd16 [cifs]
Jul 14 17:57:22 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10

Jul 14 17:57:20 notebook2 kernel: cifsd         S 00000000     0 10167      1  
             6284 (L-TLB)
Jul 14 17:57:20 notebook2 kernel:        974d1e84 00000000 00000000 00000000
00000006 39477c07 0000012a 0000000a
Jul 14 17:57:20 notebook2 kernel:        83c39560 39477c07 0000012a 0000a375
83c39670 0000ab2a 00000000 52edc6e5
Jul 14 17:57:20 notebook2 kernel:        00000009 ffffff8d baea9b80 7fffffff
974d1eb8 80322006 00000006 bd010000
Jul 14 17:57:20 notebook2 kernel: Call Trace:
Jul 14 17:57:20 notebook2 kernel:  [<80322006>] schedule_timeout+0x76/0xc0
Jul 14 17:57:20 notebook2 kernel:  [<80301a23>] inet_stream_connect+0x153/0x270
Jul 14 17:57:20 notebook2 kernel:  [<e0df18b9>] ipv4_connect+0x1a9/0x310 [cifs]
Jul 14 17:57:20 notebook2 kernel:  [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs]
Jul 14 17:57:20 notebook2 kernel:  [<e0df486a>]
cifs_demultiplex_thread+0x3aa/0xd16 [cifs]
Comment 7 Rafael J. Wysocki 2006-07-14 13:46:14 UTC
Created attachment 8554 [details]
Modified patch (with try_to_freeze())

Well, perhaps we should put try_to_freeze() in there too.
Comment 8 Xu 2006-07-14 17:18:09 UTC
Hello Rafael. The problem still happens (but maybe it is a little bit better, as
two '=' signs in the "Stopping tasks: ===============" list are added after some
short time (<3 seconds). Shortly after the failed suspend2ram, I took these
strack traces:


Jul 15 02:14:21 notebook2 kernel: cifsoplockd   S 9E321560     0 10164      6  
      10166  4868 (L-TLB)
Jul 15 02:14:21 notebook2 kernel:        ac779f54 ac779f20 80101c51 9e321560
b7c23740 9e321560 00000000 0000000a
Jul 15 02:14:21 notebook2 kernel:        9e321560 347d533c 00000018 00000984
9e321670 ac779f54 80126d8e 002aa607
Jul 15 02:14:21 notebook2 kernel:        00000000 ac779f68 1115ddef 00000000
ac779f88 80321fdd 00015366 00000000
Jul 15 02:14:21 notebook2 kernel: Call Trace:
Jul 15 02:14:21 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 15 02:14:21 notebook2 kernel:  [<e0de5a5d>] cifs_oplock_thread+0x17d/0x265
[cifs]
Jul 15 02:14:21 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 15 02:14:21 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 15 02:14:21 notebook2 kernel: cifsdnotifyd  S D47B4A90     0 10166      6  
            10164 (L-TLB)
Jul 15 02:14:21 notebook2 kernel:        ab335f80 ab335f4c 80101c51 d47b4a90
b7c23740 d47b4a90 00000000 0000000a
Jul 15 02:14:21 notebook2 kernel:        d47b4a90 347d5b0b 00000018 000007cf
d47b4ba0 ab335f80 80126d8e 007061a4
Jul 15 02:14:21 notebook2 kernel:        00000000 ab335f94 1115802f 00000000
ab335fb4 80321fdd 0000803d 00000000
Jul 15 02:14:21 notebook2 kernel: Call Trace:
Jul 15 02:14:21 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 15 02:14:21 notebook2 kernel:  [<e0de5291>] cifs_dnotify_thread+0x41/0xd0 [cifs]
Jul 15 02:14:21 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 15 02:14:21 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 15 02:14:21 notebook2 kernel: cifsd         S 00000000     0 10167      1  
      10806  6284 (L-TLB)
Jul 15 02:14:21 notebook2 kernel:        974d1e84 00000000 00000000 00000000
00000006 024e14d3 00000019 0000000a
Jul 15 02:14:21 notebook2 kernel:        83c39560 024e14d3 00000019 0000a30b
83c39670 0002b01e 00000000 0c06ac52
Jul 15 02:14:21 notebook2 kernel:        00000013 ffffff8d c72c6e00 7fffffff
974d1eb8 80322006 00000006 8b000000
Jul 15 02:14:21 notebook2 kernel: Call Trace:
Jul 15 02:14:21 notebook2 kernel:  [<80322006>] schedule_timeout+0x76/0xc0
Jul 15 02:14:21 notebook2 kernel:  [<80301a23>] inet_stream_connect+0x153/0x270
Jul 15 02:14:21 notebook2 kernel:  [<e0df18b9>] ipv4_connect+0x1a9/0x310 [cifs]
Jul 15 02:14:21 notebook2 kernel:  [<e0df1e2c>] cifs_reconnect+0x29c/0x500 [cifs]
Jul 15 02:14:21 notebook2 kernel:  [<e0df49bd>]
cifs_demultiplex_thread+0x4fd/0xd16 [cifs]
Jul 15 02:14:21 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10


Are you sure, that, during the call to inet_stream_connect(), any suspend2ram is
possible? Or should inet_stream_connect() return fastly, allowing the execution
of cifs_reconnect() to continue?

Maybe inet_stream_connect() does not return for a long time and that is the
reason for the suspend2ram to fail?
Comment 9 Rafael J. Wysocki 2006-07-15 00:25:57 UTC
Yes, inet_stream_connect() may be one reason, but there also is the cifsoplockd 
thread.  Perhaps we should place try_to_freeze() in there too.  I'll have a 
look at it in a while.
Comment 10 Rafael J. Wysocki 2006-07-15 01:11:08 UTC
Created attachment 8556 [details]
Modified patch (with try_to_freeze() in wait_for_connect())

Well, cifs_oplock_thread already contains try_to_freeze() so it most likely is
innocent.

Let's place try_to_freeze() in net/ipv4/af_inet.c#inet_wait_for_connect() and
see what happens.
Comment 11 Xu 2006-07-15 01:50:44 UTC
Hello Rafael!

I think there is progress. :-) Some suspend2rams succeeded (after waiting about
10 seconds in "Stopping tasks"), and some did not.

A short time before invoking suspend2ram, I started "ls -la
/remote/cifs/directory" already in the state where the network is not available,
so the "ls -la" was doomed to fail and it failed with message

"/bin/ls: /remote/cifs/directory: Host is down"

However, getting this message takes some time (because "cifs" tries to
communicate with the CIFS server) and it seems that this time is the same time I
have to wait for suspend2ram to succeed. So, if I wait long after invoking "ls
-la", I can suspend, and if I wait short after invoking "ls -la", I cannot.


I made some other stack traces and got:


Jul 15 10:28:57 notebook2 kernel: cifsoplockd   S D47B4A90     0 21325      6  
      21326  4868 (L-TLB)
Jul 15 10:28:57 notebook2 kernel:        ab335f54 ab335f20 80101c51 d47b4a90
df13ce40 d47b4a90 00000000 0000000a
Jul 15 10:28:57 notebook2 kernel:        d47b4a90 f2a3fec1 00000009 00000970
d47b4ba0 ab335f54 80126d8e 000389f1
Jul 15 10:28:57 notebook2 kernel:        00000000 ab335f68 12da3b11 00000000
ab335f88 80321fdd 00014e38 00000000
Jul 15 10:28:57 notebook2 kernel: Call Trace:
Jul 15 10:28:57 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 15 10:28:57 notebook2 kernel:  [<e0de5a5d>] cifs_oplock_thread+0x17d/0x265
[cifs]
Jul 15 10:28:57 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 15 10:28:57 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 15 10:28:57 notebook2 kernel: cifsdnotifyd  S 00000001     0 21326      6  
            21325 (L-TLB)
Jul 15 10:28:57 notebook2 kernel:        ac779f80 00000000 00000000 00000001
8042e420 5544982b 00000012 0000000a
Jul 15 10:28:57 notebook2 kernel:        9e321560 5544982b 00000012 0000144a
9e321670 00001a33 00000000 00045c70
Jul 15 10:28:57 notebook2 kernel:        00000000 ac779f94 12da69f1 00000000
ac779fb4 80321fdd 00000000 00000003
Jul 15 10:28:57 notebook2 kernel: Call Trace:
Jul 15 10:28:57 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 15 10:28:57 notebook2 kernel:  [<e0de5291>] cifs_dnotify_thread+0x41/0xd0 [cifs]
Jul 15 10:28:57 notebook2 kernel:  [<80132055>] kthread+0xd5/0xe0
Jul 15 10:28:57 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 15 10:28:57 notebook2 kernel: cifsd         S 8797DD30     0 21329      1  
             5724 (L-TLB)
Jul 15 10:28:57 notebook2 kernel:        8797dd68 8797dd24 80126de4 8797dd30
802b350f e524fabe 00000011 0000000a
Jul 15 10:28:57 notebook2 kernel:        966f7030 e524fabe 00000011 00002de8
966f7140 000b10a2 00000000 0123f72a
Jul 15 10:28:57 notebook2 kernel:        00000000 8797dd7c 12da435e 8797de34
8797dd9c 80321fdd d40a107c 9ffec140
Jul 15 10:28:57 notebook2 kernel: Call Trace:
Jul 15 10:28:57 notebook2 kernel:  [<80321fdd>] schedule_timeout+0x4d/0xc0
Jul 15 10:28:57 notebook2 kernel:  [<802b3cb4>] sk_wait_data+0x84/0xc0
Jul 15 10:28:57 notebook2 kernel:  [<802e742e>] tcp_recvmsg+0x48e/0xb30
Jul 15 10:28:57 notebook2 kernel:  [<802b2d03>] sock_common_recvmsg+0x43/0x60
Jul 15 10:28:57 notebook2 kernel:  [<802b07be>] sock_recvmsg+0x10e/0x130
Jul 15 10:28:57 notebook2 kernel:  [<802b2915>] kernel_recvmsg+0x35/0x50
Jul 15 10:28:57 notebook2 kernel:  [<e0df462c>]
cifs_demultiplex_thread+0x15c/0xd16 [cifs]
Jul 15 10:28:57 notebook2 kernel:  [<80101005>] kernel_thread_helper+0x5/0x10
Jul 15 10:28:57 notebook2 kernel: ls            D 9034DC24     0 21702  26704  
                  (NOTLB)
Jul 15 10:28:57 notebook2 kernel:        9034dc24 ffffffa8 00000058 9034dc24
e0dff810 bc10fd6c 00000011 00000007
Jul 15 10:28:57 notebook2 kernel:        da584a90 57b8fa2d 00000012 0000138c
da584ba0 038e5b0d 9034dc24 002fd41b
Jul 15 10:28:57 notebook2 kernel:        00000000 b21de9c0 12da6080 9034dc64
9034dc84 e0e00798 960a3634 00000ffc
Jul 15 10:28:57 notebook2 kernel: Call Trace:
Jul 15 10:28:57 notebook2 kernel:  [<e0e00798>] SendReceive+0x3e8/0x80c [cifs]
Jul 15 10:28:57 notebook2 kernel:  [<e0de89d7>] CIFSSMBUnixQPathInfo+0x187/0x250
[cifs]
Jul 15 10:28:57 notebook2 kernel:  [<e0dfaf14>]
cifs_get_inode_info_unix+0x64/0x610 [cifs]
Jul 15 10:28:57 notebook2 kernel:  [<e0dfbf5b>] cifs_revalidate+0x11b/0x3c0 [cifs]
Jul 15 10:28:57 notebook2 kernel:  [<e0df52f5>] cifs_d_revalidate+0x15/0x100 [cifs]
Jul 15 10:28:57 notebook2 kernel:  [<8017a2dc>] do_lookup+0x3c/0x140
Jul 15 10:28:57 notebook2 kernel:  [<8017be40>] __link_path_walk+0x140/0x1060
Jul 15 10:28:57 notebook2 kernel:  [<8017cdaa>] link_path_walk+0x4a/0xe0
Jul 15 10:28:57 notebook2 kernel:  [<8017d073>] do_path_lookup+0xb3/0x2c0
Jul 15 10:28:57 notebook2 kernel:  [<8017dae8>] __user_walk_fd+0x38/0x50
Jul 15 10:28:57 notebook2 kernel:  [<80175e3e>] vfs_lstat_fd+0x1e/0x50
Jul 15 10:28:57 notebook2 kernel:  [<80175eb1>] vfs_lstat+0x11/0x20
Jul 15 10:28:57 notebook2 kernel:  [<80175ed4>] sys_lstat64+0x14/0x30
Jul 15 10:28:57 notebook2 kernel:  [<80103147>] syscall_call+0x7/0xb

So maybe sk_wait_data() also needs a try_to_freeze()?
Comment 12 Rafael J. Wysocki 2006-07-15 14:57:40 UTC
Created attachment 8560 [details]
Treat the freezing of current like a pending signal in tcp_recvmsg

I'd like to try a different approach first.

Let's treat the freezing of the current task like a pending signal in
net/ipv4/tcp.c#tcp_recvmsg and see what happens.
Comment 13 Xu 2006-07-17 17:44:09 UTC
Hello Rafael,

I applied your last two patches. However, I do not think that there is an
improvenment by the last patch.

If I freshly mount a CIFS share and then disconnect the network and then try to
access a directory of that share using "ls -la" and then try to suspend2ram,
this suspend2ram-attempt fails (after 20 seconds of waiting). Then, "ls -la"
returns with "Host is down". However, the next suspend2ram-attempts succeed
immediately. (I assume that this success is because "cifs" is already in a "host
is down" state and cifs_reconnect() allows to suspend2ram immediately.)
Comment 14 Rafael J. Wysocki 2006-07-17 23:26:19 UTC
Evidently on the first attempt it's unable to freeze processes (20s is the 
timeout for that).

Well, it looks like I have to set up a test box with a cifs share.
Comment 15 Rafael J. Wysocki 2006-07-22 06:47:22 UTC
Created attachment 8598 [details]
Fix candidate

This patch seems to be sufficient to fix the issue on my test box.

If 'ls' is run before suspend on the non-accessible share right before suspend,
it takes more time to suspend and I have to wait until the CIFS timeout expires
to get the command prompt after resume, but finally it works.

Could you please verify?
Comment 16 Steve French 2006-07-23 20:15:23 UTC
I got suspend to ram working while at OLS (strange problem with Gnome button not
working - but s2ram worked at command line as root) - patch you suggest seems
quite reasonable.  Will try it tomorrow.
Comment 17 Steve French 2006-07-31 15:43:21 UTC
Added fix to reconnect path in cifs to cifs git tree