Bug 11050
Summary: | software suspend does not work when mounted cifs share inaccessible | ||
---|---|---|---|
Product: | File System | Reporter: | Michal Suchanek (hramrach) |
Component: | CIFS | Assignee: | Steve French (sfrench) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, anpaza, bunk, jeremy.william.murphy, rjw, shirishp |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.31-rc4 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216 | ||
Attachments: |
the kernel messages from my log that seem relevant
patch to allow cifsnotifyd thread to be suspended kernel output during suspend attempt patch -- set send and receive timeouts before attempting to connect |
Description
Michal Suchanek
2008-07-07 03:25:33 UTC
Created attachment 16758 [details]
the kernel messages from my log that seem relevant
Created attachment 17515 [details]
patch to allow cifsnotifyd thread to be suspended
Let me know if this fixes the problem
I have not been able to reproduce the original problem yet (which may be a good sign as I am running current code, 2.6.27-rc4) - but it may turn out that suspend is failing when cifs is in the process of reconnecting to the server (which might happen in the situation you describe). You note that this fails to localhost as well - could you describe the steps you followed for failure when mounted to localhost? You cannot make it fail with localhost, at least not with my current kernel + cifs. However, you may try to reproduce with a local interface alias address as describe above in B). It may take some time to apply the patch and build a kernel, I currently do not have compiled kernel sources due to disk space limitations, sorry. With 2.6.27-rc5 I can reproduce the problem using the above steps (under B). AFAICT the patch makes no difference. I ran command 'suspend' on a Linux workstation and it tells me -bash: suspend: cannot suspend a login shell Do I need a Linux laptop for this to try out? no, it should work equally poorly on any hardware. However, this problem would not normally manifest on a desktop because it is typically connected to a single network only and does not move elsewhere. If you cannot suspend your machine to disk either your suspend script or your kernel is broken. The command to suspend is probably not 'suspend', however. 'suspend' in bash is equivalent of Ctrl+Z. It would be something like '/sbin/hibernate' or 'echo disk > /sys/power/state' linux-1t9o:~ # uname -r 2.6.25-rc6-default //192.168.1.71/winshare on /mnt/smb_a type cifs (rw,mand) linux-1t9o:~ # echo disk > /sys/power/state bash: echo: write error: Function not implemented linux-1t9o:~ # suspend The pty is not responding to any commands. Do not know what to make out of this. I could do other things on this desktop i.e. open a new tty and execute commands in it. I created an alias like 10.0.0.1 on my eth0 interface, I can ping it but a command like this fails: mount -t cifs //10.0.0.1/<sambashare> <mountpoint> -o user=user,pass=password fails with the error mount error 112 = Host is down I did start samba daemons with command like this /usr/sbin/smbd start on this very same machine. (In reply to comment #8) > linux-1t9o:~ # uname -r > 2.6.25-rc6-default > > //192.168.1.71/winshare on /mnt/smb_a type cifs (rw,mand) > > linux-1t9o:~ # echo disk > /sys/power/state > bash: echo: write error: Function not implemented\ Perhaps you did not compile software suspned into your kernel? > linux-1t9o:~ # suspend This is a bash command that is equivalent to sending SIGSTOP to the current shell. > > > > The pty is not responding to any commands. Do not know what to make > out of this. I could do other things on this desktop i.e. open a new tty > and execute commands in it. That it did exactly what it is supposed to do. In case it was not clear enough from comment #7: In the steps to reproduce I use 'suspend' to mean putting the system into a deep sleep state such as using 'echo disk > /sys/power/state', invoking a suspend script such as '/usr/sbin/hibernate', invoking the 'Hibernate' option in some GUI or pressing a hardware button which is configured to perform one of the above. It does *not* mean invoking the bash command 'suspend' which is completely unrelated to system state and only stops that single shell. (In reply to comment #9) > I created an alias like 10.0.0.1 on my eth0 interface, I can ping it but a > command like this fails: > > mount -t cifs //10.0.0.1/<sambashare> <mountpoint> -o user=user,pass=password > > fails with the error mount error 112 = Host is down > > I did start samba daemons with command like this /usr/sbin/smbd start on > this very same machine. > Did you actually configure your samba server to export the share? Can you mount the share on any other IP (127.0.0.1, your normal outside IP)? Do you have any firewall in place? Is the server actually running (some distributions require modifying some settings to actually start services)? For me mounting a share like this works so there must be something wrong with your networking or samba server. Thanks Michael, much clearer. I can mount the samba share on a Windows XP box (over eth0) but can't mount over eth0, eth0 alias, and loopback interfaces on this 2.6.25-rc6 kernel box. Strange. On a 2.6.27.8, I can do all i.e. mount a local samba share over eth0, eth0 alias, and loopback interfaces. Will continue debugging. I am using 2.6.24 kernel, everything is fine i.e. I can create an alias, mount a share using that alias. But the .config does not have ip_tables module so I iptables command fails. Then I issue command echo disk > /sys/power/state, whole system suspends (i.e. screen blanks out and system does not respond the keyboard/mouse etc.) and comes back when I push the power button. Will repeat this on a 2.6.27 machine I have which has ip_tables module built. Scenario B listed in the first comment, does not quite recreate on 2.6.27. For suspend, I am using command echo disk > /sys/power/state. With or without a cifs mount, after the echo disk command is executed, the screen blanks out but returns with a message on pty, bash: echo: write error. Device or resource busy Investigating. I take it back, I did execute scenario B with all the steps this 2.6.27 kernel and like I see on 2.6.24, the kernel suspends and resumes when powere button is pressed. It is broken on 2.6.27-rc5 but I cannot reproduce on another machine running 2.6.28.3. I will try to upgrade the laptop to a later kernel. I'm sorry I put you through so much trouble to test a problem that seems fixed in current kernels. Thanks Michal I could not reproduce it (scenario b) on 2.6.29-rc3. Suspend worked fine (although the subsequent wakeup from suspended state failed a few minutes later, in what I believe, is an unrelated problem with video drivers). It is possible that the change to the send handling or the change to remove dir_notify from the kernel (which affected one of cifs's background threads) may be the reason why it now works. I cannot test on the laptop because 2.6.28.4 does not recognize my dm volumes. Michael, any luck on testing this on later kernels on laptop or otherwise as per your comment #18? Yes, I installed a 2.6.29 from Debian which sees my volumes. However, it still has the bug (using scenario A to reproduce). Perhaps I should look what patches Debian applies and how the other scenario works. Created attachment 22577 [details]
kernel output during suspend attempt
I've hit this accidentally on 2.6.31-rc4 by doing the following:
1. Connect to a friend's Windows laptop using smb4k through his WLAN.
2. Suspend. (no problem yet)
3. Go to my home and continue working on my WLAN.
4. Suspend triggers the bug now (because my friend's laptop cannot be reached)
So far I haven't tried to re-produce, but I will, if that helps.
If this is reproducible let us know (and if possible let us know the stack trace of the failing process) *** Bug 20442 has been marked as a duplicate of this bug. *** This bug is 100% reproducible; just mount a share from another computer; then shut down (or suspend) that computer, then try to suspend the first computer; it won't suspend. Created attachment 43632 [details]
patch -- set send and receive timeouts before attempting to connect
Does this patch happen to fix the issue? It should apply cleanly to current mainline tree.
Bug #15165 refers to much the same symptoms but with NFS (and it's not actually resolved). I wonder if it's worth making it a duplicate of this bug? They're obviously quite related. Although, sometimes I see the bug even without a share mounted: my suspicion in this case is that the nfs.mount is trying to reach the server at the time suspend is run. Steve, I would love to see this bug (assuming it is the same one) resolved and can afford a bit of time to work on the code if I understand it. Please let me/us know where/how to direct our efforts! Thanks, cheers. OK, please ignore what I said about bug #15165, I must have been high (or low) when I wrote that. What I was trying to say is that I have seen the same symptoms that this bug describes but with NFS instead CIFS. Jeff, I tried to find the equivalent code in the NFS client that your patch for CIFS affects but without success. If you (or someone) can provide a patch for NFS, I can test it fairly soon. Or, should I open a new bug, since I'm talking about NFS, not CIFS? |