Bug 11050 - software suspend does not work when mounted cifs share inaccessible
Summary: software suspend does not work when mounted cifs share inaccessible
Status: CLOSED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: CIFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Steve French
URL:
Keywords:
: 20442 (view as bug list)
Depends on:
Blocks: 7216
  Show dependency tree
 
Reported: 2008-07-07 03:25 UTC by Michal Suchanek
Modified: 2012-05-22 12:44 UTC (History)
6 users (show)

See Also:
Kernel Version: 2.6.31-rc4
Subsystem:
Regression: No
Bisected commit-id:


Attachments
the kernel messages from my log that seem relevant (5.22 KB, text/plain)
2008-07-07 03:26 UTC, Michal Suchanek
Details
patch to allow cifsnotifyd thread to be suspended (310 bytes, patch)
2008-08-28 16:27 UTC, Steve French
Details | Diff
kernel output during suspend attempt (354.84 KB, text/plain)
2009-08-02 13:53 UTC, Thiemo Nagel
Details
patch -- set send and receive timeouts before attempting to connect (3.90 KB, patch)
2011-01-14 20:26 UTC, Jeff Layton
Details | Diff

Description Michal Suchanek 2008-07-07 03:25:33 UTC
Latest working kernel version: unknown
Earliest failing kernel version: 2.6.25
Distribution: Debian
Hardware Environment: hp nc4010 notebook
Problem Description:

Software suspend does not work when there is a mounted cifs share, and the server that serves the share is inaccessible.
 
Steps to reproduce:

A)

0) make a local network where the local DHCP points to a local DNS that resolves a name such as nonexistent.nodomain to a private ip such as 10.0.0.1.
connect the PC to the local network.

1) mount //nonexistent/someshare 

2) move the PC to a public network

3) suspend

B)

1) make an alias address such as

ifconfig eth0:0 10.0.0.1

Note: does not work with 127.0.0.1 for some reason (another cifs bug?)

2) mount //10.0.0.1/someshare

3) disable access to the server: iptables -P OUTPUT DROP

4) suspend

In either case the suspend does not complete - it attempts to stop the tasks, prints a lot of backtraces and scheduler debug info, and then resumes the tasks again.
Comment 1 Michal Suchanek 2008-07-07 03:26:57 UTC
Created attachment 16758 [details]
the kernel messages from my log that seem relevant
Comment 2 Steve French 2008-08-28 16:27:15 UTC
Created attachment 17515 [details]
patch to allow cifsnotifyd thread to be suspended

Let me know if this fixes the problem
Comment 3 Steve French 2008-08-29 13:20:49 UTC
I have not been able to reproduce the original problem yet (which may be a good sign as I am running current code, 2.6.27-rc4) - but it may turn out that suspend is failing when cifs is in the process of reconnecting to the server (which might happen in the situation you describe).

You note that this fails to localhost as well - could you describe the steps you followed for failure when mounted to localhost?
Comment 4 Michal Suchanek 2008-08-30 08:26:28 UTC
You cannot make it fail with localhost, at least not with my current kernel + cifs.

However, you may try to reproduce with a local interface alias address as describe above in B).

It may take some time to apply the patch and build a kernel, I currently do not have compiled kernel sources due to disk space limitations, sorry.
Comment 5 Michal Suchanek 2008-09-11 06:54:56 UTC
With 2.6.27-rc5 I can reproduce the problem using the above steps (under B).

AFAICT the patch makes no difference.
Comment 6 Shirish Pargaonkar 2009-01-25 04:53:18 UTC
I ran command 'suspend' on a Linux workstation and it tells me
 -bash: suspend: cannot suspend a login shell

Do I need a Linux laptop for this to try out?
Comment 7 Michal Suchanek 2009-01-26 06:43:18 UTC
no, it should work equally poorly on any hardware.

However, this problem would not normally manifest on a desktop because it is typically connected to a single network only and does not move elsewhere.

If you cannot suspend your machine to disk either your suspend script or your kernel is broken.

The command to suspend is probably not 'suspend', however. 'suspend' in bash is equivalent of Ctrl+Z.

It would be something like '/sbin/hibernate' or 'echo disk > /sys/power/state'
Comment 8 Shirish Pargaonkar 2009-02-16 13:32:47 UTC
linux-1t9o:~ # uname -r
2.6.25-rc6-default

//192.168.1.71/winshare on /mnt/smb_a type cifs (rw,mand)

linux-1t9o:~ # echo disk > /sys/power/state
bash: echo: write error: Function not implemented
linux-1t9o:~ # suspend



The pty is not responding to any commands.  Do not know what to make
out of this.  I could do other things on this desktop i.e. open a new tty
and execute commands in it.
Comment 9 Shirish Pargaonkar 2009-02-17 03:07:43 UTC
I created an alias like 10.0.0.1 on my eth0 interface, I can ping it but a
command like this fails:

mount -t cifs //10.0.0.1/<sambashare> <mountpoint> -o user=user,pass=password

fails with the error    mount error 112 = Host is down

I did start samba daemons with command like this /usr/sbin/smbd start on
this very same machine.
Comment 10 Michal Suchanek 2009-02-17 04:40:20 UTC
(In reply to comment #8)
> linux-1t9o:~ # uname -r
> 2.6.25-rc6-default
> 
> //192.168.1.71/winshare on /mnt/smb_a type cifs (rw,mand)
> 
> linux-1t9o:~ # echo disk > /sys/power/state
> bash: echo: write error: Function not implemented\

Perhaps you did not compile software suspned into your kernel?

> linux-1t9o:~ # suspend
This is a bash command that is equivalent to sending SIGSTOP to the current shell.
> 
> 
> 
> The pty is not responding to any commands.  Do not know what to make
> out of this.  I could do other things on this desktop i.e. open a new tty
> and execute commands in it.

That it did exactly what it is supposed to do.

In case it was not clear enough from comment #7:

In the steps to reproduce I use 'suspend' to mean putting the system into a deep sleep state such as using 'echo disk > /sys/power/state', invoking a suspend script such as '/usr/sbin/hibernate', invoking the 'Hibernate' option in some GUI or pressing a hardware button which is configured to perform one of the above.

It does *not* mean invoking the bash command 'suspend' which is completely unrelated to system state and only stops that single shell.

(In reply to comment #9)
> I created an alias like 10.0.0.1 on my eth0 interface, I can ping it but a
> command like this fails:
> 
> mount -t cifs //10.0.0.1/<sambashare> <mountpoint> -o user=user,pass=password
> 
> fails with the error    mount error 112 = Host is down
> 
> I did start samba daemons with command like this /usr/sbin/smbd start on
> this very same machine.
> 

Did you actually configure your samba server to export the share?
Can you mount the share on any other IP (127.0.0.1, your normal outside IP)?
Do you have any firewall in place?
Is the server actually running (some distributions require modifying some settings to actually start services)?

For me mounting a share like this works so there must be something wrong with your networking or samba server.
Comment 11 Shirish Pargaonkar 2009-02-17 07:13:06 UTC
Thanks Michael, much clearer.

I can mount the samba share on a Windows XP box (over eth0) but can't mount 
over eth0, eth0 alias, and loopback interfaces on this 2.6.25-rc6 kernel box. 
Strange.

On a 2.6.27.8, I can do all i.e. mount a local samba share over eth0, eth0 
alias, and loopback interfaces.

Will continue debugging.
Comment 12 Shirish Pargaonkar 2009-02-17 10:05:48 UTC
I am using 2.6.24 kernel, everything is fine i.e. I can create an alias,
mount a share using that alias.
But the .config does not have ip_tables module so I iptables command fails.  
Then I issue command echo disk > /sys/power/state, whole system
suspends (i.e. screen blanks out and system does not respond the 
keyboard/mouse etc.) and comes back when I push the power button.

Will repeat this on a 2.6.27 machine I have which has ip_tables module built.
Comment 13 Shirish Pargaonkar 2009-02-17 12:01:27 UTC
Scenario B listed in the first comment, does not quite recreate on 2.6.27.
For suspend, I am using command  echo disk > /sys/power/state.

With or without a cifs mount, after the echo disk command is executed, the 
screen blanks out but returns with a message on pty,

bash: echo: write error. Device or resource busy

Investigating.
Comment 14 Shirish Pargaonkar 2009-02-17 12:24:04 UTC
I take it back, I did execute scenario B with all the steps this 2.6.27 kernel
and like I see on 2.6.24, the kernel suspends and resumes when powere button is 
pressed.
Comment 15 Michal Suchanek 2009-02-17 14:56:40 UTC
It is broken on 2.6.27-rc5 but I cannot reproduce on another machine running 2.6.28.3.

I will try to upgrade the laptop to a later kernel.

I'm sorry I put you through so much trouble to test a problem that seems fixed in current kernels.

Thanks

Michal
Comment 16 Steve French 2009-02-18 14:14:52 UTC
I could not reproduce it (scenario b) on 2.6.29-rc3.  Suspend worked fine (although the subsequent wakeup from suspended state failed a few minutes later, in what I believe, is an unrelated problem with video drivers).
Comment 17 Steve French 2009-02-18 14:15:53 UTC
It is possible that the change to the send handling or the change to remove dir_notify from the kernel (which affected one of cifs's background threads) may be the reason why it now works.
Comment 18 Michal Suchanek 2009-02-27 09:27:23 UTC
I cannot test on the laptop because 2.6.28.4 does not recognize my dm volumes.
Comment 19 Shirish Pargaonkar 2009-04-13 17:34:12 UTC
Michael, any luck on testing this on later kernels on laptop or otherwise
as per your comment #18?
Comment 20 Michal Suchanek 2009-05-12 12:21:16 UTC
Yes, I installed a 2.6.29 from Debian which sees my volumes.

However, it still has the bug (using scenario A to reproduce).

Perhaps I should look what patches Debian applies and how the other scenario works.
Comment 21 Thiemo Nagel 2009-08-02 13:53:48 UTC
Created attachment 22577 [details]
kernel output during suspend attempt


I've hit this accidentally on 2.6.31-rc4 by doing the following:

1. Connect to a friend's Windows laptop using smb4k through his WLAN.
2. Suspend.  (no problem yet)
3. Go to my home and continue working on my WLAN.
4. Suspend triggers the bug now (because my friend's laptop cannot be reached)

So far I haven't tried to re-produce, but I will, if that helps.
Comment 22 Steve French 2009-08-03 02:35:16 UTC
If this is reproducible let us know (and if possible let us know the stack trace of the failing process)
Comment 23 Jeff Layton 2010-11-23 12:52:00 UTC
*** Bug 20442 has been marked as a duplicate of this bug. ***
Comment 24 Andrew Zabolotny 2010-11-25 13:48:34 UTC
This bug is 100% reproducible; just mount a share from another computer; then shut down (or suspend) that computer, then try to suspend the first computer; it won't suspend.
Comment 25 Jeff Layton 2011-01-14 20:26:03 UTC
Created attachment 43632 [details]
patch -- set send and receive timeouts before attempting to connect

Does this patch happen to fix the issue? It should apply cleanly to current mainline tree.
Comment 26 Jeremy Murphy 2011-04-09 11:27:33 UTC
Bug #15165 refers to much the same symptoms but with NFS (and it's not actually resolved).  I wonder if it's worth making it a duplicate of this bug?  They're obviously quite related.  Although, sometimes I see the bug even without a share mounted: my suspicion in this case is that the nfs.mount is trying to reach the server at the time suspend is run.

Steve, I would love to see this bug (assuming it is the same one) resolved and can afford a bit of time to work on the code if I understand it.  Please let me/us know where/how to direct our efforts!  Thanks, cheers.
Comment 27 Jeremy Murphy 2011-04-12 23:53:57 UTC
OK, please ignore what I said about bug #15165, I must have been high (or low) when I wrote that.

What I was trying to say is that I have seen the same symptoms that this bug describes but with NFS instead CIFS.

Jeff, I tried to find the equivalent code in the NFS client that your patch for CIFS affects but without success.  If you (or someone) can provide a patch for NFS, I can test it fairly soon.

Or, should I open a new bug, since I'm talking about NFS, not CIFS?

Note You need to log in before you can comment on or make changes to this bug.