Bug 15925 - Empty file creation corruption on CIFS filesystem
Empty file creation corruption on CIFS filesystem
Status: RESOLVED WILL_NOT_FIX
Product: File System
Classification: Unclassified
Component: CIFS
All Linux
: P1 high
Assigned To: Jeff Layton
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-06 22:43 UTC by Nuno Lucas
Modified: 2011-01-07 21:18 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.31.0, 2.6.32, 2.6.33, 2.6.34
Tree: Mainline
Regression: Yes


Attachments
patch -- fix noserverino handling when unix extensions are enabled (3.71 KB, patch)
2010-05-14 14:10 UTC, Jeff Layton
Details | Diff
patchset 2 -- fix noserverino handling when unix extensions are enabled (3.71 KB, patch)
2010-05-14 15:16 UTC, Jeff Layton
Details | Diff
patchset 2 -- fix noserverino handling when unix extensions are enabled (7.06 KB, patch)
2010-05-14 15:18 UTC, Jeff Layton
Details | Diff
Patch rebased for linux2.6 git HEAD (3.49 KB, patch)
2010-05-14 16:14 UTC, Nuno Lucas
Details | Diff
2.6.34-HEAD + linux2.6 patch kernel oops (34.11 KB, text/plain)
2010-05-14 17:12 UTC, Nuno Lucas
Details
cifs_writepages() assembly and source (2.93 KB, text/plain)
2010-05-14 18:44 UTC, Nuno Lucas
Details
patchset 3 (6.69 KB, patch)
2010-05-14 20:09 UTC, Jeff Layton
Details | Diff

Description Nuno Lucas 2010-05-06 22:43:58 UTC
There is a bad regression of the CIFS driver on 2.6.32 kernels (tested 2.6.32.11, 2.6.32.12 and Ubuntu 10.04 LTS 2.6.32-22-generic).

The version of the server doesn't seem to matter (tested 2.6.27, 2.6.32 and Ubuntu 8.04 LTS 2.6.24-27-generic), as long as the client is 2.6.32-something.

Empty files created on the server become non-empty (with the contents of some earlier written file) when read by the client. This happens when using lockf() and without using it.

The following bash script shows the problem:

-----------< test-cifs.sh >------------------------
#!/bin/bash

while [ 1 ];
do
    rnd=$(( $RANDOM % 2 + 1 ))
    if [ -f $rnd ]; then
        v=$( cat $rnd  )
        if [ -n "$v" -a "$v" != "$rnd" ]; then
            echo "ERROR!!! rnd=$rnd val=$v"
            exit -1
        fi
        rm -f $rnd
    else
        touch $rnd
        echo $rnd > $rnd
    fi
done
-----------< test-cifs.sh >------------------------

When this script runs both on the server and on the client on the same shared directory, after a few seconds (sometimes almost right away), the client will exit with the error.

In a nutshell, the script creates a random (1 or 2) empty file if it doesn't exist, else makes sure it's content are either empty or the same as the file name. After the check removes the file.

The script running on the server will never fail, as expected, but the client will sometimes see the file with the wrong contents.

This only occurs when the client is running 2.6.32 (.11-12 or the Ubuntu 10.04 one). Any other kernel version I tested will work as expected.

A workaround we found for this problem is to never delete the file, just truncate it's size to zero. In this way the problem doesn't show.

Replacing the 'rm -f $rnd' with 'echo -n "" > $rnd', is the script equivalent of this workaround.

I haven't tested 2.6.33 yet because we will have to work with 2.6.32 for the time being, so it's not a solution for me.

I'm available for any further information.
Comment 1 Jeff Layton 2010-05-07 07:34:23 UTC
Thanks for the bug report. Does this problem go away if you mount with '-o noserverino' ?
Comment 2 Nuno Lucas 2010-05-07 11:03:28 UTC
No. "noserverino" doesn't make any difference.
For the record, I'm mounting a public share with the simple command:

mount -t cifs //<IP-ADDRESS>/<SHARE> dir/ -o pass=

I only pass "pass=" to avoid the password promp.

But any other mount option I tried didn't make a difference, and as I tried on different systems (minimal busybox based system, Ubuntu 8.04, 9.10 & 10.04) I don't think samba version matters too.
Comment 3 Jeff Layton 2010-05-07 14:30:57 UTC
Shirish, do you have some time to look at this? I sort of suspect that this may be related to the create on lookup stuff...
Comment 4 Shirish Pargaonkar 2010-05-07 14:38:33 UTC
Jeff, I need to spend next two hours on a stop ship issue, the old problem
you had fixed, Busy inodes after unmount of cifs, seems to be back.
I will spend time on this after that meeting is over an update this bug.
Comment 5 Nuno Lucas 2010-05-07 14:56:52 UTC
Just for information, as I also tested this using Ubuntu 10.04 I opened bug 577031 on their launchpad [1].

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/577031
Comment 6 Shirish Pargaonkar 2010-05-07 18:21:16 UTC
Looking.  Not sure if you have access to a non-unix server like a Windows
server to recreate this problem (actually I will try that myself too).
And you can also see whether the problem recreates against a Samba servr
when you disable unix extensions (and I will try that myself too).
echo 0 > /proc/fs/cifs/LinuxExtensionsEnabled before you mount a Samba server.
Comment 7 Nuno Lucas 2010-05-07 23:47:40 UTC
(seems like bugzilla is acting strange today)

I tried your suggestion on the proc LinuxExtensionsEnabled value and it works! At least for some 10-15 minutes, when I stopped it.

But I have to say that my minimalistic samba server were I saw the error was already configured with Linux extensions disabled. Could the client be using Linux Extensions even if the server refused?

I don't have a windows machine here, and now only Monday I can test this with a windows machine.

The test was made with a Ubuntu 9.10 machine acting as server and Ubuntu 10.04 (the one with the 2.6.32 kernel) acting as the client (after the proc thing).

I'm going to let the test run for some more time to make sure it works, but you seem to have nailed something...
Comment 8 Nuno Lucas 2010-05-08 00:38:19 UTC
The test run for 40 minutes and no error, with the proc thing.
Comment 9 Shirish Pargaonkar 2010-05-08 16:17:13 UTC
(In reply to comment #7)
> But I have to say that my minimalistic samba server were I saw the error was
> already configured with Linux extensions disabled. 

Have you disabled unix extensions on Samba server in global section?

> Could the client be using
> Linux Extensions even if the server refused?

No.  Can you verify that unix extensions are indeed turned off in capabilities
field in the negotiate protocol response from server and in the subsequent 
session setup request from client?
Comment 10 Nuno Lucas 2010-05-08 19:01:57 UTC
I verified my config and it seems along the time I removed that configuration option and forgot about it. Sorry about that.
Comment 11 Nuno Lucas 2010-05-10 17:54:46 UTC
I retested two machines with 2.6.32.11.

Both had "/proc/fs/cifs/LinuxExtensionsEnabled" set to 0 and the samba configuration included "unix extensions = no".

The script failed two times right away, as before, but on the third time it continued running until I stopped it.

Tried it again, unmounting and remounting the share. This time I didn't made a "ls" on the shared dir and now it failed for 4 or 5 times and then continued running for a long time until I stopped it.

If I stop it and re-run without the unmounting it seems to continue without error.

So it seems the "Unix Extensions" configuration isn't the real culprit, and maybe the "stat" system call has some effect on it.

I'm thinking that I have done several "dir's" on the share before running the test and that made the error go away with my earlier test.

Any ideas on what to test next?
Comment 12 Shirish Pargaonkar 2010-05-10 18:05:37 UTC
I copied the script/testcase you have listed in a file, mounted a share
from a Samba server (Version 3.4.0-GIT-1dadf17-devel), cd'ed to the mount
point and executed the script.

It has been running for nearly three hours now, generating files 1 and 2.
No errors i.e. script has not stopped running.
Comment 13 Shirish Pargaonkar 2010-05-10 18:20:03 UTC
I am running with linux exensions enabled both on the server and client.

But can you confirm that the smb.conf option
 unix exensions = no
is in the global section and not in a stanza of this particular share?
Also, it would be useful to doublecheck that unix extensions is indeed off
in an packet exchange in a trace file like in wireshark trace file.
Comment 14 Nuno Lucas 2010-05-10 19:01:57 UTC
I can confirm that "unix extensions=no" is set on the "[global]" section.

I will set up wireshark to check the packet exchange.

Just to confirm something: are you running the script BOTH on the client and the server? If it's running only on the client it will not stop.
Comment 15 Shirish Pargaonkar 2010-05-10 22:16:31 UTC
Yes, unix extensions bits are off.
And I was running the script only on client. I should be running on both.
Will resume this problem_recreation/debugging tomorrow.
Comment 16 Shirish Pargaonkar 2010-05-10 22:20:05 UTC
did not take too long, when running on both the machines, client and server,
the script on client stopped in no time.
Will resume debugging later today/tomorow.
Comment 17 Nuno Lucas 2010-05-13 18:56:06 UTC
Did some more testing and can add that this bug also applies to 2.6.31.13, 2.6.32.13 and 2.6.33.4.

The kernel I know it doesn't apply is 2.6.27.46.

I will narrow down the exact version where it appeared, but probably only tomorrow.
Comment 18 Nuno Lucas 2010-05-13 19:18:45 UTC
Didn't have to test much more. 2.6.30.10 doesn't show the bug.

So something in 2.6.31 broke the driver.

I'll test 2.6.31.0 next just to make sure it wasn't a stable update on 2.6.31.
Comment 19 Nuno Lucas 2010-05-13 19:27:46 UTC
Confirmed. 2.6.31.0 shows the bug.

I have never done a git bisection test before, so I hope this narrows down the problem enough to solve this.
Comment 20 Jeff Layton 2010-05-13 19:30:24 UTC
Reassigning to Shrish while he works on this.
Comment 21 Jeff Layton 2010-05-14 12:53:47 UTC
I have a hunch that I know what this is. Here's my suspicion:

Most likely the underlying filesystem that samba is serving out is one that recycles inode numbers quickly. It ends up in a situation like this:

There's an inode on the share and client is aware of it and its contents
server deletes that inode
client creates an inode, server's filesystem recycles the inode number and returns it
client gets confused and thinks that the inode is the original one that was deleted

Now, one would think that -o noserverino should work around this, but it seems like there are a couple of bugs involving that and unix extensions (I'll pass along a patch for that in a bit).

Also, one would think that this shouldn't be a problem if the filesystem being served has fine-grained timestamps.

Nuno, can you tell us what sort of local filesystem the server is serving out? ext3? ext4? xfs? something else?
Comment 22 Jeff Layton 2010-05-14 13:06:22 UTC
Oddly enough, we could probably do a better job with this against windows. Windows has a proper create time on each file, and we could conceivably use that as a way to verify the "uniqueness" of the inode.

This might be the death-knell for keeping serverino the default. I don't see a good way to reliably detect this situation given what the protocol provides.
Comment 23 Nuno Lucas 2010-05-14 13:15:19 UTC
Server file systems tested were ext2 (1GB IDE compact flash), ext3 or reiserfs (the last ones for real rotating disks).

The one I tested most as server was reiserfs (not reiser4).
Comment 24 Nuno Lucas 2010-05-14 13:18:45 UTC
Now that I think it better, I don't think I actually tested ext3.
So it was either ext2 or reiserfs.
Comment 25 Shirish Pargaonkar 2010-05-14 13:21:29 UTC
Testing against something like jfs or ext4 would be useful too.
Comment 26 Jeff Layton 2010-05-14 13:25:32 UTC
Ok, I think both of those filesystems have 1s timestamp granularity.

As Shrish says, it might be an interesting datapoint to know whether something with sub-second timestamps (ext4, xfs, etc) fares better here.
Comment 27 Nuno Lucas 2010-05-14 13:27:26 UTC
I can't easily test ext4 in the machine I am right now (Ubuntu 8.04 doesn't support ext4 yet) but I will try xfs and jfs and post the results.
Comment 28 Nuno Lucas 2010-05-14 13:59:50 UTC
XFS doesn't seem to suffer from the problem. I only let it run for 5 minutes, but in all other cases it stopped right away.

JFS also doesn't seem to suffer from the problem. More 5 minutes without stopping.

EXT3 also didn't stop, so I guess I had actually never tested it before.

I then retested with ext2 just to make sure it was something I did different this time. It stopped as expected.
Comment 29 Jeff Layton 2010-05-14 14:04:44 UTC
Interesting. ext3 also has 1s timestamp granularity, so it seems likely that the inode number allocation scheme might be different between it and ext2. That could also affect the outcome here.
Comment 30 Jeff Layton 2010-05-14 14:10:06 UTC
Created attachment 26380 [details]
patch -- fix noserverino handling when unix extensions are enabled

These two patches should fix the handling of the "noserverino" option when unix extensions are enabled. Nuno, could you test this patch with your reproducer? The share will most likely need to be mounted with the '-o noserverino' option in order for it to help.
Comment 31 Nuno Lucas 2010-05-14 14:31:21 UTC
The patch doesn't apply on 2.6.32 or 2.6.33 so I assume it's 2.6.34 material.
My current kernel build scripts don't accept rc kernels so give me some time to do it manually ;-)
Comment 32 Nuno Lucas 2010-05-14 15:13:20 UTC
I "git cloned" the kernel with:

"git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6"

and tried to apply the patch with:

$ patch -p1 --dry-run -i /home/lucas/Desktop/cifs-unix-noserverino.patch
patching file fs/cifs/inode.c
patching file fs/cifs/cifsproto.h
Hunk #1 FAILED at 110.
1 out of 1 hunk FAILED -- saving rejects to file fs/cifs/cifsproto.h.rej
patching file fs/cifs/dir.c
Hunk #1 FAILED at 248.
1 out of 1 hunk FAILED -- saving rejects to file fs/cifs/dir.c.rej
patching file fs/cifs/inode.c
Hunk #1 succeeded at 170 (offset 1 line).
Hunk #2 succeeded at 334 (offset 1 line).
Hunk #3 succeeded at 1210 (offset 18 lines).

Am I doing something weird?

"git log" shows:

$ git log
commit 6a251b0ab67989f468f4cb65179e0cf40cf8c295
Merge: 9e766d8... 5051d41...
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu May 13 14:48:10 2010 -0700

    Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
      mfd: Clean up after WM83xx AUXADC interrupt if it arrives late
Comment 33 Jeff Layton 2010-05-14 15:16:30 UTC
Created attachment 26381 [details]
patchset 2 -- fix noserverino handling when unix extensions are enabled

Here's the patchset I think we should consider for upstream. It fixes some problems with noserverino handling when unix extensions are disabled, and disables server inode numbers by default.

I hate to go full circle on this, but given the problems we've seen with leaving them enabled I don't think we can reasonably keep them enabled by default.

Thoughts?
Comment 34 Jeff Layton 2010-05-14 15:18:57 UTC
Created attachment 26382 [details]
patchset 2 -- fix noserverino handling when unix extensions are enabled

Oops, uploaded the wrong one. This one should be correct.
Comment 35 Jeff Layton 2010-05-14 15:24:18 UTC
Ahh, the patch is against the tip of Steve French's tree (he's the kernel CIFS maintainer). I believe there are some commits in his tree that aren't yet in Linus'.
Comment 36 Nuno Lucas 2010-05-14 15:28:53 UTC
Ok. But I'm no kernel hacker, so if you could give me a primer on "How to test this patch for dummies" I would be most grateful ;-)
Comment 37 Nuno Lucas 2010-05-14 15:39:32 UTC
I applied the patch by hand. I'll let you know of the results...
Comment 38 Nuno Lucas 2010-05-14 16:14:36 UTC
Created attachment 26383 [details]
Patch rebased for linux2.6 git HEAD

The patch seems to work. I attached my hand made patch applied to linux2.6 git HEAD, just for information.

I have to do something else now, but I'll latter retest with more time.
Comment 39 Jeff Layton 2010-05-14 16:28:04 UTC
Thanks for testing it. I think there are still a couple of small bugs in the patch, but it's good that it fixed the problem for you.

I talked to Steve F. on the phone and he is opposed to changing the default back to noserverino. That still leaves us a problem though: In this situation, how do we tell that a delete/recreate happened on the server? Unfortunately, with unix extensions, I don't believe that we reasonably can at this time.

I'll respin the patchset with some other changes when I have time.
Comment 40 Jeff Layton 2010-05-14 16:29:30 UTC
I'll go ahead and take this bug back now. Shirish, if you want it back, feel free to take it and run with it...
Comment 41 Nuno Lucas 2010-05-14 17:12:07 UTC
Created attachment 26385 [details]
2.6.34-HEAD + linux2.6 patch  kernel oops

The test was running for some time (more than 30 minutes) and when I stopped it and did a "cd" to another directory, this kernel bug was emitted.

Not sure if the reason was because it wasn't applied after the other changes on the maintainer tree.
Comment 42 Jeff Layton 2010-05-14 18:00:52 UTC
[ 3280.188367] VFS: Busy inodes after unmount of cifs. Self-destruct in 5 seconds.  Have a nice day...

That's probably the cause of the oops. Icky, but most likely unrelated to this I think.

Can you do this though? Open the kernel module with gdb:

$ gdb /path/to/cifs.ko

...and then do:

(gdb) list *(cifs_writepages+0x35)

and paste the output here? That should tell us where it panicked.
Comment 43 Nuno Lucas 2010-05-14 18:14:16 UTC
The cifs module was compiled built-in, so can't do it.
Is there some way to do the same thing on the kernel?
I looked around but couldn't find a valid kernel object with symbols for gdb to read.
Comment 44 Shirish Pargaonkar 2010-05-14 18:20:10 UTC
(In reply to comment #42)

Not very clear, did cifs module oops after "Busy inodes" message was logged?
Comment 45 Nuno Lucas 2010-05-14 18:27:26 UTC
Yes, but not right away.
Looking at the kernel dmesg I can say that that message was logged some 10 minutes before I stopped the test (with Ctrl+C), did a "cd" (to go to the home dir) and almost right away the oops was emitted.
Comment 46 Nuno Lucas 2010-05-14 18:44:56 UTC
Created attachment 26386 [details]
cifs_writepages() assembly and source

I'm attaching start of cifs_writepages() and corresponding source code. I believe it should be enough for someone used to this kind of things to see where the oops occurred.

I'm not that person, unfortunately.

I have to stop for today. I'll try to help in what I can over the weekend.
Comment 47 Jeff Layton 2010-05-14 18:49:37 UTC
Probably panicked dereferencing this:

	if (cifs_sb->wsize < PAGE_CACHE_SIZE)
		return generic_writepages(mapping, wbc);

Most likely cifs_sb was bogus. Not too surprising since the superblock had been freed. Unfortunately, busy inodes after umount problems are very difficult to track down. What would help most for that is a reliable way to reproduce the problem.
Comment 48 Shirish Pargaonkar 2010-05-14 19:25:18 UTC
Could this be some kind of race condition?
Some inodes off of an unmounted filesystem with pages to be 
written out to the backing store?

I think Jeff is right, 

EIP: [<c110d057>] cifs_writepages+0x35/0x60a SS:ESP 0068:f7885dcc

EAX: 00000400 

2f73:       81 78 14 ff 0f 00 00    cmpl   $0xfff,0x14(%eax)

if (cifs_sb->wsize < PAGE_CACHE_SIZE)
Comment 49 Jeff Layton 2010-05-14 20:03:52 UTC
Maybe. Are we not reliably flushing on close()?

I'll need to take a look and see if I can reproduce it on my more recent kernel.
Comment 50 Jeff Layton 2010-05-14 20:05:47 UTC
Oh, and btw...you can also run gdb on the uncompressed vmlinux image that the kernel build produces...
Comment 51 Jeff Layton 2010-05-14 20:09:34 UTC
Created attachment 26389 [details]
patchset 3

Here's a third patchset that's probably pretty close to what I'll propose upstream. The main change is that it also forces revalidation of the file after the last close. I doubt this will help the reproducer here, but it might.

The default setting of serverino will need to be dealt with and discussed separately. I still think it's the safest course of action, but we should probably take it to the list.

I'll also need to see if I can reproduce this problem myself, especially the busy inodes after umount problem.
Comment 52 Jeff Layton 2010-05-14 20:32:14 UTC
Testing this with an ext4 fs on the server and it ran for a long time. I did get these sorts of errors when it ran concurrently, but it never exited. I suppose that they are expected?

./test-cifs.sh: line 15: 2: Invalid argument
./test-cifs.sh: line 15: 1: Invalid argument
./test-cifs.sh: line 15: 1: No such file or directory
cat: 2: No such file or directory
./test-cifs.sh: line 15: 2: Invalid argument
./test-cifs.sh: line 15: 2: Invalid argument
cat: 1: No such file or directory
cat: 2: No such file or directory
cat: 2: No such file or directory
./test-cifs.sh: line 15: 2: No such file or directory
cat: 2: No such file or directory
./test-cifs.sh: line 15: 1: Invalid argument
touch: setting times of `2': Invalid argument
./test-cifs.sh: line 15: 2: No such file or directory
./test-cifs.sh: line 15: 1: No such file or directory
./test-cifs.sh: line 15: 1: No such file or directory
./test-cifs.sh: line 15: 1: No such file or directory

With an ext3 fs on the server, the run lasted a little longer with my latest patchset, but still eventually the script exited. I also tried setting cifs_i->invalid_mapping to true in the last cifs_close for an inode and it still failed. Not sure what to make of that yet.
Comment 53 Nuno Lucas 2010-05-14 23:25:46 UTC
I only tested ext3 for about 5 minutes so it's possible I just didn't wait long enough.

Those errors are normal. I did thought of making the script less noisy but that would make it longer and less readable, and the important thing was for it to stop with the contents of the wrong file read.

Another thing I thought was to write both the value and the pid, so one could know what process created those contents, but the same reasons apply.

Tomorrow will test ext3 more throughly with your third patchset.
While one could dismiss ext2 or reiserfs as not important for the majority of "modern" users (although they are important for my use-case), ext3 is still a major player out there.
Comment 54 Suresh Jayaraman 2010-05-17 06:28:22 UTC
(In reply to comment #21)
> I have a hunch that I know what this is. Here's my suspicion:
> 
> Most likely the underlying filesystem that samba is serving out is one that
> recycles inode numbers quickly. It ends up in a situation like this:
> 
> There's an inode on the share and client is aware of it and its contents
> server deletes that inode
> client creates an inode, server's filesystem recycles the inode number and
> returns it
> client gets confused and thinks that the inode is the original one that was
> deleted
> 
> Now, one would think that -o noserverino should work around this, but it seems
> like there are a couple of bugs involving that and unix extensions (I'll pass
> along a patch for that in a bit).

Jeff, fixing the bugs mentioned above would make serverino little more dependable? Given that recycling of inode number is not all that common, we still could suggest users hitting this sort of problem to switch to 'serverino'? Are there any more issues or gaps..?
Comment 55 Suresh Jayaraman 2010-05-17 08:05:26 UTC
(In reply to comment #42)
> [ 3280.188367] VFS: Busy inodes after unmount of cifs. Self-destruct in 5
> seconds.  Have a nice day...
> 
> That's probably the cause of the oops. Icky, but most likely unrelated to this
> I think.
> 

Yes, 'VFS: Busy inodes error' seems unrelated to this patch. I saw this while trying out test_cifs.sh on a box without Jeff's patchset..
Comment 56 Jeff Layton 2010-05-17 11:25:14 UTC
(In reply to comment #54)

> Jeff, fixing the bugs mentioned above would make serverino little more
> dependable? Given that recycling of inode number is not all that common, we
> still could suggest users hitting this sort of problem to switch to
> 'serverino'? Are there any more issues or gaps..?

I've gone ahead and posted some of the patches to fix noserverino problems to the list. 
serverino has its own problems when inode numbers are recycled quickly. While it's a pain, the sad fact is that a lot of existing deployments still use ext3 so this is a problem for a large number of existing deployments.

I'm still of the opinion that we should revert the default to "noserverino", but we should probably discuss that on-list to see what the right course of action is.
Comment 57 Suresh Jayaraman 2010-05-17 15:11:54 UTC
I think I have a reliable way of reproducing this "VFS: Busy inode error". On a 2.6.34-rc6 kernel (without any of Jeff's recent patches), I could reproduce this with the following steps:

* mount the Samba exported cifs share (unix extensions enabled)
* Run on the server on the share
    while true; do touch 1 2; echo 1 > 1;echo 2 > 2; rm 1 2; done
* Run on the client on the cifs mount point
    while true; do touch 1 2; echo 1 > 1;echo 2 > 2; rm 1 2; done
* Stop the script on the client
* Stop on the server
* cd to a different dir other than mountpoint on the client
* umount the mountpoint on the client
* VFS: Busy inode error appears on the dmesg/syslog

I'm yet to narrow down the cause, will keep looking..
Comment 58 Nuno Lucas 2010-05-27 15:15:26 UTC
Any news on this?
I had to give up on a quick fix and work around the bug on the software side.
But I'm still interested, as even if 99% of the times the workaround works there is still a race-condition that would trigger this bug and then "bad things" could happen...
Comment 59 Nuno Lucas 2010-05-27 15:21:20 UTC
Updated the kernel version field where this bug applies.
If is there some better syntax to mark multiple kernel versions, please fix.
Comment 60 Jeff Layton 2011-01-02 12:45:26 UTC
Is this still an issue in recent kernels? We've patched a number of problems in this area in the last few months.
Comment 61 Nuno Lucas 2011-01-03 19:28:21 UTC
I am running a test for 15 minutes and the bug has not been triggered yet.
Server is running 2.6.36.2 and the client is 2.6.32.27 (Ubuntu 10.04 variant).

Don't have much more time today, but will check this better tomorrow...
Comment 62 Nuno Lucas 2011-01-04 19:07:18 UTC
Results using the reiserfs file system (v3.6):

- Server with kernel 2.6.32-27.49 (Ubuntu 10.04 current stable version)
  Samba server 3.4.7~dfsg-1ubuntu3.2.
  Client with kernel 2.6.36.2 triggers the bug in a few seconds.
- Server with kernel 2.6.36.2 (vanilla kernel)
  Samba server version 3.5.6.
  Client with 2.6.32-27.49 (Ubuntu 10.04) does NOT trigger the bug after
  more than 40 minutes of testing.
- Server with kernel 2.6.35.10 (vanilla kernel)
  Samba server version 3.5.6.
  Client with 2.6.32-27.49 (Ubuntu 10.04) does NOT trigger the bug after
  more than 30 minutes of testing.

And, just to make sure, server with the 2.6.34.7 kernel version, does trigger the bug in a few seconds too, as expected.

One thing I noticed with 2.6.34.7 is that if the server starts the test first and then the client, it takes longer (or never) to trigger the bug. If the client starts running the test and only after I start the test on the server, then is almost guaranteed to trigger (on the client).

I re-tested 2.6.35.10 using this last piece of info and could not reproduce, so it seems 2.6.35.10 does FIX the bug.

Now all that may be lacking is understanding exactly what fixes the bug so a patch can be made for distros that ship kernels prior than 2.6.35 (like the Ubuntu 10.04 LTS version and Debian 6.0), but I suppose that is their job.
Comment 63 Jeff Layton 2011-01-04 19:35:11 UTC
Ok, thanks for testing it. I'll close this with a resolution of CODE_FIX. Please reopen if the bug resurfaces and I'll take another look.
Comment 64 Nuno Lucas 2011-01-04 20:12:45 UTC
FYI, I could reproduce the bug with 2.6.35.0, so either I didn't test 2.6.35.10 throughly - and, eventually, 2.6.36.2 - or it was fixed by a stable update.
Will follow up on this tomorrow...
Comment 65 Nuno Lucas 2011-01-05 13:46:00 UTC
Did something very wrong while testing this yesterday. Was doing several things at the same time and it shows...

I can now reproduce in all kernels, including the 2.6.37 version released today.

Will add another comment shortly, with a comprehensive test-case for reproduction.
Comment 66 Nuno Lucas 2011-01-05 14:15:05 UTC
I'm using two machines: one with a standard Ubuntu 10.04 LTS version and another (Intel 945GSE board) with a busybox based minimal system (running directly from memory from the initramfs), including samba version 3.5.7 and, now, with a vanilla 2.6.37 kernel.

I'm sharing a directory located in a disk partition formated with Reiserfs 3.6.

The relevant "smb.conf" config is:
-------------------------------------------
[GLOBAL]
security = share
show add printer wizard = no
store dos attributes = no
unix charset = ISO-8859-1
dos filemode = yes
preserve case = no
short preserve case = no
map archive = no
map hidden = no
kernel oplocks = yes
nt acl support = no
unix extensions = no

[PDATA]
force user = root
create mode = 0666
guest ok = yes
writable = yes
-------------------------------------------

(1) On the server machine (with the 2.6.37 kernel), start the test script:

root@sdinux ~/data # ./test.sh

(2) Mount the shared folder on the Ubuntu machine (assuming I already created the "xx" directory) with the command:

lucas@fonseca:~/mtn$ sudo mount -t cifs //10.0.0.101/PDATA xx/ -o user=guest,pass=,uid=lucas
lucas@fonseca:~/mtn$ cd xx

(2) Now, on the Ubuntu machine, run the test script:

lucas@fonseca:~/mtn/xx$ ./test.sh

(4) After a very few seconds (if any at all), the Ubuntu client should stop with an output like this:

cat: 1: No such file or directory
cat: 2: No such file or directory
cat: 1: No such file or directory
touch: setting times of `2': No such file or directory
./test.sh: line 15: 2: No such file or directory
cat: 2: No such file or directory
ERROR!!! rnd=2 val=1

If this doesn't happen right away, then it can work for hours without triggering  the bug, so unmount the folder and retry (go to step (2).

Hope this helps!
Comment 67 Jeff Layton 2011-01-05 18:35:45 UTC
A little...

The problem with the testing that you're doing is that you seem to be varying the kernel on the server rather than the client. The server's kernel is really meaningless here. What really matters is whether updating the kernel on the *client* machine makes this behave better.

I've tried reproducing this quite a few times over the last hour or so and have not been able to do so on a 2.6.37 kernel on the client.
Comment 68 Jeff Layton 2011-01-05 18:46:05 UTC
Also, when you run this, do you pretty consistently see a message like this pop?

-----------[snip]------------
CIFS VFS: Autodisabling the use of server inode numbers on \\server\share. This server doesn't seem to support them properly. Hardlinks will not be recognized on this mount. Consider mounting with the "noserverino" option to silence this message.
-----------[snip]------------
Comment 69 Nuno Lucas 2011-01-05 19:03:41 UTC
(had to use a special tool to remove my hand from my forehead)

Sometimes it's better to not have holidays...
My head is still not functioning right on the second day after restarting work...

Ok, sorry for the noise.
Will check this up again, as I only concluded that Ubuntu 10.4 is still bugged, as expected...

...

Now I think it's final:
 - Ubuntu 10.04 (2.6.32) as server
 - Linux 2.6.37 as client

Following my earlier test case (inverting the server and the client machines), the client triggers the bug right away.

At least it was easy to confirm, even if 2 days too late...

In relation to your last question, no, never saw it. dmesg doesn't show it on either machine.
Comment 70 Nuno Lucas 2011-01-05 19:15:02 UTC
One thing I noticed with 2.6.37 is that the bug now triggers relatively fast every time I tried.
Comment 71 Jeff Layton 2011-01-05 19:47:45 UTC
Well, doubtful we'd have made 2.6.37 with this anyway. I'm still having a hard time understanding what's happening as I'm unable to reproduce it so far.

One more question: With a 2.6.37 kernel on the client, is this reproducible if you do the cifs mount with '-o noserverino' ?
Comment 72 Nuno Lucas 2011-01-06 16:53:54 UTC
I could not trigger the bug when "noserverino" was added, with either 2.6.36.2 and 2.6.37.

I will work on making available the minimal ram based system I use for testing (a busybox based initramfs including Samba and other things - 17MB compressed).
Right now it assumes a special hard disk layout, but with some small changes it will work well without any disk needed.
Comment 73 Nuno Lucas 2011-01-07 00:19:42 UTC
I made available on the web the minimal system I use for testing [1]
In that directory listing you will find a "vmlinuz" and an "initrd.gz" files for it, which you can use via pxeboot or any other way.

The login and password is "root".

Also available is the config I used to build the kernel.

You can easily run it via qemu with "qemu -kernel vmlinuz -initrd initrd.gz"
It has dropbear (ssh server) available, but by default doesn't allow password entry (runs with "-s"), so you need to kill and restart it to use ssh/scp.

I tested it first on a netbook and could trigger the bug, although again failed to trigger it if I add "noserverino" to the mount options.


[1] http://www.sdilab.pt/tmp/
Comment 74 Jeff Layton 2011-01-07 15:25:22 UTC
Ok, I think I understand what the problem is here. When the client does a lookup (to hook up a filename to an inode), it checks to see if the inode is a duplicate of an existing inode. To do this, it fetches the "uniqueid" of the inode which is a unique identifier across the scope of a share. Samba generally manufactures this value from the st_ino and st_dev fields in a stat() call.

The problem here is that the underlying filesystem here quickly reuses inode numbers. To see this do:

$ touch a; stat a; rm a; touch b; stat b; rm b

...you'll see that 'a' and 'b' end up with the same inode number. This makes it very difficult for the client to tell that these are not the same file due to a create/delete cycle.

In my situation, I find that the client code generally autodisables server inode numbers quickly when I run this reproducer. The client does a QUERY_PATH_INFO call, and then another call to try and get the inode number. Usually, I hit a race where the first call works, and the second doesn't because the file has been deleted on the server. This arguably a bug, but it usually causes server inode numbers to be quickly disabled in my testing.

Now, what may help in this case is this patch, which I've been trying to get the maintainer to take for the last 6 months:

http://git.kernel.org/?p=linux/kernel/git/jlayton/linux.git;a=commitdiff;h=00f1d9a051ec30f11464486e5f250dd9fe442251

...could you test it and let me know if it does?
Comment 75 Jeff Layton 2011-01-07 16:04:51 UTC
The other problem you likely have is that reiserfs3 only has 1s timestamp granularity. So, if the create/delete cycles all happen within the same second (and that's easily possible) then the client can't detect changes.

You may have better luck with a filesystem that has more granular timestamp capability (e.g. ext4, xfs, btrfs, etc.). Another alternative is to consider using Pavel Shilovsky's "strictcache" patches that should make 2.6.38. Those bring the semantics of Linux CIFS closer into line with the core cifs protocol (by disabling client-side caching unless you hold an oplock).
Comment 76 Nuno Lucas 2011-01-07 19:47:30 UTC
I tried 2.6.37 with your patch applied it doesn't seem to help when "serverino" is enabled (the default).
I still can't trigger the bug with "noserverino".

Your description of the problem makes sense.
In the reiserfs file system your small test does actually show the same inode numbers for the two files, while on the ramfs and tmpfs systems they don't.

I'm inclined to think the fault is really with the default being changed to "serverino", but several kernel versions latter I'm not sure (no hope) they will change it back, even if it's a regression.

It's the second time we had to change the software implementation because of cifs regressions, the first was because the non-LFS file access was broken, so we had to recompile the software with "large file support" enabled to make it work.
I don't remember the details but I think it also had to do with the introduction of the "serverino" mount option -- something like we either recompiled the program or force the option, which is ironic as if we did the latter then we couldn't opt for using "noserverino" now.

This is more difficult because we can never be sure it will not happen sometime. We only changed the implementation to fix the place it was frequently making problems, and hope we don't have some rare cases that consistently fail.

The choice of the file system is complicated. On compact flash based machines ext2 is only one to minimally reassure us, because we can't be sure of the reliability of the flash wear (we can't be picky with what we get). On disk based machines, we ended up choosing reiserfs because fsck for ext3 (and I believe ext4) forces a power user to run it to fix more serious errors, which we can't hope to have in front of the computer.
I was waiting for btrfs to stabilize to replace reiserfs. Maybe now it's the right time...

Anyway, at least it seems "noserverino" does fix the problem now, which didn't with the 2.6.32 kernel.
Comment 77 Jeff Layton 2011-01-07 21:18:40 UTC
Yes, reverting the serverino default would introduce yet another regression (problems dealing with hardlinks). I think the best recommendation we can give is for you to just use noserverino until you can migrate to a more modern filesystem with better timestamp granularity.

Network filesystems (including NFSv2/3 and CIFS) that don't have cache consistency mechanisms built into the protocol are highly dependent on server timestamp granularity. Even with "noserverino" you may still see cache coherency problems if you have multiple clients operating on the same files.

Much like with NFSv2/3, it's possible to get a sequence of events where the file will be changed but the timestamps won't appear to have changed and the cache will be considered valid when it shouldn't be.

I'll go ahead and close this as resolved. Please reopen it if you want to discuss it further.

Note You need to log in before you can comment on or make changes to this bug.