There is a bad regression of the CIFS driver on 2.6.32 kernels (tested 2.6.32.11, 2.6.32.12 and Ubuntu 10.04 LTS 2.6.32-22-generic). The version of the server doesn't seem to matter (tested 2.6.27, 2.6.32 and Ubuntu 8.04 LTS 2.6.24-27-generic), as long as the client is 2.6.32-something. Empty files created on the server become non-empty (with the contents of some earlier written file) when read by the client. This happens when using lockf() and without using it. The following bash script shows the problem: -----------< test-cifs.sh >------------------------ #!/bin/bash while [ 1 ]; do rnd=$(( $RANDOM % 2 + 1 )) if [ -f $rnd ]; then v=$( cat $rnd ) if [ -n "$v" -a "$v" != "$rnd" ]; then echo "ERROR!!! rnd=$rnd val=$v" exit -1 fi rm -f $rnd else touch $rnd echo $rnd > $rnd fi done -----------< test-cifs.sh >------------------------ When this script runs both on the server and on the client on the same shared directory, after a few seconds (sometimes almost right away), the client will exit with the error. In a nutshell, the script creates a random (1 or 2) empty file if it doesn't exist, else makes sure it's content are either empty or the same as the file name. After the check removes the file. The script running on the server will never fail, as expected, but the client will sometimes see the file with the wrong contents. This only occurs when the client is running 2.6.32 (.11-12 or the Ubuntu 10.04 one). Any other kernel version I tested will work as expected. A workaround we found for this problem is to never delete the file, just truncate it's size to zero. In this way the problem doesn't show. Replacing the 'rm -f $rnd' with 'echo -n "" > $rnd', is the script equivalent of this workaround. I haven't tested 2.6.33 yet because we will have to work with 2.6.32 for the time being, so it's not a solution for me. I'm available for any further information.
Thanks for the bug report. Does this problem go away if you mount with '-o noserverino' ?
No. "noserverino" doesn't make any difference. For the record, I'm mounting a public share with the simple command: mount -t cifs //<IP-ADDRESS>/<SHARE> dir/ -o pass= I only pass "pass=" to avoid the password promp. But any other mount option I tried didn't make a difference, and as I tried on different systems (minimal busybox based system, Ubuntu 8.04, 9.10 & 10.04) I don't think samba version matters too.
Shirish, do you have some time to look at this? I sort of suspect that this may be related to the create on lookup stuff...
Jeff, I need to spend next two hours on a stop ship issue, the old problem you had fixed, Busy inodes after unmount of cifs, seems to be back. I will spend time on this after that meeting is over an update this bug.
Just for information, as I also tested this using Ubuntu 10.04 I opened bug 577031 on their launchpad [1]. [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/577031
Looking. Not sure if you have access to a non-unix server like a Windows server to recreate this problem (actually I will try that myself too). And you can also see whether the problem recreates against a Samba servr when you disable unix extensions (and I will try that myself too). echo 0 > /proc/fs/cifs/LinuxExtensionsEnabled before you mount a Samba server.
(seems like bugzilla is acting strange today) I tried your suggestion on the proc LinuxExtensionsEnabled value and it works! At least for some 10-15 minutes, when I stopped it. But I have to say that my minimalistic samba server were I saw the error was already configured with Linux extensions disabled. Could the client be using Linux Extensions even if the server refused? I don't have a windows machine here, and now only Monday I can test this with a windows machine. The test was made with a Ubuntu 9.10 machine acting as server and Ubuntu 10.04 (the one with the 2.6.32 kernel) acting as the client (after the proc thing). I'm going to let the test run for some more time to make sure it works, but you seem to have nailed something...
The test run for 40 minutes and no error, with the proc thing.
(In reply to comment #7) > But I have to say that my minimalistic samba server were I saw the error was > already configured with Linux extensions disabled. Have you disabled unix extensions on Samba server in global section? > Could the client be using > Linux Extensions even if the server refused? No. Can you verify that unix extensions are indeed turned off in capabilities field in the negotiate protocol response from server and in the subsequent session setup request from client?
I verified my config and it seems along the time I removed that configuration option and forgot about it. Sorry about that.
I retested two machines with 2.6.32.11. Both had "/proc/fs/cifs/LinuxExtensionsEnabled" set to 0 and the samba configuration included "unix extensions = no". The script failed two times right away, as before, but on the third time it continued running until I stopped it. Tried it again, unmounting and remounting the share. This time I didn't made a "ls" on the shared dir and now it failed for 4 or 5 times and then continued running for a long time until I stopped it. If I stop it and re-run without the unmounting it seems to continue without error. So it seems the "Unix Extensions" configuration isn't the real culprit, and maybe the "stat" system call has some effect on it. I'm thinking that I have done several "dir's" on the share before running the test and that made the error go away with my earlier test. Any ideas on what to test next?
I copied the script/testcase you have listed in a file, mounted a share from a Samba server (Version 3.4.0-GIT-1dadf17-devel), cd'ed to the mount point and executed the script. It has been running for nearly three hours now, generating files 1 and 2. No errors i.e. script has not stopped running.
I am running with linux exensions enabled both on the server and client. But can you confirm that the smb.conf option unix exensions = no is in the global section and not in a stanza of this particular share? Also, it would be useful to doublecheck that unix extensions is indeed off in an packet exchange in a trace file like in wireshark trace file.
I can confirm that "unix extensions=no" is set on the "[global]" section. I will set up wireshark to check the packet exchange. Just to confirm something: are you running the script BOTH on the client and the server? If it's running only on the client it will not stop.
Yes, unix extensions bits are off. And I was running the script only on client. I should be running on both. Will resume this problem_recreation/debugging tomorrow.
did not take too long, when running on both the machines, client and server, the script on client stopped in no time. Will resume debugging later today/tomorow.
Did some more testing and can add that this bug also applies to 2.6.31.13, 2.6.32.13 and 2.6.33.4. The kernel I know it doesn't apply is 2.6.27.46. I will narrow down the exact version where it appeared, but probably only tomorrow.
Didn't have to test much more. 2.6.30.10 doesn't show the bug. So something in 2.6.31 broke the driver. I'll test 2.6.31.0 next just to make sure it wasn't a stable update on 2.6.31.
Confirmed. 2.6.31.0 shows the bug. I have never done a git bisection test before, so I hope this narrows down the problem enough to solve this.
Reassigning to Shrish while he works on this.
I have a hunch that I know what this is. Here's my suspicion: Most likely the underlying filesystem that samba is serving out is one that recycles inode numbers quickly. It ends up in a situation like this: There's an inode on the share and client is aware of it and its contents server deletes that inode client creates an inode, server's filesystem recycles the inode number and returns it client gets confused and thinks that the inode is the original one that was deleted Now, one would think that -o noserverino should work around this, but it seems like there are a couple of bugs involving that and unix extensions (I'll pass along a patch for that in a bit). Also, one would think that this shouldn't be a problem if the filesystem being served has fine-grained timestamps. Nuno, can you tell us what sort of local filesystem the server is serving out? ext3? ext4? xfs? something else?
Oddly enough, we could probably do a better job with this against windows. Windows has a proper create time on each file, and we could conceivably use that as a way to verify the "uniqueness" of the inode. This might be the death-knell for keeping serverino the default. I don't see a good way to reliably detect this situation given what the protocol provides.
Server file systems tested were ext2 (1GB IDE compact flash), ext3 or reiserfs (the last ones for real rotating disks). The one I tested most as server was reiserfs (not reiser4).
Now that I think it better, I don't think I actually tested ext3. So it was either ext2 or reiserfs.
Testing against something like jfs or ext4 would be useful too.
Ok, I think both of those filesystems have 1s timestamp granularity. As Shrish says, it might be an interesting datapoint to know whether something with sub-second timestamps (ext4, xfs, etc) fares better here.
I can't easily test ext4 in the machine I am right now (Ubuntu 8.04 doesn't support ext4 yet) but I will try xfs and jfs and post the results.
XFS doesn't seem to suffer from the problem. I only let it run for 5 minutes, but in all other cases it stopped right away. JFS also doesn't seem to suffer from the problem. More 5 minutes without stopping. EXT3 also didn't stop, so I guess I had actually never tested it before. I then retested with ext2 just to make sure it was something I did different this time. It stopped as expected.
Interesting. ext3 also has 1s timestamp granularity, so it seems likely that the inode number allocation scheme might be different between it and ext2. That could also affect the outcome here.
Created attachment 26380 [details] patch -- fix noserverino handling when unix extensions are enabled These two patches should fix the handling of the "noserverino" option when unix extensions are enabled. Nuno, could you test this patch with your reproducer? The share will most likely need to be mounted with the '-o noserverino' option in order for it to help.
The patch doesn't apply on 2.6.32 or 2.6.33 so I assume it's 2.6.34 material. My current kernel build scripts don't accept rc kernels so give me some time to do it manually ;-)
I "git cloned" the kernel with: "git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6" and tried to apply the patch with: $ patch -p1 --dry-run -i /home/lucas/Desktop/cifs-unix-noserverino.patch patching file fs/cifs/inode.c patching file fs/cifs/cifsproto.h Hunk #1 FAILED at 110. 1 out of 1 hunk FAILED -- saving rejects to file fs/cifs/cifsproto.h.rej patching file fs/cifs/dir.c Hunk #1 FAILED at 248. 1 out of 1 hunk FAILED -- saving rejects to file fs/cifs/dir.c.rej patching file fs/cifs/inode.c Hunk #1 succeeded at 170 (offset 1 line). Hunk #2 succeeded at 334 (offset 1 line). Hunk #3 succeeded at 1210 (offset 18 lines). Am I doing something weird? "git log" shows: $ git log commit 6a251b0ab67989f468f4cb65179e0cf40cf8c295 Merge: 9e766d8... 5051d41... Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu May 13 14:48:10 2010 -0700 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: mfd: Clean up after WM83xx AUXADC interrupt if it arrives late
Created attachment 26381 [details] patchset 2 -- fix noserverino handling when unix extensions are enabled Here's the patchset I think we should consider for upstream. It fixes some problems with noserverino handling when unix extensions are disabled, and disables server inode numbers by default. I hate to go full circle on this, but given the problems we've seen with leaving them enabled I don't think we can reasonably keep them enabled by default. Thoughts?
Created attachment 26382 [details] patchset 2 -- fix noserverino handling when unix extensions are enabled Oops, uploaded the wrong one. This one should be correct.
Ahh, the patch is against the tip of Steve French's tree (he's the kernel CIFS maintainer). I believe there are some commits in his tree that aren't yet in Linus'.
Ok. But I'm no kernel hacker, so if you could give me a primer on "How to test this patch for dummies" I would be most grateful ;-)
I applied the patch by hand. I'll let you know of the results...
Created attachment 26383 [details] Patch rebased for linux2.6 git HEAD The patch seems to work. I attached my hand made patch applied to linux2.6 git HEAD, just for information. I have to do something else now, but I'll latter retest with more time.
Thanks for testing it. I think there are still a couple of small bugs in the patch, but it's good that it fixed the problem for you. I talked to Steve F. on the phone and he is opposed to changing the default back to noserverino. That still leaves us a problem though: In this situation, how do we tell that a delete/recreate happened on the server? Unfortunately, with unix extensions, I don't believe that we reasonably can at this time. I'll respin the patchset with some other changes when I have time.
I'll go ahead and take this bug back now. Shirish, if you want it back, feel free to take it and run with it...
Created attachment 26385 [details] 2.6.34-HEAD + linux2.6 patch kernel oops The test was running for some time (more than 30 minutes) and when I stopped it and did a "cd" to another directory, this kernel bug was emitted. Not sure if the reason was because it wasn't applied after the other changes on the maintainer tree.
[ 3280.188367] VFS: Busy inodes after unmount of cifs. Self-destruct in 5 seconds. Have a nice day... That's probably the cause of the oops. Icky, but most likely unrelated to this I think. Can you do this though? Open the kernel module with gdb: $ gdb /path/to/cifs.ko ...and then do: (gdb) list *(cifs_writepages+0x35) and paste the output here? That should tell us where it panicked.
The cifs module was compiled built-in, so can't do it. Is there some way to do the same thing on the kernel? I looked around but couldn't find a valid kernel object with symbols for gdb to read.
(In reply to comment #42) Not very clear, did cifs module oops after "Busy inodes" message was logged?
Yes, but not right away. Looking at the kernel dmesg I can say that that message was logged some 10 minutes before I stopped the test (with Ctrl+C), did a "cd" (to go to the home dir) and almost right away the oops was emitted.
Created attachment 26386 [details] cifs_writepages() assembly and source I'm attaching start of cifs_writepages() and corresponding source code. I believe it should be enough for someone used to this kind of things to see where the oops occurred. I'm not that person, unfortunately. I have to stop for today. I'll try to help in what I can over the weekend.
Probably panicked dereferencing this: if (cifs_sb->wsize < PAGE_CACHE_SIZE) return generic_writepages(mapping, wbc); Most likely cifs_sb was bogus. Not too surprising since the superblock had been freed. Unfortunately, busy inodes after umount problems are very difficult to track down. What would help most for that is a reliable way to reproduce the problem.
Could this be some kind of race condition? Some inodes off of an unmounted filesystem with pages to be written out to the backing store? I think Jeff is right, EIP: [<c110d057>] cifs_writepages+0x35/0x60a SS:ESP 0068:f7885dcc EAX: 00000400 2f73: 81 78 14 ff 0f 00 00 cmpl $0xfff,0x14(%eax) if (cifs_sb->wsize < PAGE_CACHE_SIZE)
Maybe. Are we not reliably flushing on close()? I'll need to take a look and see if I can reproduce it on my more recent kernel.
Oh, and btw...you can also run gdb on the uncompressed vmlinux image that the kernel build produces...
Created attachment 26389 [details] patchset 3 Here's a third patchset that's probably pretty close to what I'll propose upstream. The main change is that it also forces revalidation of the file after the last close. I doubt this will help the reproducer here, but it might. The default setting of serverino will need to be dealt with and discussed separately. I still think it's the safest course of action, but we should probably take it to the list. I'll also need to see if I can reproduce this problem myself, especially the busy inodes after umount problem.
Testing this with an ext4 fs on the server and it ran for a long time. I did get these sorts of errors when it ran concurrently, but it never exited. I suppose that they are expected? ./test-cifs.sh: line 15: 2: Invalid argument ./test-cifs.sh: line 15: 1: Invalid argument ./test-cifs.sh: line 15: 1: No such file or directory cat: 2: No such file or directory ./test-cifs.sh: line 15: 2: Invalid argument ./test-cifs.sh: line 15: 2: Invalid argument cat: 1: No such file or directory cat: 2: No such file or directory cat: 2: No such file or directory ./test-cifs.sh: line 15: 2: No such file or directory cat: 2: No such file or directory ./test-cifs.sh: line 15: 1: Invalid argument touch: setting times of `2': Invalid argument ./test-cifs.sh: line 15: 2: No such file or directory ./test-cifs.sh: line 15: 1: No such file or directory ./test-cifs.sh: line 15: 1: No such file or directory ./test-cifs.sh: line 15: 1: No such file or directory With an ext3 fs on the server, the run lasted a little longer with my latest patchset, but still eventually the script exited. I also tried setting cifs_i->invalid_mapping to true in the last cifs_close for an inode and it still failed. Not sure what to make of that yet.
I only tested ext3 for about 5 minutes so it's possible I just didn't wait long enough. Those errors are normal. I did thought of making the script less noisy but that would make it longer and less readable, and the important thing was for it to stop with the contents of the wrong file read. Another thing I thought was to write both the value and the pid, so one could know what process created those contents, but the same reasons apply. Tomorrow will test ext3 more throughly with your third patchset. While one could dismiss ext2 or reiserfs as not important for the majority of "modern" users (although they are important for my use-case), ext3 is still a major player out there.
(In reply to comment #21) > I have a hunch that I know what this is. Here's my suspicion: > > Most likely the underlying filesystem that samba is serving out is one that > recycles inode numbers quickly. It ends up in a situation like this: > > There's an inode on the share and client is aware of it and its contents > server deletes that inode > client creates an inode, server's filesystem recycles the inode number and > returns it > client gets confused and thinks that the inode is the original one that was > deleted > > Now, one would think that -o noserverino should work around this, but it > seems > like there are a couple of bugs involving that and unix extensions (I'll pass > along a patch for that in a bit). Jeff, fixing the bugs mentioned above would make serverino little more dependable? Given that recycling of inode number is not all that common, we still could suggest users hitting this sort of problem to switch to 'serverino'? Are there any more issues or gaps..?
(In reply to comment #42) > [ 3280.188367] VFS: Busy inodes after unmount of cifs. Self-destruct in 5 > seconds. Have a nice day... > > That's probably the cause of the oops. Icky, but most likely unrelated to > this > I think. > Yes, 'VFS: Busy inodes error' seems unrelated to this patch. I saw this while trying out test_cifs.sh on a box without Jeff's patchset..
(In reply to comment #54) > Jeff, fixing the bugs mentioned above would make serverino little more > dependable? Given that recycling of inode number is not all that common, we > still could suggest users hitting this sort of problem to switch to > 'serverino'? Are there any more issues or gaps..? I've gone ahead and posted some of the patches to fix noserverino problems to the list. serverino has its own problems when inode numbers are recycled quickly. While it's a pain, the sad fact is that a lot of existing deployments still use ext3 so this is a problem for a large number of existing deployments. I'm still of the opinion that we should revert the default to "noserverino", but we should probably discuss that on-list to see what the right course of action is.
I think I have a reliable way of reproducing this "VFS: Busy inode error". On a 2.6.34-rc6 kernel (without any of Jeff's recent patches), I could reproduce this with the following steps: * mount the Samba exported cifs share (unix extensions enabled) * Run on the server on the share while true; do touch 1 2; echo 1 > 1;echo 2 > 2; rm 1 2; done * Run on the client on the cifs mount point while true; do touch 1 2; echo 1 > 1;echo 2 > 2; rm 1 2; done * Stop the script on the client * Stop on the server * cd to a different dir other than mountpoint on the client * umount the mountpoint on the client * VFS: Busy inode error appears on the dmesg/syslog I'm yet to narrow down the cause, will keep looking..
Any news on this? I had to give up on a quick fix and work around the bug on the software side. But I'm still interested, as even if 99% of the times the workaround works there is still a race-condition that would trigger this bug and then "bad things" could happen...
Updated the kernel version field where this bug applies. If is there some better syntax to mark multiple kernel versions, please fix.
Is this still an issue in recent kernels? We've patched a number of problems in this area in the last few months.
I am running a test for 15 minutes and the bug has not been triggered yet. Server is running 2.6.36.2 and the client is 2.6.32.27 (Ubuntu 10.04 variant). Don't have much more time today, but will check this better tomorrow...
Results using the reiserfs file system (v3.6): - Server with kernel 2.6.32-27.49 (Ubuntu 10.04 current stable version) Samba server 3.4.7~dfsg-1ubuntu3.2. Client with kernel 2.6.36.2 triggers the bug in a few seconds. - Server with kernel 2.6.36.2 (vanilla kernel) Samba server version 3.5.6. Client with 2.6.32-27.49 (Ubuntu 10.04) does NOT trigger the bug after more than 40 minutes of testing. - Server with kernel 2.6.35.10 (vanilla kernel) Samba server version 3.5.6. Client with 2.6.32-27.49 (Ubuntu 10.04) does NOT trigger the bug after more than 30 minutes of testing. And, just to make sure, server with the 2.6.34.7 kernel version, does trigger the bug in a few seconds too, as expected. One thing I noticed with 2.6.34.7 is that if the server starts the test first and then the client, it takes longer (or never) to trigger the bug. If the client starts running the test and only after I start the test on the server, then is almost guaranteed to trigger (on the client). I re-tested 2.6.35.10 using this last piece of info and could not reproduce, so it seems 2.6.35.10 does FIX the bug. Now all that may be lacking is understanding exactly what fixes the bug so a patch can be made for distros that ship kernels prior than 2.6.35 (like the Ubuntu 10.04 LTS version and Debian 6.0), but I suppose that is their job.
Ok, thanks for testing it. I'll close this with a resolution of CODE_FIX. Please reopen if the bug resurfaces and I'll take another look.
FYI, I could reproduce the bug with 2.6.35.0, so either I didn't test 2.6.35.10 throughly - and, eventually, 2.6.36.2 - or it was fixed by a stable update. Will follow up on this tomorrow...
Did something very wrong while testing this yesterday. Was doing several things at the same time and it shows... I can now reproduce in all kernels, including the 2.6.37 version released today. Will add another comment shortly, with a comprehensive test-case for reproduction.
I'm using two machines: one with a standard Ubuntu 10.04 LTS version and another (Intel 945GSE board) with a busybox based minimal system (running directly from memory from the initramfs), including samba version 3.5.7 and, now, with a vanilla 2.6.37 kernel. I'm sharing a directory located in a disk partition formated with Reiserfs 3.6. The relevant "smb.conf" config is: ------------------------------------------- [GLOBAL] security = share show add printer wizard = no store dos attributes = no unix charset = ISO-8859-1 dos filemode = yes preserve case = no short preserve case = no map archive = no map hidden = no kernel oplocks = yes nt acl support = no unix extensions = no [PDATA] force user = root create mode = 0666 guest ok = yes writable = yes ------------------------------------------- (1) On the server machine (with the 2.6.37 kernel), start the test script: root@sdinux ~/data # ./test.sh (2) Mount the shared folder on the Ubuntu machine (assuming I already created the "xx" directory) with the command: lucas@fonseca:~/mtn$ sudo mount -t cifs //10.0.0.101/PDATA xx/ -o user=guest,pass=,uid=lucas lucas@fonseca:~/mtn$ cd xx (2) Now, on the Ubuntu machine, run the test script: lucas@fonseca:~/mtn/xx$ ./test.sh (4) After a very few seconds (if any at all), the Ubuntu client should stop with an output like this: cat: 1: No such file or directory cat: 2: No such file or directory cat: 1: No such file or directory touch: setting times of `2': No such file or directory ./test.sh: line 15: 2: No such file or directory cat: 2: No such file or directory ERROR!!! rnd=2 val=1 If this doesn't happen right away, then it can work for hours without triggering the bug, so unmount the folder and retry (go to step (2). Hope this helps!
A little... The problem with the testing that you're doing is that you seem to be varying the kernel on the server rather than the client. The server's kernel is really meaningless here. What really matters is whether updating the kernel on the *client* machine makes this behave better. I've tried reproducing this quite a few times over the last hour or so and have not been able to do so on a 2.6.37 kernel on the client.
Also, when you run this, do you pretty consistently see a message like this pop? -----------[snip]------------ CIFS VFS: Autodisabling the use of server inode numbers on \\server\share. This server doesn't seem to support them properly. Hardlinks will not be recognized on this mount. Consider mounting with the "noserverino" option to silence this message. -----------[snip]------------
(had to use a special tool to remove my hand from my forehead) Sometimes it's better to not have holidays... My head is still not functioning right on the second day after restarting work... Ok, sorry for the noise. Will check this up again, as I only concluded that Ubuntu 10.4 is still bugged, as expected... ... Now I think it's final: - Ubuntu 10.04 (2.6.32) as server - Linux 2.6.37 as client Following my earlier test case (inverting the server and the client machines), the client triggers the bug right away. At least it was easy to confirm, even if 2 days too late... In relation to your last question, no, never saw it. dmesg doesn't show it on either machine.
One thing I noticed with 2.6.37 is that the bug now triggers relatively fast every time I tried.
Well, doubtful we'd have made 2.6.37 with this anyway. I'm still having a hard time understanding what's happening as I'm unable to reproduce it so far. One more question: With a 2.6.37 kernel on the client, is this reproducible if you do the cifs mount with '-o noserverino' ?
I could not trigger the bug when "noserverino" was added, with either 2.6.36.2 and 2.6.37. I will work on making available the minimal ram based system I use for testing (a busybox based initramfs including Samba and other things - 17MB compressed). Right now it assumes a special hard disk layout, but with some small changes it will work well without any disk needed.
I made available on the web the minimal system I use for testing [1] In that directory listing you will find a "vmlinuz" and an "initrd.gz" files for it, which you can use via pxeboot or any other way. The login and password is "root". Also available is the config I used to build the kernel. You can easily run it via qemu with "qemu -kernel vmlinuz -initrd initrd.gz" It has dropbear (ssh server) available, but by default doesn't allow password entry (runs with "-s"), so you need to kill and restart it to use ssh/scp. I tested it first on a netbook and could trigger the bug, although again failed to trigger it if I add "noserverino" to the mount options. [1] http://www.sdilab.pt/tmp/
Ok, I think I understand what the problem is here. When the client does a lookup (to hook up a filename to an inode), it checks to see if the inode is a duplicate of an existing inode. To do this, it fetches the "uniqueid" of the inode which is a unique identifier across the scope of a share. Samba generally manufactures this value from the st_ino and st_dev fields in a stat() call. The problem here is that the underlying filesystem here quickly reuses inode numbers. To see this do: $ touch a; stat a; rm a; touch b; stat b; rm b ...you'll see that 'a' and 'b' end up with the same inode number. This makes it very difficult for the client to tell that these are not the same file due to a create/delete cycle. In my situation, I find that the client code generally autodisables server inode numbers quickly when I run this reproducer. The client does a QUERY_PATH_INFO call, and then another call to try and get the inode number. Usually, I hit a race where the first call works, and the second doesn't because the file has been deleted on the server. This arguably a bug, but it usually causes server inode numbers to be quickly disabled in my testing. Now, what may help in this case is this patch, which I've been trying to get the maintainer to take for the last 6 months: http://git.kernel.org/?p=linux/kernel/git/jlayton/linux.git;a=commitdiff;h=00f1d9a051ec30f11464486e5f250dd9fe442251 ...could you test it and let me know if it does?
The other problem you likely have is that reiserfs3 only has 1s timestamp granularity. So, if the create/delete cycles all happen within the same second (and that's easily possible) then the client can't detect changes. You may have better luck with a filesystem that has more granular timestamp capability (e.g. ext4, xfs, btrfs, etc.). Another alternative is to consider using Pavel Shilovsky's "strictcache" patches that should make 2.6.38. Those bring the semantics of Linux CIFS closer into line with the core cifs protocol (by disabling client-side caching unless you hold an oplock).
I tried 2.6.37 with your patch applied it doesn't seem to help when "serverino" is enabled (the default). I still can't trigger the bug with "noserverino". Your description of the problem makes sense. In the reiserfs file system your small test does actually show the same inode numbers for the two files, while on the ramfs and tmpfs systems they don't. I'm inclined to think the fault is really with the default being changed to "serverino", but several kernel versions latter I'm not sure (no hope) they will change it back, even if it's a regression. It's the second time we had to change the software implementation because of cifs regressions, the first was because the non-LFS file access was broken, so we had to recompile the software with "large file support" enabled to make it work. I don't remember the details but I think it also had to do with the introduction of the "serverino" mount option -- something like we either recompiled the program or force the option, which is ironic as if we did the latter then we couldn't opt for using "noserverino" now. This is more difficult because we can never be sure it will not happen sometime. We only changed the implementation to fix the place it was frequently making problems, and hope we don't have some rare cases that consistently fail. The choice of the file system is complicated. On compact flash based machines ext2 is only one to minimally reassure us, because we can't be sure of the reliability of the flash wear (we can't be picky with what we get). On disk based machines, we ended up choosing reiserfs because fsck for ext3 (and I believe ext4) forces a power user to run it to fix more serious errors, which we can't hope to have in front of the computer. I was waiting for btrfs to stabilize to replace reiserfs. Maybe now it's the right time... Anyway, at least it seems "noserverino" does fix the problem now, which didn't with the 2.6.32 kernel.
Yes, reverting the serverino default would introduce yet another regression (problems dealing with hardlinks). I think the best recommendation we can give is for you to just use noserverino until you can migrate to a more modern filesystem with better timestamp granularity. Network filesystems (including NFSv2/3 and CIFS) that don't have cache consistency mechanisms built into the protocol are highly dependent on server timestamp granularity. Even with "noserverino" you may still see cache coherency problems if you have multiple clients operating on the same files. Much like with NFSv2/3, it's possible to get a sequence of events where the file will be changed but the timestamps won't appear to have changed and the cache will be considered valid when it shouldn't be. I'll go ahead and close this as resolved. Please reopen it if you want to discuss it further.