I've tested this on various Debian and Ubuntu kernels, 2.6.24 - 29, 32 and 64 bit, and it consistently reproduces. It does not reproduce so far on older kernels like Debian's 2.6.18 or RedHat 5's 2.6.16. The issue is that when NFS-exporting multiple snapshots of an LVM volume, NFS clients are receiving the contents of a different snapshot than the one mounted on the export that they have requested. And in fact, all are receiving the contents of then *same snapshot* regardless of what they mount. It may be best explained by example: I take 3 snapshots "snap1", "snap2" and "snap3" of an LV (no special options of any kind). I mount and export the 3 snaps individually: /nfs/snap1 *(rw,no_root_squash,subtree_check,async) /nfs/snap2 *(rw,no_root_squash,subtree_check,async) /nfs/snap3 *(rw,no_root_squash,subtree_check,async) Mount all three exports on a client. root@poolshark:/mnt# df -k | egrep test[1-3]$ xamot:/nfs/test1 516096 16768 473088 4% /mnt/test1 xamot:/nfs/test2 516096 16768 473088 4% /mnt/test2 xamot:/nfs/test3 516096 16768 473088 4% /mnt/test3 root@poolshark:/mnt# Next, the client, write some data to one of the three snaps. root@poolshark:/mnt# dd if=/dev/zero bs=1024k count=50 of=/mnt/test1/outfile 50+0 records in 50+0 records out 52428800 bytes (52 MB) copied, 0.565292 s, 92.7 MB/s root@poolshark:/mnt# Automagically, data written to any one of the NFS mounts shows up on either *all* of them or *none*. Note the Used/Avail size have changed on all three: root@poolshark:/mnt# df -k | egrep test[1-3]$ xamot:/nfs/test1 516096 68096 421888 14% /mnt/test1 xamot:/nfs/test2 516096 68096 421888 14% /mnt/test2 xamot:/nfs/test3 516096 68096 421888 14% /mnt/test3 root@poolshark:/mnt# Just to be sure, md5 the output file: f1c9645dbc14efddc7d8a322685f26eb /mnt/snap1/outfile f1c9645dbc14efddc7d8a322685f26eb /mnt/snap2/outfile f1c9645dbc14efddc7d8a322685f26eb /mnt/snap3/outfile Yep, same file. Check from the server side: root@xamot:/nfs# ls -l snap*/outfile -rw-r--r-- 1 root root 10485760 2009-04-04 09:37 snap1/outfile root@xamot:/nfs# So, the client mounted /nfs/snap1, /nfs/snap2 and /nfs/snap3, but it apparently only receiving /nfs/snap1 on all the mounts. Check the server again, writing data to all three snaps: root@xamot:/nfs# touch snap1/this-is-snap1 root@xamot:/nfs# touch snap2/this-is-snap2 root@xamot:/nfs# touch snap3/this-is-snap3 Check the client: root@poolshark:/mnt# find /mnt/test*/this* /mnt/test1/this-is-snap1 /mnt/test2/this-is-snap1 /mnt/test3/this-is-snap1 root@poolshark:/mnt# And that's it. Behavior of the snapshots is completely normal on the server, i.e. they are separate block devices as expected, nothing unusual. The weirdness only occurs when the snaps are NFS-exported. Thanks in advance -Mike
Have you tried using the 'fsid' export option? The point is that the snapshots will (by definition) have the same UUID, and so by default, the NFS server will construct filehandles with the same filesystem identifier for all three exports. If you explicitly set /nfs/snap1 *(rw,no_root_squash,subtree_check,async,fsid=1) /nfs/snap2 *(rw,no_root_squash,subtree_check,async,fsid=2) /nfs/snap3 *(rw,no_root_squash,subtree_check,async,fsid=3) then I'd expect this kind setup to work correctly.
That makes sense, and I would have thought so as well, but unfortunately it doesn't help. In my 3-mount test setup, the server seems to be handing out the first export in the list to all mount requests, with or without 'fsid' set per export. In fact, when I switch the order around in /etc/exports, i.e.: /nfs/snap3 *(rw,no_root_squash,subtree_check,async,fsid=3) /nfs/snap2 *(rw,no_root_squash,subtree_check,async,fsid=2) /nfs/snap1 *(rw,no_root_squash,subtree_check,async,fsid=1) The client now receives the contents of /nfs/snap3 on all its mounts: root@poolshark:/mnt# df -k | egrep snap[1-3]$ xamot:/nfs/snap1 516096 16768 473088 4% /mnt/snap1 xamot:/nfs/snap2 516096 16768 473088 4% /mnt/snap2 xamot:/nfs/snap3 516096 16768 473088 4% /mnt/snap3 root@poolshark:/mnt# find /mnt/test*/this* /mnt/snap1/this-is-snap3 /mnt/snap2/this-is-snap3 /mnt/snap3/this-is-snap3 root@poolshark:/mnt# It does seem entirely possible that UUIDs are getting crossed somewhere... -Mike
Did you both re-export on the server (exportfs -rv) and umount the volumes from the client? If you didn't umount the volumes on the client, then it will continue to use the cached filehandles with the fsid=UUID, and the server will continue to accept those filehandles.
Yes, I unmounted the client, modified exports, restarted NFS and re-mounted on the client. I've even gone so far as unmounting everything on the server, removing nfs state files on both systems, remounting the server, re-exporting and re-mounting on the client. The behavior stays the same in all these cases. I haven't found the minimum kernel version yet, but I should clarify older kernels (<= 2.6.18) don't exhibit this problem, even with the same userspace, i.e. I booted the same NFS server with an older kernel and the NFS exports behave as expected, with or without fsid set. By the way, thanks for the quick attention! -Mike
A patch went into the kernel recently (30fa8c0157e4591ee2227aaa0b17cd3b0da5e6cb, "NFSD: FIDs need to take precedence over UUIDs") which might address this; however, that patch is in v2.6.29, which you say you've tried? (Did you try that with fsid= set on the exports?)
I can't swear that I tested 2.6.29 on the server side, but I do have a couple boxes running it. I'll test it out later today.
2.6.29 on the server with fsid specified in exports works properly: client mnt # df -k |grep test server:/nfs/test1 5160576 325376 4573056 7% /mnt/test1 server:/nfs/test2 5160576 141312 4757120 3% /mnt/test2 server:/nfs/test3 5160576 451840 4446592 10% /mnt/test3 server:/nfs/test4 5160576 325376 4573056 7% /mnt/test4 server:/nfs/test5 5160576 325376 4573056 7% /mnt/test5 client mnt # ls -l test*/this* -rw-r--r-- 1 root root 0 2009-04-06 10:19 test1/this-is-test1 -rw-r--r-- 1 root root 0 2009-04-06 10:19 test2/this-is-test2 -rw-r--r-- 1 root root 0 2009-04-06 10:19 test3/this-is-test3 -rw-r--r-- 1 root root 0 2009-04-06 10:19 test4/this-is-test4 -rw-r--r-- 1 root root 0 2009-04-06 10:19 test5/this-is-test5 client mnt # So, this looks to be fine in < 2.6.20 and >= 2.6.29.
OK, I'm assuming Steved's patch fixed this problem, then. Thanks!