Bug 13011 - NFS exports of LVM snapshots get very confused
Summary: NFS exports of LVM snapshots get very confused
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: NFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: bfields
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-04 17:58 UTC by Mike Culbertson
Modified: 2010-04-20 22:58 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.24+ , possibly some earlier
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mike Culbertson 2009-04-04 17:58:34 UTC
I've tested this on various Debian and Ubuntu kernels, 2.6.24 - 29, 32 and 64 bit, and it consistently reproduces.  It does not reproduce so far on older kernels like Debian's 2.6.18 or RedHat 5's 2.6.16.

The issue is that when NFS-exporting multiple snapshots of an LVM volume, NFS clients are receiving the contents of a different snapshot than the one mounted on the export that they have requested. And in fact, all are receiving the contents of then *same snapshot* regardless of what they mount.  It may be best explained by example:

I take 3 snapshots "snap1", "snap2" and "snap3" of an LV (no special options of any kind).  I mount and export the 3 snaps individually:

/nfs/snap1      *(rw,no_root_squash,subtree_check,async)
/nfs/snap2      *(rw,no_root_squash,subtree_check,async)
/nfs/snap3      *(rw,no_root_squash,subtree_check,async) 

Mount all three exports on a client.  

root@poolshark:/mnt# df -k | egrep test[1-3]$
xamot:/nfs/test1        516096     16768    473088   4% /mnt/test1
xamot:/nfs/test2        516096     16768    473088   4% /mnt/test2
xamot:/nfs/test3        516096     16768    473088   4% /mnt/test3
root@poolshark:/mnt# 

Next, the client, write some data to one of the three snaps.

root@poolshark:/mnt# dd if=/dev/zero bs=1024k count=50 of=/mnt/test1/outfile
50+0 records in
50+0 records out
52428800 bytes (52 MB) copied, 0.565292 s, 92.7 MB/s
root@poolshark:/mnt#

Automagically, data written to any one of the NFS mounts shows up on either *all* of them or *none*.  Note the Used/Avail size have changed on all three:

root@poolshark:/mnt# df -k | egrep test[1-3]$
xamot:/nfs/test1        516096     68096    421888  14% /mnt/test1
xamot:/nfs/test2        516096     68096    421888  14% /mnt/test2
xamot:/nfs/test3        516096     68096    421888  14% /mnt/test3
root@poolshark:/mnt#

Just to be sure, md5 the output file:

f1c9645dbc14efddc7d8a322685f26eb  /mnt/snap1/outfile
f1c9645dbc14efddc7d8a322685f26eb  /mnt/snap2/outfile
f1c9645dbc14efddc7d8a322685f26eb  /mnt/snap3/outfile

Yep, same file.  Check from the server side:

root@xamot:/nfs# ls -l snap*/outfile
-rw-r--r-- 1 root root 10485760 2009-04-04 09:37 snap1/outfile
root@xamot:/nfs#

So, the client mounted /nfs/snap1, /nfs/snap2 and /nfs/snap3, but it apparently only receiving /nfs/snap1 on all the mounts.

Check the server again, writing data to all three snaps:

root@xamot:/nfs# touch snap1/this-is-snap1
root@xamot:/nfs# touch snap2/this-is-snap2
root@xamot:/nfs# touch snap3/this-is-snap3

Check the client:

root@poolshark:/mnt# find /mnt/test*/this*
/mnt/test1/this-is-snap1
/mnt/test2/this-is-snap1
/mnt/test3/this-is-snap1
root@poolshark:/mnt#

And that's it.  Behavior of the snapshots is completely normal on the server, i.e. they are separate block devices as expected, nothing unusual.  The weirdness only occurs when the snaps are NFS-exported.

Thanks in advance

-Mike
Comment 1 Trond Myklebust 2009-04-04 18:13:01 UTC
Have you tried using the 'fsid' export option?

The point is that the snapshots will (by definition) have the same UUID, and so
by default, the NFS server will construct filehandles with the same filesystem
identifier for all three exports.

If you explicitly set

/nfs/snap1      *(rw,no_root_squash,subtree_check,async,fsid=1)
/nfs/snap2      *(rw,no_root_squash,subtree_check,async,fsid=2)
/nfs/snap3      *(rw,no_root_squash,subtree_check,async,fsid=3)

then I'd expect this kind setup to work correctly.
Comment 2 Mike Culbertson 2009-04-04 18:45:38 UTC
That makes sense, and I would have thought so as well, but unfortunately it doesn't help.  In my 3-mount test setup, the server seems to be handing out the first export in the list to all mount requests, with or without 'fsid' set per export.  In fact, when I switch the order around in /etc/exports, i.e.: 

/nfs/snap3      *(rw,no_root_squash,subtree_check,async,fsid=3)
/nfs/snap2      *(rw,no_root_squash,subtree_check,async,fsid=2)
/nfs/snap1      *(rw,no_root_squash,subtree_check,async,fsid=1)

The client now receives the contents of /nfs/snap3 on all its mounts:

root@poolshark:/mnt# df -k | egrep snap[1-3]$
xamot:/nfs/snap1        516096     16768    473088   4% /mnt/snap1
xamot:/nfs/snap2        516096     16768    473088   4% /mnt/snap2
xamot:/nfs/snap3        516096     16768    473088   4% /mnt/snap3
root@poolshark:/mnt# find /mnt/test*/this*
/mnt/snap1/this-is-snap3
/mnt/snap2/this-is-snap3
/mnt/snap3/this-is-snap3
root@poolshark:/mnt# 

It does seem entirely possible that UUIDs are getting crossed somewhere...

-Mike
Comment 3 Trond Myklebust 2009-04-04 18:52:13 UTC
Did you both re-export on the server (exportfs -rv) and umount the volumes from
the client?
If you didn't umount the volumes on the client, then it will continue to use the
cached filehandles with the fsid=UUID, and the server will continue to accept
those filehandles.
Comment 4 Mike Culbertson 2009-04-04 19:16:50 UTC
Yes, I unmounted the client, modified exports, restarted NFS and re-mounted on the client. I've even gone so far as unmounting everything on the server, removing nfs state files on both systems, remounting the server, re-exporting and re-mounting on the client.  The behavior stays the same in all these cases.

I haven't found the minimum kernel version yet, but I should clarify older kernels (<= 2.6.18) don't exhibit this problem, even with the same userspace, i.e. I booted the same NFS server with an older kernel and the NFS exports behave as expected, with or without fsid set.

By the way, thanks for the quick attention!

-Mike
Comment 5 bfields 2009-04-05 19:30:32 UTC
A patch went into the kernel recently (30fa8c0157e4591ee2227aaa0b17cd3b0da5e6cb, "NFSD: FIDs need to take precedence over UUIDs") which might address this; however, that patch is in v2.6.29, which you say you've tried?  (Did you try that with fsid= set on the exports?)
Comment 6 Mike Culbertson 2009-04-05 21:55:40 UTC
I can't swear that I tested 2.6.29 on the server side, but I do have a couple boxes running it.  I'll test it out later today.
Comment 7 Mike Culbertson 2009-04-06 17:44:56 UTC
2.6.29 on the server with fsid specified in exports works properly:

client mnt # df -k |grep test
server:/nfs/test1      5160576    325376   4573056   7% /mnt/test1
server:/nfs/test2      5160576    141312   4757120   3% /mnt/test2
server:/nfs/test3      5160576    451840   4446592  10% /mnt/test3
server:/nfs/test4      5160576    325376   4573056   7% /mnt/test4
server:/nfs/test5      5160576    325376   4573056   7% /mnt/test5
client mnt # ls -l test*/this* 
-rw-r--r-- 1 root root 0 2009-04-06 10:19 test1/this-is-test1
-rw-r--r-- 1 root root 0 2009-04-06 10:19 test2/this-is-test2
-rw-r--r-- 1 root root 0 2009-04-06 10:19 test3/this-is-test3
-rw-r--r-- 1 root root 0 2009-04-06 10:19 test4/this-is-test4
-rw-r--r-- 1 root root 0 2009-04-06 10:19 test5/this-is-test5
client mnt #

So, this looks to be fine in < 2.6.20 and >= 2.6.29.
Comment 8 bfields 2009-04-07 23:01:49 UTC
OK, I'm assuming Steved's patch fixed this problem, then.  Thanks!

Note You need to log in before you can comment on or make changes to this bug.