Subject : regression: am-utils stopped working in 2.6.25-rc* Submitter : Meelis Roos <mroos@linux.ee> Date : 2008-03-28 15:20 References : http://lkml.org/lkml/2008/3/28/174 This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline.
I'll bet you it is this one: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=f9c3a3802119a2d30f3e4a69aef30a81e09d0209 Furthermore, if I'm reading the am-utils code right, then the bug is theirs: they appear to be advertising a mount structure version of 'NFS_MOUNT_VERSION' (which they appear to take from /usr/include/linux/nfs_mount.h), while actually using their private 'struct nfs_args', which is a copy of struct nfs_mount version 4. IOW: when NFS_MOUNT_VERSION == 6, then they are failing to initialise both the 'pseudoflavor' and 'context' fields.
Adding Eric Paris as Cc, in case he has any comments.
> Furthermore, if I'm reading the am-utils code right, then the bug is theirs: > they appear to be advertising a mount structure version of > 'NFS_MOUNT_VERSION' > (which they appear to take from /usr/include/linux/nfs_mount.h), while > actually > using their private 'struct nfs_args', which is a copy of struct nfs_mount > version 4. > > IOW: when NFS_MOUNT_VERSION == 6, then they are failing to initialise both > the > 'pseudoflavor' and 'context' fields. I changed the assingmnet to be 4 instead of NFS_MOUNT_VERSION and recompiled am-utils. The result is definitely different - it now emits Invalid hostname "pid5765@rhn:/net" in NFS lock request in dmesg like -rc3 did but does still not work. Have not had time for more bisecting or kernel compilation on that machine :(
OK. That looks as if it is falling afoul of the sanity checking in the new __nsm_find() routine. Why is am-utils stuffing "pid5765@rhn:/net" into the 'hostname' field of the mount structure instead of providing the _server_ hostname as it is supposed to? The comment at the top of 'libamu/mount_fs.c:compute_nfs_args()' even says that the argument is supposed to be the 'name of remote NFS host'. Sigh...
AFAICS, amd mounted partitions have been broken w.r.t. locking since 2.6.19 when http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=8dead0dbd478f35fd943f3719591e5af1ac0950d went into the kernel. The only difference now is that we report it at mount time as a result of http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=9289e7f91add1c09c3ec8571a2080f7507730b8d A workaround would be to force amd to set the 'nolock' flag on all these mounts.
This seems to be a bug in am-utils: https://bugzilla.am-utils.org/show_bug.cgi?id=612 so I'm closing the bug.