Bug 10349
Summary: | regression: am-utils stopped working in 2.6.25-rc* | ||
---|---|---|---|
Product: | File System | Reporter: | Rafael J. Wysocki (rjw) |
Component: | NFS | Assignee: | Trond Myklebust (trondmy) |
Status: | CLOSED DOCUMENTED | ||
Severity: | normal | CC: | eparis, mroos |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-rc3-git | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 |
Description
Rafael J. Wysocki
2008-03-28 09:22:10 UTC
I'll bet you it is this one: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=f9c3a3802119a2d30f3e4a69aef30a81e09d0209 Furthermore, if I'm reading the am-utils code right, then the bug is theirs: they appear to be advertising a mount structure version of 'NFS_MOUNT_VERSION' (which they appear to take from /usr/include/linux/nfs_mount.h), while actually using their private 'struct nfs_args', which is a copy of struct nfs_mount version 4. IOW: when NFS_MOUNT_VERSION == 6, then they are failing to initialise both the 'pseudoflavor' and 'context' fields. Adding Eric Paris as Cc, in case he has any comments. > Furthermore, if I'm reading the am-utils code right, then the bug is theirs:
> they appear to be advertising a mount structure version of
> 'NFS_MOUNT_VERSION'
> (which they appear to take from /usr/include/linux/nfs_mount.h), while
> actually
> using their private 'struct nfs_args', which is a copy of struct nfs_mount
> version 4.
>
> IOW: when NFS_MOUNT_VERSION == 6, then they are failing to initialise both
> the
> 'pseudoflavor' and 'context' fields.
I changed the assingmnet to be 4 instead of NFS_MOUNT_VERSION and
recompiled am-utils. The result is definitely different - it now emits
Invalid hostname "pid5765@rhn:/net" in NFS lock request
in dmesg like -rc3 did but does still not work.
Have not had time for more bisecting or kernel compilation on that
machine :(
OK. That looks as if it is falling afoul of the sanity checking in the new __nsm_find() routine. Why is am-utils stuffing "pid5765@rhn:/net" into the 'hostname' field of the mount structure instead of providing the _server_ hostname as it is supposed to? The comment at the top of 'libamu/mount_fs.c:compute_nfs_args()' even says that the argument is supposed to be the 'name of remote NFS host'. Sigh... AFAICS, amd mounted partitions have been broken w.r.t. locking since 2.6.19 when http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=8dead0dbd478f35fd943f3719591e5af1ac0950d went into the kernel. The only difference now is that we report it at mount time as a result of http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=9289e7f91add1c09c3ec8571a2080f7507730b8d A workaround would be to force amd to set the 'nolock' flag on all these mounts. This seems to be a bug in am-utils: https://bugzilla.am-utils.org/show_bug.cgi?id=612 so I'm closing the bug. |