Bug 10349

Summary: regression: am-utils stopped working in 2.6.25-rc*
Product: File System Reporter: Rafael J. Wysocki (rjw)
Component: NFSAssignee: Trond Myklebust (trondmy)
Status: CLOSED DOCUMENTED    
Severity: normal CC: eparis, mroos
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc3-git Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    

Description Rafael J. Wysocki 2008-03-28 09:22:10 UTC
Subject    : regression: am-utils stopped working in 2.6.25-rc*
Submitter  : Meelis Roos <mroos@linux.ee>
Date       : 2008-03-28 15:20
References : http://lkml.org/lkml/2008/3/28/174

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Trond Myklebust 2008-03-28 10:30:42 UTC
I'll bet you it is this one:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=f9c3a3802119a2d30f3e4a69aef30a81e09d0209

Furthermore, if I'm reading the am-utils code right, then the bug is theirs:
they appear to be advertising a mount structure version of 'NFS_MOUNT_VERSION'
(which they appear to take from /usr/include/linux/nfs_mount.h), while actually
using their private 'struct nfs_args', which is a copy of struct nfs_mount
version 4.

IOW: when NFS_MOUNT_VERSION == 6, then they are failing to initialise both the
'pseudoflavor' and 'context' fields.
Comment 2 Trond Myklebust 2008-03-28 10:34:18 UTC
Adding Eric Paris as Cc, in case he has any comments.
Comment 3 Meelis Roos 2008-04-03 03:25:36 UTC
> Furthermore, if I'm reading the am-utils code right, then the bug is theirs:
> they appear to be advertising a mount structure version of
> 'NFS_MOUNT_VERSION'
> (which they appear to take from /usr/include/linux/nfs_mount.h), while
> actually
> using their private 'struct nfs_args', which is a copy of struct nfs_mount
> version 4.
> 
> IOW: when NFS_MOUNT_VERSION == 6, then they are failing to initialise both
> the
> 'pseudoflavor' and 'context' fields.

I changed the assingmnet to be 4 instead of NFS_MOUNT_VERSION and 
recompiled am-utils. The result is definitely different - it now emits
Invalid hostname "pid5765@rhn:/net" in NFS lock request
in dmesg like -rc3 did but does still not work.

Have not had time for more bisecting or kernel compilation on that 
machine :(
Comment 4 Trond Myklebust 2008-04-03 11:26:43 UTC
OK. That looks as if it is falling afoul of the sanity checking in
the new __nsm_find() routine.

Why is am-utils stuffing "pid5765@rhn:/net" into the 'hostname' field of the mount
structure instead of providing the _server_ hostname as it is supposed to? The
comment at the top of 'libamu/mount_fs.c:compute_nfs_args()' even says that the
argument is supposed to be the 'name of remote NFS host'.

Sigh...
Comment 5 Trond Myklebust 2008-04-03 11:47:54 UTC
AFAICS, amd mounted partitions have been broken w.r.t. locking since 2.6.19
when

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=8dead0dbd478f35fd943f3719591e5af1ac0950d

went into the kernel. The only difference now is that we report it at mount time
as a result of

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=9289e7f91add1c09c3ec8571a2080f7507730b8d

A workaround would be to force amd to set the 'nolock' flag on all these
mounts.
Comment 6 Rafael J. Wysocki 2008-04-06 14:06:05 UTC
This seems to be a bug in am-utils:

https://bugzilla.am-utils.org/show_bug.cgi?id=612

so I'm closing the bug.