Bug 75341

Summary: Kernel Oops in acl.c, __jfs_set_acl(). NULL derefence could not be handled.
Product: File System Reporter: wbkernel
Component: JFSAssignee: Dave Kleikamp (shaggy)
Status: RESOLVED CODE_FIX    
Severity: high CC: alan, wbkernel, xerofoify
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.14 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Patch for Oops with NULL acl value

Description wbkernel 2014-05-03 02:34:23 UTC
For some reason, the acl parameter passed into __jfs_set_acl() is NULL at some point while the JFS partition is in use.  The NULL is passed to posix_acl_equiv_mode(), causing a subsequent Ooops.  Any assistance on where the posix_acl...() call is would be interesting.  Also, why would the acl struct ever be NULL (race condition?)?

The inode that is affected by the Oops is a temporary file owned by rdiff-backup during its operation.

This bug occurred during an upgrade to 3.14, it did not affect the older kernel in use in Debian stable, v3.2.  There were no patches to acl.c in 3.14.2.  On googling it was noted that there were patches in February for POSIX acl regressions.  It is not clear if those patches are in the kernel shipped from kernel.org or not.  The patch in question can be seen here, please advise if it should be applied or not:

  http://filibusta.crema.unimi.it/~cavok/kbts/kr402.html

A few choice printk() statements showed that the code executed until it hit the posix_acl_equiv_mode() call around line 88, which was passed a NULL for the struct acl.

System is an AMD 6-core processor, disk in question is attached USB 3.0 hard drive.

Backtrace (from notes):

__jfs_set_acl
jfs_set_acl
posix_acl_xattr_set
legitimize_mnt
generic_removexattr
jfs_removexattr
vfs_removexattr
remove_xattr

Any further information or requests, please let me know.
Comment 1 wbkernel 2014-05-03 02:45:42 UTC
Looks like the code for the v3.12 kernel does not contain this bug.  Will test it Saturday.
Comment 2 wbkernel 2014-05-04 02:29:29 UTC
Confirmed that the latest release of the previous kernel, 3.13.11 does not have the issue.
Comment 3 wbkernel 2014-05-25 00:49:23 UTC
Created attachment 137311 [details]
Patch for Oops with NULL acl value

The change suggested is to add a check for NULL before using the acl in the switch statement.  This seems to be consistent with what is done in the JFFS and ext4 filesystems and with the behaviour of JFS in the 3.13 kernel.  The bug seemed to be introduced in commit 2cc6a5a0.  If acl is NULL, it seems clear that the inode time need not be updated (ibid JFFS, ext4).

It compiles now and I will test it Sunday.
Comment 4 wbkernel 2014-05-25 16:28:24 UTC
Verified that the proposed patch as applied to 3.14 now allows rdiff-backup to successfully complete its run without causing a kernel Oops.

To see the code as used in JFFS, ext4 or other filesystems, usually there is a file called acl.c with similar code that can be found by searching for posix_acl_equiv_mode.

Hope this helps ease a fix into the kernel.
Comment 5 Dave Kleikamp 2014-05-27 14:00:40 UTC
Sorry it's taken me so long to get back to you.

The patch looks good. Could you please send it to me with a proper Signed-off-by: line? You can send it directly to dave.kleikamp@oracle.com.

Thanks,
Dave
Comment 6 Dave Kleikamp 2014-05-29 02:26:00 UTC
I've pushed the patch to my git tree for the linux-next build.
Comment 7 xerofoify 2014-06-17 19:20:48 UTC
Hey David is this bug closed or not? 
Seems to me due to you having a patch 
for it in your linux tree.
Cheers Nick
Comment 8 Dave Kleikamp 2014-06-17 20:29:06 UTC
Yes, the patch is in kernel 3.16-rc1. Closing

Thanks.
Comment 9 xerofoify 2014-06-17 21:26:06 UTC
No problem ,
Just trying to help you out as you may have 
your hands full with other work.
Nick