For some reason, the acl parameter passed into __jfs_set_acl() is NULL at some point while the JFS partition is in use. The NULL is passed to posix_acl_equiv_mode(), causing a subsequent Ooops. Any assistance on where the posix_acl...() call is would be interesting. Also, why would the acl struct ever be NULL (race condition?)? The inode that is affected by the Oops is a temporary file owned by rdiff-backup during its operation. This bug occurred during an upgrade to 3.14, it did not affect the older kernel in use in Debian stable, v3.2. There were no patches to acl.c in 3.14.2. On googling it was noted that there were patches in February for POSIX acl regressions. It is not clear if those patches are in the kernel shipped from kernel.org or not. The patch in question can be seen here, please advise if it should be applied or not: http://filibusta.crema.unimi.it/~cavok/kbts/kr402.html A few choice printk() statements showed that the code executed until it hit the posix_acl_equiv_mode() call around line 88, which was passed a NULL for the struct acl. System is an AMD 6-core processor, disk in question is attached USB 3.0 hard drive. Backtrace (from notes): __jfs_set_acl jfs_set_acl posix_acl_xattr_set legitimize_mnt generic_removexattr jfs_removexattr vfs_removexattr remove_xattr Any further information or requests, please let me know.
Looks like the code for the v3.12 kernel does not contain this bug. Will test it Saturday.
Confirmed that the latest release of the previous kernel, 3.13.11 does not have the issue.
Created attachment 137311 [details] Patch for Oops with NULL acl value The change suggested is to add a check for NULL before using the acl in the switch statement. This seems to be consistent with what is done in the JFFS and ext4 filesystems and with the behaviour of JFS in the 3.13 kernel. The bug seemed to be introduced in commit 2cc6a5a0. If acl is NULL, it seems clear that the inode time need not be updated (ibid JFFS, ext4). It compiles now and I will test it Sunday.
Verified that the proposed patch as applied to 3.14 now allows rdiff-backup to successfully complete its run without causing a kernel Oops. To see the code as used in JFFS, ext4 or other filesystems, usually there is a file called acl.c with similar code that can be found by searching for posix_acl_equiv_mode. Hope this helps ease a fix into the kernel.
Sorry it's taken me so long to get back to you. The patch looks good. Could you please send it to me with a proper Signed-off-by: line? You can send it directly to dave.kleikamp@oracle.com. Thanks, Dave
I've pushed the patch to my git tree for the linux-next build.
Hey David is this bug closed or not? Seems to me due to you having a patch for it in your linux tree. Cheers Nick
Yes, the patch is in kernel 3.16-rc1. Closing Thanks.
No problem , Just trying to help you out as you may have your hands full with other work. Nick