Bug 14826
Summary: | jdm-20002 reiserfs_xattr_get: Invalid hash for xattr | ||
---|---|---|---|
Product: | File System | Reporter: | Christian Kujau (kernel) |
Component: | ReiserFS | Assignee: | ReiseFS developers team (reiserfs-devel) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, andrex, gregsurbey, jeffm, kernel, kernel, marco.gatti, ohnobinki |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
Christian Kujau
2009-12-17 09:00:02 UTC
I can confirm that these messages keep popping up, seemingly at random, since I upgraded to 2.6.32. This is a *huge* problem when using ACLs, as the file becomes inaccessible. $cat thefile.xlsx cat: thefile.xlsx: Input/output error $ And in the logs: Feb 14 01:24:58 victoria [1149610.665902] REISERFS warning (device dm-0): jdm-20002 reiserfs_xattr_get: Invalid hash for xattr (system.posix_acl_access) associated with [1953724787 1882090853 0x7869736f UNKNOWN] I first noticed this because my automated backups failed. At first it seems this problem happened to a file only for a short period, because the backup would not complain about the same file twice in a row and I can access the failing files now, but for some time now I'm getting the error on one particular file. My organization's main file server started experiencing this issue when we upgraded from 2.6.24-gentoo-r8 to 2.6.31-gentoo-r6. This is how the error manifests itself on our server's console: fs1 Equipment # pwd /home/share/Projects/Equipment fs1 Equipment # cat Equipment\ List.xls cat: Equipment List.xls: Input/output error fs1 Equipment # dmesg REISERFS warning (device dm-0): jdm-20002 reiserfs_xattr_get: Invalid hash for xattr (system.posix_acl_access) associated with [1953724787 1882090853 0x7869736f UNKNOWN] We can fix each of these errors individually: # For directories we need to run this extra first step # setfattr -x system.posix_acl_default Equipment\ Directory setfattr -x system.posix_acl_access Equipment\ List.xls # copy permissions from current directory to fix file getfacl . | setfacl -M - Equipment\ List.xls For many instances within a directory structure: # This will find corrupted ACLs recursively starting from where you are getfattr -Rd -m '.*' . 2>&1 | grep 'Input/output error' | cut -d: -f1 I slapped together a poor man's recursive function since setfattr does not support a '-R' option: # first, find a good ACL and then back it up getfacl . > backup.acl # These next two commands are almost equivalent to "setfacl -Rb ." # Note: you may need to run these two commands several times to fix everything # Note: it will fail on files with apostrophes, but you could temporarily rename them... find . -name '*' | xargs -i setfattr -x system.posix_acl_default "{}" find . -name '*' | xargs -i setfattr -x system.posix_acl_access "{}" # now we recursively restore all the ACLs from the good ACL cat backup.acl | setfacl -RM - . Google shows me that this issue is not new: http://old.nabble.com/jdm-20002-reiserfs_xattr_get:-Invalid-hash-for-xattr-td26786353.html It might be caused by a locking error: http://markmail.org/thread/zrrenmncxisqlooc Looking towards a permanent solution, I e-mailed Jeff Mahoney at SUSE who was the last one to work with the reiserfs xattr code: http://ftp.suse.com/pub/people/jeffm/reiserfs/xattr-rework/ Linus committed Jeff's new code to the mainline "stable" kernel on March 30, 2009: http://kerneltrap.org/mailarchive/git-commits-head/2009/3/30/5337714 And it got implemented with this patch: http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.29-git7.log It seems to me that things turned bad with acl with the release of kernel 2.6.30. Check this discussion if you haven't: http://marc.info/?t=126979018700003&r=1&w=2 In particular this post with a simple test case and results: http://marc.info/?l=reiserfs-devel&m=127056179512450&w=2 Is there something i can do to help? I mean just as a linux user... Considering that this bug eventually leads to completely unrepairable filesystem corruption, I strongly recommend that Linux revert to the old reiserfs xattr pre-2.6.30 code for the time being. I realize that these code improvements are a worthy goal, however this is supposed to be a stable kernel release with a stable filesystem. Losing data is the worst thing that can happen, but losing data silently/randomly is even worse. Unless this issue is fixed today, this patch-set needs to go back to testing/development mode for the time being. I'm looking into this today. Once I've identified the source, I can write something up to repair the damage. Ok, I've found the problem. Turns out expose_privroot is useful after all. The loss isn't random. It affects xattrs that have been shrunk. The test case in comment #2 demonstrated the issue perfectly because it removes a single ACL, which shrinks the xattr, before it removes all of them. The issue is that when an xattr is shrunk, the size of the header wasn't accounted for, so it's shrunk by 8 bytes too many. The test case is sensitive to memory pressure because the ACL is cached with the in-memory inode. Accesses to it use the cached value until the inode is dropped from memory and it needs to be re-read from disk. Upon the re-read, the corruption is noticed and an -EIO results. Unfortunately, this type of damage I am not going to be able to repair. I'll be able to fix the checksums but the data inside a shrunken xattr will be lost. |