Bug 68411
Summary: | general protection fault in read_extent_buffer (from readdir) (possibly corrupt fs) | ||
---|---|---|---|
Product: | File System | Reporter: | Zack Weinberg (zackw) |
Component: | btrfs | Assignee: | Josef Bacik (josef) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | dsterba, simoncion |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.12.6 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | kernel oops log |
Description
Zack Weinberg
2014-01-09 20:42:22 UTC
The is a duplicate of bug 63701, the patch is on the way. I don't see the code in fsck to fix that. As an ugly manual fix you can try to locate the files $ btrfs incpect ino 4493802 /path (and same for the rest from the fsck output0) and truncate them to 0 if you can afford to lose/recreate the files $ truncate -s 0 /path/to/file-4493802 (This worked on the reproducer I have at disposal.) Thanks. I am traveling right now but I will try that on Friday. (In reply to David Sterba from comment #1) > As an ugly manual fix you can try to locate the files > > $ btrfs inspect ino 4493802 /path > (and same for the rest from the fsck output0) > > and truncate them to 0 if you can afford to lose/recreate the files > > $ truncate -s 0 /path/to/file-4493802 This did *not* clear up the errors. I still get the same failures from btrfs check (after unmounting the filesystem). I also tried mounting the fs again and running a scrub, which found no errors and did not make the problem go away. I am now trying a force clear of the space cache and unlinking the offending inodes. Will update on results later today. Truncating *and* unlinking all the affected files seems to have cleared the problem. Also, now that I know which files were affected, I can say that the cause was significantly different from bug 63701 (although it may well be the same error-in-the-code). My computer did not experience a power failure at any time in recent memory, *until* I had to forcibly power it off as a *response* to the crash I posted (because `umount` following the crash got stuck in D-state). I am prepared to believe that the damaged files were secondary to that force-poweroff, though; they had all been modified shortly before the crash. (In reply to Zack Weinberg from comment #3) > (In reply to David Sterba from comment #1) > > As an ugly manual fix you can try to locate the files > > > > $ btrfs inspect ino 4493802 /path > > (and same for the rest from the fsck output0) > > > > and truncate them to 0 if you can afford to lose/recreate the files > > > > $ truncate -s 0 /path/to/file-4493802 > > This did *not* clear up the errors. I still get the same failures from > btrfs check (after unmounting the filesystem). fsck is not yet able to fix this error, the point of truncating the file was to stop crashing. Which apparently worked. > I also tried mounting the fs > again and running a scrub, which found no errors and did not make the > problem go away. This not something scrub could fix, it only verifies the checksums. The bug is a structural inconsistency in the extent items and has to be fixed as such. > I am now trying a force clear of the space cache and unlinking the offending > inodes. Will update on results later today. Space cache should not be affected by this, I'm not completely sure here, though. (In reply to Zack Weinberg from comment #4) > Truncating *and* unlinking all the affected files seems to have cleared the > problem. Yeah. I haven't noticed before that even if the file is 0 in size, it has incorrect nbytes, which tracks the actually allocated space. > Also, now that I know which files were affected, I can say that the cause > was significantly different from bug 63701 (although it may well be the same > error-in-the-code). AFAIK the bug is in the code, unrelated to crash failures. Until recently we haven't known what's the real cause, so the reports may point to different areas. > I am prepared to believe that the damaged files were > secondary to that force-poweroff, though; they had all been modified shortly > before the crash. This sounds correct. (In reply to David Sterba from comment #5) > (In reply to Zack Weinberg from comment #3) > > (In reply to David Sterba from comment #1) > > > $ truncate -s 0 /path/to/file-4493802 > > > > This did *not* clear up the errors. I still get the same failures from > > btrfs check (after unmounting the filesystem). > > fsck is not yet able to fix this error, the point of truncating the file was > to stop crashing. Which apparently worked. Ah, I misunderstood you. I thought the truncate would either remove the inconsistency altogether or convert it into something fsck could fix. I would have preferred not to mount the damaged filesystem again until I got a clean result from 'btrfs check'; as is, I did have to mount it to do some of the manual repair actions but I didn't do anything with it other than that. So I can't really say whether or not the crash would have recurred. In any case I only got the one crash, and we're agreed that the damage was caused by the crash rather than the other way around. This doesn't leave me feeling very good about my chances for stability in the future. How's that patch coming? > > I am now trying a force clear of the space cache and unlinking the > offending > > inodes. Will update on results later today. > > Space cache should not be affected by this, I'm not completely sure here, > though. There were *also* a bunch of complaints about mismatched space cache generation numbers in the fsck reports. A mount with nospace_cache,clear_cache seems to have corrected that. (clear_cache by itself did not.) (In reply to Zack Weinberg from comment #7) > How's that patch coming? Runtime fix is in 3.14 pull, Btrfs: don't use ram_bytes for uncompressed inline items Do you still need more information from me in order to address the kernel bug(s)? Should I file a new bug (and if so, where?) re the missing fsck features? *** Bug 63701 has been marked as a duplicate of this bug. *** This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks. |