Bug 12151
Summary: | Unexplained fsck errors on a ext4 filesystem | ||
---|---|---|---|
Product: | File System | Reporter: | Nathan Grennan (kernel-bugzilla) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, flo, sandeen, tytso |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27.5 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Nathan Grennan
2008-12-03 13:38:30 UTC
was this a one-time occurrence? have you encountered this again? Any updates on this bug report? Has this happened again for you? At least a related problem seems to have happened on another reboot. This reboot was a hard reset, because of the system going into some mostly hung state. I saw it somewhat respond a few times, ping still work, ssh would return the comment string, and even managed to login once but it hung before the shell. On the next boot I found I seem to have a corrupt superblock. It sounds a lot like this reboot. http://kerneltrap.org/mailarchive/linux-ext4/2009/1/5/4598534 If you can try the 2.6.29 kernels out of koji and see if you still hit this it'd be great. As I mentioned on IRC I have found another race w/ inode alloc/free but it's not likely to lead to as much damage as your above fsck found. Have you been hitting this reliably? oh, and; a boot-time fsck on fedora root filesystems probably means the fs was marked w/ errors prior to the shutdown. You might look in your logs to see if there's anything there. Same problem here with 2.6.29.2 I have tried ext4 on a linux software raid 0, it was working flawlessly for about a week. I booted the computer yesterday and hat exactly the same errors as described above. Running e2fsck from a archlinux bootstick took about one hour and left me with a mountable, but free of every kind of directory structur, ext4 filesystem. I am running that raid 0 on two 500GB Sata Disks. I can provide you with more information on that this evening. I have been running Fedora's kernels the whole time, and I haven't seen this issue with 2.6.29 kernels. I don't think I saw it with 2.6.28 kernels either. I wonder if Fedora has been putting more patches for ext4 in over what goes into Linus's tree. We're desperately looking for a reliable reproduction case for this problem. My suggestion at this point is if you have a large filesystem (the reports for this seem to come from users with > 1TB filesystems) to take a periodic e2image backup of your filesystem before the corruption, and save it on some other filesystem so there is a backup of your filesystem metadata. This will help recover the filesystem after the corruption. If you can reproduce this reliably, please let us know. We haven't been able to get this problem reproduced yet. (In reply to comment #7) > I have been running Fedora's kernels the whole time, and I haven't seen this > issue with 2.6.29 kernels. I don't think I saw it with 2.6.28 kernels either. > I > wonder if Fedora has been putting more patches for ext4 in over what goes > into > Linus's tree. Nathan - well, not really. I'll never put something in fedora that hasn't been sent upstream, it's not how we work. One difference may be that the 2.6.27 kernels in F10 did have the ext4 "stable" backports that Ted was doing... If you're running .29 kernels from fedora, it should be equivalent to what's upstream. The only changes in F11 for example are: Patch2920: linux-2.6-ext4-flush-on-close.patch Patch2921: linux-2.6-ext4-really-print-warning-once.patch Here is my basic experience with ext4. I had two basic problems. One was where the system would just go off into a hang. The other was this issue. This issue went away when I went with a 2.6.28+ kernel. Backporting patches didn't work for me. I say this because at the time you guys were telling me there were no new patches, but I would have issue with 2.6.27 kernels, but not 2.6.28 kernels. Later cebbert said there was something nasty in 2.6.28.1 kernels, so I upgraded to 2.6.29. I have had zero issues with ext4 since upgrading to 2.6.29. I just looked through my irc logs, and found the errors that I think caused this problem. Sandeen, I have mentioned these to you before. How I think it would go would be I would get one of these errors, the system would continue, because that is the crazy default. Then a few days later, I having not noticed these errors, would reboot the system, and receive the fsck issue above. From what I remember reading this issue was fixed. Feb 16 12:03:19 proton kernel: EXT4-fs error (device md3): ext4_mb_generate_buddy: EXT4-fs: group EXT4-fs error (device md3): mb_free_blocks: double-free of inode 0's block 321550248(bit 30632 in group 9812) Hi guys (and gals), please don't assume that your problem is the same as a previously reported bug --- especially if the bug report title is as vague as "unexplained fsck errors". That could mean software bugs, or hardware bugs --- and just because you have an "unexplained fsck error", please don't assume your problem is the same as another person's. It could be, if the kernel versions are the same, and the symptoms are exactly the same, and especially if the way to reproduce it is the same. The original bug report dated from a 2.6.27 kernel, and there have been a huge number of bugs fixed since then. To be honest, not all bug fixes have been backported to the 2.6.27.x series, either. In some cases it was just way too difficult to do. So Nathan, if you're at 2.6.29, and you're not seeing any problem, then we're probably better of closing this bug. Florian, my guess is that whatever problem Nathan reported back in the 2.6.27 kernel is very different what you're seeing. May I suggest that you open a new bugzilla entry for your problems, and please give us as much detail as possible? Many thanks. |