Distribution: Gentoo Linux Hardware Environment: x86 (amd) linux-2.6.18-gentoo-r6 xfsprogs-2.8.11 mdadm-2.6.1 raid5 (5 x 250gb disks), one of the disks died, XFS corruption ensued. Before corruption: ~600gb of data. After corruption: 200gb lost, 150gb in lost+found, 250gb "okay" but files missing here and there. Already have posted this on xfs bugzilla (http://oss.sgi.com/bugzilla/show_bug.cgi?id=741) and gentoo bugzilla (http://bugs.gentoo.org/show_bug.cgi?id=169667).
Created attachment 10648 [details] Error messages from /var/log/messages
There was a bug in raid5 in 2.6.19 and earlier where by error-returns weren't properly recognised by the filesystem (depending on the filesystem) (We cleared the UPTODAT bit but passed a '0' error code). In this case it was probably a read-ahead request failed due to lack of resources, as much of the stripe cache was tided up with retries on the failed drive. I don't know if this analysis meshes with the reality of how XFS works, the code is a bit to complex for me to follow easily. I think this bug should possibly be assigned to someone with XFS knowledge to comment if that is a possible explanation.... I wonder how I do that...
Maybe I do it like that.... accept the bug first, then reassign...
Neil: I still have the machine off and in a "broken" state. I am planning on redoing the array soon, was wondering if you need any other info before I do this. Thanks
Have you been able to bring up your RAID and maybe do more testing with newer kernels? Is the problem still there? Thanks.