Most recent kernel where this bug did *NOT* occur: 2.6.19 Distribution: Gentoo (mainline kernel directly from kernel.org) Hardware Environment: AMD64, Tyan s2885, dual Opteron 242, 8 gig memory, 4 SATA drives Software Environment: md-RAID-1, -6 (partitioned), -0, all across all four SATA drives. root (and a backup root snapshot) on RAID-6 partitions. LVM2 on third RAID-6 partition. reiserfs for everything. Problem Description: I'm having the chunk_aligned_read issue, continuing on -rc5. I googled chunk_aligned_read and see the pre-rc1 commits, but nothing since. I searched bugs and nothing on it showed up, tho I'm not especially skilled at bugzilla queries. From google's results I see you were aware of the issue pre-rc1, and submitted patches for it. I'm still seeing the errors at boot, loading root off the RAID-6, tho based on the changelog entries that's the first try and the second, thru the stripe cache, goes thru. Still, it's -rc5, and I'm still seeing the issue, so whatever was done to fix that bad RAID-6 assumption (I read changelog entries but not C code so well tho I can sometimes get the gist, so it's "whatever" from my perspective) apparently didn't fix it for me. I gather from the changelogs and kernel list postings that pre-changes in 2.6.19-git18, the kernel /always/ read thru the stripe cache, so falling back to that now isn't a big deal. However, those warnings still look rather alarming, "harmless" or not, and I can't help but be worried if I'm putting system stability on the line. Bottom line, what's the status? Will proper RAID-6 assumptions be merged, curing the issue? Will the direct read changes be rolled back? Since it's now using as a fallback what it did before, will the warnings just be quited, at least for 2.6.20, with a real fix presumably for 2.6.21? A different solution? Did you think it was all fixed and this is new data? This RAID-6 users needs to know. His data depends on it! =8^) Hopefully this bug will avoid other inquiries as others find it. Thanks for all you kernel hackers do for us users. =8^) Duncan
What is "the chunk_aligned_read issue"?
I figured you'd be familiar with it. This makes way more sense than I'm likely to. It's a google for "chunk_aligned_read" http://www.google.com/linux?lr=lang_en&hl=en&q=%22chunk_aligned_read%22 In particular, from the 2.6.19-git18 long format changelog: commit 46031f9a38a9773021f1872abc713d62467ac22e Author: Raz Ben-Jehuda(caro) <raziebe@gmail.com> Date: Sun Dec 10 02:20:47 2006 -0800 [PATCH] md: allow reads that have bypassed the cache to be retried on failure If a bypass-the-cache read fails, we simply try again through the cache. If it fails again it will trigger normal recovery precedures. update 1: From: NeilBrown <neilb@suse.de> 1/ chunk_aligned_read and retry_aligned_read assume that data_disks == raid_disks - 1 which is not true for raid6. So when an aligned read request bypasses the cache, we can get the wrong data. 2/ The cloned bio is being used-after-free in raid5_align_endio (to test BIO_UPTODATE). 3/ We forgot to add rdev->data_offset when submitting a bio for aligned-read 4/ clone_bio calls blk_recount_segments and then we change bi_bdev, so we need to invalidate the segment counts. 5/ We don't de-reference the rdev when the read completes. This means we need to record the rdev to so it is still available in the end_io routine. Fortunately bi_next in the original bio is unused at this point so we can stuff it in there. 6/ We leak a cloned bio if the target rdev is not usable. From: NeilBrown <neilb@suse.de> update 2: 1/ When aligned requests fail (read error) they need to be retried via the normal method (stripe cache). As we cannot be sure that we can process a single read in one go (we may not be able to allocate all the stripes needed) we store a bio-being-retried and a list of bioes-that-still-need-to-be-retried. When find a bio that needs to be retried, we should add it to the list, not to single-bio... 2/ We were never incrementing 'scnt' when resubmitting failed aligned requests. [akpm@osdl.org: build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> OK, so as I said, that (and the other references) seems to indicate that 2.6.19 and earlier read from the (stripe?) cache -- new to .20-rc1 is the direct-read code, bypassing cache. Only it made a wrong assumption about RAID-6 stripe alignment, assuming it was the same as RAID-5. As best I as a non-coder can decipher, the situation now is that if the direct-read fails, it tries again thru the cache -- which seems to work here, thankfully. Only, apparently either the RAID-5/RAID-6 mismatch wasn't entirely fixed or it didn't take, or there's a related issue still hanging around, because while the second try thru the cache (which AFAIK was the only try, previously) works, I'm continuing to get "chunk_aligned_read : non aligned" errors on the console at boot and continuing in syslog. The second try thru the cache works, so I'm operational (and I think/hope data-safe), but the first try is still failing, and all those errors stacking up remain rather alarming, in addition to spamming my logs. So I'm just asking what the status is. Am I supposed to still be getting those errors, and will it be fixed or at least the errors quieted (since after all it's simply falling back to the old cached option where needed, which might be good enough for .20, with the error to be tackled in .21) by .20 main release? I expect if not, even if it's a relatively harmless error, a lot of RAID-6 folks will be rather alarmed when they start seeing these after they upgrade, so to prevent needless bug noise if anything, something needs done with this by .20 release. Duncan (hope I'm being helpful)
Yes, you are being helpful. The particular helpful thing you say is that: I'm continuing to get "chunk_aligned_read : non aligned" errors on the console This helps me understand what is happening. I'll attach a patch which fixes this. It just removes the printk as it is reporting something that can normally happen.
Created attachment 10179 [details] Patch to remove annoying message
I applied the patch to -rc6 b4 I compiled it. The patch is confirmed to work as promised. =8^) Looking forward to seeing it in .20 release! Duncan
Te patch from this bug is now in Linus' tree.