Most recent kernel where this bug did *NOT* occur: 2.6.19
Distribution: Gentoo (mainline kernel directly from kernel.org)
Hardware Environment: AMD64, Tyan s2885, dual Opteron 242, 8 gig memory, 4 SATA
Software Environment: md-RAID-1, -6 (partitioned), -0, all across all four SATA
drives. root (and a backup root snapshot) on RAID-6 partitions. LVM2 on third
RAID-6 partition. reiserfs for everything.
Problem Description: I'm having the chunk_aligned_read issue, continuing
on -rc5. I googled chunk_aligned_read and see the pre-rc1 commits, but nothing
since. I searched bugs and nothing on it showed up, tho I'm not especially
skilled at bugzilla queries.
From google's results I see you were aware of the issue pre-rc1, and submitted
patches for it. I'm still seeing the errors at boot, loading root off the
RAID-6, tho based on the changelog entries that's the first try and the second,
thru the stripe cache, goes thru. Still, it's -rc5, and I'm still seeing the
issue, so whatever was done to fix that bad RAID-6 assumption (I read changelog
entries but not C code so well tho I can sometimes get the gist, so
it's "whatever" from my perspective) apparently didn't fix it for me.
I gather from the changelogs and kernel list postings that pre-changes in
2.6.19-git18, the kernel /always/ read thru the stripe cache, so falling back
to that now isn't a big deal. However, those warnings still look rather
alarming, "harmless" or not, and I can't help but be worried if I'm putting
system stability on the line.
Bottom line, what's the status? Will proper RAID-6 assumptions be merged,
curing the issue? Will the direct read changes be rolled back? Since it's now
using as a fallback what it did before, will the warnings just be quited, at
least for 2.6.20, with a real fix presumably for 2.6.21? A different solution?
Did you think it was all fixed and this is new data? This RAID-6 users needs
to know. His data depends on it! =8^)
Hopefully this bug will avoid other inquiries as others find it. Thanks for
all you kernel hackers do for us users. =8^)
What is "the chunk_aligned_read issue"?
I figured you'd be familiar with it. This makes way more sense than I'm likely
to. It's a google for "chunk_aligned_read"
In particular, from the 2.6.19-git18 long format changelog:
Author: Raz Ben-Jehuda(caro) <firstname.lastname@example.org>
Date: Sun Dec 10 02:20:47 2006 -0800
[PATCH] md: allow reads that have bypassed the cache to be retried on
If a bypass-the-cache read fails, we simply try again through the cache.
it fails again it will trigger normal recovery precedures.
From: NeilBrown <email@example.com>
chunk_aligned_read and retry_aligned_read assume that
data_disks == raid_disks - 1
which is not true for raid6.
So when an aligned read request bypasses the cache, we can get the wrong
2/ The cloned bio is being used-after-free in raid5_align_endio
(to test BIO_UPTODATE).
3/ We forgot to add rdev->data_offset when submitting
a bio for aligned-read
4/ clone_bio calls blk_recount_segments and then we change bi_bdev,
so we need to invalidate the segment counts.
5/ We don't de-reference the rdev when the read completes.
This means we need to record the rdev to so it is still
available in the end_io routine. Fortunately
bi_next in the original bio is unused at this point so
we can stuff it in there.
6/ We leak a cloned bio if the target rdev is not usable.
From: NeilBrown <firstname.lastname@example.org>
1/ When aligned requests fail (read error) they need to be retried
via the normal method (stripe cache). As we cannot be sure that
we can process a single read in one go (we may not be able to
allocate all the stripes needed) we store a bio-being-retried
and a list of bioes-that-still-need-to-be-retried.
When find a bio that needs to be retried, we should add it to
the list, not to single-bio...
2/ We were never incrementing 'scnt' when resubmitting failed
[email@example.com: build fix]
Signed-off-by: Neil Brown <firstname.lastname@example.org>
Signed-off-by: Andrew Morton <email@example.com>
Signed-off-by: Linus Torvalds <firstname.lastname@example.org>
OK, so as I said, that (and the other references) seems to indicate that 2.6.19
and earlier read from the (stripe?) cache -- new to .20-rc1 is the direct-read
code, bypassing cache. Only it made a wrong assumption about RAID-6 stripe
alignment, assuming it was the same as RAID-5. As best I as a non-coder can
decipher, the situation now is that if the direct-read fails, it tries again
thru the cache -- which seems to work here, thankfully.
Only, apparently either the RAID-5/RAID-6 mismatch wasn't entirely fixed or it
didn't take, or there's a related issue still hanging around, because while the
second try thru the cache (which AFAIK was the only try, previously) works, I'm
continuing to get "chunk_aligned_read : non aligned" errors on the console at
boot and continuing in syslog. The second try thru the cache works, so I'm
operational (and I think/hope data-safe), but the first try is still failing,
and all those errors stacking up remain rather alarming, in addition to
spamming my logs.
So I'm just asking what the status is. Am I supposed to still be getting those
errors, and will it be fixed or at least the errors quieted (since after all
it's simply falling back to the old cached option where needed, which might be
good enough for .20, with the error to be tackled in .21) by .20 main release?
I expect if not, even if it's a relatively harmless error, a lot of RAID-6
folks will be rather alarmed when they start seeing these after they upgrade,
so to prevent needless bug noise if anything, something needs done with this
by .20 release.
Duncan (hope I'm being helpful)
Yes, you are being helpful.
The particular helpful thing you say is that:
I'm continuing to get "chunk_aligned_read : non aligned" errors on the console
This helps me understand what is happening. I'll attach a patch which
fixes this. It just removes the printk as it is reporting something
that can normally happen.
Created attachment 10179 [details]
Patch to remove annoying message
I applied the patch to -rc6 b4 I compiled it.
The patch is confirmed to work as promised. =8^)
Looking forward to seeing it in .20 release!
Te patch from this bug is now in Linus' tree.