Bug 4946
Summary: | Random data returned in portions of certain large files when reading | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Matthew Stapleton (matthew4196) |
Component: | Block Layer | Assignee: | Jens Axboe (axboe) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | akpm, brewt-bugzilla.kernel.org, bunk |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | >=2.6.11-rc2, >=2.6.10-ac10, >2.6.10-hardened-r3, 2.6.12-gentoo- | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | bio_clone fix |
Description
Matthew Stapleton
2005-07-26 21:17:55 UTC
Update: kernel 2.6.11-rc1 doesn't have this bug but 2.6.11-rc2 does. That's helpful. There are no significant reiserfs changes between 2.6.11-rc1 and 2.6.11-rc2. There's one small change in drivers/block/md.c but it looks fairly innocent. To test that you could run a 2.6.11-rc2 kernel with 2.6.11-rc1's md.c But I'd more be suspecting block and/or device driver problems. What device driver are you using there? And have you eliminated the "hardened" patches from the picture? I am using the IDE VIA82CXXX and Device Mapper for LVM2. Since 2.6.10-hardened-r3 uses 2.6.10-ac10 as its base I tested 2.6.10-ac9 and 2.6.10-ac10. 2.6.10-ac9 doesn't have the bug but 2.6.10-ac10 does. Then by examining the few files that had changed I found out that the change in fs/bio.c is what is causing the problem. This same change occurred between 2.6.11-rc1 and 2.6.11-rc2. Also, the problem still occurs in 2.6.13-rc3. In case it helps, I have 1 LVM2 physical volume spanning 5 20008832 block RAID 1 partitions and 1 14056768 block partition. The LVM2 logical volume containing the 'corrupt' test file is 35GB, spans 2 segments and uses a ReiserFS filesystem. Unfortunately, I can't seem to get any other copies of large files to trigger the bug at the moment. Fantastic work, thanks. I'll ping a few people.. Created attachment 5394 [details]
bio_clone fix
Could you please try this fix?
Andrew, please pass that patch on to Linus right away. I tested that patch and it fixes the problem on 2.6.11, 2.6.12, and 2.6.13-rc3 The patch is now included in 2.6.13-rc and scheduled for inclusion in 2.6.12.4 . What about that comment a few lines up? /* * notes -- maybe just leave bi_idx alone. assume identical mapping * for the clone */ From that comment, it sounds like the original author had been doing the bi_idx copy, but it was removed later on (and the comment was left in). And now bi_idx is being copied again. The author is me and the change is fine, probably the comment should be killed as well, but lets just leave it. Most users will expect the clone to have the same index as the original, so it's was a bad judgement. heh, yeah, thanks. I'll kill that comment. |