Bug 7912
Summary: | [patch 1/1] Disk Errors during boot-time caused by probe of invalid partitions | ||
---|---|---|---|
Product: | File System | Reporter: | TJ (linux) |
Component: | Other | Assignee: | fs_other |
Status: | CLOSED OBSOLETE | ||
Severity: | high | CC: | alan, rdunlap |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.27 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Patch against fs/partitions/msdos.c |
Description
TJ
2007-01-31 14:46:05 UTC
**Correction to paths mentioned in body text** The paths should read: fs/partitions/check.c fs/partitions/msdos.c Paths in the patch file *are* correct. Thanks. But we prefer to receive patches via email - this one was wordwrapped and probably had tabs replaced by spaces. Please email the aptch to myself, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>, Neil Brown <neilb@suse.de> and linux-kernel@vger.kernel.org as per http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt Created attachment 10246 [details]
Patch against fs/partitions/msdos.c
Also distributed to maintainers & kernel developers mailing list
Disks do range checks and the kernel also does range checks for the full media size. This bug is still biting (other users) now. I no longer use dmraid but we're seeing reports from those that do. For example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/329880 On Wed, 18 Feb 2009, TJ wrote: > > In early 2007 I submitted a patch for a bug that only seemed to show up > when using dmraid and RAID 1 stripes. I tracked the bug down and posted > a patch to the -mm tree. > On 8th May 2007 Andrew Morton submitted it to you and you rejected it. > > Since then it seems nothing has been done to address the underlying > issue. I no longer use the dmraid module but we've just had a report at > Ubuntu about the same issue still affecting dmraid. And that report is obviously pure garbage. If the kernel says [ 38.168279] sda: rw=0, want=1250274689, limit=625142448 [ 38.168284] attempt to access beyond end of device then by definition the kernel NEVER DID THE IO. That's the warning path for "hey, I'm not going to do this IO because it's past the end". The IO was never actually done. So when somebody does a bug-report claiming This rare but critical bug has the potential to cause a hardware failure to disk drives by allowing the system to repeatedly attempt to seek to sectors beyond the end of the physical disk, causing sustained 'head banging'. I'm just not too damn impressed. Clearly no head banging took place, since we never did the IO, since we saw the error message. So quite frankly, your email and your bug report are just annoying. They're especially annoying since _if_ the problem had been real, you shouldn't even have sent email to me privately, but instead try to find the appropriate relevant parties. Because I would quite possibly have lost it. In other words - if you find a real bug, fine. It needs to get fixed. But the patch you propose seems to not actually do anything really interesting, and it is VERY COMPLICATED in not doing that interesting thing. You also don't seem to worry at all about my original worry: > BTW - my name is just "TJ" (pronounced Teej) - single word, no > first/bio_check_eodlast format. Just out of interest, what does it say on your tax returns and/or on your drivers license? > > IIRC, we cannot trust the "capacity" data, because not all disks report it > > correctly. If we did, we'd just do the check in read_dev_sector() instead. > > > > So I'm dropping this. I might be wrong about the capacity thing, we may > > have fixed it (Jens cc'd). But if the capacity is trustworthy, why not > > just do the trivial check in read_dev_sector to protect against invalid > > extended ones? And in add_partitions()? IOW, I think that if we want to avoid the warning message, we could either: - Just remove the warning message Do we really care? We probably do, because it's quite possibly interesting as a warnign about real corrupted filesystems. - Do the check at "read_dev_sector()" time, and avoid the warning that way (for _all_ partitioning schemes) - I still wonder whether the capacity is trustworthy at all at this point, because I have this dim - but possibly incorrect - memory that some broken devices don't do proper capacity stuff, and we actually then depend on the partitioning info. See? So I'm open to getting rid of the warning, but I am ABSOLUTELY NOT interested in bug reports that try to make scary claims without bothering to verify whether they are true, and the bug reporter then ignoring everything I say. Linus On Wed, 2009-02-18 at 12:28 -0800, Linus Torvalds wrote: > And that report is obviously pure garbage. I wouldn't have made the original report if the effect wasn't seriously disturbing. As I said, I no longer use dmraid but at the time of the report (Jan 2007 - 2.6.20) I was converting a Windows server that used a Promise Fastrak TX2000 controller in RAID 1+0 (4 drive) configuration and had to preserve the existing file-systems. > If the kernel says > > [ 38.168279] sda: rw=0, want=1250274689, limit=625142448 > [ 38.168284] attempt to access beyond end of device > > then by definition the kernel NEVER DID THE IO. That's the warning path > for "hey, I'm not going to do this IO because it's past the end". The IO > was never actually done. The reports you quote here are not from my original bug but from a recent report against 2.6.27 which indicates the primary issue now is the syslog being filled with error reports. At the time of my original report the I/O was being attempted. At some point later there was work done to prevent the I/O. > So when somebody does a bug-report claiming > > This rare but critical bug has the potential to cause a hardware failure > to disk drives by allowing the system to repeatedly attempt to seek to > sectors beyond the end of the physical disk, causing sustained 'head > banging'. > > I'm just not too damn impressed. Clearly no head banging took place, since > we never did the IO, since we saw the error message. The drives would 'thrash' and make a lot of noise for many seconds that sounded like repeated banging/tapping/knocking and the boot would either be delayed for a long time or sometimes the drives would lock-up and only a cold power off would revive the system. I tried to record the noises via camcorder when people questioned the banging but it was too indistinct with the noises of fans as well. > So quite frankly, your email and your bug report are just annoying. > They're especially annoying since _if_ the problem had been real, you > shouldn't even have sent email to me privately, but instead try to find > the appropriate relevant parties. I replied directly to you since you emailed me directly and privately (no copy to the lkml) in May 2007. > Because I would quite possibly have > lost it. I don't understand why you should be so aggressive. In my email I said "...so I thought I'd chase it up and ask what you think the solution should be if my original patch isn't acceptable". It seemed appropriate now the same cause has resurfaced for some users, and when the original patch approach was rejected by you, to ask what an acceptable approach might be. At the time I created the patch I was new to Linux kernel programming and did my best with what I could find out, not just to solve the immediate problem I was seeing, but also to help ensure the knowledge gained would be used to improve the kernel for others. > In other words - if you find a real bug, fine. It needs to get fixed. But > the patch you propose seems to not actually do anything really > interesting, and it is VERY COMPLICATED in not doing that interesting > thing. I've not disputed that. At the time the issue was relevant to me (early 2007) I had a very steep learning-curve over a few days to figure out the kernel call-paths, not least because the work-queues made it difficult to associate the originator of the disk access request (which turned out to be the partition code). It was my first time hacking the kernel code. By the time Andrew Morton forwarded the patch to you from the -mm tree in May 2007 and it was rejected it was no longer an issue for me (moved away from dmraid) and I had no time or desire to investigate it further. > You also don't seem to worry at all about my original worry: > > > BTW - my name is just "TJ" (pronounced Teej) - single word, no > > first/last format. > > Just out of interest, what does it say on your tax returns and/or on your > drivers license? "TJ" > > > IIRC, we cannot trust the "capacity" data, because not all disks report > it > > > correctly. If we did, we'd just do the check in read_dev_sector() > instead. > > > > > > So I'm dropping this. I might be wrong about the capacity thing, we may > > > have fixed it (Jens cc'd). But if the capacity is trustworthy, why not > > > just do the trivial check in read_dev_sector to protect against invalid > > > extended ones? And in add_partitions()? > > IOW, I think that if we want to avoid the warning message, we could > either: > > - Just remove the warning message > > Do we really care? We probably do, because it's quite possibly > interesting as a warnign about real corrupted filesystems. > > - Do the check at "read_dev_sector()" time, and avoid the warning that > way (for _all_ partitioning schemes) > > - I still wonder whether the capacity is trustworthy at all at this > point, because I have this dim - but possibly incorrect - memory that > some broken devices don't do proper capacity stuff, and we actually > then depend on the partitioning info. > > See? Thank-you. Your last point is an example of why I emailed you. There was an inference in your 2007 email but nothing specific that at the time I was able to understand, and when I originally reported the issue on LKML in Jan/Feb 2007 there were no other comments that gave me a clue to that. > So I'm open to getting rid of the warning, It may be that rate-limiting the warning would be sufficient. Examining the dmesg log provided in the Ubuntu bug #329880 comment #6 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/329880/comments/6 I noticed that many of the error messages are in groups of 8 consecutive sectors. It has me wondering if this is an MD superblock search (which is, I believe, 4KB). It would explain why the access attempts always seem to be at the end of a partition's range (superblock in the last 64KB usually I think). I recall noticing that in 2007 but at that time didn't know the low-level organisation of MD. I'll try to reproduce the symptoms using a KVM to understand the reasons for the seek. [ Added Neil and Jens to cc, since they are in charge of MD/block layer respectively ] On Wed, 18 Feb 2009, TJ wrote: > > > So I'm open to getting rid of the warning, > > It may be that rate-limiting the warning would be sufficient. Yes. Just rate-limiting it might be sufficient. On the other hand, if there actually are cases where you see the disk actually _seeking_, then that is actually the bigger problem, and means that there is either (a) a path that submits IO without the check for capacity or (b) a case where the capacity simply isn't set correctly in the first place. So I'd worry about that one more. Getting scary messages is not all that dangerous. The fact that that apparently happened with dmraid, and I suspect _your_ case was a matter of (a), ie some MD layer that just took a request to the MD layer, and mapped it into a request for a real disk without doing the same sanity checking that __generic_make_request does with that whole "bio_check_eod()". However, your report is from over two years ago, and really fundamentally doesn't smell like Ubuntu bug #329880 at all, except in the (very limited) sense that it's probably triggered by the same/similar probe. The ubuntu bug is harmless as long as it's just that "attempt to access beyond end of device" message. > Examining the dmesg log provided in the Ubuntu bug #329880 comment #6 > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/329880/comments/6 > > I noticed that many of the error messages are in groups of 8 consecutive > sectors. It has me wondering if this is an MD superblock search (which > is, I believe, 4KB). I suspect you're right about the MD superblock search, but the 4kB pattern is likely simpler, and more fundamental, than that. We'll see 4kB IO patterns for almost _anything_, simply because we almost always strive to do IO on PAGE_SIZE blocks (4kB on x86), since that's how the page cache works. All the generic kernel IO routines are geared towards reading a page (or a set of pages) of data. So even something that _tries_ to read just one sector (like "read_dev_setor()") will often read a whole page, because it ends up using "readpage()" rather than trying to generate any direct IO (why? Because we will re-read the thing over and over as we try different partitioning schemes, so we want to do it cached). And then it's purely a matter of whether we read that page as 8x 512-byte blocks, or as 1x 4kB request, which will depend on what we decided the block size of the device will be. And that, in turn, will depend on things like alignment and size. If some device has an odd number of sectors or starts at an odd sector boundary, for example, we'll decide that it must be accessed as 512-byte blocks, and then you'd see 8 warnings. > It would explain why the access attempts always seem to be at the end of > a partition's range (superblock in the last 64KB usually I think). I > recall noticing that in 2007 but at that time didn't know the low-level > organisation of MD. I do agree that the "end of the partition range" implies that it's probably the MD superblock search. And then it's caught by the logic in __make_generic_request(), and won't actually ever even hit the disk - just cause the warning. But if you can find the case where we generate requests _without_ checking the size of the underlying device, then that is interesting. Neil? Linus On Thu, February 19, 2009 11:06 am, Linus Torvalds wrote: > > [ Added Neil and Jens to cc, since they are in charge of MD/block layer > respectively ] Isn't this fixed (or avoided) by commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac which is in .28-rc ??? It is certainly a problem that I have seen before. mdadm destroy all partitions in a device that it includes in an array to try to stop other code getting confused. Presumably dmraid doesn't. But from .28 it probably shouldn't need to. NeilBrown commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac Author: Kay Sievers <kay.sievers@vrfy.org> Date: Wed Oct 15 22:04:21 2008 -0700 block: sanitize invalid partition table entries We currently follow blindly what the partition table lies about the disk, and let the kernel create block devices which can not be accessed. Trying to identify the device leads to kernel logs full of: sdb: rw=0, want=73392, limit=28800 attempt to access beyond end of device ........
On Thu, 19 Feb 2009, NeilBrown wrote:
>
> Isn't this fixed (or avoided) by
> commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac
No.
Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is
totally harmless.
But the serious problem is not.
Why do people think that the partition check is IN ANY WAY SPECIAL?
It's not. It just does a read from the disk. _Exactly_ the same way that a
plain
fd = open("/dev/xyzzy", O_RDWR);
pread(fd, buf, size, offset_past_the_end);
does.
Now, the warning is not emitted when you do that, because the code in
fs/block_dev.c acts slightly differently - it caches the disk capacity in
the inode size, and it checks that size against the offset read. And it
just returns zero - without any warning - if you pass it.
But my point in this whole discussion has been that the warning was never
the problem to begin with! When the warning happened, nobody cares. It's a
warning, and there's a totally unimportant bugzilla associated with it,
but it's totally meaningless.
So getting the warning actually means that something _caught_ the error.
The problem case is actually when you do NOT get the warning, and get an
IO error instead.
So the _real_ bug (and the only one I can find myself caring about - the
warning really doesn't matter) is if you can actually make the disk seek
past the end. You definitely cannot make it do that with /dev/sda or any
simple device like that, because of the checks in block/blk-core.c
(bio_check_eod()).
And THAT is the problem. The fact that the partition code probes the end
of the disk is just a small incidental detail. Fixing the partition code
FIXES NOTHING, because if that code could trigger a real disk seek past
the end, then so can potentially other code that just does a blind read.
So ignore the warning. The warning was for a case that we already handled,
and isn't interesting. The interesting case is if we can insert a requests
_without_ doing that bio_check_eod() handling, and judging by TJ's
original report, we probably can (or at least could, two years ago).
Linus
On Fri, February 20, 2009 3:48 am, Linus Torvalds wrote: > > > On Thu, 19 Feb 2009, NeilBrown wrote: >> >> Isn't this fixed (or avoided) by >> commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac > > No. > > Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is > totally harmless. Depends on your perspective. Warnings can worry people, people can complain about warnings that worry them. Developers time can be spent explaining the warnings, and so taken away from development. That could be seen as 'harm' :-) > > But the serious problem is not. The serious problem was fixed by commit 5ddfe9691c91a244e8d1be597b6428fcefd58103 Author: NeilBrown <neilb@suse.de> Date: Mon Oct 30 22:07:21 2006 -0800 [PATCH] md: check bio address after mapping through partitions. Partitions are not limited to live within a device. So we should range check after partition mapping. ?? Though that is more that 2 years ago... but then the bug report doesn't mention explicit kernel versions so it is hard to be sure what was tested. But I think this problem should be considered fixed... unless some problem can be demostrated on a more recent kernel. NeilBrown Hi Neil, thanks for your enlightenment on this.
On Fri, 2009-02-20 at 06:35 +1100, NeilBrown wrote:
> The serious problem was fixed by
>
> commit 5ddfe9691c91a244e8d1be597b6428fcefd58103
> Author: NeilBrown <neilb@suse.de>
> Date: Mon Oct 30 22:07:21 2006 -0800
>
> [PATCH] md: check bio address after mapping through partitions.
>
> Partitions are not limited to live within a device. So we should range
> check after partition mapping.
>
> ??
>
> Though that is more that 2 years ago... but then the bug report doesn't
> mention explicit kernel versions so it is hard to be sure what was tested.
The kernel-version the original report was against is 2.6.20. When I
returned to it a few days ago I updated the affected kernel version to
the latest reported.
I'm going to be doing two things to conclude this:
1) Try to reproduce the scenario in a KVM set-up (to make it easier to
test)
2) Rebuild a physical machine with the same Fastrak TX2000, disks, and
partition scheme as the PC in 2007 had
and then boot with various kernel releases starting with the original
2.6.20 and working forward to test against the commits mentioned in this
bug report.
If (2) can reproduce the physical disk head-banging I'll try once again
to capture the noise on microphone (just so folks can hear how alarming
it sounds).
Hopefully this will confirm that the physical I/O issue is no longer
present as well as identifying the root cause of the attempts.
TJ.
On Fri, 20 Feb 2009, NeilBrown wrote: > > > > Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is > > totally harmless. > > Depends on your perspective. Warnings can worry people, people can complain > about warnings that worry them. Developers time can be spent explaining > the warnings, and so taken away from development. That could be seen > as 'harm' :-) Sure. But relative to perhaps eating your disk.. > > But the serious problem is not. > > The serious problem was fixed by > > commit 5ddfe9691c91a244e8d1be597b6428fcefd58103 > Author: NeilBrown <neilb@suse.de> > Date: Mon Oct 30 22:07:21 2006 -0800 > > [PATCH] md: check bio address after mapping through partitions. Goodie. So md doesn't ever remap sectors and then just insert them on some raw queue without going through make_request()? > Though that is more that 2 years ago... but then the bug report doesn't > mention explicit kernel versions so it is hard to be sure what was tested. Yes. It apparently made it into 2.6.19-rc5, an 2.6.29 was released November 2006, but if it was a distro kernel it would easily not have been used by users until much later in 2007. > But I think this problem should be considered fixed... unless some problem > can be demostrated on a more recent kernel. Ack. Linus Reply-To: jens.axboe@oracle.com On Thu, Feb 19 2009, Linus Torvalds wrote: > > > On Fri, 20 Feb 2009, NeilBrown wrote: > > > > > > Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is > > > totally harmless. > > > > Depends on your perspective. Warnings can worry people, people can > complain > > about warnings that worry them. Developers time can be spent explaining > > the warnings, and so taken away from development. That could be seen > > as 'harm' :-) > > Sure. But relative to perhaps eating your disk.. > > > > But the serious problem is not. > > > > The serious problem was fixed by > > > > commit 5ddfe9691c91a244e8d1be597b6428fcefd58103 > > Author: NeilBrown <neilb@suse.de> > > Date: Mon Oct 30 22:07:21 2006 -0800 > > > > [PATCH] md: check bio address after mapping through partitions. > > Goodie. So md doesn't ever remap sectors and then just insert them on some > raw queue without going through make_request()? With a bio, the only way is through ->make_request() for the device. I only see generic_make_request() or submit_bio() being used, which will check limits before diving into the lower layer ->make_request(). So should be safe in that regard. Closing old bugs |