Bug 7912 - [patch 1/1] Disk Errors during boot-time caused by probe of invalid partitions
Summary: [patch 1/1] Disk Errors during boot-time caused by probe of invalid partitions
Status: CLOSED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: fs_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-31 14:46 UTC by TJ
Modified: 2012-05-17 14:23 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.27
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Patch against fs/partitions/msdos.c (3.86 KB, patch)
2007-01-31 18:52 UTC, TJ
Details | Diff

Description TJ 2007-01-31 14:46:05 UTC
Most recent kernel where this bug did *NOT* occur: exists in all 2.6 kernels and
probably 2.4 and earlier, too.
Distribution: Ubuntu Edgy 6.10
Hardware Environment: i686 AMD Athlon SMP
Software Environment: 
Problem Description: 

This rare but critical bug has the potential to cause a hardware failure to disk
drives by allowing the system to repeatedly attempt to seek to sectors beyond
the end of the physical disk, causing sustained 'head banging'.

It particularly affects dmraid-managed RAID 1 stripes of the type hde+hdf where
the first physical disk hde contains a standard partition table which relates to
the larger logical disk represented by hde+hdf.

I'm not sure if this bug will affect mdraid RAID-1 stripes, or other software
RAID configurations.

The bug was discovered on a RAID 1+0 array consisting of 4x60GB drives on a
Promise FastTrak PDC20271 2-channel IDE controller (hde+hdf mirrored to hdg+hdh)
with logical block addressing (LBA).

There are 3 prolonged periods of disk-probing each lasting about 20 seconds
during which the 'head banging' is quite scary. The first two occur during the
kernel boot, and the last will occur when a GUI environment such as Gnome
initialises.

In the system where this bug appeared this caused thousands of disk-read errors
during boot (which overflowed dmesg log), and 'head bangs' the drive(s) so hard
that sometimes the system has to be powered off for a considerable time before
the disk(s) will re-initialise.

Cause:

At boot-time when drives are being probed the disks are scanned for partition
tables by fs/check.c:check_partition() which makes calls to all registered
partition-types.

In the case of the commonly used "msdos" partition-type used for Linux, BSD,
Solaris, MS-DOS, extended and others, the checking is done in
fs/msdos.c:msdos_partition().

The partition table is only checked for validity based on the 'magic bytes' 55AA
in the boot sector. The sector values in the partition table are copied without
any checks to ensure they are within the bounds of the disk device.

As a result, block devices are created based on the partition structures and
then various file-systems are given the task of scanning the partition to
determine if it is one they will manage.

This scanning, in a partition that has sector numbers outside the bounds of the
device, causes the errors.


Steps to reproduce:

1) Create a partition table with sector offsets/sizes that point to locations
beyond the end of the physical disk, and boot the PC.

2) Install a RAID-1 stripe using a dmraid-managed controller with partitions
that straddle or are beyond the physical boundary of the first disk.


Solution:

Before the partitions are built scan the sector numbers in the table to ensure
they are within the limits of the device.


Patch:

My patch creates a new function in fs/partitions/msdos.c "check_sane_values()"
and calls that function from within msdos_partition() *before* it enters the
partition-table build loop.

If insane values are found the function prints a detailed report of each error
to the kernel log (printk() to dmesg) and returns an error which results in the
partition table scan being aborted and msdos_partition() reporting "unable to
read partition table".

Created against 2.6.19 (applies to at least 2.6.17-2.6.20-rc7)

--- fs/partitions/msdos.c	2006-11-29 21:57:37.000000000 +0000
+++ fs/partitions/msdos.tj.c	2007-01-31 21:29:06.000000000 +0000
@@ -399,6 +399,77 @@ static struct {
 	{NEW_SOLARIS_X86_PARTITION, parse_solaris_x86},
 	{0, NULL},
 };
+
+/* 
+ * Check that *all* sector offsets are valid before actually building the
partition structure.
+ *
+ * This prevents physical damage to disks and boot-time problems caused by an
apparently valid
+ * partition table causing attempts to read sectors beyond the end of the
physical disk.
+ *
+ * This is especially important where this is the first physical disk in a
striped RAID array
+ * and the partition table contains sector offsets into the larger logical disk
(beyond the end
+ * of this physical disk).
+ *
+ * The RAID module will correctly manage the disks.
+ *
+ * The function is re-entrant so it can call itself to check extended partitions.
+ * 
+ * @param p partition table
+ * @param bdev block device
+ * @returns -1 if insane values found; 0 otherwise
+ * @copy Copyright 31 January 2007
+ * @author TJ <linux@tjworld.net>
+ */ 
+int check_sane_values(struct partition *p, struct block_device *bdev) {
+	unsigned char *data;
+	struct partition *ext;
+	Sector sect;
+	int slot;
+	int insane;
+	int sector_size = bdev_hardsect_size(bdev) / 512;
+	int ret = 0; /* default is to report ok */
+
+	/* don't return early; allow all partition entries to be checked */
+	for (slot = 1 ; slot <= 4 ; slot++, p++) { 
+		insane = 0; /* track sanity within each table entry */
+
+		if (NR_SECTS(p) == 0)
+			continue; /* ignore zero-sized entries */
+
+		if (START_SECT(p) > bdev->bd_disk->capacity-1) { /* invalid - beyond end of
disk */
+			insane |= 1; /* bit-0 flags insane start */
+		}
+		if (START_SECT(p)+NR_SECTS(p)-1 > bdev->bd_disk->capacity-1) { /* invalid -
beyond end of disk */
+			insane |= 2; /* bit-1 flags insane end */
+		}
+		if (!insane && is_extended_partition(p)) { /* check the extended partition */
+			data = read_dev_sector(bdev, START_SECT(p)*sector_size, &sect); /* fetch
sector from cache */
+			if (data) {
+				if (msdos_magic_present(data + 510)) { /* check for signature */
+					ext = (struct partition *) (data + 0x1be);
+					ret = check_sane_values(ext, bdev); /* recursive call */
+					if (ret == -1) /* insanity found */
+						insane |= 4; /* bit-2 flags insane extended partition contents */
+				}
+				put_dev_sector(sect); /* release sector to cache */
+			}
+			else ret = -1; /* failed to read sector from cache */
+
+		}
+		if (insane) { /* insanity found; report it */
+			ret = -1; /* error code */
+			printk("\n"); /* start error report on a fresh line */
+			if (insane | 1)
+				printk(" partition %d: start (sector %d) beyond end of disk (sector %d)\n",
slot, START_SECT(p), (unsigned int) bdev->bd_disk->capacity-1);
+			if (insane | 2)
+				printk(" partition %d: end (sector %d) beyond end of disk (sector %d)\n",
slot, START_SECT(p)+NR_SECTS(p)-1, (unsigned int) bdev->bd_disk->capacity-1);
+			if (insane | 4)
+				printk(" partition %d: insane extended contents\n", slot);
+		}
+	}
+	return ret;
+}
+
  
 int msdos_partition(struct parsed_partitions *state, struct block_device *bdev)
 {
@@ -448,6 +519,18 @@ int msdos_partition(struct parsed_partit
 #endif
 	p = (struct partition *) (data + 0x1be);
 
+	/* 
+	 * Check that *all* sector offsets are valid before actually building the
partition structure
+	 * Do it now rather than inside the loop that builds the partition entries to
avoid having to
+	 * unwind an unknown number of put_partition() calls in this loop and in the
(possible) calls
+	 * to parse_extended()
+	 * Added by TJ <linux@tjworld.net>, 31 January 2007.
+	 */
+	if (check_sane_values(p, bdev) == -1) {
+		put_dev_sector(sect); /* release to cache */
+		return -1; /* report invalid partition table */
+	}
+
 	/*
 	 * Look for partitions in two passes:
 	 * First find the primary and DOS-type extended partitions.
Comment 1 TJ 2007-01-31 14:49:59 UTC
**Correction to paths mentioned in body text**

The paths should read:

fs/partitions/check.c
fs/partitions/msdos.c

Paths in the patch file *are* correct.
Comment 2 Andrew Morton 2007-01-31 15:00:40 UTC
Thanks.  But we prefer to receive patches via email - this one was wordwrapped
and probably had tabs replaced by spaces.

Please email the aptch to myself, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Neil Brown <neilb@suse.de> and linux-kernel@vger.kernel.org as per
http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt
Comment 3 TJ 2007-01-31 18:52:57 UTC
Created attachment 10246 [details]
Patch against fs/partitions/msdos.c

Also distributed to maintainers & kernel developers mailing list
Comment 4 Alan 2008-09-22 12:44:31 UTC
Disks do range checks and the kernel also does range checks for the full media size.
Comment 5 TJ 2009-02-18 11:54:09 UTC
This bug is still biting (other users) now. I no longer use dmraid but we're seeing reports from those that do. For example:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/329880
Comment 6 Linus Torvalds 2009-02-18 12:30:31 UTC

On Wed, 18 Feb 2009, TJ wrote:
> 
> In early 2007 I submitted a patch for a bug that only seemed to show up
> when using dmraid and RAID 1 stripes. I tracked the bug down and posted
> a patch to the -mm tree.
> On 8th May 2007 Andrew Morton submitted it to you and you rejected it.
> 
> Since then it seems nothing has been done to address the underlying
> issue. I no longer use the dmraid module but we've just had a report at
> Ubuntu about the same issue still affecting dmraid.

And that report is obviously pure garbage.

If the kernel says

	[ 38.168279] sda: rw=0, want=1250274689, limit=625142448
	[ 38.168284] attempt to access beyond end of device

then by definition the kernel NEVER DID THE IO. That's the warning path 
for "hey, I'm not going to do this IO because it's past the end". The IO 
was never actually done.

So when somebody does a bug-report claiming

  This rare but critical bug has the potential to cause a hardware failure 
  to disk drives by allowing the system to repeatedly attempt to seek to 
  sectors beyond the end of the physical disk, causing sustained 'head 
  banging'.

I'm just not too damn impressed. Clearly no head banging took place, since 
we never did the IO, since we saw the error message.

So quite frankly, your email and your bug report are just annoying. 
They're especially annoying since _if_ the problem had been real, you 
shouldn't even have sent email to me privately, but instead try to find 
the appropriate relevant parties. Because I would quite possibly have 
lost it.

In other words - if you find a real bug, fine. It needs to get fixed. But 
the patch you propose seems to not actually do anything really 
interesting, and it is VERY COMPLICATED in not doing that interesting 
thing. 

You also don't seem to worry at all about my original worry:

> BTW - my name is just "TJ" (pronounced Teej) - single word, no
> first/bio_check_eodlast format.

Just out of interest, what does it say on your tax returns and/or on your 
drivers license?

> > IIRC, we cannot trust the "capacity" data, because not all disks report it 
> > correctly. If we did, we'd just do the check in read_dev_sector() instead.
> > 
> > So I'm dropping this. I might be wrong about the capacity thing, we may 
> > have fixed it (Jens cc'd). But if the capacity is trustworthy, why not 
> > just do the trivial check in read_dev_sector to protect against invalid 
> > extended ones? And in add_partitions()?

IOW, I think that if we want to avoid the warning message, we could 
either:

 - Just remove the warning message

   Do we really care? We probably do, because it's quite possibly 
   interesting as a warnign about real corrupted filesystems.

 - Do the check at "read_dev_sector()" time, and avoid the warning that 
   way (for _all_ partitioning schemes)

 - I still wonder whether the capacity is trustworthy at all at this 
   point, because I have this dim - but possibly incorrect - memory that 
   some broken devices don't do proper capacity stuff, and we actually 
   then depend on the partitioning info.

See? 

So I'm open to getting rid of the warning, but I am ABSOLUTELY NOT 
interested in bug reports that try to make scary claims without bothering 
to verify whether they are true, and the bug reporter then ignoring 
everything I say.

			Linus
Comment 7 TJ 2009-02-18 15:33:19 UTC
On Wed, 2009-02-18 at 12:28 -0800, Linus Torvalds wrote:

> And that report is obviously pure garbage.

I wouldn't have made the original report if the effect wasn't seriously
disturbing.

As I said, I no longer use dmraid but at the time of the report (Jan
2007 - 2.6.20) I was converting a Windows server that used a Promise
Fastrak TX2000 controller in RAID 1+0 (4 drive) configuration and had to
preserve the existing file-systems.

> If the kernel says
> 
>       [ 38.168279] sda: rw=0, want=1250274689, limit=625142448
>       [ 38.168284] attempt to access beyond end of device
> 
> then by definition the kernel NEVER DID THE IO. That's the warning path 
> for "hey, I'm not going to do this IO because it's past the end". The IO 
> was never actually done.

The reports you quote here are not from my original bug but from a
recent report against 2.6.27 which indicates the primary issue now is
the syslog being filled with error reports.

At the time of my original report the I/O was being attempted. At some
point later there was work done to prevent the I/O.

> So when somebody does a bug-report claiming
> 
>   This rare but critical bug has the potential to cause a hardware failure 
>   to disk drives by allowing the system to repeatedly attempt to seek to 
>   sectors beyond the end of the physical disk, causing sustained 'head 
>   banging'.
> 
> I'm just not too damn impressed. Clearly no head banging took place, since 
> we never did the IO, since we saw the error message.

The drives would 'thrash' and make a lot of noise for many seconds that
sounded like repeated banging/tapping/knocking and the boot would either
be delayed for a long time or sometimes the drives would lock-up and
only a cold power off would revive the system.
I tried to record the noises via camcorder when people questioned the
banging but it was too indistinct with the noises of fans as well.

> So quite frankly, your email and your bug report are just annoying. 
> They're especially annoying since _if_ the problem had been real, you 
> shouldn't even have sent email to me privately, but instead try to find 
> the appropriate relevant parties.

I replied directly to you since you emailed me directly and privately
(no copy to the lkml) in May 2007.

>  Because I would quite possibly have 
> lost it.

I don't understand why you should be so aggressive. In my email I said
"...so I thought I'd chase it up and ask what you think the solution
should be if my original patch isn't acceptable".

It seemed appropriate now the same cause has resurfaced for some users,
and when the original patch approach was rejected by you, to ask what an
acceptable approach might be.

At the time I created the patch I was new to Linux kernel programming
and did my best with what I could find out, not just to solve the
immediate problem I was seeing, but also to help ensure the knowledge
gained would be used to improve the kernel for others.

> In other words - if you find a real bug, fine. It needs to get fixed. But 
> the patch you propose seems to not actually do anything really 
> interesting, and it is VERY COMPLICATED in not doing that interesting 
> thing.

I've not disputed that. At the time the issue was relevant to me (early
2007) I had a very steep learning-curve over a few days to figure out
the kernel call-paths, not least because the work-queues made it
difficult to associate the originator of the disk access request (which
turned out to be the partition code). It was my first time hacking the
kernel code.

By the time Andrew Morton forwarded the patch to you from the -mm tree
in May 2007 and it was rejected it was no longer an issue for me (moved
away from dmraid) and I had no time or desire to investigate it further.

> You also don't seem to worry at all about my original worry:
> 
> > BTW - my name is just "TJ" (pronounced Teej) - single word, no
> > first/last format.
> 
> Just out of interest, what does it say on your tax returns and/or on your 
> drivers license?

"TJ"

> > > IIRC, we cannot trust the "capacity" data, because not all disks report
> it 
> > > correctly. If we did, we'd just do the check in read_dev_sector()
> instead.
> > > 
> > > So I'm dropping this. I might be wrong about the capacity thing, we may 
> > > have fixed it (Jens cc'd). But if the capacity is trustworthy, why not 
> > > just do the trivial check in read_dev_sector to protect against invalid 
> > > extended ones? And in add_partitions()?
> 
> IOW, I think that if we want to avoid the warning message, we could 
> either:
> 
>  - Just remove the warning message
> 
>    Do we really care? We probably do, because it's quite possibly 
>    interesting as a warnign about real corrupted filesystems.
> 
>  - Do the check at "read_dev_sector()" time, and avoid the warning that 
>    way (for _all_ partitioning schemes)
> 
>  - I still wonder whether the capacity is trustworthy at all at this 
>    point, because I have this dim - but possibly incorrect - memory that 
>    some broken devices don't do proper capacity stuff, and we actually 
>    then depend on the partitioning info.
> 
> See?

Thank-you. Your last point is an example of why I emailed you. There was
an inference in your 2007 email but nothing specific that at the time I
was able to understand, and when I originally reported the issue on LKML
in Jan/Feb 2007 there were no other comments that gave me a clue to
that.

> So I'm open to getting rid of the warning,

It may be that rate-limiting the warning would be sufficient.

Examining the dmesg log provided in the Ubuntu bug #329880 comment #6

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/329880/comments/6

I noticed that many of the error messages are in groups of 8 consecutive
sectors. It has me wondering if this is an MD superblock search (which
is, I believe, 4KB). It would explain why the access attempts always
seem to be at the end of a partition's range (superblock in the last
64KB usually I think). I recall noticing that in 2007 but at that time
didn't know the low-level organisation of MD.

I'll try to reproduce the symptoms using a KVM to understand the reasons
for the seek.
Comment 8 Linus Torvalds 2009-02-18 16:06:47 UTC
[ Added Neil and Jens to cc, since they are in charge of MD/block layer 
  respectively ]

On Wed, 18 Feb 2009, TJ wrote:
> 
> > So I'm open to getting rid of the warning,
> 
> It may be that rate-limiting the warning would be sufficient.

Yes. Just rate-limiting it might be sufficient.

On the other hand, if there actually are cases where you see the disk 
actually _seeking_, then that is actually the bigger problem, and means 
that there is either (a) a path that submits IO without the check for 
capacity or (b) a case where the capacity simply isn't set correctly in 
the first place.

So I'd worry about that one more. Getting scary messages is not all that 
dangerous.

The fact that that apparently happened with dmraid, and I suspect _your_ 
case was a matter of (a), ie some MD layer that just took a request to the 
MD layer, and mapped it into a request for a real disk without doing the 
same sanity checking that __generic_make_request does with that whole 
"bio_check_eod()".

However, your report is from over two years ago, and really fundamentally 
doesn't smell like Ubuntu bug #329880 at all, except in the (very limited) 
sense that it's probably triggered by the same/similar probe. The ubuntu 
bug is harmless as long as it's just that "attempt to access beyond end of 
device" message.

> Examining the dmesg log provided in the Ubuntu bug #329880 comment #6
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/329880/comments/6
> 
> I noticed that many of the error messages are in groups of 8 consecutive
> sectors. It has me wondering if this is an MD superblock search (which
> is, I believe, 4KB).

I suspect you're right about the MD superblock search, but the 4kB pattern 
is likely simpler, and more fundamental, than that.

We'll see 4kB IO patterns for almost _anything_, simply because we almost 
always strive to do IO on PAGE_SIZE blocks (4kB on x86), since that's how 
the page cache works. All the generic kernel IO routines are geared 
towards reading a page (or a set of pages) of data.

So even something that _tries_ to read just one sector (like 
"read_dev_setor()") will often read a whole page, because it ends up using 
"readpage()" rather than trying to generate any direct IO (why? Because we 
will re-read the thing over and over as we try different partitioning 
schemes, so we want to do it cached).

And then it's purely a matter of whether we read that page as 8x 512-byte 
blocks, or as 1x 4kB request, which will depend on what we decided the 
block size of the device will be.

And that, in turn, will depend on things like alignment and size. If some 
device has an odd number of sectors or starts at an odd sector boundary, 
for example, we'll decide that it must be accessed as 512-byte blocks, and 
then you'd see 8 warnings.

> It would explain why the access attempts always seem to be at the end of 
> a partition's range (superblock in the last 64KB usually I think). I 
> recall noticing that in 2007 but at that time didn't know the low-level 
> organisation of MD.

I do agree that the "end of the partition range" implies that it's 
probably the MD superblock search. And then it's caught by the logic in 
__make_generic_request(), and won't actually ever even hit the disk - just 
cause the warning.

But if you can find the case where we generate requests _without_ checking 
the size of the underlying device, then that is interesting. Neil?

		Linus
Comment 9 Neil Brown 2009-02-19 01:56:57 UTC
On Thu, February 19, 2009 11:06 am, Linus Torvalds wrote:
>
> [ Added Neil and Jens to cc, since they are in charge of MD/block layer
>   respectively ]

Isn't this fixed (or avoided) by
 commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac

which is in .28-rc ???

It is certainly a problem that I have seen before.
mdadm destroy all partitions in a device that it includes in an array
to try to stop other code getting confused.  Presumably dmraid
doesn't.  But from .28 it probably shouldn't need to.

NeilBrown

commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac
Author: Kay Sievers <kay.sievers@vrfy.org>
Date:   Wed Oct 15 22:04:21 2008 -0700

    block: sanitize invalid partition table entries

    We currently follow blindly what the partition table lies about the
    disk, and let the kernel create block devices which can not be accessed.
    Trying to identify the device leads to kernel logs full of:
      sdb: rw=0, want=73392, limit=28800
      attempt to access beyond end of device
 ........
Comment 10 Linus Torvalds 2009-02-19 08:48:35 UTC

On Thu, 19 Feb 2009, NeilBrown wrote:
> 
> Isn't this fixed (or avoided) by
>  commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac

No.

Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is 
totally harmless.

But the serious problem is not.

Why do people think that the partition check is IN ANY WAY SPECIAL?

It's not. It just does a read from the disk. _Exactly_ the same way that a 
plain

	fd = open("/dev/xyzzy", O_RDWR);
	pread(fd, buf, size, offset_past_the_end);

does.

Now, the warning is not emitted when you do that, because the code in 
fs/block_dev.c acts slightly differently - it caches the disk capacity in 
the inode size, and it checks that size against the offset read. And it 
just returns zero - without any warning - if you pass it. 

But my point in this whole discussion has been that the warning was never 
the problem to begin with! When the warning happened, nobody cares. It's a 
warning, and there's a totally unimportant bugzilla associated with it, 
but it's totally meaningless.

So getting the warning actually means that something _caught_ the error. 
The problem case is actually when you do NOT get the warning, and get an 
IO error instead.

So the _real_ bug (and the only one I can find myself caring about - the 
warning really doesn't matter) is if you can actually make the disk seek 
past the end. You definitely cannot make it do that with /dev/sda or any 
simple device like that, because of the checks in block/blk-core.c 
(bio_check_eod()).

And THAT is the problem. The fact that the partition code probes the end 
of the disk is just a small incidental detail. Fixing the partition code 
FIXES NOTHING, because if that code could trigger a real disk seek past 
the end, then so can potentially other code that just does a blind read.

So ignore the warning. The warning was for a case that we already handled, 
and isn't interesting. The interesting case is if we can insert a requests 
_without_ doing that bio_check_eod() handling, and judging by TJ's 
original report, we probably can (or at least could, two years ago).

			Linus
Comment 11 Neil Brown 2009-02-19 11:35:28 UTC
On Fri, February 20, 2009 3:48 am, Linus Torvalds wrote:
>
>
> On Thu, 19 Feb 2009, NeilBrown wrote:
>>
>> Isn't this fixed (or avoided) by
>>  commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac
>
> No.
>
> Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is
> totally harmless.

Depends on your perspective.  Warnings can worry people, people can complain
about warnings that worry them.  Developers time can be spent explaining
the warnings, and so taken away from development.  That could be seen
as 'harm' :-)

>
> But the serious problem is not.

The serious problem was fixed by

commit 5ddfe9691c91a244e8d1be597b6428fcefd58103
Author: NeilBrown <neilb@suse.de>
Date:   Mon Oct 30 22:07:21 2006 -0800

    [PATCH] md: check bio address after mapping through partitions.

    Partitions are not limited to live within a device.  So we should range
    check after partition mapping.

??

Though that is more that 2 years ago... but then the bug report doesn't
mention explicit kernel versions so it is hard to be sure what was tested.

But I think this problem should be considered fixed... unless some problem
can be demostrated on a more recent kernel.

NeilBrown
Comment 12 TJ 2009-02-19 11:56:03 UTC
Hi Neil, thanks for your enlightenment on this.

On Fri, 2009-02-20 at 06:35 +1100, NeilBrown wrote:
> The serious problem was fixed by
> 
> commit 5ddfe9691c91a244e8d1be597b6428fcefd58103
> Author: NeilBrown <neilb@suse.de>
> Date:   Mon Oct 30 22:07:21 2006 -0800
> 
>     [PATCH] md: check bio address after mapping through partitions.
> 
>     Partitions are not limited to live within a device.  So we should range
>     check after partition mapping.
> 
> ??
> 
> Though that is more that 2 years ago... but then the bug report doesn't
> mention explicit kernel versions so it is hard to be sure what was tested.

The kernel-version the original report was against is 2.6.20. When I
returned to it a few days ago I updated the affected kernel version to
the latest reported.

I'm going to be doing two things to conclude this:

1) Try to reproduce the scenario in a KVM set-up (to make it easier to
test)

2) Rebuild a physical machine with the same Fastrak TX2000, disks, and
partition scheme as the PC in 2007 had

and then boot with various kernel releases starting with the original
2.6.20 and working forward to test against the commits mentioned in this
bug report.

If (2) can reproduce the physical disk head-banging I'll try once again
to capture the noise on microphone (just so folks can hear how alarming
it sounds).

Hopefully this will confirm that the physical I/O issue is no longer
present as well as identifying the root cause of the attempts.

TJ.
Comment 13 Linus Torvalds 2009-02-19 11:59:24 UTC

On Fri, 20 Feb 2009, NeilBrown wrote:
> >
> > Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is
> > totally harmless.
> 
> Depends on your perspective.  Warnings can worry people, people can complain
> about warnings that worry them.  Developers time can be spent explaining
> the warnings, and so taken away from development.  That could be seen
> as 'harm' :-)

Sure. But relative to perhaps eating your disk..

> > But the serious problem is not.
> 
> The serious problem was fixed by
> 
> commit 5ddfe9691c91a244e8d1be597b6428fcefd58103
> Author: NeilBrown <neilb@suse.de>
> Date:   Mon Oct 30 22:07:21 2006 -0800
> 
>     [PATCH] md: check bio address after mapping through partitions.

Goodie. So md doesn't ever remap sectors and then just insert them on some 
raw queue without going through make_request()?

> Though that is more that 2 years ago... but then the bug report doesn't
> mention explicit kernel versions so it is hard to be sure what was tested.

Yes. It apparently made it into 2.6.19-rc5, an 2.6.29 was released 
November 2006, but if it was a distro kernel it would easily not have been 
used by users until much later in 2007.

> But I think this problem should be considered fixed... unless some problem
> can be demostrated on a more recent kernel.

Ack.

		Linus
Comment 14 Anonymous Emailer 2009-02-19 12:12:30 UTC
Reply-To: jens.axboe@oracle.com

On Thu, Feb 19 2009, Linus Torvalds wrote:
> 
> 
> On Fri, 20 Feb 2009, NeilBrown wrote:
> > >
> > > Well, maybe the _symptoms_ are fixed - ie it fixed the warning that is
> > > totally harmless.
> > 
> > Depends on your perspective.  Warnings can worry people, people can
> complain
> > about warnings that worry them.  Developers time can be spent explaining
> > the warnings, and so taken away from development.  That could be seen
> > as 'harm' :-)
> 
> Sure. But relative to perhaps eating your disk..
> 
> > > But the serious problem is not.
> > 
> > The serious problem was fixed by
> > 
> > commit 5ddfe9691c91a244e8d1be597b6428fcefd58103
> > Author: NeilBrown <neilb@suse.de>
> > Date:   Mon Oct 30 22:07:21 2006 -0800
> > 
> >     [PATCH] md: check bio address after mapping through partitions.
> 
> Goodie. So md doesn't ever remap sectors and then just insert them on some 
> raw queue without going through make_request()?

With a bio, the only way is through ->make_request() for the device. I
only see generic_make_request() or submit_bio() being used, which will
check limits before diving into the lower layer ->make_request(). So
should be safe in that regard.
Comment 15 Alan 2012-05-17 14:23:44 UTC
Closing old bugs

Note You need to log in before you can comment on or make changes to this bug.