Bug 13252

Summary:	"bio too big device md0" and possible data corruption
Product:	IO/Storage	Reporter:	Tim Connors (tim.w.connors)
Component:	LVM2/DM	Assignee:	Alasdair G Kergon (agk)
Status:	CLOSED DUPLICATE
Severity:	high	CC:	agk
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	2.6.29	Subsystem:
Regression:	No	Bisected commit-id:

Description Tim Connors 2009-05-06 00:13:21 UTC

I had a raid1 device consisting of 2 1TB esata drives with
/sys/block/sdc/queue/max_sectors_kb=512
and
/sys/block/sdc/queue/max_hw_sectors_kb=32767
I then have lvm ontop of the md0 device.

I remove one device, and because of a bug (http://marc.info/?l=linux-ide&m=124153367903104&w=2) had to add it back as a USB device instead.
There, both max_hw_sectors_kb and max_sectors_kb become 120.

Thereafter, I get a whole lot of:
[55125.228693] bio too big device md0 (248 > 240)
although the resync and subsequent usage goes ahead without a glitch.

This is a long standing bug that according to people like https://lists.ubuntu.com/archives/kernel-bugs/2009-January/thread.html#46929 eats data.  Since I had lost some data when I first was dealing with raid on these disks a month ago, but didn't know at the time what caused it, I can easily imagine I came across the same circumstances then too.

The bitmap usage had been hovering at about 60/233 572KB pages for quite some time, even on a disk with not much writing happening at the time.  I took that to initially mean that it wasn't writing out some data to the usb disk, but then after some time, it dropped to 0.  As such, since I no longer trust the integrity of the raid device, I have removed the usb device from the array, and will probably zero its superblock so that it has to do a full resync rather than using the bitmap which may not be understanding that bits of the usb drive weren't written properly?  Is that a possibility?

Comment 1 Alasdair G Kergon 2009-10-18 18:42:33 UTC


*** This bug has been marked as a duplicate of bug 9401 ***