Bug 13252

Summary: "bio too big device md0" and possible data corruption
Product: IO/Storage Reporter: Tim Connors (tim.w.connors)
Component: LVM2/DMAssignee: Alasdair G Kergon (agk)
Status: CLOSED DUPLICATE    
Severity: high CC: agk
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29 Subsystem:
Regression: No Bisected commit-id:

Description Tim Connors 2009-05-06 00:13:21 UTC
I had a raid1 device consisting of 2 1TB esata drives with
/sys/block/sdc/queue/max_sectors_kb=512
and
/sys/block/sdc/queue/max_hw_sectors_kb=32767
I then have lvm ontop of the md0 device.

I remove one device, and because of a bug (http://marc.info/?l=linux-ide&m=124153367903104&w=2) had to add it back as a USB device instead.
There, both max_hw_sectors_kb and max_sectors_kb become 120.

Thereafter, I get a whole lot of:
[55125.228693] bio too big device md0 (248 > 240)
although the resync and subsequent usage goes ahead without a glitch.

This is a long standing bug that according to people like https://lists.ubuntu.com/archives/kernel-bugs/2009-January/thread.html#46929 eats data.  Since I had lost some data when I first was dealing with raid on these disks a month ago, but didn't know at the time what caused it, I can easily imagine I came across the same circumstances then too.

The bitmap usage had been hovering at about 60/233 572KB pages for quite some time, even on a disk with not much writing happening at the time.  I took that to initially mean that it wasn't writing out some data to the usb disk, but then after some time, it dropped to 0.  As such, since I no longer trust the integrity of the raid device, I have removed the usb device from the array, and will probably zero its superblock so that it has to do a full resync rather than using the bitmap which may not be understanding that bits of the usb drive weren't written properly?  Is that a possibility?
Comment 1 Alasdair G Kergon 2009-10-18 18:42:33 UTC

*** This bug has been marked as a duplicate of bug 9401 ***