Most recent kernel where this bug did *NOT* occur: Distribution: Debian unstable Hardware Environment: Asus M2N-E (nVidia nForce 570 Ultra) motherboard Plextor PX-712SA serial ATA DVD writer Software Environment: Problem Description: When I mount a DVD, sometimes I am unable to read some files from it, getting errors like the following: attempt to access beyond end of device sr0: rw=0, want=5373636, limit=2097151 Buffer I/O error on device sr0, logical block 1343408 It seems that the kernel gets the DVD size wrong, because if I try to take an image using dd, the result is a much smaller file compared to the contents of the DVD. On the other hand, if I use readom (the Debian cdrkit equivalent to cdrecord's readcd) to take the image, which seems to bypass the kernel and send commands directly to the drive, the result is of the correct size, and if I mount the image with loopback I can read all files just fine. Steps to reproduce:
Jens, I could swear this problem cropped up a few month ago and we almost got to the bottom of it, but didn't. Do you recall?
Does this occur if the volume is mounted in UDF format or just iso9660 ?
Yes, the problem occurs with UDF too, I just tested. Not that I expected otherwise -- as I said, it occurs even when dd'ing the raw block device (the resulting image is smaller than it should be), so the filesystem type wasn't very likely to affect the result. By the way, I just performed all tests with 2.6.22-rc4, and the problem still persists.
When that occurs, please run 'dmesg' and 'fuser -v /dev/srX' and post the result. I saw a similar problem where udev was holding onto the device node thus preventing revalidation of the device.
What am I looking for? I just tested, fuser's output is always empty, and dmesg only reports errors like the ones I mentioned already (attempt to access beyond end of device). Keep in mind that I'm running Debian unstable, so userspace is very up-to-date.
Thanks. Hmmm... so it's not udev. Does it happen after fresh reboot if the specific dvd is inserted? If so, can you send me the iso?
The limit is set to 0x1FFFF which is the default media size, so it seems media revalidation never occurred. It could be that the device doesn't report check condition properly making the sr/cdrom skip. Anyways, please report the result of dmesg and describe in detail how you can trigger the error condition.
This happens straight after reboot too (first DVD inserted), and with pretty much any DVD I try to insert, i.e. not a particular one. And as I said, the dmesg error messages are of this form (here taken from my latest attempt): attempt to access beyond end of device sr0: rw=0, want=8483296, limit=2097151 printk: 2 messages suppressed. Buffer I/O error on device sr0, logical block 2120823 attempt to access beyond end of device sr0: rw=0, want=8483300, limit=2097151 Buffer I/O error on device sr0, logical block 2120824 attempt to access beyond end of device sr0: rw=0, want=8483304, limit=2097151 Buffer I/O error on device sr0, logical block 2120825 attempt to access beyond end of device sr0: rw=0, want=8483308, limit=2097151 Buffer I/O error on device sr0, logical block 2120826 attempt to access beyond end of device sr0: rw=0, want=8483312, limit=2097151 Buffer I/O error on device sr0, logical block 2120827 attempt to access beyond end of device sr0: rw=0, want=8483316, limit=2097151 Buffer I/O error on device sr0, logical block 2120828 attempt to access beyond end of device sr0: rw=0, want=8483320, limit=2097151 Buffer I/O error on device sr0, logical block 2120829 attempt to access beyond end of device sr0: rw=0, want=8483324, limit=2097151 Buffer I/O error on device sr0, logical block 2120830 attempt to access beyond end of device sr0: rw=0, want=8483296, limit=2097151 Buffer I/O error on device sr0, logical block 2120823 attempt to access beyond end of device sr0: rw=0, want=8483300, limit=2097151 Buffer I/O error on device sr0, logical block 2120824 The "Buffer I/O error" messages appear 8 times for consecutive logical blocks, then 2 more times for the first 2 blocks again, then they disappear altogether (but the same pattern continues): attempt to access beyond end of device sr0: rw=0, want=8493732, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493736, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493740, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493744, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493748, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493752, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493756, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493760, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493732, limit=2097151 attempt to access beyond end of device sr0: rw=0, want=8493736, limit=2097151 ...
Please post full dmesg result including boot log so that we can see which drivers are loaded and etc.
Hm, the plot thickens. Since I was going to do tests with a clean reboot and recording dmesg etc., I thought I'd try with init=/bin/bash (nothing having run, not even udev) and single (udev running, basic setup but not much else). Well, the problem didn't manifest in either of these cases. So unless the bug is extremely inconsistent in its appearance, it's quite likely that there's something started afterwards (at runlevel 3) that's causing the problem. Likely culprits seem to be either hal or the packet writing setup stuff. I'll post more details when I'm able to run more tests, either today or tomorrow.
Something is hanging onto /dev/sr0. udev creates temporary device node before actual device name is determined. That can be why fuser doesn't show the user. You run fuser as root, right?
OK, it's the packet writing stuff (CONFIG_CDROM_PKTCDVD). This whole bug only manifests after my init scripts have run: pktsetup 0 /dev/scd0 which produces the following output on dmesg: pktcdvd: writer pktcdvd0 mapped to sr0 Gah, this is so annoying. Have I been wasting this subsystem's maintainers' time then? Sorry about that. If it's Someone Else's Problem (TM), can you recategorize / reassign? Thanks.
My best bet is PKTCDVD maintainer. Hmmm... he doesn't have bugzilla account. I'll ping him via email.
Okay, pinging again.
I continue to see this with 2.6.25-2-amd64 as packaged by Debian (block device size being severely limited [limit=2097151], and dd-ing off the optical media resulting in a truncated image). Turning off packet writing works around the issue. (Other things that _might_ be attributable to packet writing are the occasional hard-lock only when burning media, as well as a reluctance of the drive to eject. Said drive works perfectly in another OS.)
The cause of this bug is describe in commit 7b3d9545f9ac8b31528dd2d6d8ec8d19922917b8 (Revert "scsi: revert "[SCSI] Get rid of scsi_cmnd->done"") Mr. Torvalds suggest to: " The proper fix for that is probably to just do something like bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9; in fs/block_dev.c:do_open() even for the cases where we're not the original opener (but *not* call bd_set_size(), since that will also change the block size of the device). "
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7b3d9545f9ac8b31528dd2d6d8ec8d19922917b8