Bug 7805

Summary: files cannot be read off mounted DVDs, because they are considered to be beyond the end of the device
Product: IO/Storage Reporter: Vasilis Vasaitis (v.vasaitis)
Component: Serial ATAAssignee: Tejun Heo (htejun)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, alan, axboe, bunk, cub.uanic, gurdasani, htejun
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20-rc3 Subsystem:
Regression: --- Bisected commit-id:

Description Vasilis Vasaitis 2007-01-10 16:06:02 UTC
Most recent kernel where this bug did *NOT* occur:
Distribution: Debian unstable
Hardware Environment:
Asus M2N-E (nVidia nForce 570 Ultra) motherboard
Plextor PX-712SA serial ATA DVD writer
Software Environment:
Problem Description:

  When I mount a DVD, sometimes I am unable to read some files from it, getting
errors like the following:

attempt to access beyond end of device
sr0: rw=0, want=5373636, limit=2097151
Buffer I/O error on device sr0, logical block 1343408

  It seems that the kernel gets the DVD size wrong, because if I try to take an
image using dd, the result is a much smaller file compared to the contents of
the DVD. On the other hand, if I use readom (the Debian cdrkit equivalent to
cdrecord's readcd) to take the image, which seems to bypass the kernel and send
commands directly to the drive, the result is of the correct size, and if I
mount the image with loopback I can read all files just fine.

Steps to reproduce:
Comment 1 Andrew Morton 2007-01-31 00:46:46 UTC
Jens, I could swear this problem cropped up a few month ago and we almost
got to the bottom of it, but didn't.  Do you recall?
Comment 2 Alan 2007-06-05 07:04:18 UTC
Does this occur if the volume is mounted in UDF format or just iso9660 ?
Comment 3 Vasilis Vasaitis 2007-06-05 16:32:28 UTC
Yes, the problem occurs with UDF too, I just tested. Not that I expected
otherwise -- as I said, it occurs even when dd'ing the raw block device (the
resulting image is smaller than it should be), so the filesystem type wasn't
very likely to affect the result.

By the way, I just performed all tests with 2.6.22-rc4, and the problem still
persists.
Comment 4 Tejun Heo 2007-06-05 21:55:43 UTC
When that occurs, please run 'dmesg' and 'fuser -v /dev/srX' and post the
result.  I saw a similar problem where udev was holding onto the device node
thus preventing revalidation of the device.
Comment 5 Vasilis Vasaitis 2007-06-06 04:11:35 UTC
What am I looking for? I just tested, fuser's output is always empty, and dmesg
only reports errors like the ones I mentioned already (attempt to access beyond
end of device). Keep in mind that I'm running Debian unstable, so userspace is
very up-to-date.
Comment 6 Tejun Heo 2007-06-06 23:27:27 UTC
Thanks.  Hmmm... so it's not udev.  Does it happen after fresh reboot if the
specific dvd is inserted?  If so, can you send me the iso?
Comment 7 Tejun Heo 2007-06-07 00:06:18 UTC
The limit is set to 0x1FFFF which is the default media size, so it seems media
revalidation never occurred.  It could be that the device doesn't report check
condition properly making the sr/cdrom skip.  Anyways, please report the result
of dmesg and describe in detail how you can trigger the error condition.
Comment 8 Vasilis Vasaitis 2007-06-07 03:46:50 UTC
This happens straight after reboot too (first DVD inserted), and with pretty
much any DVD I try to insert, i.e. not a particular one. And as I said, the
dmesg error messages are of this form (here taken from my latest attempt):

attempt to access beyond end of device
sr0: rw=0, want=8483296, limit=2097151
printk: 2 messages suppressed.
Buffer I/O error on device sr0, logical block 2120823
attempt to access beyond end of device
sr0: rw=0, want=8483300, limit=2097151
Buffer I/O error on device sr0, logical block 2120824
attempt to access beyond end of device
sr0: rw=0, want=8483304, limit=2097151
Buffer I/O error on device sr0, logical block 2120825
attempt to access beyond end of device
sr0: rw=0, want=8483308, limit=2097151
Buffer I/O error on device sr0, logical block 2120826
attempt to access beyond end of device
sr0: rw=0, want=8483312, limit=2097151
Buffer I/O error on device sr0, logical block 2120827
attempt to access beyond end of device
sr0: rw=0, want=8483316, limit=2097151
Buffer I/O error on device sr0, logical block 2120828
attempt to access beyond end of device
sr0: rw=0, want=8483320, limit=2097151
Buffer I/O error on device sr0, logical block 2120829
attempt to access beyond end of device
sr0: rw=0, want=8483324, limit=2097151
Buffer I/O error on device sr0, logical block 2120830
attempt to access beyond end of device
sr0: rw=0, want=8483296, limit=2097151
Buffer I/O error on device sr0, logical block 2120823
attempt to access beyond end of device
sr0: rw=0, want=8483300, limit=2097151
Buffer I/O error on device sr0, logical block 2120824

The "Buffer I/O error" messages appear 8 times for consecutive logical blocks,
then 2 more times for the first 2 blocks again, then they disappear altogether
(but the same pattern continues):

attempt to access beyond end of device
sr0: rw=0, want=8493732, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493736, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493740, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493744, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493748, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493752, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493756, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493760, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493732, limit=2097151
attempt to access beyond end of device
sr0: rw=0, want=8493736, limit=2097151
...
Comment 9 Tejun Heo 2007-06-07 04:53:13 UTC
Please post full dmesg result including boot log so that we can see which
drivers are loaded and etc.
Comment 10 Vasilis Vasaitis 2007-06-08 06:20:22 UTC
Hm, the plot thickens. Since I was going to do tests with a clean reboot and
recording dmesg etc., I thought I'd try with init=/bin/bash (nothing having run,
not even udev) and single (udev running, basic setup but not much else). Well,
the problem didn't manifest in either of these cases. So unless the bug is
extremely inconsistent in its appearance, it's quite likely that there's
something started afterwards (at runlevel 3) that's causing the problem. Likely
culprits seem to be either hal or the packet writing setup stuff. I'll post more
details when I'm able to run more tests, either today or tomorrow.
Comment 11 Tejun Heo 2007-06-08 07:28:01 UTC
Something is hanging onto /dev/sr0.  udev creates temporary device node before
actual device name is determined.  That can be why fuser doesn't show the user.
 You run fuser as root, right?
Comment 12 Vasilis Vasaitis 2007-06-08 11:15:55 UTC
OK, it's the packet writing stuff (CONFIG_CDROM_PKTCDVD). This whole bug only 
manifests after my init scripts have run:

pktsetup 0 /dev/scd0

which produces the following output on dmesg:

pktcdvd: writer pktcdvd0 mapped to sr0

Gah, this is so annoying. Have I been wasting this subsystem's maintainers' 
time then? Sorry about that. If it's Someone Else's Problem (TM), can you 
recategorize / reassign? Thanks.
Comment 13 Tejun Heo 2007-06-08 22:26:44 UTC
My best bet is PKTCDVD maintainer.  Hmmm... he doesn't have bugzilla account. 
I'll ping him via email.
Comment 14 Tejun Heo 2007-07-01 23:20:11 UTC
Okay, pinging again.
Comment 15 Amit Gurdasani 2008-05-22 14:03:38 UTC
I continue to see this with 2.6.25-2-amd64 as packaged by Debian (block device size being severely limited [limit=2097151], and dd-ing off the optical media resulting in a truncated image). Turning off packet writing works around the issue.

(Other things that _might_ be attributable to packet writing are the occasional hard-lock only when burning media, as well as a reluctance of the drive to eject. Said drive works perfectly in another OS.)
Comment 16 Thomas Meyer 2008-05-25 13:19:44 UTC
The cause of this bug is describe in commit 7b3d9545f9ac8b31528dd2d6d8ec8d19922917b8 (Revert "scsi: revert "[SCSI] Get rid of scsi_cmnd->done"")

Mr. Torvalds suggest to:
"
    The proper fix for that is probably to just do something like
    
        bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9;
    
    in fs/block_dev.c:do_open() even for the cases where we're not the
    original opener (but *not* call bd_set_size(), since that will also
    change the block size of the device).
"