Subject : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device Submitter : Thomas Meyer <thomas@m3y3r.de> References : http://lkml.org/lkml/2007/11/13/250
Probably caused by: commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d Author: Matthew Wilcox <matthew@wil.cx> Date: Tue Sep 25 12:42:04 2007 -0400 [SCSI] Get rid of scsi_cmnd->done http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6f5391c283d7fdcf24bf40786ea79061919d1e1d
On Tue, Nov 13, 2007 at 01:13:29PM -0800, bugme-daemon@bugzilla.kernel.org wrote: > Probably caused by: > > commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d > Author: Matthew Wilcox <matthew@wil.cx> > Date: Tue Sep 25 12:42:04 2007 -0400 > > [SCSI] Get rid of scsi_cmnd->done > > > http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6f5391c283d7fdcf24bf40786ea79061919d1e1d Since the reporter says that reverting this commit fixes the problem, it must have caused the problem. For the benefit of those on linux-scsi who are just tuning into this report, the original mail is at http://lkml.org/lkml/2007/11/13/250 and the report that reverting this commit fixes it is a followup to it. I've stared at the patch, trying to figure out what's wrong. The only thing which makes any sense to me is that a request comes in, tagged as REQ_TYPE_FS. Later, it becomes REQ_TYPE_BLOCK_PC. This would have caused us to call sr_done() before the patch, and after the patch, we would not call sr_done(). Further eyes are welcome. James has suggested that the problem may be that ide-scsi does just this. Thomas, can you confirm whether or not you were using ide-scsi?
$ grep _IDE .config # CONFIG_IDE is not set $ udevinfo --path=/block/sr0 -a Udevinfo starts with the device specified by the devpath and then walks up the chain of parent devices. It prints for every device found, all possible attributes in the udev rules key format. A rule to match, can be composed by the attributes of the device and the attributes from one single parent device. looking at device '/block/sr0': KERNEL=="sr0" SUBSYSTEM=="block" DRIVER=="" ATTR{dev}=="11:0" ATTR{range}=="1" ATTR{removable}=="1" ATTR{size}=="40944" ATTR{stat}==" 12 52 504 3336 0 0 0 0 0 3070 3336" ATTR{capability}=="19" looking at parent device '/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0': KERNELS=="0:0:0:0" SUBSYSTEMS=="scsi" DRIVERS=="sr" ATTRS{device_blocked}=="0" ATTRS{type}=="5" ATTRS{scsi_level}=="6" ATTRS{vendor}=="MATSHITA" ATTRS{model}=="DVD-R UJ-857 " ATTRS{rev}=="HAEA" ATTRS{state}=="running" ATTRS{timeout}=="0" ATTRS{iocounterbits}=="32" ATTRS{iorequest_cnt}=="0x48" ATTRS{iodone_cnt}=="0x47" ATTRS{ioerr_cnt}=="0x0" ATTRS{modalias}=="scsi:t-0x05" ATTRS{evt_media_change}=="0" ATTRS{queue_depth}=="1" ATTRS{queue_type}=="none" looking at parent device '/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0': KERNELS=="target0:0:0" SUBSYSTEMS=="" DRIVERS=="" ATTRS{uevent}=="" looking at parent device '/devices/pci0000:00/0000:00:1f.1/host0': KERNELS=="host0" SUBSYSTEMS=="" DRIVERS=="" ATTRS{uevent}=="" looking at parent device '/devices/pci0000:00/0000:00:1f.1': KERNELS=="0000:00:1f.1" SUBSYSTEMS=="pci" DRIVERS=="ata_piix" ATTRS{vendor}=="0x8086" ATTRS{device}=="0x27df" ATTRS{subsystem_vendor}=="0x8086" ATTRS{subsystem_device}=="0x7270" ATTRS{class}=="0x01018a" ATTRS{irq}=="18" ATTRS{local_cpus}=="3" ATTRS{modalias}=="pci:v00008086d000027DFsv00008086sd00007270bc01sc01i8a" ATTRS{broken_parity_status}=="0" ATTRS{msi_bus}=="" looking at parent device '/devices/pci0000:00': KERNELS=="pci0000:00" SUBSYSTEMS=="" DRIVERS=="" ATTRS{uevent}==""
~/source/linux-2.6/drivers/block $ grep REQ_TY * [schnipp] pktcdvd.c: rq->cmd_type = REQ_TYPE_BLOCK_PC; [schnipp] I am using the pktcdvd driver on /dev/sr0 resulting in /dev/pktcdvd/sr0 Maybe some bad interaction?
ping: requested debug info was posted. Guilty commit was identified. No response for more than 2 weeks and guilty commit is still not reverted upstream.
steps to reproduce this error: 1.) setup pktsetup mapping, similar to this: # pktsetup -s 0 : 253:0 -> 11:0 # ls -ld /dev/sr0 brw-rw---- 1 root cdrom 11, 0 17. Dez 18:50 /dev/sr0 # ls -ld /dev/pktcdvd/sr0 brw-rw---- 1 root cdrw 253, 0 17. Dez 18:51 /dev/pktcdvd/sr0 i.e. /dev/sr0 is mapped to /dev/pktcdvd/sr0 2.) mount dvd with filesystem: # mount /dev/sr0 /mnt/xchange 3.) copy some file to your location of choice. # cp /mnt/xchange/testfile1.bin /root/ 4.) and there is the error!
after removing the pktcdvd setup with # pktsetup -d 253:0 i can now mount the /dev/sr0 again and i can copy files without any problems from the dvd. the error seems to have to do with the paket cd/dvd setup for the device!
(Thomas, I'm sure Matthew and James are looking into this issue. Well, a bit of holiday rush these days... They'll get to it, no worries :) Matthew, as I understand this patch can't be simply backed out? Do you think Thomas can try some debugging of sr_done(), based on #2?
Reply-To: James.Bottomley@HansenPartnership.com On Tue, 2007-12-18 at 00:25 -0800, bugme-daemon@bugzilla.kernel.org wrote: > ------- Comment #8 from protasnb@gmail.com 2007-12-18 00:25 ------- > (Thomas, > I'm sure Matthew and James are looking into this issue. Well, a bit of > holiday > rush these days... They'll get to it, no worries :) > > Matthew, as I understand this patch can't be simply backed out? > Do you think Thomas can try some debugging of sr_done(), based on #2? Erm, actually no ... this is the first I've heard of it, except as a passing question from matthew. It's usually safe to assume if it's not on linux-scsi I haven't seen it. Do you have the subject line of the email thread (which should have been on linux-scsi but apparently isn't). lkml.org can't be reached currently so I'll need to look it up on marc.info. It would also have been really helpful to have some actual details in the bug report rather than references to mail threads. But, if you've identified pktcdvd as the likely culprit, that's most likely to be something Jens understands. To summarise, the commit in question (the remove scsi_done one) changed the way returning commands are processed in that they need to preserve req->cmd_type across their execution because it is used to select post processing. The speculation is that something in pktcdvd is violating this assumption. James
Reply-To: jens.axboe@oracle.com On Tue, Dec 18 2007, James Bottomley wrote: > > On Tue, 2007-12-18 at 00:25 -0800, bugme-daemon@bugzilla.kernel.org > wrote: > > ------- Comment #8 from protasnb@gmail.com 2007-12-18 00:25 ------- > > (Thomas, > > I'm sure Matthew and James are looking into this issue. Well, a bit of > holiday > > rush these days... They'll get to it, no worries :) > > > > Matthew, as I understand this patch can't be simply backed out? > > Do you think Thomas can try some debugging of sr_done(), based on #2? > > Erm, actually no ... this is the first I've heard of it, except as a > passing question from matthew. It's usually safe to assume if it's not > on linux-scsi I haven't seen it. > > Do you have the subject line of the email thread (which should have been > on linux-scsi but apparently isn't). lkml.org can't be reached > currently so I'll need to look it up on marc.info. > > It would also have been really helpful to have some actual details in > the bug report rather than references to mail threads. > > But, if you've identified pktcdvd as the likely culprit, that's most > likely to be something Jens understands. To summarise, the commit in > question (the remove scsi_done one) changed the way returning commands > are processed in that they need to preserve req->cmd_type across their > execution because it is used to select post processing. The speculation > is that something in pktcdvd is violating this assumption. pktcdvd doesn't change the cmd_type after the request has been issued. Where does this thread originate from?
It looks like the bug has come from Thomas Meyer, and the mail thread translates to: i get these errors in the kernel log while trying to copy a file from an iso9660 file system (/dev/sr0) to my intenal hard disk. This is the second cd/dvd that gives me this error. kernel 2.6.23 works without any problems, so i think this is not an medium error: ISO 9660 Extensions: Microsoft Joliet Level 3 ISOFS: changing to secondary root attempt to access beyond end of device sr0: rw=0, want=81220, limit=40944 Buffer I/O error on device sr0, logical block 20304 attempt to access beyond end of device sr0: rw=0, want=81224, limit=40944 Buffer I/O error on device sr0, logical block 20305 attempt to access beyond end of device sr0: rw=0, want=81228, limit=40944 Buffer I/O error on device sr0, logical block 20306 attempt to access beyond end of device sr0: rw=0, want=81232, limit=40944 Buffer I/O error on device sr0, logical block 20307 attempt to access beyond end of device sr0: rw=0, want=81236, limit=40944 Buffer I/O error on device sr0, logical block 20308 attempt to access beyond end of device sr0: rw=0, want=81240, limit=40944 Buffer I/O error on device sr0, logical block 20309 attempt to access beyond end of device sr0: rw=0, want=81244, limit=40944 Buffer I/O error on device sr0, logical block 20310 attempt to access beyond end of device sr0: rw=0, want=81248, limit=40944 Buffer I/O error on device sr0, logical block 20311 attempt to access beyond end of device sr0: rw=0, want=81220, limit=40944 Buffer I/O error on device sr0, logical block 20304 attempt to access beyond end of device sr0: rw=0, want=81224, limit=40944 Buffer I/O error on device sr0, logical block 20305 attempt to access beyond end of device sr0: rw=0, want=1028968, limit=40944 Buffer I/O error on device sr0, logical block 257241 attempt to access beyond end of device sr0: rw=0, want=1028972, limit=40944 Buffer I/O error on device sr0, logical block 257242 attempt to access beyond end of device sr0: rw=0, want=1028976, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028980, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028984, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028988, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028992, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028996, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028968, limit=40944 attempt to access beyond end of device sr0: rw=0, want=1028972, limit=40944 does anybody have similiar problems with current linus kernel? mfg thomas Further there is a commit identified that Ingo mentioned, reverting which fixes the problem: i now discovered (using gitk and a lot of scolling) that above commit > is after v2.6.23 but before v2.6.24-rc1 (so it is a possible offender) > > You see i am just too lazy to bisect (and reboot 20th times or so) and > i just wanted to try my luck and revert the last > commit and see if this fixes my problem. > > By the way: Reverting commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d > makes the same cd medium readable again on v2.6.24-rc2-409-g9418d5d.
Reverted in commit ac40532ef0b8649e6f7f83859ea0de1c4ed08a19.