Bug 9370

Summary: v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device
Product: IO/Storage Reporter: Rafael J. Wysocki (rjwysocki)
Component: SCSIAssignee: Matthew Wilcox (matthew)
Status: CLOSED CODE_FIX    
Severity: normal CC: axboe, bunk, eatnumber1, jejb, matthew, mingo, protasnb, spock, thomas
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc2-409-g9418d5d Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    

Description Rafael J. Wysocki 2007-11-13 13:11:18 UTC
Subject         : v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device
Submitter       : Thomas Meyer <thomas@m3y3r.de>
References      : http://lkml.org/lkml/2007/11/13/250
Comment 1 Rafael J. Wysocki 2007-11-13 13:13:28 UTC
Probably caused by:

commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d
Author: Matthew Wilcox <matthew@wil.cx>
Date:   Tue Sep 25 12:42:04 2007 -0400

    [SCSI] Get rid of scsi_cmnd->done

http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6f5391c283d7fdcf24bf40786ea79061919d1e1d
Comment 2 Matthew Wilcox 2007-11-13 14:08:54 UTC
On Tue, Nov 13, 2007 at 01:13:29PM -0800, bugme-daemon@bugzilla.kernel.org wrote:
> Probably caused by:
> 
> commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d
> Author: Matthew Wilcox <matthew@wil.cx>
> Date:   Tue Sep 25 12:42:04 2007 -0400
> 
>     [SCSI] Get rid of scsi_cmnd->done
> 
>
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6f5391c283d7fdcf24bf40786ea79061919d1e1d

Since the reporter says that reverting this commit fixes the problem, it
must have caused the problem.  For the benefit of those on linux-scsi
who are just tuning into this report, the original mail is at
http://lkml.org/lkml/2007/11/13/250 and the report that reverting this
commit fixes it is a followup to it.

I've stared at the patch, trying to figure out what's wrong.  The only
thing which makes any sense to me is that a request comes in, tagged as
REQ_TYPE_FS.  Later, it becomes REQ_TYPE_BLOCK_PC.  This would have
caused us to call sr_done() before the patch, and after the patch, we
would not call sr_done().  Further eyes are welcome.

James has suggested that the problem may be that ide-scsi does just this.
Thomas, can you confirm whether or not you were using ide-scsi?
Comment 3 Thomas Meyer 2007-11-14 08:30:19 UTC
$ grep _IDE .config
# CONFIG_IDE is not set

$ udevinfo --path=/block/sr0 -a

Udevinfo starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

  looking at device '/block/sr0':
    KERNEL=="sr0"
    SUBSYSTEM=="block"
    DRIVER==""
    ATTR{dev}=="11:0"
    ATTR{range}=="1"
    ATTR{removable}=="1"
    ATTR{size}=="40944"
    ATTR{stat}=="      12       52      504     3336        0        0        0        0        0     3070     3336"
    ATTR{capability}=="19"

  looking at parent device '/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0/0:0:0:0':
    KERNELS=="0:0:0:0"
    SUBSYSTEMS=="scsi"
    DRIVERS=="sr"
    ATTRS{device_blocked}=="0"
    ATTRS{type}=="5"
    ATTRS{scsi_level}=="6"
    ATTRS{vendor}=="MATSHITA"
    ATTRS{model}=="DVD-R   UJ-857  "
    ATTRS{rev}=="HAEA"
    ATTRS{state}=="running"
    ATTRS{timeout}=="0"
    ATTRS{iocounterbits}=="32"
    ATTRS{iorequest_cnt}=="0x48"
    ATTRS{iodone_cnt}=="0x47"
    ATTRS{ioerr_cnt}=="0x0"
    ATTRS{modalias}=="scsi:t-0x05"
    ATTRS{evt_media_change}=="0"
    ATTRS{queue_depth}=="1"
    ATTRS{queue_type}=="none"

  looking at parent device '/devices/pci0000:00/0000:00:1f.1/host0/target0:0:0':
    KERNELS=="target0:0:0"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{uevent}==""

  looking at parent device '/devices/pci0000:00/0000:00:1f.1/host0':
    KERNELS=="host0"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{uevent}==""

  looking at parent device '/devices/pci0000:00/0000:00:1f.1':
    KERNELS=="0000:00:1f.1"
    SUBSYSTEMS=="pci"
    DRIVERS=="ata_piix"
    ATTRS{vendor}=="0x8086"
    ATTRS{device}=="0x27df"
    ATTRS{subsystem_vendor}=="0x8086"
    ATTRS{subsystem_device}=="0x7270"
    ATTRS{class}=="0x01018a"
    ATTRS{irq}=="18"
    ATTRS{local_cpus}=="3"
    ATTRS{modalias}=="pci:v00008086d000027DFsv00008086sd00007270bc01sc01i8a"
    ATTRS{broken_parity_status}=="0"
    ATTRS{msi_bus}==""

  looking at parent device '/devices/pci0000:00':
    KERNELS=="pci0000:00"
    SUBSYSTEMS==""
    DRIVERS==""
    ATTRS{uevent}==""
Comment 4 Thomas Meyer 2007-11-14 09:04:31 UTC
~/source/linux-2.6/drivers/block $ grep REQ_TY *
[schnipp]
pktcdvd.c:      rq->cmd_type = REQ_TYPE_BLOCK_PC;
[schnipp]

I am using the pktcdvd driver on /dev/sr0 resulting in /dev/pktcdvd/sr0
Maybe some bad interaction?
Comment 5 Ingo Molnar 2007-12-04 03:33:43 UTC
ping: requested debug info was posted. Guilty commit was identified. No response for more than 2 weeks and guilty commit is still not reverted upstream.
Comment 6 Thomas Meyer 2007-12-17 11:05:07 UTC
steps to reproduce this error:

1.) setup pktsetup mapping, similar to this:

# pktsetup -s
 0 : 253:0 -> 11:0

# ls -ld /dev/sr0
brw-rw---- 1 root cdrom 11, 0 17. Dez 18:50 /dev/sr0

# ls -ld /dev/pktcdvd/sr0
brw-rw---- 1 root cdrw 253, 0 17. Dez 18:51 /dev/pktcdvd/sr0

i.e. /dev/sr0 is mapped to /dev/pktcdvd/sr0

2.) mount dvd with filesystem:

# mount /dev/sr0 /mnt/xchange

3.) copy some file to your location of choice.
# cp /mnt/xchange/testfile1.bin /root/

4.) and there is the error!
Comment 7 Thomas Meyer 2007-12-17 11:08:08 UTC
after removing the pktcdvd setup with
# pktsetup -d 253:0

i can now mount the /dev/sr0 again and i can copy files without any problems from the dvd.

the error seems to have to do with the paket cd/dvd setup for the device!
Comment 8 Natalie Protasevich 2007-12-18 00:25:05 UTC
(Thomas,
I'm sure Matthew and James are looking into this issue. Well, a bit of holiday rush these days... They'll get to it, no worries :)

Matthew, as I understand this patch can't be simply backed out?
Do you think Thomas can try some debugging of sr_done(), based on #2? 
Comment 9 Anonymous Emailer 2007-12-18 07:10:17 UTC
Reply-To: James.Bottomley@HansenPartnership.com


On Tue, 2007-12-18 at 00:25 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> ------- Comment #8 from protasnb@gmail.com  2007-12-18 00:25 -------
> (Thomas,
> I'm sure Matthew and James are looking into this issue. Well, a bit of
> holiday
> rush these days... They'll get to it, no worries :)
> 
> Matthew, as I understand this patch can't be simply backed out?
> Do you think Thomas can try some debugging of sr_done(), based on #2? 

Erm, actually no ... this is the first I've heard of it, except as a
passing question from matthew.  It's usually safe to assume if it's not
on linux-scsi I haven't seen it.

Do you have the subject line of the email thread (which should have been
on linux-scsi but apparently isn't).  lkml.org can't be reached
currently so I'll need to look it up on marc.info.

It would also have been really helpful to have some actual details in
the bug report rather than references to mail threads.

But, if you've identified pktcdvd as the likely culprit, that's most
likely to be something Jens understands.  To summarise, the commit in
question (the remove scsi_done one) changed the way returning commands
are processed in that they need to preserve req->cmd_type across their
execution because it is used to select post processing.  The speculation
is that something in pktcdvd is violating this assumption.

James
Comment 10 Anonymous Emailer 2007-12-19 04:37:32 UTC
Reply-To: jens.axboe@oracle.com

On Tue, Dec 18 2007, James Bottomley wrote:
> 
> On Tue, 2007-12-18 at 00:25 -0800, bugme-daemon@bugzilla.kernel.org
> wrote:
> > ------- Comment #8 from protasnb@gmail.com  2007-12-18 00:25 -------
> > (Thomas,
> > I'm sure Matthew and James are looking into this issue. Well, a bit of
> holiday
> > rush these days... They'll get to it, no worries :)
> > 
> > Matthew, as I understand this patch can't be simply backed out?
> > Do you think Thomas can try some debugging of sr_done(), based on #2? 
> 
> Erm, actually no ... this is the first I've heard of it, except as a
> passing question from matthew.  It's usually safe to assume if it's not
> on linux-scsi I haven't seen it.
> 
> Do you have the subject line of the email thread (which should have been
> on linux-scsi but apparently isn't).  lkml.org can't be reached
> currently so I'll need to look it up on marc.info.
> 
> It would also have been really helpful to have some actual details in
> the bug report rather than references to mail threads.
> 
> But, if you've identified pktcdvd as the likely culprit, that's most
> likely to be something Jens understands.  To summarise, the commit in
> question (the remove scsi_done one) changed the way returning commands
> are processed in that they need to preserve req->cmd_type across their
> execution because it is used to select post processing.  The speculation
> is that something in pktcdvd is violating this assumption.

pktcdvd doesn't change the cmd_type after the request has been issued.
Where does this thread originate from?
Comment 11 Natalie Protasevich 2008-01-02 02:20:40 UTC
It looks like the bug has come from Thomas Meyer, and the mail thread translates to:

i get these errors in the kernel log while trying to copy a file from an
iso9660 file system (/dev/sr0) to my intenal hard disk. This is the
second cd/dvd that gives me this error. kernel 2.6.23 works without any
problems, so i think this is not an medium error:

ISO 9660 Extensions: Microsoft Joliet Level 3
ISOFS: changing to secondary root
attempt to access beyond end of device
sr0: rw=0, want=81220, limit=40944
Buffer I/O error on device sr0, logical block 20304
attempt to access beyond end of device
sr0: rw=0, want=81224, limit=40944
Buffer I/O error on device sr0, logical block 20305
attempt to access beyond end of device
sr0: rw=0, want=81228, limit=40944
Buffer I/O error on device sr0, logical block 20306
attempt to access beyond end of device
sr0: rw=0, want=81232, limit=40944
Buffer I/O error on device sr0, logical block 20307
attempt to access beyond end of device
sr0: rw=0, want=81236, limit=40944
Buffer I/O error on device sr0, logical block 20308
attempt to access beyond end of device
sr0: rw=0, want=81240, limit=40944
Buffer I/O error on device sr0, logical block 20309
attempt to access beyond end of device
sr0: rw=0, want=81244, limit=40944
Buffer I/O error on device sr0, logical block 20310
attempt to access beyond end of device
sr0: rw=0, want=81248, limit=40944
Buffer I/O error on device sr0, logical block 20311
attempt to access beyond end of device
sr0: rw=0, want=81220, limit=40944
Buffer I/O error on device sr0, logical block 20304
attempt to access beyond end of device
sr0: rw=0, want=81224, limit=40944
Buffer I/O error on device sr0, logical block 20305
attempt to access beyond end of device
sr0: rw=0, want=1028968, limit=40944
Buffer I/O error on device sr0, logical block 257241
attempt to access beyond end of device
sr0: rw=0, want=1028972, limit=40944
Buffer I/O error on device sr0, logical block 257242
attempt to access beyond end of device
sr0: rw=0, want=1028976, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028980, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028984, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028988, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028992, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028996, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028968, limit=40944
attempt to access beyond end of device
sr0: rw=0, want=1028972, limit=40944

does anybody have similiar problems with current linus kernel?

mfg
thomas

Further there is a commit identified that Ingo mentioned, reverting which fixes the problem:

 i now discovered (using gitk and a lot of scolling) that above commit 
> is after v2.6.23 but before v2.6.24-rc1 (so it is a possible offender)
> 
> You see i am just too lazy to bisect (and reboot 20th times or so) and 
> i just wanted to try my luck and revert the last
>  commit and see if this fixes my problem.
> 
> By the way: Reverting commit 6f5391c283d7fdcf24bf40786ea79061919d1e1d 
> makes the same cd medium readable again on v2.6.24-rc2-409-g9418d5d.
Comment 12 Adrian Bunk 2008-01-02 16:25:43 UTC
Reverted in commit ac40532ef0b8649e6f7f83859ea0de1c4ed08a19.