Most recent kernel where this bug did *NOT* occur: Distribution: Hardware Environment: Software Environment: Problem Description: Steps to reproduce:
Created attachment 10881 [details] dmesg from 20.03.2007
Created attachment 10882 [details] New dmesg output with the same 2.6.20.3 kernel with applied Albert's patches
Hi Yarema, Thanks a lot. But the part before [ 39.188609] ata4: protocol 5 task_state 1 [ 39.188614] ata4: host_stat 0x0 is truncated... Could you please 1. reboot the system into the kernel with the IDE driver (the kernel that has all drives working) 2. Run "cdrecord -scanbus", check which one is the AOpen 56X/AKH drive. 3. Run something like "cdrecord dev=3,0,0 -inq -vvVV" (Please change dev=3,0,0 to the real value returned by "cdrecord -scanbus" for the AOpen drive.) 4. Please attach the output of "cdrecord dev=?,0,0 -inq -vvVV" to the bugzilla. Thanks.
cdrecord -scanbus give me no output, but here is the output from "cdrecord --devices" : Beginning native device scan. This may take a while if devices are busy... Error trying to open /dev/hde exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hde exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hde exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hde exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hde exclusively (Device or resource busy)... giving up. Error trying to open /dev/hdf exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hdf exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hdf exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hdf exclusively (Device or resource busy)... retrying in 1 second. Error trying to open /dev/hdf exclusively (Device or resource busy)... giving up. wodim: Warning: controller returns wrong size for CD capabilities page. wodim: Overview of accessible drives (2 found) : ---------------------------------------------------------------------- 0 dev='/dev/hda' rwrw-- : 'HP' 'DVD Writer 200j' 1 dev='/dev/hdh' rwrw-- : 'E-IDE' 'CD-ROM 56X/AKH' ---------------------------------------------------------------------- Output from "cdrecord dev=/dev/hdh -inq -vvVV" is in attachment.
Created attachment 10887 [details] otput from "cdrecord dev=/dev/hdh -inq -vvVV"
The INQUIRY looks good. The drive said it's 36 bytes. I don't know why the SCSI subsystem ever sent INQUIRY of 254 bytes to the AOpen drive... ========= Executing 'inquiry' command on Bus 3 Target 1, Lun 0 timeout 40s CDB: 12 00 00 00 24 00 cmd finished after 0.000s timeout 40s Got 36 (0x24), expecting 36 (0x24) bytes of data. Received Data: 05 80 00 01 1F 00 00 00 45 2D 49 44 45 20 20 20 43 44 2D 52 4F 4D 20 35 36 58 2F 41 4B 48 20 20 41 38 30 20 Inquiry Data : 05 80 00 01 1F 00 00 00 45 2D 49 44 45 20 20 20 43 44 2D 52 4F 4D 20 35 36 58 2F 41 4B 48 20 20 41 38 30 20 Inquiry Data : ........E-IDE CD-ROM 56X/AKH A80
Created attachment 10888 [details] output from "cdrecord dev=/dev/hda -inq -vvVV"
For the HP drive, the INQUIRY looks good, too. Notice that the HP driver has longer INQUIRY (74 bytes) than the AOpen drive (32 bytes) =========== Executing 'inquiry' command on Bus 0 Target 0, Lun 0 timeout 40s CDB: 12 00 00 00 24 00 cmd finished after 0.001s timeout 40s Got 36 (0x24), expecting 36 (0x24) bytes of data. Received Data: 05 80 00 32 45 00 00 00 48 50 20 20 20 20 20 20 44 56 44 20 57 72 69 74 65 72 20 32 30 30 6A 20 31 2E 33 36 Inquiry Data : 05 80 00 32 45 00 00 00 48 50 20 20 20 20 20 20 44 56 44 20 57 72 69 74 65 72 20 32 30 30 6A 20 31 2E 33 36 Executing 'inquiry' command on Bus 0 Target 0, Lun 0 timeout 40s CDB: 12 00 00 00 4A 00 cmd finished after 0.001s timeout 40s Got 74 (0x4A), expecting 74 (0x4A) bytes of data. Received Data: 05 80 00 32 45 00 00 00 48 50 20 20 20 20 20 20 44 56 44 20 57 72 69 74 65 72 20 32 30 30 6A 20 31 2E 33 36 30 32 30 37 32 34 00 00 00 00 43 4E 32 42 36 33 32 33 37 4A FF FF 52 49 43 44 41 33 33 31 31 30 34 38 34 20 00 00 Inquiry Data : 05 80 00 32 45 00 00 00 48 50 20 20 20 20 20 20 44 56 44 20 57 72 69 74 65 72 20 32 30 30 6A 20 31 2E 33 36 30 32 30 37 32 34 00 00 00 00 43 4E 32 42 36 33 32 33 37 4A FF FF 52 49 43 44 41 33 33 31 31 30 34 38 34 20 00 00 Inquiry Data : ...2E...HP DVD Writer 200j 1.36020724....CN2B63237J..RICDA33110484 ..
So, the AOpen 56X/AKH looks fine. The question now is, why the SCSI midlayer send INQUIRY of "0xfe" length to the AOpen drive, like below? ========= [ 99.289088] CDB (4:0,1,0) 12 01 00 00 fe 00 00 00 00 [ 99.289144] ata4: protocol 5 task_state 4 (dev_stat 0x58) [ 99.289592] ata4: protocol 5 task_state 1 [ 99.289596] ata4: host_stat 0x4 [ 99.289601] ata4: protocol 5 task_state 1 (dev_stat 0x51) [ 99.289604] ata4: protocol 5 task_state 2 (dev_stat 0x51) [ 99.289607] ata4: protocol 5 task_state 3 (dev_stat 0x51) [ 99.289684] ata4: protocol 5 task_state 4 (dev_stat 0x58) [ 102.023103] ata4: protocol 5 task_state 1 [ 102.023114] ata4: host_stat 0x0 [ 104.367468] ata4: protocol 5 task_state 1 [ 104.367479] ata4: host_stat 0x0 [ 104.368073] ata4: protocol 5 task_state 1 [ 104.368077] ata4: host_stat 0x0 [ 104.368681] ata4: protocol 5 task_state 1 [ 104.368685] ata4: host_stat 0x0
Albert, here are my output of dmesg and messages before and after experiment with "rmmod ata_piix" and "modprobe ata_piix" (see below)
Created attachment 10900 [details] dmesg 21.03.2007 18.38 (before experiment)
Created attachment 10901 [details] messages 21.03.2007 18.38 (before experiment)
Created attachment 10902 [details] messages 21.03.2007 18.53 (after experiment)
Created attachment 10903 [details] dmesg 21.03.2007 18.55 (after experiment)
From the trace, SCSI mid-layer only issues one good INQUIRY at the beginning. I've also checked the scsi_scan.c source code, it also looks good. So, it looks like the bad INQUIRY command "CDB (4:0,1,0) 12 01 00 00 fe 00 00 00 00" is coming from the user space, not the SCSI mid-layer. Two problems together caused this bug: 1. Ubuntu Linux issues an incorrect INQUIRY command to the drive. (Other distros seem to have the INQUIRY correct.) 2. The incorrect INQUIRY happens to cause the AOpen drive frozen. (The HP drive is immune from the incorrect INQUIRY command.)
We have two possible solutions here: 1. Fix Ubuntu, such that the incorrect INQUIRY is fixed. 2. Patch kernel, such that the AOpen drives are blacklisted. Each INQUIRY is inspected for the blacklisted drives. If the INQUIRY looks wrong, the INQUIRY is rejected. I guess 1. is the preferred solution. Will ask Jeff and others advice.
Thank you very much!!! I posted this bug to kubuntu forum and sent it also to ubuntu-developers list, we'll see what developers say about it.
(Adding Alan to cc list.) Hi Yarema, It seems the INQUIRY comes from the "scsi_id" utility. Could you please attach "/etc/scsi_id.config" to bugzilla, thanks.
Hm, but I don't have it, this file I have on my ubuntu box in /usr/share/doc/udev/examples/scsi_id.config, so my box doesn't use it.
Hmm, Could you please try 1. mv /sbin/scsi_id /sbin/scsi_id.ori 2. rmmod ata_piix; modprobe ata_piix and see if the "qc time out" problem persists?
Well, in ubuntu it is /lib/udev/scsi_id, I changed it to /lib/udev/scsi_id.orig, and after "rmmod ata_piix; modprobe ata_piix" I've got this: [ 501.777352] ata1.00: disabled [ 501.840307] ata_piix 0000:00:1f.1: version 2.10ac1 [ 501.840772] PCI: Setting latency timer of device 0000:00:1f.1 to 64 [ 501.841124] ata5: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001f000 irq 14 [ 501.860461] ata6: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001f008 irq 15 [ 501.860926] scsi4 : ata_piix [ 502.177256] ata5.00: ATAPI, max UDMA/33 [ 502.341037] ata5.00: configured for UDMA/33 [ 502.341165] scsi5 : ata_piix [ 502.498977] scsi 4:0:0:0: CD-ROM HP DVD Writer 200j 1.36 PQ: 0 ANSI: 5 [ 502.503937] sr0: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray [ 502.504499] sr 4:0:0:0: Attached scsi CD-ROM sr0 [ 502.505038] sr 4:0:0:0: Attached scsi generic sg0 type 5 Note, there nothing about Aopen at all! Aopen cdrom attached to Promise adapter as it was before in the previous tests.
> Note, there nothing about Aopen at all! Aopen cdrom attached to > Promise adapter as it was before in the previous tests. Ah, sorry I forgot the AOpen is still connected to Promise. But we cannot rmmod pata_pdc2027x since it's where the root hard drive connected to. So, may need a reboot when convient.
Created attachment 10921 [details] dmesg scsi_id removed I rebooted comp with scsi_id removed. It looks like nothing changed. I attached dmesg with output. Kernel is without your patches. BTW, standard scsi_id.config in ubuntu has only this: # # scsi_id configuration # # lower or upper case has no affect on the left side. Quotes (") are # required for spaces in values. Model is the same as the SCSI # INQUIRY product identification field. Per the SCSI INQUIRY, the vendor # is limited to 8 bytes, model to 16 bytes. # # The first matching line found is used. Short matches match longer ones, # if you do not want such a match space fill the extra bytes. If no model # is specified, only the vendor string need match. # # options=<any scsi_id command line options> # vendor=string[,model=string],options=<per-device scsi_id options> # some libata drives require vpd page 0x80 vendor="ATA",options=-p 0x80 Yarema
Ok, maybe the device id probe is done by hald on your system, not scsi_id? (So, we can mv scsi_id.ori back to scsi_id now.) I've no idea about how to prevent hald from sending INQUIRY... I will try to make a debug patch to filter out the INQUIRY with EVPD first for your test.
Hi Albert, I think you are right. Every time I do modprobe ata_piix I see (ps xuaw) that hald polling /dev/scd1 (cdrom drive): 107 4868 0.0 0.1 2096 900 ? S 10:37 0:00 hald-addon-storage: polling /dev/scd1 After that I can use /dev/scd1. Yarema
Created attachment 10922 [details] Debug patch to filter out INQUIRY with EVPD = 1 Hi Yarema, Please try this patch and check if it makes the "qc timeout" go away.
Created attachment 10923 [details] dmesg from kernel with scsi patch It is hard to believe, but it works! What did that patch? I attached dmesg. One thing I noticed that start k3b takes long time because of hald polling. ps xuaw | grep hald 107 4330 0.6 1.6 10412 8712 ? Ss 15:04 0:03 /usr/sbin/hald root 4331 0.0 0.1 2876 1016 ? S 15:04 0:00 hald-runner 107 4342 0.0 0.1 2100 892 ? S 15:04 0:00 hald-addon-keyboard: listening on /dev/input/event1 107 4370 0.0 0.1 2104 900 ? S 15:04 0:00 hald-addon-storage: polling /dev/scd0 107 4372 0.0 0.1 2104 900 ? D 15:04 0:00 hald-addon-storage: polling /dev/scd1 Thanks, Yarema
Life is not so easy. This is happened when I tried to mount cdrom: [ 361.792873] ata4.01: qc timeout (cmd 0xa0) [ 361.792890] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 361.792902] ata4.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x46 data 32 in [ 361.792904] res 51/51:03:00:0c:00/00:00:00:00:00/b0 Emask 0x5 (timeout) [ 368.782440] ata4: port is slow to respond, please be patient (Status 0xd8) [ 391.764019] ata4: port failed to respond (30 secs, Status 0xd8) [ 391.764033] ata4: soft resetting port [ 391.931178] ATA: abnormal status 0x7F on port 0xE08815DF [ 391.942565] ATA: abnormal status 0x7F on port 0xE08815DF [ 392.103558] ata4.01: revalidation failed (errno=-2) [ 392.103571] ata4: failed to recover some devices, retrying in 5 secs [ 397.100065] ata4: soft resetting port [ 397.267219] ATA: abnormal status 0x7F on port 0xE08815DF [ 397.278623] ATA: abnormal status 0x7F on port 0xE08815DF [ 397.603616] ata4.01: configured for MWDMA2 [ 397.603654] ata4: EH complete [ 427.558506] ata4.01: qc timeout (cmd 0xa0) [ 427.558522] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 427.558534] ata4.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x46 data 32 in [ 427.558536] res 51/51:03:00:00:02/00:00:00:00:00/b0 Emask 0x5 (timeout) [ 434.548060] ata4: port is slow to respond, please be patient (Status 0xd8) [ 457.529657] ata4: port failed to respond (30 secs, Status 0xd8) [ 457.529670] ata4: soft resetting port [ 457.696812] ATA: abnormal status 0x7F on port 0xE08815DF [ 457.708194] ATA: abnormal status 0x7F on port 0xE08815DF [ 457.869183] ata4.01: revalidation failed (errno=-2) [ 457.869195] ata4: failed to recover some devices, retrying in 5 secs [ 462.865694] ata4: soft resetting port [ 463.032849] ATA: abnormal status 0x7F on port 0xE08815DF [ 463.044229] ATA: abnormal status 0x7F on port 0xE08815DF [ 463.369250] ata4.01: configured for MWDMA2 [ 463.369288] ata4: EH complete Cdrom cannot mounted. Yarema
Created attachment 10924 [details] Debug patch that filters both 0x12 and 0x46 So, 0x46 also times out? Too bad. Please check if this new debug patch helps. Thanks.
Created attachment 10925 [details] dmesg after boot second scsi patch Boot was OK, dmesg is attached. I put cd into cdrom drive, dmesg otput is here: [ 294.820052] ata4.01: qc timeout (cmd 0xa0) [ 294.820070] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 294.820082] ata4.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x51 data 32 in [ 294.820084] res 51/51:03:00:00:08/00:00:00:00:00/b0 Emask 0x5 (timeout) [ 301.809610] ata4: port is slow to respond, please be patient (Status 0xd8) [ 324.791245] ata4: port failed to respond (30 secs, Status 0xd8) [ 324.791260] ata4: soft resetting port [ 324.958391] ATA: abnormal status 0x7F on port 0xE08815DF [ 324.969852] ATA: abnormal status 0x7F on port 0xE08815DF [ 325.130746] ata4.01: revalidation failed (errno=-2) [ 325.130757] ata4: failed to recover some devices, retrying in 5 secs [ 330.127244] ata4: soft resetting port [ 330.294428] ATA: abnormal status 0x7F on port 0xE08815DF [ 330.305872] ATA: abnormal status 0x7F on port 0xE08815DF [ 330.630839] ata4.01: configured for MWDMA2 [ 330.630894] ata4: EH completeBTW, what does that mean? Immediately after that I tried to mount cdrom: [ 528.047522] ISO 9660 Extensions: Microsoft Joliet Level 3 [ 528.048443] ISOFS: changing to secondary root Cdrom is mounted! And I even can read data on it. But hald still polling: hald-addon-storage: polling /dev/scd1. I tried also k3b, k3b tried to get info about cd drives by means of hald: [ 681.633498] ata4.01: qc timeout (cmd 0xa0) [ 681.633516] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 681.633528] ata4.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x5a data 8 in [ 681.633530] res 51/51:03:00:24:00/00:00:00:00:00/b0 Emask 0x5 (timeout) [ 688.623052] ata4: port is slow to respond, please be patient (Status 0xd8) [ 711.604630] ata4: port failed to respond (30 secs, Status 0xd8) [ 711.604643] ata4: soft resetting port [ 711.771791] ATA: abnormal status 0x7F on port 0xE08815DF [ 711.783200] ATA: abnormal status 0x7F on port 0xE08815DF [ 711.944183] ata4.01: revalidation failed (errno=-2) [ 711.944196] ata4: failed to recover some devices, retrying in 5 secs [ 716.940703] ata4: soft resetting port [ 717.107826] ATA: abnormal status 0x7F on port 0xE08815DF [ 717.119220] ATA: abnormal status 0x7F on port 0xE08815DF [ 717.444226] ata4.01: configured for MWDMA2 [ 717.444275] ata4: EH complete [ 747.427071] ata4.01: qc timeout (cmd 0xa0) [ 747.427089] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 747.427100] ata4.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x5a data 8 in [ 747.427102] res 51/51:03:00:1c:00/00:00:00:00:00/b0 Emask 0x5 (timeout) [ 754.416641] ata4: port is slow to respond, please be patient (Status 0xd8) [ 777.398223] ata4: port failed to respond (30 secs, Status 0xd8) [ 777.398237] ata4: soft resetting port [ 777.565383] ATA: abnormal status 0x7F on port 0xE08815DF [ 777.576766] ATA: abnormal status 0x7F on port 0xE08815DF [ 777.737749] ata4.01: revalidation failed (errno=-2) [ 777.737761] ata4: failed to recover some devices, retrying in 5 secs [ 782.734265] ata4: soft resetting port [ 782.901419] ATA: abnormal status 0x7F on port 0xE08815DF [ 782.912808] ATA: abnormal status 0x7F on port 0xE08815DF [ 783.229866] ata4.01: configured for MWDMA2 [ 783.229917] ata4: EH complete [ 834.237242] ata4.01: qc timeout (cmd 0xa0) [ 834.237261] ata4.01: limiting speed to MWDMA1 [ 834.237266] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [ 834.237277] ata4.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x51 data 2 in [ 834.237279] res 51/51:03:00:08:00/00:00:00:00:00/b0 Emask 0x5 (timeout) [ 841.226788] ata4: port is slow to respond, please be patient (Status 0xd8) [ 864.208379] ata4: port failed to respond (30 secs, Status 0xd8) [ 864.208392] ata4: soft resetting port [ 864.375538] ATA: abnormal status 0x7F on port 0xE08815DF [ 864.386916] ATA: abnormal status 0x7F on port 0xE08815DF [ 864.547909] ata4.01: failed to IDENTIFY (I/O error, err_mask=0x1) [ 864.547918] ata4.01: revalidation failed (errno=-5) [ 864.547928] ata4: failed to recover some devices, retrying in 5 secs [ 869.544433] ata4: soft resetting port [ 869.711572] ATA: abnormal status 0x7F on port 0xE08815DF [ 869.722951] ATA: abnormal status 0x7F on port 0xE08815DF [ 870.047997] ata4.01: configured for MWDMA1 [ 870.048043] ata4: EH complete Albert, I think you are very close to solution. I'm ready to try another patches. Regards, Yarema
Sigh, this drive is really broken. Updating the broken firmware maybe the best thing to do. I will ask AOpen if any firmware update available. > cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x5a ... > cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x51 > res 51/51:03:00:08:00/00:00:00:00:00/b0 Emask 0x5(timeout) So, it is not 0x12, 0x46, 0x5a and 0x51 that causes trouble. The dev_stat is 0x51; it seems the device returns "check condition" and timeout upon "Request Sense"... I will try to do a debug patch to blacklist 0x12, 0x46, 0x5a and 0x51 in the atapi request sense code path.
Created attachment 10946 [details] Patch to do REQUEST SENSE by polling PIO From previous trace, it seems the "qc timeout" happens during REQUEST SENSE. Maybe the drive just lost irq? Hi Yarema, Could you please try this patch alone (without previous INQUIRY/EVPD filter patches). Thanks.
Created attachment 10947 [details] dmesg from the proposed Albert's patch Hard to say... Same shit.
> Hard to say... Same shit. So polling REQUEST SENSE doesn't help here. I was hoping this simple fix can save us from command filtering in the hot path... too bad. Ok, let's go back to command filtering...
Created attachment 10957 [details] Debug patch to show actual CDB Hi Yarema, Could you please apply this debug patch and collect the "CDB" output. (This can help us to filter the command more precisely if we know what parameter inside the CDB caused the problem.)
Albert, where are you planning to put the filtering? In SCSI or libata? Do you think we can fix this by modifying userspace? Kernel itself doesn't have problem detecting and using the device and I'm not sure whether kernel should be filtering direct CDBs from userland. Thanks.
Hi Tejun, Currently just trying to narrow down the problematic command/parameter pairs. No plan for adding the filtering to the kernel. BTW, I've contacted AOpen, the support engineer (DarrenHsieh@aopen.com.tw) said this CD-956E is no longer in production and low chance AOpen would put in R&D resource into it. :(
I see. Thanks a lot for all the effort. Yeah, I don't expect AOpen (or any other vendor for that matter) to fix firmware of discontinued products.
CC'd scsi_id author Patrick Mansfield. Hello.
Summary of the current problem/status: 1. The hald/scsi_id issues a valid INQUIRY with EVPD = 1, page 0x00 2. The AOpen 56X/AKH drive returns "check condition" (status 0x51) 3. Libata EH issues "REQUEST SENSE" to the AOpen drive, but the AOpen drive times out. 4. Besides INQUIRY, the AOpen drive also has problem on - GET_CONFIGURATION (0x46) => Maybe from kernel space? not from hald/scsi_id? - READ_DISC_INFO (0x51) - MODE_SENSE_10 (0x5a) and - INQUIRY with EVPD=1 The time out seems also occurs on REQUEST SENSE, not on the commands itself. We're waiting for Yarema's CDB debug data for the actual parameters that causes the timeout. 5. Polling "REQUEST SENSE" doesn't help. => the timeout is not caused by lost irq.
To which kernel version I have to apply this patch? Mine is 2.6.20.3 downloaded from kernel.org. Patch is succeeded, but: Hunk #1 succeeded at 2384 (offset -44 lines). Hunk #2 succeeded at 2416 (offset -44 lines). Hunk #3 succeeded at 2477 (offset -44 lines). Next, I can't compile modules: CC [M] drivers/ata/libata-core.o CC [M] drivers/ata/libata-scsi.o drivers/ata/libata-scsi.c: In function
Created attachment 10958 [details] Debug patch to show CDB parameters Hi Yurema, Sorry, I mixed-up the 2.6.21-rc one. Please use this one for 2.6.20.3 instead, thanks.
Created attachment 10959 [details] Debug patch to show CDB parameters Please try this one. Sorry I messed it up.
Created attachment 10960 [details] dmesg from filtering patch Here is truncated (don't know why) dmesg, but with the useful info I believe. Yarema
Created attachment 10966 [details] one more dmesg with the latest patch
I have not been following scsi_id changes for a while, it is part of udev, post to linux-hotplug-devel@lists.sourceforge.net if you have any patches, changes or questions. There are so many dmesg attached, I don't know which one to look at. I'm also not up to date on scsi/ata work. Is this a SATA bridge to a cdrom? Those are problematic, we probably don't want to use scsi_id at all for IDE CDROM drives attached that way - the output is likely meaningless when it does work. I would try scsi_id from the command line, and see if it works or not and then update scsi_id.config or udev rules. It might be not be possible to properly black list just your device and not affect others, but that does not matter much since no one cares about id's from CDROM. Looks like you have vendor E-IDE product CD-ROM 56X/AKH. To debug easier, run scsi_id from the command line, and maybe try hdparm -w /dev/sr1 to reset it on hang. I don't have any systems SCSI/SATA CDROM, I'm not sure of the sysfs path for sr (argument to the '-s'). Like: scsi_id -g -s /block/sr1 And try this: scsi_id -g -p 0x80 -s /block/sr1 And then black list out completely (spaces and names might be wrong ...) in scsi_id.config: vendor="E-IDE",options="-b" Use page 0x80 in scsi_id.config: vendor="E-IDE",options="-p 0x80" A udev rule to ignore cdrom on SATA/ATA bridge or such might be possible. Hope that helps ...
To Patrick Mansfield: Well, everything is OK, but how can I do that, if I have sr1 in /dev/.udev/failed/block@sr1 only? Anyway thanks for help. Yarema
> Well, everything is OK, but how can I do that, if I have sr1 in > /dev/.udev/failed/block@sr1 only? What do you mean by "everything is OK"? Did you run scsi_id per the other comment? I don't know why udev puts stuff in the .failed directory, you probably need to look at (or post) the log messages not just dmesg. I see lots of dmesg output, but no log files. Last I knew, even if there is no /dev scsi_id should still work OK (when run as root), now it normally uses udev tmp dev $tempnod
Listen, I need argument as /dev/sr1, but I don't have it at all! Device is recognized, but udev or something else put it into the /dev/.udev/failed/block@sr1 and that's it. Consequently, how can I run commands mentioned by you if I don't have this device? And for sure I looked up into logs to find where my devices are. I can post here my logs if you want. Just tell my what you need. Regards, Yarema
> Listen, I need argument as /dev/sr1, but I don't have it at all! You did not answer the question: Did you run scsi_id per the other comment? Do you have /sys/block entries, like /sys/block/sr1? For example, on my laptop with a USB storage device attached: [root ~]# ls -ld /sys/block/sda/ drwxr-xr-x 6 root root 0 Mar 27 15:25 /sys/block/sda/ [root ~]# scsi_id -g -p 0x80 -s /block/sda SGeneric STORAGE DEVICE [root ~]# ls -l /dev/sda brw-r----- 1 root disk 8, 0 Mar 27 15:19 /dev/sda [root ~]# scsi_id -g -p 0x80 -s /block/sda SGeneric STORAGE DEVICE [root ~]# rm /dev/sda [root ~]# scsi_id -g -p 0x80 -s /block/sda SGeneric STORAGE DEVICE > Device is recognized, but udev or something else put it into the > /dev/.udev/failed/block@sr1 and that's it. Consequently, how can I run commands > mentioned by you if I don't have this device? AFAIUI, that means udev did not create a /dev entry, not that the kernel driver(s) failed to detect the device. > And for sure I looked up into logs > to find where my devices are. I can post here my logs if you want. Just tell my > what you need.
See: ls -l /sys/block/sr1/ total 0 -r--r--r-- 1 root root 4096 2007-03-28 12:03 dev lrwxrwxrwx 1 root root 0 2007-03-28 12:03 device -> ../../devices/pci0000:00/0000:00:1e.0/0000:02:01.0/host3/target3:0:1/3:0:1:0 drwxr-xr-x 2 root root 0 2007-03-28 12:03 holders drwxr-xr-x 3 root root 0 2007-03-28 12:03 queue -r--r--r-- 1 root root 4096 2007-03-28 12:03 range -r--r--r-- 1 root root 4096 2007-03-28 12:03 removable -r--r--r-- 1 root root 4096 2007-03-28 12:03 size drwxr-xr-x 2 root root 0 2007-03-28 12:03 slaves -r--r--r-- 1 root root 4096 2007-03-28 12:03 stat lrwxrwxrwx 1 root root 0 2007-03-28 12:03 subsystem -> ../../block --w------- 1 root root 4096 2007-03-28 12:03 uevent And I don't have sr1 in /dev/ directory. Do you think that it is udev problem? BTW, what linux do you have? Yarema
Dear all, AOpen kindly send me a 952E/AKH for test today. And I can reproduce Yarema's problem locally on my box: ata4.00: qc timeout (cmd 0xa0) ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata4.00: cmd a0/00:00:00:00:20/00:00:00:00:00/a0 tag 0 cdb 0x46 data 32 in res 51/51:03:00:0c:00/00:00:00:00:00/a0 Emask 0x5 (timeout) ata4: port is slow to respond, please be patient (Status 0xd8) ata4: port failed to respond (30 secs, Status 0xd8) ata4: soft resetting port I will test more with this drive to collect more info about the failing SCSI commands, etc.
Good news!!! Now I calmed down - my cdrom is not broken. Thank you very much, Albert! Yarema
Created attachment 10972 [details] Proposed patch to simulate REQUEST SENSE This patch simulate REQUEST SENSE for the AOpen drive, command 0x12, 0x46, 0x51 and 0x5a. Hi Yarema, Could you please try this patch (against clean 2.6.20.3). Please try k3b, etc. and see if any "qc timeout" in dmesg. We need to check whether any other commands (besides 0x12, 0x46, 0x51 and 0x5a) require such simulated request sense. Thanks.
Created attachment 10976 [details] dmesg from proposed simulation REQUEST SENSE It looks like everything is OK... I tried k3d and monitored dmesg and syslogs, and I haven't seen any "qc timeout". I attached here just in case my dmesg for more precisely review by specialists. If everything is OK can we expect that it would be implemented in the next kernel tree? Alan, thanks. Yarema
Created attachment 10977 [details] dmesg from proposed simulation REQUEST SENSE It looks like everything is OK... I tried k3d and monitored dmesg and syslogs, and I haven't seen any "qc timeout". I attached here just in case my dmesg for more precisely review by specialists. If everything is OK can we expect that it would be implemented in the next kernel tree? Albert, thanks. Yarema
Oh, man. That's just a weird patch. Well, I guess we'll need it to get the AOpen device to work on linux. Do you happen to know how the ide layer fares with it? Thanks. :-)
> Do you happen to know how the ide layer fares with it? No idea. Maybe just lucky or something? Now that I have the drive at hand, I will reboot to IDE and check it.
> I will reboot to IDE and check it. With the AOpen drive connected to the Promise adapter, IDE got "irq 10: nobody cared...". However, with the AOpen drive connected to the onboard ICH4, IDE works flawlessly with the AOpen drive. Even REQUEST SENSE works well on IDE: CDB 46 00 00 00 00 00 00 00 20 CDB 03 00 00 00 12 00 00 00 00 It turns out in libata, the "ap->ops->tf_read(ap, &tf)" in atapi_eh_request_sense() leaves some garbage in the taskfile and the AOpen drive doesn't like it. Patch to follow.
Created attachment 10989 [details] Patch to remove tf_read() from atapi_eh_request_sense(). Hi Yarema, Could you please also try this patch (against clean 2.6.20.3) on both Promise and onboard ICHx. Please check if anything like "irq xx: nobody cared" with this patch: Thanks. ====================== (e.g.) irq 10: nobody cared (try booting with the "irqpoll" option) [<c013d714>] __report_bad_irq+0x24/0x7f [<c013d990>] note_interrupt+0x221/0x24c [<c027a649>] usb_hcd_irq+0x26/0x4f [<c013cb7c>] handle_IRQ_event+0x25/0x4a [<c013e2f5>] handle_level_irq+0xe7/0xea [<c013e20e>] handle_level_irq+0x0/0xea [<c0104675>] do_IRQ+0x87/0xf2 [<c0103007>] common_interrupt+0x23/0x28 [<c011d20c>] __do_softirq+0x37/0x96 [<c010459b>] do_softirq+0x60/0xb3 [<c013e20e>] handle_level_irq+0x0/0xea [<c011d0ee>] irq_exit+0x38/0x3a [<c010467c>] do_IRQ+0x8e/0xf2 [<c031c153>] schedule+0x2dc/0x62e [<c0103007>] common_interrupt+0x23/0x28 [<c01019ef>] default_idle+0x0/0x39 [<c0101a19>] default_idle+0x2a/0x39 [<c0101a6c>] cpu_idle+0x44/0x70 [<c03e76d8>] start_kernel+0x28d/0x315 [<c03e71e8>] unknown_bootoption+0x0/0x263 ======================= handlers: [<e0846b9d>] (ata_interrupt+0x0/0x253 [libata]) [<c027a623>] (usb_hcd_irq+0x0/0x4f) [<c027a623>] (usb_hcd_irq+0x0/0x4f) Disabling IRQ #10
Yeah, reading TF before doing REQUEST_SENSE and writing it back have always been a bit suspicious. I never understood why it was done that way but kept the code there anyway. So, removing ->tf_read() call fixes the problem?
> removing ->tf_read() call fixes the problem? Yes. No more qc timeout on REQUEST_SENSE. But we still have the "irq 10: nobody cared ..." problem. :( I guess that is another problem. At least the qc time is gone.
Created attachment 10992 [details] dmesg I attached here my dmesg for analyzing. I didn't find there "nobody cared" for irq, maybe my drive is specific:-) Yarema
Created attachment 10993 [details] dmesg, cdrom attached to ide port Here cdrom attached to onboard ide port. Still no "irq nobody cared".
> I didn't find there "nobody cared" for irq, maybe my drive is specific:-) dmesg looks good. The "irq nobody cared" maybe due to faulty device, etc. on my box.
Created attachment 10999 [details] Proposed patch to clear task file before REQUEST SENSE It seems the purpose of ->tf_read() is reading the error register to initialize sense_buf. So, ->tf_read() is actually needed. Attached the revised patch. Hi Yarema, Could you please try this patch (against clean 2.6.20.3) on both Promise and onboard ICHx. Hopefully this is the last one. :)
Created attachment 11005 [details] dmesg: cdrom and dvd are on on cable attached to onboard ide port
Created attachment 11006 [details] dmasg - cdrom attached to Promise port, DVD - to onboard ide pord Looks like everything is all right. Yarema
Hi, I've already submitted a bug report to Ubuntu, but I'm wondering if the problem I'm experiencing is related with this thread. Could somebody check the following link and comment please? Thanks. https://launchpad.net/ubuntu/+bug/99018
Hi Yarema, Thanks. The dmesgs look good. I'm closing this bug now. (BTW, it seems your AOpen drive is configured to MWDMA2: [ 31.867388] ata2.01: ATAPI, max MWDMA2 [ 32.031146] ata2.01: configured for MWDMA2 Maybe you could try turn on UDMA/33 for the AOpen drive by placing the jumper to the left most position which is labeled "Turbo".)
> Hi, I've already submitted a bug report to Ubuntu, but I'm wondering > if the problem I'm experiencing is related with this thread. > Could somebody check the following link and comment please? Thanks. > > https://launchpad.net/ubuntu/+bug/99018 Hi Sonar, Your problem looks different. You have some trailing data with your drive: [ 4295.256000] ata2.00: 12 bytes trailing data But maybe it is worth a try. Could you please try the following patch agaist 2.6.21-rcX (http://marc.info/?l=linux-ide&m=117548454130046&w=2) and see if it fixes your problem. If not, please open a new bug and attach the full dmesg to that bug, thanks.
Albert and all, Thank you very much! Yarema
Hi Albert, thank you very much, that patch fixed my problem too! I'm not sure if this fixes the problem completely, but so far it seems perfect. To be exact, I installed the source package for kernel 2.6.20.3-ubuntu1, and manually applied the patch. Hopefully, this patch will be in the new kernels from now on, so I won't have to build a custom kernel again myself. Thanks again,
Created attachment 11098 [details] Complete dmesg output This is my complete dmesg output. The part on the very bottom, after the "gspca" message shows the dmesg during which the hang occurs.
Created attachment 11101 [details] lspci -vvv output
Created attachment 11103 [details] cdrecord dev=/dev/sr0 -inq -vvVV output
I am using Arch Linux, with kernel 2.6.20.6 with the patch - http://marc.info/?l=linux-ide&m=117548454130046&w=2 - but it does not solve the problem. I have an Acer TravelMate 3260 laptop. Symptoms are - periodically the PC hangs for about 30 seconds, and then all is back to normal. The complete dmesg output is attached, and the dmesg output during the hang are all the messages after the last "gspca" message. I am also attaching the lspci -vvv output. The CDROM is a TSSTCorp CD/DVDW TS-L632D. Also attached is the "cdrecord dev=/dev/sr0 -inq -vvVV" output. dmesg output - http://bugzilla.kernel.org/attachment.cgi?id=11098&action=view lspci output - http://bugzilla.kernel.org/attachment.cgi?id=11101&action=view cdrecord output - http://bugzilla.kernel.org/attachment.cgi?id=11103&action=view I would have opened a new bug, but the dmesg output is EXACTLY the same as output attached in the bug. Please tell me if any more info is needed, or if I should open a new bug.
> I am using Arch Linux, with kernel 2.6.20.6 with the patch - > http://marc.info/?l=linux-ide&m=117548454130046&w=2 - but it does not solve > the problem. Hmm, looks like another problem since the patch doesn't fix it. > I would have opened a new bug, but the dmesg output is EXACTLY the same as > output attached in the bug. Please tell me if any more info is needed, or > if I should open a new bug. The timeout could be caused by something like device not responding or lost irq, etc. Please open a new bug and attach the full dmesg, etc. to that bug, thanks.
Albert, here is the new bug that I have filed - http://bugzilla.kernel.org/show_bug.cgi?id=8316 Thanks ! :)