Bug 8926 - (ata_piix) Repeated CDROM device timeouts when no disk in tray
Summary: (ata_piix) Repeated CDROM device timeouts when no disk in tray
Status: CLOSED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-08-23 02:05 UTC by D. Angelis
Modified: 2008-02-07 00:23 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.22.2-42.fc6
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
/sbin/lspci -nnvvvxxx outout (35.38 KB, text/plain)
2007-08-23 02:09 UTC, D. Angelis
Details
dmesg log of all ata2.01 timeouts. (5.17 KB, text/plain)
2007-08-23 02:20 UTC, D. Angelis
Details
Patch for sr_media_change() (1.63 KB, patch)
2007-08-27 19:49 UTC, Albert Lee
Details | Diff
Trace for commands sent after the debug patch (175.25 KB, text/plain)
2007-08-28 20:16 UTC, Albert Lee
Details
sr_media_change: avoid READ TOC/CAPACITY if no media present (5.09 KB, patch)
2007-09-05 00:10 UTC, Albert Lee
Details | Diff
Patch to issue GET_EVENT_STATUS_NOTIFICATION (2.40 KB, patch)
2007-09-06 03:51 UTC, Albert Lee
Details | Diff

Description D. Angelis 2007-08-23 02:05:11 UTC
Most recent kernel where this bug did not occur: 2.6.20-1.2962.fc6
Distribution: FC6
Hardware Environment: 
Clevo 570U laptop
CPU = Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz
SATA Controler = ICH7M
Hard Disk= ata1.00 : ATA-7: ST9100824AS, 3.04, max UDMA/133
CDROM= ata2.01 : ATAPI: TSSTcorpCD/DVDW TS-L632D, TI03, max UDMA/33
Software Environment:
Problem Description: With kernels 2.6.22.1 and newer I get repeated timeout messages for the CDROM device like the following :
 ata2.01: qc timeout (cmd 0xa0)
 ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
 ata2.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0 
 ata2: port is slow to respond, please be patient (Status 0xd0)
 ata2: device not ready (errno=-16), forcing hardreset
 ata2: soft resetting port
 ata2.01: configured for UDMA/33
 ata2: EH complete

It only happens when there is no disk in the CD tray.

I will attach full dmesg log and lspci -nnvvvxxxx

Steps to reproduce:
Boot any kernel newer than 2.6.22.1 and just keep the CD tray empty.
Comment 1 D. Angelis 2007-08-23 02:09:49 UTC
Created attachment 12496 [details]
 /sbin/lspci -nnvvvxxx outout
Comment 2 D. Angelis 2007-08-23 02:20:14 UTC
Created attachment 12497 [details]
dmesg log of all ata2.01 timeouts.


There was no CDROM activity since boot (No insert/eject of any media in CD tray)
Comment 3 Albert Lee 2007-08-23 19:33:19 UTC
This bug 8926 looks like yet another TSSTcorp TS-L632D firmware problem similar to bug 8316, bug 8856 and bug 8904. Currently there is little we can do from libata. We are trying to have a workaround tip on linux-ata.org.

==============================================
Known hardware compatibility issue of TS-L632D:

Problem description:
There is a known hardware issue when libata is used with certain TSSTcorp TS-L623D drives. The user might experience random system freezes for a few minutes periodically. The problem mostly occurs on Acer and Asus laptop computers.

Root cause:
  It seems that the firmware of the TS-L623D stops responding after being continuously polled by hald-addon-storage.

Affected firmware versions:
  Acer:    All firmware versions including AC00 and AC01
  Asus:    All firmware versions including AS05 and AS99
  Clevo:   TI03 and possibly other versions
  HP:      HH15 and possibly other versions
  Samsung: SC02

Good firmware versions:
  Samsung SC03

Workaround solutions:
  a. keep a CD in the TS-L632D drive or
  b. kill the hald-addon-storage polling process or
  c. cross-flash the drive firmware of TS-L632D to the SC03 version.

How to cross-flash:
(Be warned that you are at your own risk to cross-flash the firmware of the drive. Also it might void the warranty.)
1. Download the Samsung firmware update utility
  (http://www.samsungodd.com/KorLib/File/sfdnwin.exe)
2. Download the Samsung SC03 firmware
  (http://www.samsungodd.com/korlib/download.asp?no=&fname=200702081154530571_TS-L632D_SC03.BIN&path=/UploadFiles/FW/FWDOWNLOAD/KOR/)
3. Run "sfdnwin -nocheck" to crossflash.
Comment 4 D. Angelis 2007-08-24 10:35:56 UTC
Thanks a lot for the pointers.

I have filed this report following a suggestion by Tejun just to let you know that I am willing to help you fix this problem.

I can live with the problem since I am not experience any system freezes.

I will not cross-flash the firmware in order to have a hardware for testing any patches you come up with in the future.

Just as a final remark I can assure you that the problem was not existing/appearing in previous kernels (2.6.20 series) witch I was using for months.
Comment 5 Albert Lee 2007-08-27 19:49:01 UTC
Created attachment 12580 [details]
Patch for sr_media_change()

This patch modifies sr_media_change() such that it won't call sr_cd_check() and get_sectorsize() if the device is not ready.
Comment 6 Albert Lee 2007-08-27 19:56:38 UTC
Hi,

Could you please give the patch a try and see if the timeout still happens, thanks.
Comment 7 Albert Lee 2007-08-27 20:33:24 UTC
BTW, I've looked at the commands issued to the drive after the patch. There are three commands sent to the drive per second:

1. the drive got TEST_UNIT_READY from sr_media_change() on open.
2. the drive got another TEST_UNIT_READY from sr_drive_status() on ioctl.
3. later the drive got ALLOW_MEDIUM_REMOVAL from sd_release() on close().

The behavior of TS-L632D under such command sequence needs testing, especially the effect of ALLOW_MEDIUM_REMOVAL if no disc in the drive.
Comment 8 Alan 2007-08-28 04:06:08 UTC
3 a second is nuts, current HAL polls once every 5-10 seconds or so. Is this a box with an ancient crap version of Gnome ?
Comment 9 Albert Lee 2007-08-28 20:16:29 UTC
Created attachment 12588 [details]
Trace for commands sent after the debug patch

> 3 a second is nuts, current HAL polls once every 5-10 seconds or so.
> Is this a box with an ancient crap version of Gnome ?

The box is running the old Fedora Core 4. The commands look coming from hald-addon-storage not Gnome. After killing hald-addon-storage, the commands are stopped. BTW, I was wrong in the previous comment. It's 3 command every 2 seconds, not per second. Detailed trace attached for your reference.

For those 3 commands:
1. the TEST_UNIT_READY on open() looks necessary for medium change detection.
2. the other TEST_UNIT_READY seems coming from hald-addon-storage by ioctl.
3. the final ALLOW_MEDIUM_REMOVAL on close() looks unnecessary. Maybe when closing the device, we can check if medium present and only send ALLOW_MEDIUM_REMOVAL if medium is there in the drive.
Comment 10 Albert Lee 2007-08-28 20:32:35 UTC
The 3rd command, ALLOW_MEDIUM_REMOVAL (0x1e), comes from:

sys_close()
  => sr_block_release()
     => cdrom_release()
        => cdo->lock_door(cdi, 0)
           => sr_lock_door()
              => scsi_set_medium_removal()
Comment 11 Alan 2007-08-29 03:45:27 UTC
FC4 as an old Gnome (and its correspondingly old HAL). The newer HAL is a bit more civilised (5 seconds or so) but still send stoo many commands. Killing the bogus ALLOW_MEDIUM_REMOVAL might be good too as that seems to make some drives drop out of power saving
Comment 12 Albert Lee 2007-09-05 00:10:29 UTC
Created attachment 12708 [details]
sr_media_change: avoid READ TOC/CAPACITY if no media present

(Patch revised per http://marc.info/?l=linux-kernel&m=115412002912837&w=2)

Hi D. Angelis,

Could you please test if the attached patch fixed the TS-L632D timeout problem?
Thanks!
Comment 13 D. Angelis 2007-09-05 08:43:24 UTC
(In reply to comment #12)
> Created an attachment (id=12708) [details]
> sr_media_change: avoid READ TOC/CAPACITY if no media present
> 
> (Patch revised per http://marc.info/?l=linux-kernel&m=115412002912837&w=2)
> 
> Hi D. Angelis,
> 
> Could you please test if the attached patch fixed the TS-L632D timeout
> problem?
> Thanks!

I have already tested the previous patch and it was ok: No timeouts for a day.

Give me another day to test the new one.
> 
Comment 14 D. Angelis 2007-09-05 09:28:36 UTC
Oops no good !!
Here is the first timeout just 22 minutes after boot 2.6.22.6 with he new patch:

Sep  5 18:47:43 localhost kernel: ata2.01: ATAPI: TSSTcorpCD/DVDW TS-L632D, TI03, max UDMA/33

Sep  5 19:09:54 localhost kernel: ata2.01: qc timeout (cmd 0xa0)
Sep  5 19:09:54 localhost kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  5 19:09:54 localhost kernel: ata2.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0 
Sep  5 19:09:59 localhost kernel: ata2: port is slow to respond, please be patient (Status 0xd0)
Sep  5 19:10:04 localhost kernel: ata2: device not ready (errno=-16), forcing hardreset
Sep  5 19:10:04 localhost kernel: ata2: soft resetting port
Sep  5 19:10:05 localhost kernel: ata2.01: configured for UDMA/33
Sep  5 19:10:05 localhost kernel: ata2: EH complete
Comment 15 Albert Lee 2007-09-05 22:12:29 UTC
> Oops no good !!
> Here is the first timeout just 22 minutes after boot 2.6.22.6 with
> the new patch

Hmm, the timed out command is TEST_UNIT_READY (cdb 0x0). So, it looks like even if we only issues three commands (TUR, TUR and ALLOW_MEDIUM_REMOVAL), the timeout error still happens.

Does this bug happen if the old IDE subsystem is used to drive the CD-ROM?
Comment 16 Albert Lee 2007-09-05 22:18:49 UTC
BTW, the following is a comparison of commands sent by ide-cd v.s. libata + sr:

IDE + no disc (total 7 commands):
 TEST_UNIT_READY (followed by REQUEST_SENSE)
 TEST_UNIT_READY (followed by REQUEST_SENSE)
 READ_CAPACITY   (followed by REQUEST_SENSE)
 READ_TOC        (followed by REQUEST_SENSE)
 TEST_UNIT_READY (followed by REQUEST_SENSE)
 GET_EVENT_STATUS_NOTIFICATION
 ALLOW_MEDIUM_REMOVAL

libata + no disc (total 8 commands):
 TEST_UNIT_READY
 READ_TOC
 READ_TOC
 READ_CAPACITY
 READ_CAPACITY
 READ_CAPACITY
 TEST_UNIT_READY
 ALLOW_MEDIUM_REMOVAL

With disc, IDE and libata issues the same commands:
 TEST_UNIT_READY
 TEST_UNIT_READY
 TEST_UNIT_READY
 ALLOW_MEDIUM_REMOVAL
Comment 17 Albert Lee 2007-09-05 22:25:26 UTC
We're not sure if this problem happens with ide-cd or not yet. There seems to be similar timeout reports related to the old IDE on web. If IDE is immune, then maybe the bug is related to sr.c not issuing GET_EVENT_STATUS_NOTIFICATION to the drive?

Hi Angelis, could you please check if the problem happens with old IDE too? Thanks!
Comment 18 D. Angelis 2007-09-06 01:46:06 UTC
(In reply to comment #17)

> Hi Angelis, could you please check if the problem happens with old IDE too?
> Thanks!
> 

Well testing the old IDE is a bit of a hassle because the only way to get my CDROM detected by IDE is to boot from the CDROM and the system is very slow with old IDE too.

I will give it a try thought.
Comment 19 Albert Lee 2007-09-06 03:51:44 UTC
Created attachment 12731 [details]
Patch to issue GET_EVENT_STATUS_NOTIFICATION
Comment 20 Albert Lee 2007-09-06 03:54:01 UTC
> Well testing the old IDE is a bit of a hassle because the only way
> to get my CDROM detected by IDE is to boot from the CDROM and the
> system is very slow with old IDE too.

Ok. Maybe we can just test more with libata first.

Please try the attached "GET_EVENT_STATUS_NOTIFICATION" patch + the previous "sr_media_change: avoid READ TOC/CAPACITY" patch with libata and see if it makes any difference.

Thanks.
Comment 21 D. Angelis 2007-09-06 09:47:37 UTC
(In reply to comment #20)

> Please try the attached "GET_EVENT_STATUS_NOTIFICATION" patch + the previous
> "sr_media_change: avoid READ TOC/CAPACITY" patch with libata and see if it
> makes any difference.
> 
> Thanks.
> 

No good either.

Here is the first one (25 mins after boot): 

Sep  6 19:02:28 localhost kernel: ata2.01: ATAPI: TSSTcorpCD/DVDW TS-L632D, TI03, max UDMA/33

Sep  6 19:27:20 localhost kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep  6 19:27:20 localhost kernel: ata2.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0 
Sep  6 19:27:25 localhost kernel: ata2: port is slow to respond, please be patient (Status 0xd0)
Sep  6 19:27:30 localhost kernel: ata2: device not ready (errno=-16), forcing hardreset
Sep  6 19:27:30 localhost kernel: ata2: soft resetting port
Sep  6 19:27:30 localhost kernel: ata2.01: configured for UDMA/33
Sep  6 19:27:30 localhost kernel: ata2: EH complete
Comment 22 Albert Lee 2007-09-06 19:23:45 UTC
> No good either.
> ...
> Sep  6 19:27:20 localhost kernel: ata2.01: cmd
> a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0
>

Ok, TEST_UNIT_READY times out again. Adding GET_EVENT_STATUS_NOTIFICATION doesn't help. :(

I wonder what is the secret recipe to make this device work?
Comment 23 D. Angelis 2007-09-06 23:29:01 UTC
(In reply to comment #22)

> I wonder what is the secret recipe to make this device work?
> 

Let me remind you that the patch in http://bugzilla.kernel.org/attachment.cgi?id=12580&action=view
worked fine (see comment #13).
Comment 24 Jeff Garzik 2007-11-08 11:32:02 UTC
a patch from Alan just went in for this device, too (IVB fix)

commit 6bbfd53d47abd1fb20d7c93a9b19a75970b66f49
Comment 25 Natalie Protasevich 2008-02-07 00:23:10 UTC
Closing this bug, since the patch was committed.

Note You need to log in before you can comment on or make changes to this bug.