Bug 8926
Summary: | (ata_piix) Repeated CDROM device timeouts when no disk in tray | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | D. Angelis (dangelis) |
Component: | Serial ATA | Assignee: | Jeff Garzik (jgarzik) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | alan, albertcc, bunk, htejun, protasnb |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.2-42.fc6 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
/sbin/lspci -nnvvvxxx outout
dmesg log of all ata2.01 timeouts. Patch for sr_media_change() Trace for commands sent after the debug patch sr_media_change: avoid READ TOC/CAPACITY if no media present Patch to issue GET_EVENT_STATUS_NOTIFICATION |
Description
D. Angelis
2007-08-23 02:05:11 UTC
Created attachment 12496 [details]
/sbin/lspci -nnvvvxxx outout
Created attachment 12497 [details]
dmesg log of all ata2.01 timeouts.
There was no CDROM activity since boot (No insert/eject of any media in CD tray)
This bug 8926 looks like yet another TSSTcorp TS-L632D firmware problem similar to bug 8316, bug 8856 and bug 8904. Currently there is little we can do from libata. We are trying to have a workaround tip on linux-ata.org. ============================================== Known hardware compatibility issue of TS-L632D: Problem description: There is a known hardware issue when libata is used with certain TSSTcorp TS-L623D drives. The user might experience random system freezes for a few minutes periodically. The problem mostly occurs on Acer and Asus laptop computers. Root cause: It seems that the firmware of the TS-L623D stops responding after being continuously polled by hald-addon-storage. Affected firmware versions: Acer: All firmware versions including AC00 and AC01 Asus: All firmware versions including AS05 and AS99 Clevo: TI03 and possibly other versions HP: HH15 and possibly other versions Samsung: SC02 Good firmware versions: Samsung SC03 Workaround solutions: a. keep a CD in the TS-L632D drive or b. kill the hald-addon-storage polling process or c. cross-flash the drive firmware of TS-L632D to the SC03 version. How to cross-flash: (Be warned that you are at your own risk to cross-flash the firmware of the drive. Also it might void the warranty.) 1. Download the Samsung firmware update utility (http://www.samsungodd.com/KorLib/File/sfdnwin.exe) 2. Download the Samsung SC03 firmware (http://www.samsungodd.com/korlib/download.asp?no=&fname=200702081154530571_TS-L632D_SC03.BIN&path=/UploadFiles/FW/FWDOWNLOAD/KOR/) 3. Run "sfdnwin -nocheck" to crossflash. Thanks a lot for the pointers. I have filed this report following a suggestion by Tejun just to let you know that I am willing to help you fix this problem. I can live with the problem since I am not experience any system freezes. I will not cross-flash the firmware in order to have a hardware for testing any patches you come up with in the future. Just as a final remark I can assure you that the problem was not existing/appearing in previous kernels (2.6.20 series) witch I was using for months. Created attachment 12580 [details]
Patch for sr_media_change()
This patch modifies sr_media_change() such that it won't call sr_cd_check() and get_sectorsize() if the device is not ready.
Hi, Could you please give the patch a try and see if the timeout still happens, thanks. BTW, I've looked at the commands issued to the drive after the patch. There are three commands sent to the drive per second: 1. the drive got TEST_UNIT_READY from sr_media_change() on open. 2. the drive got another TEST_UNIT_READY from sr_drive_status() on ioctl. 3. later the drive got ALLOW_MEDIUM_REMOVAL from sd_release() on close(). The behavior of TS-L632D under such command sequence needs testing, especially the effect of ALLOW_MEDIUM_REMOVAL if no disc in the drive. 3 a second is nuts, current HAL polls once every 5-10 seconds or so. Is this a box with an ancient crap version of Gnome ? Created attachment 12588 [details] Trace for commands sent after the debug patch > 3 a second is nuts, current HAL polls once every 5-10 seconds or so. > Is this a box with an ancient crap version of Gnome ? The box is running the old Fedora Core 4. The commands look coming from hald-addon-storage not Gnome. After killing hald-addon-storage, the commands are stopped. BTW, I was wrong in the previous comment. It's 3 command every 2 seconds, not per second. Detailed trace attached for your reference. For those 3 commands: 1. the TEST_UNIT_READY on open() looks necessary for medium change detection. 2. the other TEST_UNIT_READY seems coming from hald-addon-storage by ioctl. 3. the final ALLOW_MEDIUM_REMOVAL on close() looks unnecessary. Maybe when closing the device, we can check if medium present and only send ALLOW_MEDIUM_REMOVAL if medium is there in the drive. The 3rd command, ALLOW_MEDIUM_REMOVAL (0x1e), comes from: sys_close() => sr_block_release() => cdrom_release() => cdo->lock_door(cdi, 0) => sr_lock_door() => scsi_set_medium_removal() FC4 as an old Gnome (and its correspondingly old HAL). The newer HAL is a bit more civilised (5 seconds or so) but still send stoo many commands. Killing the bogus ALLOW_MEDIUM_REMOVAL might be good too as that seems to make some drives drop out of power saving Created attachment 12708 [details] sr_media_change: avoid READ TOC/CAPACITY if no media present (Patch revised per http://marc.info/?l=linux-kernel&m=115412002912837&w=2) Hi D. Angelis, Could you please test if the attached patch fixed the TS-L632D timeout problem? Thanks! (In reply to comment #12) > Created an attachment (id=12708) [details] > sr_media_change: avoid READ TOC/CAPACITY if no media present > > (Patch revised per http://marc.info/?l=linux-kernel&m=115412002912837&w=2) > > Hi D. Angelis, > > Could you please test if the attached patch fixed the TS-L632D timeout > problem? > Thanks! I have already tested the previous patch and it was ok: No timeouts for a day. Give me another day to test the new one. > Oops no good !! Here is the first timeout just 22 minutes after boot 2.6.22.6 with he new patch: Sep 5 18:47:43 localhost kernel: ata2.01: ATAPI: TSSTcorpCD/DVDW TS-L632D, TI03, max UDMA/33 Sep 5 19:09:54 localhost kernel: ata2.01: qc timeout (cmd 0xa0) Sep 5 19:09:54 localhost kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Sep 5 19:09:54 localhost kernel: ata2.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0 Sep 5 19:09:59 localhost kernel: ata2: port is slow to respond, please be patient (Status 0xd0) Sep 5 19:10:04 localhost kernel: ata2: device not ready (errno=-16), forcing hardreset Sep 5 19:10:04 localhost kernel: ata2: soft resetting port Sep 5 19:10:05 localhost kernel: ata2.01: configured for UDMA/33 Sep 5 19:10:05 localhost kernel: ata2: EH complete > Oops no good !!
> Here is the first timeout just 22 minutes after boot 2.6.22.6 with
> the new patch
Hmm, the timed out command is TEST_UNIT_READY (cdb 0x0). So, it looks like even if we only issues three commands (TUR, TUR and ALLOW_MEDIUM_REMOVAL), the timeout error still happens.
Does this bug happen if the old IDE subsystem is used to drive the CD-ROM?
BTW, the following is a comparison of commands sent by ide-cd v.s. libata + sr: IDE + no disc (total 7 commands): TEST_UNIT_READY (followed by REQUEST_SENSE) TEST_UNIT_READY (followed by REQUEST_SENSE) READ_CAPACITY (followed by REQUEST_SENSE) READ_TOC (followed by REQUEST_SENSE) TEST_UNIT_READY (followed by REQUEST_SENSE) GET_EVENT_STATUS_NOTIFICATION ALLOW_MEDIUM_REMOVAL libata + no disc (total 8 commands): TEST_UNIT_READY READ_TOC READ_TOC READ_CAPACITY READ_CAPACITY READ_CAPACITY TEST_UNIT_READY ALLOW_MEDIUM_REMOVAL With disc, IDE and libata issues the same commands: TEST_UNIT_READY TEST_UNIT_READY TEST_UNIT_READY ALLOW_MEDIUM_REMOVAL We're not sure if this problem happens with ide-cd or not yet. There seems to be similar timeout reports related to the old IDE on web. If IDE is immune, then maybe the bug is related to sr.c not issuing GET_EVENT_STATUS_NOTIFICATION to the drive? Hi Angelis, could you please check if the problem happens with old IDE too? Thanks! (In reply to comment #17) > Hi Angelis, could you please check if the problem happens with old IDE too? > Thanks! > Well testing the old IDE is a bit of a hassle because the only way to get my CDROM detected by IDE is to boot from the CDROM and the system is very slow with old IDE too. I will give it a try thought. Created attachment 12731 [details]
Patch to issue GET_EVENT_STATUS_NOTIFICATION
> Well testing the old IDE is a bit of a hassle because the only way
> to get my CDROM detected by IDE is to boot from the CDROM and the
> system is very slow with old IDE too.
Ok. Maybe we can just test more with libata first.
Please try the attached "GET_EVENT_STATUS_NOTIFICATION" patch + the previous "sr_media_change: avoid READ TOC/CAPACITY" patch with libata and see if it makes any difference.
Thanks.
(In reply to comment #20) > Please try the attached "GET_EVENT_STATUS_NOTIFICATION" patch + the previous > "sr_media_change: avoid READ TOC/CAPACITY" patch with libata and see if it > makes any difference. > > Thanks. > No good either. Here is the first one (25 mins after boot): Sep 6 19:02:28 localhost kernel: ata2.01: ATAPI: TSSTcorpCD/DVDW TS-L632D, TI03, max UDMA/33 Sep 6 19:27:20 localhost kernel: ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Sep 6 19:27:20 localhost kernel: ata2.01: cmd a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0 Sep 6 19:27:25 localhost kernel: ata2: port is slow to respond, please be patient (Status 0xd0) Sep 6 19:27:30 localhost kernel: ata2: device not ready (errno=-16), forcing hardreset Sep 6 19:27:30 localhost kernel: ata2: soft resetting port Sep 6 19:27:30 localhost kernel: ata2.01: configured for UDMA/33 Sep 6 19:27:30 localhost kernel: ata2: EH complete > No good either.
> ...
> Sep 6 19:27:20 localhost kernel: ata2.01: cmd
> a0/00:00:00:00:20/00:00:00:00:00/b0 tag 0 cdb 0x0 data 0
>
Ok, TEST_UNIT_READY times out again. Adding GET_EVENT_STATUS_NOTIFICATION doesn't help. :(
I wonder what is the secret recipe to make this device work?
(In reply to comment #22) > I wonder what is the secret recipe to make this device work? > Let me remind you that the patch in http://bugzilla.kernel.org/attachment.cgi?id=12580&action=view worked fine (see comment #13). a patch from Alan just went in for this device, too (IVB fix) commit 6bbfd53d47abd1fb20d7c93a9b19a75970b66f49 Closing this bug, since the patch was committed. |