Latest working kernel version: unknown Earliest failing kernel version: 2.4 Distribution: all tried Hardware Environment: notebook "Fujitsu Siemens Amilo Xa-2528", output from `lspci -vvv` attached Software Environment: n/a Problem Description: I want to report a problem with certain CD/DVD drives / controllers, which is kernel-related as far as I can see. I came to face the problem on my new notebook (Fujitsu Siemens Amilo Xa-2528). Although I tried the installation CDs of different distributions (Gentoo, Ubuntu, DSL,...) with different kernel-versions including some of the 2.4 and 2.6 series all failed to boot. Here is part of the log: Mar 7 20:51:59 (none) Freeing unused kernel memory: 376k freed Mar 7 20:51:59 (none) ide-cd: cmd 0x5a timed out Mar 7 20:51:59 (none) hda: irq timeout: status=0x58 { DriveReady SeekComplete DataRequest } Mar 7 20:51:59 (none) ide: failed opcode was: unknown Mar 7 20:51:59 (none) hda: ATAPI CD-ROM drive, 0kB Cache Mar 7 20:51:59 (none) Uniform CD-ROM driver Revision: 3.20 Mar 7 20:51:59 (none) hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 7 20:51:59 (none) ide: failed opcode was: unknown Mar 7 20:51:59 (none) hda: drive not ready for command Mar 7 20:51:59 (none) hda: status error: status=0x58 { DriveReady SeekComplete DataRequest } Mar 7 20:51:59 (none) ide: failed opcode was: unknown Mar 7 20:51:59 (none) hda: drive not ready for command Last error messages are repetitiously outputted until the computer freezes / hangs. By googleing the error-message I found that it is not a problem of my computer alone. ( http://ubuntuforums.org/showthread.php?t=699244 , https://bugs.launchpad.net/ubuntu/+source/linux/+bug/182996 , http://forums.gentoo.org/viewtopic-t-666680.html?sid=36e4b5379601abe70a5f2fc5420e1765 , ...) As far as I can tell, the problem appears as soon as the CD/DVD drive is accessed, which is of cause done by almost any linux installation or "live" CD. So I tried a Gentoo-based boot-cd called "unattended gui" which does not mount the cd filesystem (btw: kernel config http://unattended-gui.sourceforge.net/wiki/index.php?title=Bootcd:kernel ) Once booted, I ran lspci telling me: 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev f1) 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller (rev f1) The drive inside my notebook is recognized as "hda: Optiarc DVD RW AD-7540A, ATAPI CD/DVD-ROM drive", but google tells me the problem (or similar) also appears on "PHILIPS DVDR1660P1", "BENQ DVD DD DW1620", "LILITE-ON DVD SOHD-16P6P9S", "RICOH CD-R/RW MP7040A" and other. The pre-installed Windows Vista can read and write DVDs. Some log (from the "unattended"...): http://www.winterfeld.de/Lars/amalio/lspci-vvv http://www.winterfeld.de/Lars/amalio/messages http://www.winterfeld.de/Lars/amalio/dmesg produced on this system: http://www.winterfeld.de/Lars/amalio/cpuinfo http://www.winterfeld.de/Lars/amalio/iomem http://www.winterfeld.de/Lars/amalio/ioports http://www.winterfeld.de/Lars/amalio/modules http://www.winterfeld.de/Lars/amalio/scsi http://www.winterfeld.de/Lars/amalio/uname-a Since I also program in C, I took a look at some ide*.c files where I found the error message, but I have no clue about how the kernel works. So, I would be really grateful for any suggestions and if more information is needed, then please just ask. Lars Steps to reproduce: boot problem...
Could you try some newer kernel (2.6.25 or 2.6.26-rc6), there has been a lot of fixes for ATA/ATAPI support since 2.6.24.
Created attachment 16766 [details] dmesg from 2.6.25-gentoo-r6 kernel
For 2.6.24 I found some config, which caused the error not to appear (see bottom of http://gentoo-wiki.com/HARDWARE_Fujitsu-Siemens_Amilo_Xa2528 ). I could even read from the drive. Now I'm using 2.6.25-gentoo-r6 and the error appears again (dmesg attached above) and the device is no longer present in /dev but the notebook boots (with a delay of 6 minutes or so). I'm still looking at the diff of the kernel-configs. Couldn't yet find anything, which could have caused the problem. (So it could be the fault of "lot of fixes" ;-) If someone could tell me some options which could be related to the problem, I would try the combinations of them and post the log-files.
Created attachment 16767 [details] config diff between 2.6.24-gentoo-r8 and 2.6.25-gentoo-r6
This actually seems to be a controller specific issue (quite surprising at that). Could you try the fix proposed for bug #11659 (as after the fact it seems to be same issue): http://bugzilla.kernel.org/attachment.cgi?id=18748&action=view
wanted to try it with 2.6.25.9, but include/linux/ide.h already contains a (1 << 26) flag: IDE_HFLAG_ABUSE_SET_DMA_MODE. all others up to 1<<31 are also used... because i do not want to break something else, i would try to store the "use-workaround"-flag somewhere else. what would you advice?
Hmm, the safest bet would be to define IDE_HFLAG_BROKEN_ALTSTATUS flag to some flag that is set unconditionally in amd74xx host driver -- i.e. adding: IDE_HFLAG_BROKEN_ALTSTATUS = IDE_HFLAG_PIO_NO_BLACKLIST, at the end of enum { } clause.
Any news on this?
i'll try it tomorrow. i'm going to post dmesg from the kernel with and without the patch. otherwise i'll do it on sunday, latest.
you are my hero! ;-) it worked. i tried it with the original source of linux-2.6.25.9 without the patch the described problem appears, with the patch (and nothing else changed) the kernel boots just fine, /dev/cdrom gets created and i can read files from it. i'm now going to attach some files (dmesg, maybe from other kernel versions, etc.) just for references... i think, afterwords you can mark this bug as closed or resolved... would like to see this in the next kernel version... :-)
Created attachment 18979 [details] working .config for 2.6.24-gentoo-r8
Created attachment 18980 [details] config for 2.6.25-gentoo-r9 for both patched and unpatched
Created attachment 18981 [details] config for 2.6.26-gentoo-r3 for both patched and unpatched
Created attachment 18982 [details] config for 2.6.27-gentoo-r4 for both patched and unpatched
Created attachment 18983 [details] dmesg from unpatched kernel 2.6.24-gentoo-r8
Created attachment 18984 [details] dmesg from unpatched kernel 2.6.25-gentoo-r9
Created attachment 18985 [details] dmesg from patched kernel 2.6.25-gentoo-r9
Created attachment 18986 [details] dmesg from unpatched kernel 2.6.26-gentoo-r3
Created attachment 18987 [details] dmesg from patched kernel 2.6.26-gentoo-r3
Note the "Clocksource tsc unstable" message from the patche kernel, presuably while probing ide0.
Created attachment 18988 [details] dmesg from unpatched kernel 2.6.27-gentoo-r4
Created attachment 18989 [details] dmesg from patched kernel 2.6.27-gentoo-r4
Created attachment 18990 [details] dmesg from unpatched original kernel 2.6.25.9
Created attachment 18991 [details] dmesg from patched original kernel 2.6.25.9
by looking through the dmesg output, you can see that the patch http://bugzilla.kernel.org/attachment.cgi?id=18748&action=view did it for me in all tested versions from 2.6.25 to 2.6.27. i patched the original kernel ("vanilla-sources" as its called in gentoo) and different sources which are patched by gentoo-developers. (main drivers part should be the same...) i patched drivers/ide/ide-iops.c, drivers/ide/ide-probe.c, drivers/ide/pci/amd74xx.c as described and include/linux/ide.h like this: IDE_HFLAG_NO_UNMASK_IRQS = (1 << 31), + /* AltStatus register can be unreliable*/ + IDE_HFLAG_BROKEN_ALTSTATUS = IDE_HFLAG_PIO_NO_BLACKLIST, }; what is that "Clocksource tsc unstable" about?
(In reply to comment #25) > what is that "Clocksource tsc unstable" about? Presumably very long delay with interrupts blocked.
Bart, looks like ide_wait_not_busy() needs to call clocksource_touch_watchdog().
Indeed, it seems that it's the same problem as here http://bugzilla.kernel.org/show_bug.cgi?id=11659
Lars: thanks for testing, I updated patch description accordingly. Sergei: seems so, could you make a patch for it?
(In reply to comment #29) > Lars: thanks for testing, I updated patch description accordingly. > Sergei: seems so, could you make a patch for it? I keep been overloaded for months already...
Sergei: I looked into it (under QEMU for now) and adding msleep(1000) + printk() to ide_init() causes the "Clocksource tsc unstable" to appear before IDE initialization so it seems that the issue is caused by some other kernel code and IDE is just unlucky to be initialized at same time as clocksource watchdog triggers.
(In reply to comment #31) > Sergei: I looked into it (under QEMU for now) and adding msleep(1000) + > printk() to ide_init() causes the "Clocksource tsc unstable" to appear before > IDE initialization so it seems that the issue is caused by some other kernel > code and IDE is just unlucky to be initialized at same time as clocksource > watchdog triggers. I guess it's caused by mspleep(1000) call itself. If ide_wait_not_ready() is touching the other watchdogs anyway, I don't see why it shouldn't touch the clocksource watchdog too.
(In reply to comment #32) > (In reply to comment #31) > > Sergei: I looked into it (under QEMU for now) and adding msleep(1000) + > > printk() to ide_init() causes the "Clocksource tsc unstable" to appear > before > > IDE initialization so it seems that the issue is caused by some other > kernel > > code and IDE is just unlucky to be initialized at same time as clocksource > > watchdog triggers. > I guess it's caused by mspleep(1000) call itself. Looks like it hasn't been a wild guess. The clocksource watchdog timer handler gets run every half second. :-)