Bug 10216
Description
lars.winterfeld
2008-03-10 10:38:05 UTC
Could you try some newer kernel (2.6.25 or 2.6.26-rc6), there has been a lot of fixes for ATA/ATAPI support since 2.6.24. Created attachment 16766 [details]
dmesg from 2.6.25-gentoo-r6 kernel
For 2.6.24 I found some config, which caused the error not to appear (see bottom of http://gentoo-wiki.com/HARDWARE_Fujitsu-Siemens_Amilo_Xa2528 ). I could even read from the drive. Now I'm using 2.6.25-gentoo-r6 and the error appears again (dmesg attached above) and the device is no longer present in /dev but the notebook boots (with a delay of 6 minutes or so). I'm still looking at the diff of the kernel-configs. Couldn't yet find anything, which could have caused the problem. (So it could be the fault of "lot of fixes" ;-) If someone could tell me some options which could be related to the problem, I would try the combinations of them and post the log-files. Created attachment 16767 [details]
config diff between 2.6.24-gentoo-r8 and 2.6.25-gentoo-r6
This actually seems to be a controller specific issue (quite surprising at that). Could you try the fix proposed for bug #11659 (as after the fact it seems to be same issue): http://bugzilla.kernel.org/attachment.cgi?id=18748&action=view wanted to try it with 2.6.25.9, but include/linux/ide.h already contains a (1 << 26) flag: IDE_HFLAG_ABUSE_SET_DMA_MODE. all others up to 1<<31 are also used... because i do not want to break something else, i would try to store the "use-workaround"-flag somewhere else. what would you advice? Hmm, the safest bet would be to define IDE_HFLAG_BROKEN_ALTSTATUS flag to some flag that is set unconditionally in amd74xx host driver -- i.e. adding: IDE_HFLAG_BROKEN_ALTSTATUS = IDE_HFLAG_PIO_NO_BLACKLIST, at the end of enum { } clause. Any news on this? i'll try it tomorrow. i'm going to post dmesg from the kernel with and without the patch. otherwise i'll do it on sunday, latest. you are my hero! ;-) it worked. i tried it with the original source of linux-2.6.25.9 without the patch the described problem appears, with the patch (and nothing else changed) the kernel boots just fine, /dev/cdrom gets created and i can read files from it. i'm now going to attach some files (dmesg, maybe from other kernel versions, etc.) just for references... i think, afterwords you can mark this bug as closed or resolved... would like to see this in the next kernel version... :-) Created attachment 18979 [details]
working .config for 2.6.24-gentoo-r8
Created attachment 18980 [details]
config for 2.6.25-gentoo-r9 for both patched and unpatched
Created attachment 18981 [details]
config for 2.6.26-gentoo-r3 for both patched and unpatched
Created attachment 18982 [details]
config for 2.6.27-gentoo-r4 for both patched and unpatched
Created attachment 18983 [details]
dmesg from unpatched kernel 2.6.24-gentoo-r8
Created attachment 18984 [details]
dmesg from unpatched kernel 2.6.25-gentoo-r9
Created attachment 18985 [details]
dmesg from patched kernel 2.6.25-gentoo-r9
Created attachment 18986 [details]
dmesg from unpatched kernel 2.6.26-gentoo-r3
Created attachment 18987 [details]
dmesg from patched kernel 2.6.26-gentoo-r3
Note the "Clocksource tsc unstable" message from the patche kernel, presuably while probing ide0. Created attachment 18988 [details]
dmesg from unpatched kernel 2.6.27-gentoo-r4
Created attachment 18989 [details]
dmesg from patched kernel 2.6.27-gentoo-r4
Created attachment 18990 [details]
dmesg from unpatched original kernel 2.6.25.9
Created attachment 18991 [details]
dmesg from patched original kernel 2.6.25.9
by looking through the dmesg output, you can see that the patch http://bugzilla.kernel.org/attachment.cgi?id=18748&action=view did it for me in all tested versions from 2.6.25 to 2.6.27. i patched the original kernel ("vanilla-sources" as its called in gentoo) and different sources which are patched by gentoo-developers. (main drivers part should be the same...) i patched drivers/ide/ide-iops.c, drivers/ide/ide-probe.c, drivers/ide/pci/amd74xx.c as described and include/linux/ide.h like this: IDE_HFLAG_NO_UNMASK_IRQS = (1 << 31), + /* AltStatus register can be unreliable*/ + IDE_HFLAG_BROKEN_ALTSTATUS = IDE_HFLAG_PIO_NO_BLACKLIST, }; what is that "Clocksource tsc unstable" about? (In reply to comment #25) > what is that "Clocksource tsc unstable" about? Presumably very long delay with interrupts blocked. Bart, looks like ide_wait_not_busy() needs to call clocksource_touch_watchdog(). Indeed, it seems that it's the same problem as here http://bugzilla.kernel.org/show_bug.cgi?id=11659 Lars: thanks for testing, I updated patch description accordingly. Sergei: seems so, could you make a patch for it? (In reply to comment #29) > Lars: thanks for testing, I updated patch description accordingly. > Sergei: seems so, could you make a patch for it? I keep been overloaded for months already... Sergei: I looked into it (under QEMU for now) and adding msleep(1000) + printk() to ide_init() causes the "Clocksource tsc unstable" to appear before IDE initialization so it seems that the issue is caused by some other kernel code and IDE is just unlucky to be initialized at same time as clocksource watchdog triggers. (In reply to comment #31) > Sergei: I looked into it (under QEMU for now) and adding msleep(1000) + > printk() to ide_init() causes the "Clocksource tsc unstable" to appear before > IDE initialization so it seems that the issue is caused by some other kernel > code and IDE is just unlucky to be initialized at same time as clocksource > watchdog triggers. I guess it's caused by mspleep(1000) call itself. If ide_wait_not_ready() is touching the other watchdogs anyway, I don't see why it shouldn't touch the clocksource watchdog too. (In reply to comment #32) > (In reply to comment #31) > > Sergei: I looked into it (under QEMU for now) and adding msleep(1000) + > > printk() to ide_init() causes the "Clocksource tsc unstable" to appear > before > > IDE initialization so it seems that the issue is caused by some other > kernel > > code and IDE is just unlucky to be initialized at same time as clocksource > > watchdog triggers. > I guess it's caused by mspleep(1000) call itself. Looks like it hasn't been a wild guess. The clocksource watchdog timer handler gets run every half second. :-) |