Bug 8169 - IDE "lost interrupt" messages on bootup
IDE "lost interrupt" messages on bootup
Status: CLOSED CODE_FIX
Product: IO/Storage
Classification: Unclassified
Component: IDE
i386 Linux
: P2 normal
Assigned To: Bartlomiej Zolnierkiewicz
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-10 15:37 UTC by Patrick Horn
Modified: 2007-03-22 10:51 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.20-2.6.21-rc3
Tree: Mainline
Regression: ---


Attachments
Output from dmesg on a bad kernel with DMA. (30.37 KB, text/plain)
2007-03-11 19:15 UTC, Patrick Horn
Details
.config from my git tree. (44.36 KB, text/plain)
2007-03-11 19:18 UTC, Patrick Horn
Details
don't enable DMA if CONFIG_IDEDMA_AUTO=n (536 bytes, patch)
2007-03-12 03:53 UTC, Bartlomiej Zolnierkiewicz
Details | Diff
.config for the old kernel. (43.94 KB, text/plain)
2007-03-12 16:51 UTC, Patrick Horn
Details
dmesg from 2.6.19.7 (25.55 KB, text/plain)
2007-03-12 16:54 UTC, Patrick Horn
Details
hdparm --Istdout /dev/hda /dev/hdb (2.52 KB, text/plain)
2007-03-12 16:54 UTC, Patrick Horn
Details
Clippings from five different boots to the kernel. (8.59 KB, text/plain)
2007-03-18 15:00 UTC, Patrick Horn
Details
fix locking for DMA enable/disable (1.89 KB, patch)
2007-03-21 03:08 UTC, Bartlomiej Zolnierkiewicz
Details | Diff

Description Patrick Horn 2007-03-10 15:37:42 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.18.1
Distribution: Debian Testing/etch
Hardware Environment: pentium4, Intel ICH5 IDE Controller
hda: Western Digital 200GB drive (using 48-bit LBA)
hdb: Maxtor 80GB

Software Environment: 

Problem Description:
It detects all drives correctly, however right when it starts populating udev
devices, it starts having trouble with the drives:
[   29.743553] hdb: set_drive_speed_status: status=0xd0 { Busy }
[   29.746934] ide: failed opcode was: unknown
[   39.716197] hda: lost interrupt
... message repeated at regular 30-second intervals until ...
[  249.205566] hda: lost interrupt
[  249.209282] hdb: cache flushes supported
[  249.817471] hda: set_drive_speed_status: status=0x58 { DriveReady
SeekComplete DataRequest }
[  249.820925] ide: failed opcode was: unknown
[  249.919031] hda: cache flushes supported

In addition, the last time I booted it up, the /lib/modules/2.6.21-rc3 directory
entry was misread (probably related to the lost interrupts), and that single
directory entry was corrupt, though everything returned to normal the next time
it booted, so it was just reading from IDE.

The system works completely normally after it gets past the "cache flushes
supported" except for that one time it couldn't load modules.

Steps to reproduce:
I only know that it is completely reproducible (tried 3 times).  It happens
after a page of entries, and always right after displaying "Waiting for /dev to
be fully populated", and ends a after the udev script shows "timeout".

I will try to test an intermediate kernel like 2.6.20 to see if it happens there.
Comment 1 Bartlomiej Zolnierkiewicz 2007-03-10 15:48:16 UTC
Thanks for the report!  Unfortunately I have no idea what it could be ATM so
please proceed with testing the intermediate kernels.
Comment 2 Patrick Horn 2007-03-10 23:17:00 UTC
I tested different kernels, and I found that 2.6.19.7 works perfectly fine, but
the bug starts happening on 2.6.20.

I also noticed, thanks to SysRq-W that the one task that was causing most of the
waiting is an instance of "hdparm".
In fact there is a file /etc/udev/hdparm.rules that runs hdparm.

Here are the settings from my hdparm.conf for both hda and hdb:
        io32_support = 1
        dma = on
        write_cache = on
        mult_sect_io = 16
Comment 3 Bartlomiej Zolnierkiewicz 2007-03-11 06:51:05 UTC
Thanks but I still have a problem with narrowing down the problem to a specific
patch.  Please try to use git bisect to isolate guilty change.

good practical example of using git bisect:
http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/
Comment 4 Patrick Horn 2007-03-11 17:27:33 UTC
git bisect tracked it down to this change:

44854add66811124a5667466245b6824f751143a is first bad commit
commit 44854add66811124a5667466245b6824f751143a
Author: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Date:   Fri Dec 29 16:49:26 2006 -0800

    [PATCH] PIIX/SLC90E66: PIO mode fallback fix

    The fallback to PIO mode in the hwif->dma_check() handler doesn't work in
    the Intel PIIX and SMsC SLC90E66 IDE drivers because:

    - config_drive_for_dma() calls the hwif->speedproc() handler with a wrong
      mode number (unbiased by XFER_PIO_0) in case of the PIO fallback;

    - hwif->tuneproc() handler doesn't really set the drive's own speed (this
      is not fixed as yet).

    Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
    Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
    Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

:040000 040000 f44e8fbf7c450deb36442182266a5960215ba803
be83326ac987971a1c557450cbd5199a11f55da7 M      drivers
Comment 5 Bartlomiej Zolnierkiewicz 2007-03-11 17:49:46 UTC
Thanks for tracking it down!  I'm adding Sergei to cc:

The bug is quite interesting because the code affected by the commit shouldn't
be (in theory) even executed on your system... could you send dmesg command
output and kernel config file?
Comment 6 Patrick Horn 2007-03-11 19:15:33 UTC
Created attachment 10702 [details]
Output from dmesg on a bad kernel with DMA.

This shows the output from dmesg when I only enabled DMA for hdb, not hda.

For some reason, doing this made it worse, and hdb was completely unreadable
until I manually disabled dma using hdparm.
Comment 7 Patrick Horn 2007-03-11 19:18:50 UTC
Created attachment 10703 [details]
.config from my git tree.

I used defaults (hit return) for all options after kernel 2.6.18.1.
Comment 8 Patrick Horn 2007-03-11 19:42:57 UTC
Here is the IDE item from "lspci -vvv":
----
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller
(rev 02) (prog-if 8a [Master SecP PriP])
        Subsystem: ABIT Computer Corp. Unknown device 1014
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Region 0: I/O ports at <unassigned>
        Region 1: I/O ports at <unassigned>
        Region 2: I/O ports at <unassigned>
        Region 3: I/O ports at <unassigned>
        Region 4: I/O ports at f000 [size=16]
        Region 5: Memory at 50000000 (32-bit, non-prefetchable) [size=1K]
----
My motherboard is an ABIT IC7-G
Comment 9 Bartlomiej Zolnierkiewicz 2007-03-12 03:50:54 UTC
Thanks.

I see that CONFIG_IDEDMA_PCI_AUTO is not set in the kernel config, that would
probably explain the issue.  My hypothesis is the following:

"The code responsible for enabling DMA has a bug which always causes DMA to be
enabled despite CONFIG_IDEDMA_PCI_AUTO=n (so the device/host is not programmed
for DMA) and the fact that DMA is explicitly disabled in ide_set_dma().  It is
so because "0" return value is incorrectly passed to ide.c:set_using_dma() from
ide-dma.c:ide_set_dma().

Before Sergei's fix if the device/host were programmed for DMA by BIOS/firmware
the bug went unnoticed.  However now that the piix driver was fixed to set host
and device for the PIO mode so enabling DMA bits results in DMA timeouts."

Could you try attached patch for 2.6.21-rc3?  It will make drive go in PIO but
this is expected.  Please also send dmesg/config for "good" 2.6.19. Oh and also
hdparm --Istdout from 2.6.19 would be helpful.

It seems that we need to remove CONFIG_IDEDMA_PCI_AUTO config option (not a
problem since =n case has been obsoleted ages ago and DMA can be
disabled/enabled manually later + all distros are running with =y anyway)
Comment 10 Bartlomiej Zolnierkiewicz 2007-03-12 03:53:15 UTC
Created attachment 10708 [details]
don't enable DMA if CONFIG_IDEDMA_AUTO=n
Comment 11 Sergei Shtylyov 2007-03-12 09:09:26 UTC
> I see that CONFIG_IDEDMA_PCI_AUTO is not set in the kernel config, that would
> probably explain the issue.  My hypothesis is the following:

> "The code responsible for enabling DMA has a bug which always causes DMA to be
> enabled despite CONFIG_IDEDMA_PCI_AUTO=n (so the device/host is not programmed
> for DMA) and the fact that DMA is explicitly disabled in ide_set_dma().  It is
> so because "0" return value is incorrectly passed to ide.c:set_using_dma() from
> ide-dma.c:ide_set_dma().

    Wait... that code went into 2.6.21-rc1 and the bug is against 2.6.20.

> Before Sergei's fix if the device/host were programmed for DMA by BIOS/firmware
> the bug went unnoticed.

    But my fix predates ide_set_dma() et al. and I'm not seeing a bug in the 
older code that you're fixing in the updated IDE core... wait, I got it: 
you're talking about the bug in dma_check() methods that you've fixed only in 
the updated code -- they return result of ide_dma_off_quietly() which leads 
set_using_dma() into thinking that DMA was enabled...

> However now that the piix driver was fixed to set host
> and device for the PIO mode so enabling DMA bits results in DMA timeouts."

    I'm still not seeing how my fix could have changed anything...

Comment 12 Sergei Shtylyov 2007-03-12 09:14:34 UTC
Hello.

Sergei Shtylyov wrote:

>> I see that CONFIG_IDEDMA_PCI_AUTO is not set in the kernel config, 
>> that would
>> probably explain the issue.  My hypothesis is the following:

>> "The code responsible for enabling DMA has a bug which always causes 
>> DMA to be
>> enabled despite CONFIG_IDEDMA_PCI_AUTO=n (so the device/host is not 
>> programmed
>> for DMA) and the fact that DMA is explicitly disabled in 
>> ide_set_dma().  It is
>> so because "0" return value is incorrectly passed to 
>> ide.c:set_using_dma() from
>> ide-dma.c:ide_set_dma().

>    Wait... that code went into 2.6.21-rc1 and the bug is against 2.6.20.

>> Before Sergei's fix if the device/host were programmed for DMA by 
>> BIOS/firmware
>> the bug went unnoticed.

>    But my fix predates ide_set_dma() et al. and I'm not seeing a bug in 
> the older code that you're fixing in the updated IDE core... wait, I got 
> it: you're talking about the bug in dma_check() methods that you've 
> fixed only in the updated code -- they return result of 
> ide_dma_off_quietly() which leads set_using_dma() into thinking that DMA 
> was enabled...

    I think you really should have such fix done *before* the massive IDE 
cleanups, and pushed into the stable series or something.

>> However now that the piix driver was fixed to set host
>> and device for the PIO mode so enabling DMA bits results in DMA 
>> timeouts."

>    I'm still not seeing how my fix could have changed anything...

    Ah, ide_config_drive_speed() was never called before, so id->dma_* fields 
remained as left by BIOS, and ide_dma_enable() returned 1 as if DMA was indeed 
enabled.

Comment 13 Bartlomiej Zolnierkiewicz 2007-03-12 09:20:19 UTC
>    But my fix predates ide_set_dma() et al. and I'm not seeing a bug in the 

ide_set_dma() is a no-op when it comes to the behavior of the code, the bug has
existed forever but was hidden in ->ide_dma_check implementations: 
->ide_dma_check() returned the return value of ->ide_dma_off_quietly() call
which had always been "0" (as it turned out in my "make ->ide_dma_off_quietly()
void" patch)

I described it in terms of the current code because I'm a lazy bastard and if my
theory is true I'll just cut'n'paste it into the patch description... :-)
Comment 14 Sergei Shtylyov 2007-03-12 09:24:03 UTC
> I see that CONFIG_IDEDMA_PCI_AUTO is not set in the kernel config, that would
> probably explain the issue.  My hypothesis is the following:

> "The code responsible for enabling DMA has a bug which always causes DMA to be
> enabled despite CONFIG_IDEDMA_PCI_AUTO=n (so the device/host is not programmed
> for DMA) and the fact that DMA is explicitly disabled in ide_set_dma().  It is
> so because "0" return value is incorrectly passed to ide.c:set_using_dma() from
> ide-dma.c:ide_set_dma().

Actually, I think the issues is in set_using_dma() itself -- there's no need 
to call ide_dma_on() method as it's done by ide_dma_check() in the old code, 
and ide_set_dma() in the new one.

Comment 15 Sergei Shtylyov 2007-03-12 09:35:48 UTC
>>> However now that the piix driver was fixed to set host
>>> and device for the PIO mode so enabling DMA bits results in DMA 
>>> timeouts."

>>    I'm still not seeing how my fix could have changed anything...

>    Ah, ide_config_drive_speed() was never called before, so id->dma_* 
> fields remained as left by BIOS, and ide_dma_enable() returned 1 as if 
> DMA was indeed enabled.

    Well, not only that! The driver's speedproc() method is now disabling 
UltraDMA bacause it's (wrongly) coupled with PIO there -- that's the real 
culprit that I'm still going to send a patch for. That must be the actual 
reason of DMA timeouts (well, the ultimate one :-).

Comment 16 Bartlomiej Zolnierkiewicz 2007-03-12 10:19:12 UTC
>   I think you really should have such fix done *before* the massive IDE
> cleanups, and pushed into the stable series or something.

I become aware of the bug _thanks_ to cleanups.  Doing it before "make
->ide_dma_off void" cleanup patch would require me to go over all ->dma_host_off
and ->ide_dma_off_quietly implementations and fixing the return values (+ audit
of all users besides ->ide_dma_check), and "make ->ide_dma_off void" patch
depended on ide_set_dma() patch for not having to go all over host drivers once
again when cleaning ->ide_dma_check implementations.

When it comes to -stable tree sorry but I lack the time for caring about it and
since I'm not working for a distro I also lack interest (but I review -stable
patches cc:ed to me).

The conclusion is that from my POV it would require much more work for dubious
results => no thanks.  I try to use _limited_ time spend on maintaining IDE in
the most efficent way, I think this was the case w.r.t. cleanup patches.  Linux
kernel development model is IMO a meritrocracy in the theory but an
efficientocracy in the practice.

I would like to end this rather long rant with the reminder that I _do_ accept
patches so if you think that something could be done better you know what to do. 

;-)

>     Ah, ide_config_drive_speed() was never called before, so id->dma_* fields 
> remained as left by BIOS, and ide_dma_enable() returned 1 as if DMA was indeed 
> enabled.

Hmm but ide_dma_enable() is not executed in this code patch (ide_use_dma()
returns "0" because of drive->autodma == 0) and id->dma_* vields are unimportant
when calling ->ide_dma_on.  The problem is really in the bad interaction of
->ide_dma_check passing the return value of ide_dma_on() to set_using_dma() and
the return value incorrectly always being "0" (I'm describing the old code here).

> Actually, I think the issues is in set_using_dma() itself -- there's no need 
> to call ide_dma_on() method as it's done by ide_dma_check() in the old code, 
> and ide_set_dma() in the new one.

Nice catch, we can cleanup the extra ide_dma_on().

>    Well, not only that! The driver's speedproc() method is now disabling 
> UltraDMA bacause it's (wrongly) coupled with PIO there -- that's the real 
> culprit that I'm still going to send a patch for. That must be the actual 
> reason of DMA timeouts (well, the ultimate one :-).

I'm lost on this one, please send patch...
Comment 17 Sergei Shtylyov 2007-03-12 10:30:52 UTC
Bart, concerning your patch: we may just break instead of return -1.
Comment 18 Sergei Shtylyov 2007-03-12 10:33:08 UTC
>>Actually, I think the issues is in set_using_dma() itself -- there's no need 
>>to call ide_dma_on() method as it's done by ide_dma_check() in the old code, 
>>and ide_set_dma() in the new one.

> Nice catch, we can cleanup the extra ide_dma_on().

    Except the (quite imaginary?) case where it doesn't call either 
ide_dma_on() or ide_dma_off_quietly()...

>>   Well, not only that! The driver's speedproc() method is now disabling 
>>UltraDMA bacause it's (wrongly) coupled with PIO there -- that's the real

    I.e. it disables UDMA if any *valid* non-UDMA mode is passed to it -- 
before my patch it wasn't getting a valid PIO mode, so was just bailing out.

>>culprit that I'm still going to send a patch for. That must be the actual 
>>reason of DMA timeouts (well, the ultimate one :-).

> I'm lost on this one, please send patch...

    I will -- just intended to test it (which turned out to be troublesome to 
do in the usual way, i.e. on remote target).

Comment 19 Bartlomiej Zolnierkiewicz 2007-03-12 10:47:01 UTC
>    Except the (quite imaginary?) case where it doesn't call either 
> ide_dma_on() or ide_dma_off_quietly()...

I'll audit the code later...

>>    Well, not only that! The driver's speedproc() method is now disabling 
>> UltraDMA bacause it's (wrongly) coupled with PIO there -- that's the real

>     I.e. it disables UDMA if any *valid* non-UDMA mode is passed to it -- 
> before my patch it wasn't getting a valid PIO mode, so was just bailing out.

Ah, now I get it, thanks!

update to previous comment:

>    But my fix predates ide_set_dma() et al. and I'm not seeing a bug in 
> the older code that you're fixing in the updated IDE core... wait, I got 
> it: you're talking about the bug in dma_check() methods that you've 
> fixed only in the updated code -- they return result of 
> ide_dma_off_quietly() which leads set_using_dma() into thinking that DMA 
> was enabled...

I misunderstood this part slightly - this was different bug (calling ->ide_dma
_always_ for non-IORDY case) and was fixed *before* cleanups...

The sad fact about current bug is that I had the fix for a long time in my
"to-be-completed" directory but since I completely forgot about
CONFIG_IDEDMA_AUTO and I thought that it is unlikely to happen in real world it
was a low-prio patch... *sigh*...
Comment 20 Bartlomiej Zolnierkiewicz 2007-03-12 10:51:12 UTC
/me typing too fast and making typos:

> ->ide_dma_check passing the return value of ide_dma_on() to set_using_dma() and

s/ide_dma_on()/ide_dma_off_quietly()/

> I misunderstood this part slightly - this was different bug (calling ->ide_dma

ide_dma_on()

>>    Except the (quite imaginary?) case where it doesn't call either 
>> ide_dma_on() or ide_dma_off_quietly()...
>
> I'll audit the code later...

can't check it ATM now but AFAIR it is not imaginary case...
Comment 21 Sergei Shtylyov 2007-03-12 11:31:04 UTC
>>>   Well, not only that! The driver's speedproc() method is now disabling 
>>>UltraDMA bacause it's (wrongly) coupled with PIO there -- that's the real

>>    I.e. it disables UDMA if any *valid* non-UDMA mode is passed to it -- 
>>before my patch it wasn't getting a valid PIO mode, so was just bailing out.

> Ah, now I get it, thanks!

And it finally blew up on somebody. :-)

> update to previous comment:

>>   But my fix predates ide_set_dma() et al. and I'm not seeing a bug in 
>>the older code that you're fixing in the updated IDE core... wait, I got 
>>it: you're talking about the bug in dma_check() methods that you've 
>>fixed only in the updated code -- they return result of 
>>ide_dma_off_quietly() which leads set_using_dma() into thinking that DMA 
>>was enabled...

> I misunderstood this part slightly - this was different bug (calling ->ide_dma

I guess you did mean ide_dma_off_quietly() here... :-)

> _always_ for non-IORDY case) and was fixed *before* cleanups...

I don't underatand how that fix could have changed anything WRT this issue -- 
this path shouldn't have been taken (and well, the fix wasn't quite correct 
anyway, concerning the return value ;-).

Comment 22 Sergei Shtylyov 2007-03-12 11:34:37 UTC
>> update to previous comment:

>>>   But my fix predates ide_set_dma() et al. and I'm not seeing a bug 
>>> in the older code that you're fixing in the updated IDE core... wait, 
>>> I got it: you're talking about the bug in dma_check() methods that 
>>> you've fixed only in the updated code -- they return result of 
>>> ide_dma_off_quietly() which leads set_using_dma() into thinking that 
>>> DMA was enabled...

>> I misunderstood this part slightly - this was different bug (calling 
>> ->ide_dma

> I guess you did mean ide_dma_off_quietly() here... :-)

>> _always_ for non-IORDY case) and was fixed *before* cleanups...

> I don't underatand how that fix could have changed anything WRT this 
> issue -- this path shouldn't have been taken (and well, the fix wasn't 
> quite correct anyway, concerning the return value ;-).

I meant excatly this "always successful" result of ide_dma_check() which
was discovered/fixed only after (or with) cleanups.

Comment 23 Bartlomiej Zolnierkiewicz 2007-03-12 13:27:06 UTC
>>> update to previous comment:
>
>>>>   But my fix predates ide_set_dma() et al. and I'm not seeing a bug 
>>>> in the older code that you're fixing in the updated IDE core... wait, 
>>>> I got it: you're talking about the bug in dma_check() methods that 
>>>> you've fixed only in the updated code -- they return result of 
>>>> ide_dma_off_quietly() which leads set_using_dma() into thinking that 
>>>> DMA was enabled...
>
>>> I misunderstood this part slightly - this was different bug (calling 
>>> ->ide_dma
>
>> I guess you did mean ide_dma_off_quietly() here... :-)
>
>>> _always_ for non-IORDY case) and was fixed *before* cleanups...
>
>> I don't underatand how that fix could have changed anything WRT this 

Yep, it doesn't change anything WRT this issue.

>> issue -- this path shouldn't have been taken (and well, the fix wasn't 
>> quite correct anyway, concerning the return value ;-).

Well, it fixed the usual case, namely ide-probe.c:probe_hwif() code path 
(tuning after probing) but not ide.c:set_using_dma() (hdparm -d) ;-).

> I meant excatly this "always successful" result of ide_dma_check() which
> was discovered/fixed only after (or with) cleanups.

Both bugs were discovered thanks to cleanups:

* the one discussed above was fixed *before* cleanups,
  commit d8f4469d5f2c35e0b8272f254b42737a2b8ce877

* for the current one I had a patch for a long time but due to my overlook 
  of CONFIG_IDEDMA_AUTO=n case it was low-prio (patch was planned to be merged
  *after* cleanups)

Some of my previous comments may be confusing and mixing stuff (not even
mentioning typos and broken english) - somebody please hit me with a stick in a
head next time I get the brilliant idea of checking my mail before going home... 

[ "hmm, I'll quickly reply and leave", quickly turned out to be ~1h30min ]

Anyway I hope that we understand things much better now and we can move to
actually fixing them.  I also hope that we didn't scare Patrick away... :-)
Comment 24 Patrick Horn 2007-03-12 16:51:35 UTC
Created attachment 10734 [details]
.config for the old kernel.
Comment 25 Patrick Horn 2007-03-12 16:54:00 UTC
Created attachment 10735 [details]
dmesg from 2.6.19.7
Comment 26 Patrick Horn 2007-03-12 16:54:44 UTC
Created attachment 10736 [details]
hdparm --Istdout /dev/hda /dev/hdb
Comment 27 Patrick Horn 2007-03-12 17:02:35 UTC
That patch worked... it made it refuse to set DMA mode (giving an Input/Output
error instead).
Comment 28 Bartlomiej Zolnierkiewicz 2007-03-15 13:41:20 UTC
Thanks.

Was there any particular reason (kernel issue) behind using
CONFIG_IDEDMA_PCI_AUTO=n and not using DMA by default?
Comment 29 Patrick Horn 2007-03-18 01:00:55 UTC
I did not have a reason for using it, aside from that it was selected by default
at the time (I did not use my distro's .config), and I never noticed it since
DMA worked fine on older kernels.

However, I just tried 2.6.20 and 2.6.21-rc4 with the CONFIG_IDEDMA_PCI_AUTO=y
config set, and it is better, but still has some problems... It will give
strange error messages.  Sometimes, it shows up with an "ide0: reset: success"
after which there are no more errors and DMA is enabled correctly by hdparm
afterwards.
Comment 30 Bartlomiej Zolnierkiewicz 2007-03-18 08:15:47 UTC
Thanks.

Do you still use "hdparm -d"?  Does Debian use hdparm udev scripts by default?
If so this could explain some strange messages (please try to capture them and
post here).

[ Using hdparm by default is risky since kernel should configure the
  host/device for the best possible settings anyway and there are known
  kernel/hdparm bugs... Eh, more fixes needed for 2.6.21-final. ]
Comment 31 Patrick Horn 2007-03-18 15:00:10 UTC
Created attachment 10823 [details]
Clippings from five different boots to the kernel.

I cut out all the IDE-related messages from 5 boots.  I tried using Alt_Sysrq-W
in one of them to show where hdparm was waiting.  This always only happened
during the UDEV scripts, and running hdparm any other time during the boot
process is fine (it seems to happen twice).

In all of the cases, the system finished booting despite the wait, though
occasionally it had a few other strange symptoms, and DMA was usually enabled
for all drives anyway (according to hdparm -d).
Comment 32 Patrick Horn 2007-03-18 15:04:19 UTC
I disabled "dma=on" from my hdparm.conf, and this made it work correctly (with
DMA still enabled).

It seems that the bug only occurs when hdparm -d is run during UDEV
initialization.  Running it manually after booting, or letting the kernel
automatically enable DMA works fine.
Comment 33 Bartlomiej Zolnierkiewicz 2007-03-21 03:08:03 UTC
Created attachment 10885 [details]
fix locking for DMA enable/disable

Patrick, please test if this patch (together with the previous one and setting
CONFIG_IDEDMA_PCI_AUTO to "y") fixes these problems.  Thanks.
Comment 34 Patrick Horn 2007-03-22 10:08:02 UTC
That seems to have fixed the problem... I don't seem to be getting any weird
messages during the udev scripts any more.

Is there a reason hdparm will enable DMA again if it is already enabled at boot?
 Also, does running "hdparm -d 1" after boot do anything different from the
CONFIG_IDEDMA_PCI_AUTO option?
Comment 35 Bartlomiej Zolnierkiewicz 2007-03-22 10:51:46 UTC
> That seems to have fixed the problem... I don't seem to be getting any weird
> messages during the udev scripts any more.

OK, thanks for testing the patch.  I'm closing this bugzilla bug.

> Is there a reason hdparm will enable DMA again if it is already enabled at
> boot?

It seems that hdparm just passes the parameter to the kernel without checking
first if DMA is enabled and kernel doesn't check it either (sorry but this can't
 be changed yet because some strange hdparm -d/-X combinations still may trigger
to-be-fixed kernel bugs in the code responsible for transfer mode filtering).

>  Also, does running "hdparm -d 1" after boot do anything different from the
> CONFIG_IDEDMA_PCI_AUTO option?

It does exactly the same thing as done previously by the kernel.

Note You need to log in before you can comment on or make changes to this bug.