Bug 7422 - [pata_jmicron, ata_piix, tainted] on 2.6.19-rc2-mm2 from time to time "ata3.00: qc timeout (cmd 0xa0)"-messages arise and the HDD-led constantly lights up & doesn't end to light
Summary: [pata_jmicron, ata_piix, tainted] on 2.6.19-rc2-mm2 from time to time "ata3.0...
Status: REJECTED UNREPRODUCIBLE
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-26 08:39 UTC by Matthew
Modified: 2007-06-22 04:34 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.19-rc2-mm2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
dmesg 2.6.19-rc2-mm2 (x86), with "irqpoll" (30.18 KB, text/plain)
2006-11-01 04:04 UTC, Matthew
Details
dmesg 2.6.19-rc2-mm2 (x86), without "irqpoll" (30.21 KB, text/plain)
2006-11-01 04:06 UTC, Matthew
Details
better failed qc reporting (1.55 KB, patch)
2006-11-15 05:11 UTC, Tejun Heo
Details | Diff
oops this is the one (1.56 KB, patch)
2006-11-15 05:17 UTC, Tejun Heo
Details | Diff
dmesg of 2.6.19-rc5-mm1 with provided libata, patch, timeout (29.97 KB, text/plain)
2006-11-16 04:29 UTC, Matthew
Details
output of dmesg on 2.6.22-rc5 (+ some patches such as cfs, reiser4, ...) (30.62 KB, text/plain)
2007-06-22 03:48 UTC, Matthew
Details

Description Matthew 2006-10-26 08:39:23 UTC
Most recent kernel where this bug did not occur: 2.6.18-mm2
Distribution: GNU/Gentoo Linux
Hardware Environment: Asus P5W DH Deluxe, Intel Core2 Duo (Conroe), Intel ICH7R,
JMicron JMB363, 
Software Environment: Gentoo x86 (stable)
Problem Description: from time to time this message:

ata3.00: qc timeout (cmd 0xa0)
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata3.00: tag 0 cmd 0xa0 Emask 0x5 stat 0x51 err 0x20 (timeout)
ata3: soft resetting port
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: soft resetting port
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: soft resetting port
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3.00: disabled
ata3: EH complete

shows up in syslog and the HDD-Led constantly lights, I don't know what is
connected to that port but it can be that on that port is my DVD-Burner so it
perhaps tries to identify a non-existant medium ...

it could also be related to the JMicron-controller since when I boot up the
kernel without "irqpoll" (in earlier kernel-versions too) I got such messages:

failed to IDENTIFY (I/O error, err_mask=0x4)

which are similar to the error reported ...

Steps to reproduce:
1. boot into 2.6.19-rc2-mm2 with "irqpoll"
2. work with the computer for some time
3. the message should appear "randomly" (I can't say after which time, sometimes
it appears, sometimes it doesn't)
4. putting the harddrives connected to the JMicron into standby / sleep mode
doesn't improve anything: hdparm -Y /dev/sdb && hdparm -Y /dev/sdc

these messages don't show up on boot, so the boot-process isn't affected:

can it be a similar problem to the following ? :
http://article.gmane.org/gmane.linux.ide/13434
Comment 1 Matthew 2006-10-26 08:42:01 UTC
forgot to made clear:

>it could also be related to the JMicron-controller since when I boot up the
>kernel without "irqpoll" (in earlier kernel-versions too) I got such messages:

>failed to IDENTIFY (I/O error, err_mask=0x4)

>which are similar to the error reported ...

if I don't boot up the kernel with "irqpoll" 

IDENTIFY (I/O error, err_mask=0x4) 

and other errors show up & the kernel takes pretty long to bring up the JMicron
(JMB363) Controller and the devices connected to it (since 2.6.18-rc4-mm3 and above)

Comment 2 Tejun Heo 2006-10-31 16:56:13 UTC
Can you post full dmesg w/ and w/o irqpoll?  Thanks.
Comment 3 Matthew 2006-11-01 04:04:09 UTC
Created attachment 9391 [details]
dmesg 2.6.19-rc2-mm2 (x86), with "irqpoll"
Comment 4 Matthew 2006-11-01 04:06:35 UTC
Created attachment 9392 [details]
dmesg 2.6.19-rc2-mm2 (x86), without "irqpoll"

it takes 2 minutes more to boot up this kernel, this is a big improvement:

in earlier kernels it took more than 9 additional minutes!

Thanks for your help
Comment 5 Matthew 2006-11-03 13:58:52 UTC
update:

I get the some behavior on 2.6.19-rc4-mm1 with slightly different output:

ata3.00: qc timeout (cmd 0xa0)
ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata3.00: tag 0 cmd 0xa0 Emask 0x5 stat 0x51 err 0x20 (timeout)
ata3: soft resetting port
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: soft resetting port
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3: failed to recover some devices, retrying in 5 secs
ata3: soft resetting port
ata3.00: qc timeout (cmd 0xa1)
ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata3.00: revalidation failed (errno=-5)
ata3.00: disabled
ata3: EH complete
Comment 6 Matthew 2006-11-07 01:28:34 UTC
update2: 

I also get this on 2.6.19-rc4-mm2

today I wanted to access a CD-ROM (with this message / behavior appearing)
and mount only said: 

mount: /dev/cdrom: can't read superblock

(ata3 -> is the port where the CD/DVD-Drive is connected to)
Comment 7 Tejun Heo 2006-11-15 05:10:31 UTC
Yeap, ata3 is ata_piix PATA port where "TSSTcorp CD/DVDW SH-S182D SB04" is
connected.  Something makes your dvd-w unresponsive and libata eventually
offlines it.  Can you apply the following patch and post full dmesg when the
error occurs?
Comment 8 Tejun Heo 2006-11-15 05:11:56 UTC
Created attachment 9520 [details]
better failed qc reporting
Comment 9 Tejun Heo 2006-11-15 05:17:16 UTC
Created attachment 9521 [details]
oops this is the one
Comment 10 Matthew 2006-11-16 04:07:09 UTC
Thanks ! :)

Do you mind if I test it on 2.6.19-rc5-mm1 ?
(it shows the same behavior like 2.6.19-rc2-mm2)
Comment 11 Matthew 2006-11-16 04:29:45 UTC
Created attachment 9529 [details]
dmesg of 2.6.19-rc5-mm1 with provided libata, patch, timeout

this time it happened rather fast, strange ...

I just sent my 2 drives connected to the JMicron via "hdparm -Y" to sleep,...
(those 2 Western Digital drives)

I renamed the kernel in Makefile, to that long name to be able to distinguish
it from "clean mm-sources":
the only changes are:
- in cpufreq (doesn't normally work) (should work with newer kernels), patch
from bug #7383
- your libata-patch
Comment 12 Tejun Heo 2007-06-20 00:27:15 UTC
Can you test 2.6.22-rc5 and see whether the problem is still there?  Thanks.
Comment 13 Matthew 2007-06-22 03:48:36 UTC
Created attachment 11847 [details]
output of dmesg on 2.6.22-rc5 (+ some patches such as cfs, reiser4, ...)

it doesn't seem to happen anymore (if I recall correct around 2.6.20)

the problem, though, is that a lot in my configuration has changed, so I'm not sure if it really disappeared / is fixed:

before:
2 IDE-HDD @ JMicron IDE-port (2nd ?)
1 DVD-Drive @ ICH7R IDE-port (1st ?)
2 S-ATA-HDD @ EZ-Backup (silicon image hardware raid something, raid0)

now:
3 S-ATA-HDD @ ICH7R S-ATA-ports (hope all 3 are on it)
0 on the EZ-Backup
1 @ JMicron S-ATA-port
1 DVD-Drive @ ICH7R IDE-port (1st ?)

what still annoys me is:

[   63.269376] ata2.00: qc timeout (cmd 0xec)
[   63.273910] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x104)
[   71.260399] ata2: port is slow to respond, please be patient (Status 0x80)
[   94.219379] ata2: port failed to respond (30 secs, Status 0x80)
[   94.223980] ata2: COMRESET failed (device not ready)
[   94.228527] ata2: hardreset failed, retrying in 5 secs

this "forced" wait on bootup which seems to be related to no harddrives attached to the ez-backup (silicon image hardware raid) -> don't know if there's already an opened bug on bugzilla

I even tried to remove jumper & only connect 1 harddrive but to no avail, it seems there's a need that it has to run in raid / jbod-mode, whatever activated mode that this 'failed to IDENTIFY (I/O error, err_mask=0x104)' doesn't appear (= at least 2 harddrives need to be connected to it)

thanks again for your work :)
Comment 14 Tejun Heo 2007-06-22 04:34:04 UTC
Okay, please reopen if it occurs again.  Also the forced wait problem is known and I'm working on it right now.  Probably will be fixed in 2.6.23.

Note You need to log in before you can comment on or make changes to this bug.