Bug 10239

Summary: PLEXTOR DVDR PX-740A drive not ready for command
Product: IO/Storage Reporter: Richard Genoud (richard.genoud)
Component: IDEAssignee: Patrice Vetsel (ubuntu)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bp, bzolnier, htejun, kernel, lakostis, larsbratthall, ubuntu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc5 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9243    
Attachments: full 2.6.25-rc5 dmesg
full 2.6.23 dmesg
lspci -vvvxxx for 2.6.25-rc5 kernel CONFIG_BLK_DEV_AMD74XX=y
2.6.25-rc5 dmesg with CONFIG_BLK_DEV_GENERIC=y + "ide_pci_generic.all_generic_ide"
2.6.25-rc5 lspci with CONFIG_BLK_DEV_GENERIC=y + "ide_pci_generic.all_generic_ide"
dmesg 2.6.25-rc5 kernel CONFIG_BLK_DEV_AMD74XX=y without commit b140b99c413ce410197cfcd4014e757cd745226a
lspci 2.6.25-rc5 kernel CONFIG_BLK_DEV_AMD74XX=y without commit b140b99c413ce410197cfcd4014e757cd745226a
dmesg output on Intel (cdrom is working)
amd74xx-fix-address-setup-time-for-nvidia-hosts.patch
2.6.25-rc5 dmesg with patch amd74xx-fix-address-setup-time-for-nvidia-hosts
2.6.25-rc5 lspci with patch amd74xx-fix-address-setup-time-for-nvidia-hosts
libata-fix-cable-detection.patch
2.6.25-rc5 dmesg with patch libata-fix-cable-detection.patch and pata_amd
2.6.25-rc5 lspci with patch libata-fix-cable-detection.patch and pata_amd

Description Richard Genoud 2008-03-13 12:30:40 UTC
Latest working kernel version:2.6.23
Earliest failing kernel version:2.6.24-rc1
Distribution:N/A
Hardware Environment:
MB : ASUS A7N8X-E deluxe
DVD : PLEXTOR DVDR PX-740A (FW 1.02)
Software Environment: N/A

Problem Description:
During kernel boot, the DVD drive initialisation fails with those errors (2.6.25-rc5):
hda: status error: error=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status error: error=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status error: error=0xd0 { Busy }
ide: failed opcode was: unknown
hda: drive not ready for command

Steps to reproduce:
plug a plextor DVDR PX-740A as primary ide master.
boot a kernel >= 2.6.24-rc1

NB: if the plextor is on primary ide slave, there's no error.
(the jumpers are ok).

I bisect to find out where the bug was and I found that this commit is:

b140b99c413ce410197cfcd4014e757cd745226a is first bad commit
commit b140b99c413ce410197cfcd4014e757cd745226a
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Sat Oct 13 17:47:51 2007 +0200

    ide: change master/slave IDENTIFY order
    
    Need to probe slave device first to make it release PDIAG-
    (this is required for correct device side cable detection).
    
    Based on libata commit f31f0cc2f0b7527072d94d02da332d9bb8d7d94c.
    
    Thanks to Craig for testing this patch.
    
    Cc: Craig Block <chblock3@yahoo.com>
    Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

:040000 040000 fa17f9e66a6e551184b1ec00049cea6fb87f7a1c 89a175312afe235fbdb04a3a593a4b981bdd9188 M      drivers
Comment 1 Richard Genoud 2008-03-13 13:13:12 UTC
Created attachment 15256 [details]
full 2.6.25-rc5 dmesg
Comment 2 Bartlomiej Zolnierkiewicz 2008-03-13 13:27:24 UTC
Thanks, the drive survives the probe (is identified correctly) but fails when there is a first attempt to use it with ide-cd.

Could you also send a dmesg from 2.6.23 (so we can compare these two)?

| Steps to reproduce:
| plug a plextor DVDR PX-740A as primary ide master.
| boot a kernel >= 2.6.24-rc1
|
| NB: if the plextor is on primary ide slave, there's no error.
| (the jumpers are ok).

Please confirm that I get this correctly - if the drive is plugged as master and configured by _jumpers_ as master it also fails?

[ Also: is the drive connected to the "other" cable end (_not_ in the middle)
  when used as master? ]

PS I'm cc:ed on all IDE bugs anyway so I've removed myself from cc:
Comment 3 Bartlomiej Zolnierkiewicz 2008-03-13 13:37:40 UTC
| [ Also: is the drive connected to the "other" cable end (_not_ in the middle)
|   when used as master? ]

One more thing - please also verify that the cable is plugged correctly (not in the reverse order) - the connector farer from the middle one (blue one) should go to the controller.

[ Sorry for asking so many stupid questions but I would really like to exclude
  hardware configuration issue first. ]
Comment 4 Richard Genoud 2008-03-14 01:30:29 UTC
> Please confirm that I get this correctly - if the drive is plugged as master
> and configured by _jumpers_ as master it also fails?
> [ Also: is the drive connected to the "other" cable end (_not_ in the middle)
>   when used as master? ]
yep!

doesn't work:
IDE drive configured by jumper as master AND connected to the end of the cable.

does work:
IDE drive configured by jumper as slave AND connected to the middle of the cable.

By the way, the presence of another drive on the cable doesn't seem to change anything.

[I didn't try:
IDE drive configured by jumper as master AND connected to the middle of the cable.
IDE drive configured by jumper as slave AND connected to the end of the cable.
but I can try if you want]

And the calbe I use is a 80 connector.
Comment 5 Richard Genoud 2008-03-14 01:40:45 UTC
(In reply to comment #3)
> One more thing - please also verify that the cable is plugged correctly (not
> in
> the reverse order) - the connector farer from the middle one (blue one)
> should
> go to the controller.
yes, it is.
 
> [ Sorry for asking so many stupid questions but I would really like to
> exclude
>   hardware configuration issue first. ]
It's all right, everything should be double-checked.
Comment 6 Richard Genoud 2008-03-14 12:56:58 UTC
Created attachment 15269 [details]
full 2.6.23 dmesg
Comment 7 Bartlomiej Zolnierkiewicz 2008-03-16 07:02:34 UTC
Thanks.

I've rechecked with the known ATAPI errata but so far I don't see a reason for drive getting stuck - I wonder whether it could be controller related.

Could you try using ide_pci_generic driver instead of amd74xx (you need to disable CONFIG_BLK_DEV_AMD74XX, enable CONFIG_BLK_DEV_GENERIC and boot kernel with "ide_pci_generic.all_generic_ide" option) and see if it helps?
Comment 8 Bartlomiej Zolnierkiewicz 2008-03-16 08:02:27 UTC
Also booting with "hdb=noprobe" should workaround the problem (please try it).
Comment 9 Anonymous Emailer 2008-03-16 08:03:58 UTC
Reply-To: dani@ngrt.de

On Sun, 16 Mar 2008 07:02:35 -0700 (PDT),
bugme-daemon@bugzilla.kernel.org wrote:

>I've rechecked with the known ATAPI errata but so far I don't see a reason for
>drive getting stuck - I wonder whether it could be controller related.
>
>Could you try using ide_pci_generic driver instead of amd74xx (you need to
>disable CONFIG_BLK_DEV_AMD74XX, enable CONFIG_BLK_DEV_GENERIC and boot kernel
>with "ide_pci_generic.all_generic_ide" option) and see if it helps?

This problem is widespread! I am experiencing it - like other users of
the Knoppix 5.3 live cd that came with a recent issue of the c't
magazine - here on a Nvidia MCP55 and a BenQ DW1655 which is master to
a Plextor PX-708A. Booting off the BenQ fails whereas booting off the
Plextor (slave) works.

Ciao,
  Dani
Comment 10 Bartlomiej Zolnierkiewicz 2008-03-16 08:25:04 UTC
MCP-55 controller again, hmm...

- What error messages are you seeing?

- If you boot from slave drive does the master drive work?
Comment 11 Richard Genoud 2008-03-16 08:30:47 UTC
(In reply to comment #7)

> Could you try using ide_pci_generic driver instead of amd74xx (you need to
> disable CONFIG_BLK_DEV_AMD74XX, enable CONFIG_BLK_DEV_GENERIC and boot kernel
> with "ide_pci_generic.all_generic_ide" option) and see if it helps?

when I disable CONFIG_BLK_DEV_AMD74XX and enable CONFIG_BLK_DEV_GENERIC, the kernel is booting without any error (whenever I add ide_pci_generic.all_generic_ide or hdb=noprobe).
Comment 12 Patrice Vetsel 2008-03-16 08:32:01 UTC
I have the same bug :
https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/181561
and
http://bugzilla.kernel.org/show_bug.cgi?id=9837

I don't think that this is a controller problem. I have 2 different PC (nvidia and intel controller). And the bug follow is where i put my Plextor PX-740A (setting as master, and alone on ide).
Setting it as Slave resolve temporarly the problem.
Comment 13 Bartlomiej Zolnierkiewicz 2008-03-16 08:53:25 UTC
*** Bug 9837 has been marked as a duplicate of this bug. ***
Comment 14 Bartlomiej Zolnierkiewicz 2008-03-16 09:11:34 UTC
Richard, please send 'lspci -vvvxxx' outputs for:

- kernel using CONFIG_BLK_DEV_AMD74XX
- kernel using CONFIG_BLK_DEV_GENERIC + "ide_pci_generic.all_generic_ide"
  (+ dmesg output for this one)

and also:

- "bad" kernel using amd74xx
  (one with commit b140b99c413ce410197cfcd4014e757cd745226a)
- "good" kernel using amd74xx
  (one without commit b140b99c413ce410197cfcd4014e757cd745226a)

I'll see if there are any differences in the way controller is programmed visible in PCI configuration.

Patrice, please send dmesg output for the system with the Intel controller
(for the "bad" kernel).

Thanks.
Comment 15 Richard Genoud 2008-03-16 09:39:20 UTC
Created attachment 15296 [details]
lspci -vvvxxx for 2.6.25-rc5 kernel CONFIG_BLK_DEV_AMD74XX=y
Comment 16 Richard Genoud 2008-03-16 09:54:24 UTC
Created attachment 15297 [details]
2.6.25-rc5 dmesg with CONFIG_BLK_DEV_GENERIC=y + "ide_pci_generic.all_generic_ide"
Comment 17 Richard Genoud 2008-03-16 09:55:14 UTC
Created attachment 15298 [details]
2.6.25-rc5 lspci with CONFIG_BLK_DEV_GENERIC=y + "ide_pci_generic.all_generic_ide"
Comment 18 Richard Genoud 2008-03-16 10:14:49 UTC
Created attachment 15299 [details]
dmesg 2.6.25-rc5 kernel CONFIG_BLK_DEV_AMD74XX=y without commit b140b99c413ce410197cfcd4014e757cd745226a
Comment 19 Richard Genoud 2008-03-16 10:16:49 UTC
Created attachment 15300 [details]
lspci 2.6.25-rc5 kernel CONFIG_BLK_DEV_AMD74XX=y without commit b140b99c413ce410197cfcd4014e757cd745226a
Comment 20 Richard Genoud 2008-03-16 10:18:10 UTC
I think it's all for the attachments.
Comment 21 Anonymous Emailer 2008-03-16 23:38:29 UTC
Reply-To: dani@ngrt.de

On Sun, 16 Mar 2008 08:25:05 -0700 (PDT),
bugme-daemon@bugzilla.kernel.org wrote:

>http://bugzilla.kernel.org/show_bug.cgi?id=10239
>
>
>
>
>
>------- Comment #10 from bzolnier@gmail.com  2008-03-16 08:25 -------
>MCP-55 controller again, hmm...
>
>- What error messages are you seeing?

Lots of these:
hda: status error: status=0x59 { DriveReady SeekComplete DataRequest
Error }
hda: status error: error=0x00 { }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status error: status=0x59 { DriveReady SeekComplete DataRequest
Error }
hda: status error: error=0x00 { }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status error: status=0x59 { DriveReady SeekComplete DataRequest
Error }
hda: status error: error=0x00 { }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status error: status=0x59 { DriveReady SeekComplete DataRequest
Error }
hda: status error: error=0x00 { }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: UDMA/33 mode selected
hda: set_drive_speed_status: status=0x58 { DriveReady SeekComplete
DataRequest }
hda: host max PIO5 wanted PIO255(auto-tune) selected PIO4
hda: set_drive_speed_status: status=0x58 { DriveReady SeekComplete
DataRequest }
hdb: UDMA/33 mode selected
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown

full dmesg on request.

>- If you boot from slave drive does the master drive work?

No.

Ciao,
  Dani
Comment 22 Patrice Vetsel 2008-03-17 05:51:20 UTC
Created attachment 15314 [details]
dmesg output on Intel (cdrom is working)

@Bartlomiej : I'm sorry, I'v no bug in Intel chipset
Here is informations :
root@ubuntu:~# uname -r
2.6.24-12-generic (Hardy system up to date)
Comment 23 Patrice Vetsel 2008-03-17 05:52:06 UTC
root@ubuntu:~# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 02)
00:01.0 PCI bridge: Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801IB (ICH9) 2 port SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc RV370 [Sapphire X550 Silent]
01:00.1 Display controller: ATI Technologies Inc RV370 secondary [Sapphire X550 Silent]
02:00.0 Ethernet controller: Attansic Technology Corp. L1 Gigabit Ethernet Adapter (rev b0)
03:00.0 SATA controller: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03)
03:00.1 IDE interface: JMicron Technologies, Inc. JMicron 20360/20363 AHCI Controller (rev 03)
05:03.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller (rev c0)
root@ubuntu:~# 
Comment 24 Daniel Drake 2008-03-18 05:31:43 UTC
downstream bug report: https://bugs.gentoo.org/show_bug.cgi?id=213615
Comment 25 Lars Bratthall 2008-03-19 05:09:26 UTC
Additional info sought at http://www.jmicron.com/Support_FAQ.html

Reason: Bug [probably] reproduced on ASUS Maximums Extreme SE motherboard latest BIOS, with dual DVDs connected to JMicron controller on motherboard, and, a separate 3ware 9620 RAID controller for HDs. Bug reproduced on Etch, Sabayon, Knoppix 5.1.1. The HW setup works fine for another OS.

Bug exhibits itself as follows: A bit down the installation, the CD-ROM cannot be found, despite ok booting from the device first. 
Booting DVD: LiteOn DH-16D2S (IDE)
Second DVD: Samsung SMSHS203 (SATA) (? May be a 202 which is IDE)

Though the use of all_ide_generic may work (not yet tried), it seems there are some performance side effects:

"Q6: What are the differences between Legacy mode (IDE) and AHCI mode? 
Ans: Legacy mode support s OS through legacy IDE driver. Most SATA functions are not supported in Legacy mode, like SATA II 3G, NCQ, HotPlug and etc. JMicron Technology Corp. delivers the worldwide first AHCI compliant eSATA controller and now most of the Operating Systems are "Native Support", which enables SATA II 3G, NCQ, and Hotplug on JMB36X SATA / eSATA controllers. 
 "

If I read correctly, I will disable SATA II speed enhancements (and then also on the nice RAID-controller?) if I use all_ide_generic or ide_pci_generic.all_generic_ide. That will be a problem.

Found workarounds so far:
Switch Slave/Master
Use "ide_pci_generic.all_generic_ide"
Remove CD and insert a USB-drive early in process
hdb=noprobe
Use network install
Let the distribution retry the mounting multiple times, "it then resolves by itself"

Not tried them yet, but thought summary might aid them wiser than me. Appologies if description not good enough. First time writer here.
Comment 26 Bartlomiej Zolnierkiewicz 2008-03-29 11:11:53 UTC
[ sorry for the delay ]

I investigated lspci outputs sent by Richard:

--- lspci.amd74xx	2008-03-29 17:16:31.000000000 +0100
+++ 2.6.25-rc5-generic_ide.lspci	2008-03-29 17:17:06.000000000 +0100

the chunk corresponding to 00:09.0 IDE interface...

@@ -400,10 +400,10 @@
 20: 01 f0 00 00 00 00 00 00 00 00 00 00 43 10 11 0c
 30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 01
 40: 43 10 11 0c 01 00 02 00 00 00 00 00 00 09 00 00
-50: 03 f0 00 00 00 00 00 00 a8 20 a8 20 22 00 20 20
+50: 03 f0 00 00 00 00 00 00 a8 20 a8 20 66 00 20 20
                                         ^^
0x5c: 0x22 -> 0x66

0x5c == 0x4c (AMD_ADDRESS_SETUP) + 0x10 (for nVidia)

amd74xx.c::amd_set_speed():
...
	pci_read_config_byte(dev, AMD_ADDRESS_SETUP + offset, &t);
	t = (t & ~(3 << ((3 - dn) << 1))) | ((FIT(timing->setup, 1, 4) - 1) << ((3 - dn) << 1));
	pci_write_config_byte(dev, AMD_ADDRESS_SETUP + offset, t);
...

0x4c: Address Setup Time Register:
	7:6 P0ADD Primary Drive 0 Address Setup Time
	5:4 P1ADD Primary Drive 1 Address Setup Time
	3:2 S0ADD Secondary Drive 0 Address Setup Time
	1:0 S1ADD Secondary Drive 1 Address Setup Time

0x22 == 00100010b
0x66 == 01100110b

For some reason the BIOS (if ide_pci_generic is used timings are not
programmed and the default values are used) sets higher address setup
time than amd74xx driver.

The timing used by amd74xxx is correct w.r.t. drive requirements,
ATA spec and AMD datasheet but it could be that for nVidia hosts
for some reason we need to use the higher timing (or maybe nVidia
has different programming requirements for this register).

Richard, could you try the attached patch?

[ There is also another change in PCI configuration space at offsets 0x8d-0x8e... ]
Comment 27 Bartlomiej Zolnierkiewicz 2008-03-29 11:13:15 UTC
Created attachment 15505 [details]
amd74xx-fix-address-setup-time-for-nvidia-hosts.patch
Comment 28 Bartlomiej Zolnierkiewicz 2008-03-29 11:13:50 UTC
Daniel, thanks for the link - now it is clear that it is not nVidia specific, however I still wonder whether this is generic IDE problem or the commit just uncovered some problems - in any case I'm going to revert the patch for 2.6.25 (this has the cost of making cable detection less reliable...).

To Ubuntu developers: please help us with letting us know about problems early.  
I'm a bit unhappy with the fact that the issue was initially reported in the beginning of January and I learned about it 2 weeks ago (during 2.6.25 stabilization phase)...
Comment 29 Patrice Vetsel 2008-03-29 11:31:57 UTC
@Bartlomiej : i'm not ubuntu dev, but i'v reported this bug on kernel.org on january 2008-01-28 after reported it on launchpad the 2008-01-09 (the time that i suspect a kernel bug.

https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/181561
http://bugzilla.kernel.org/show_bug.cgi?id=9837

Best regards
Comment 30 Bartlomiej Zolnierkiewicz 2008-03-29 19:41:46 UTC
@Patrice: Yeah, I know + thanks for reporting it.

The unfortunate thing is that it was all the time under Product: Platform Specific/Hardware + Component: i386 (instead of IO/Storage + IDE).  As a result it never reached linux-ide@ ML or me so we've learned about this bug entry only on 2008-03-16 when the link to the previous bug was mentioned under this bug.
Comment 31 Bartlomiej Zolnierkiewicz 2008-03-29 19:44:31 UTC
@Patrice: PS could you please re-assign the bug to me.
Comment 32 Bartlomiej Zolnierkiewicz 2008-03-29 20:27:19 UTC
I was quite puzzled why the same problem is not reported for corresponding libata host drivers as actually the "guilty" change was based on libata changes (+ amd74xx and pata_amd host drivers are very similar nowadays).

It seems that commit f58229f8060055b08b34008ea08f31de1e2f003c ("libata-link: implement and use link/device iterators") which went into 2.6.24 by accident reverted the "guilty" change (thus making cable detection less reliable):

@@ -2134,18 +2132,16 @@ int ata_bus_probe(struct ata_port *ap)
        /* after the reset the device state is PIO 0 and the controller
           state is undefined. Record the mode */
 
-       for (i = 0; i < ATA_MAX_DEVICES; i++)
-               ap->link.device[i].pio_mode = XFER_PIO_0;
+       ata_link_for_each_dev(dev, &ap->link)
+               dev->pio_mode = XFER_PIO_0;
 
        /* read IDENTIFY page and configure devices. We have to do the identify
           specific sequence bass-ackwards so that PDIAG- is released by
           the slave device */
 
-       for (i = ATA_MAX_DEVICES - 1; i >=  0; i--) {
-               dev = &ap->link.device[i];
-
-               if (tries[i])
-                       dev->class = classes[i];
+       ata_link_for_each_dev(dev, &ap->link) {
+               if (tries[dev->devno])
+                       dev->class = classes[dev->devno];
 
                if (!ata_dev_enabled(dev))
                        continue;

[ the code above should use ata_link_for_each_dev_reverse() instead ]

N.B. ata_eh_revalidate_and_attach() gets it right so libata may also be currently affected by the problem but will be triggered only for suspend/resume or if somebody decides to rescan/plug-in devices

I guess that we want to fix libata but at the same time verify that the issue that we've hit with IDE amd74xx/piix does/doesn't happen with pata_amd/ata_piix?

Tejun?
Comment 33 Richard Genoud 2008-03-30 03:29:11 UTC
The patch amd74xx-fix-address-setup-time-for-nvidia-hosts.patch doesn't seem to correct the bug.
I'm attaching dmesg and lspci.
Comment 34 Richard Genoud 2008-03-30 03:30:48 UTC
Created attachment 15507 [details]
2.6.25-rc5 dmesg with patch amd74xx-fix-address-setup-time-for-nvidia-hosts
Comment 35 Richard Genoud 2008-03-30 03:31:30 UTC
Created attachment 15508 [details]
2.6.25-rc5 lspci with patch amd74xx-fix-address-setup-time-for-nvidia-hosts
Comment 36 Bartlomiej Zolnierkiewicz 2008-03-30 05:50:54 UTC
Created attachment 15509 [details]
libata-fix-cable-detection.patch

Thanks for testing, it looks more like generic problem with some drives now...

Could you also check if pata_amd with the attached patch work?

You need to:
- disable IDE completely (CONFIG_IDE=n)
- enable libata (CONFIG_ATA=y) and pata_amd (CONFIG_PATA_AMD=y)
- enable SCSI disk (CONFIG_BLK_DEV_SD=y) and CD-ROM support
  (CONFIG_BLK_DEV_SR=y)

[ as a side-effect device names will change from /dev/hd* to /dev/sd* ]
Comment 37 Dan 2008-03-30 12:12:21 UTC
the following bug report says that adding vga=788 to the boot parameters will bring up a normal boot:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=469189

i can confirm that, but i don't have any idea why it works that way
Comment 38 Tejun Heo 2008-03-30 20:54:56 UTC
Bartlomiej, thanks for finding out the accidental order change but ata_bus_probe() is deprecated and currently only used by sata_sx4 and SAS.  None of PATA ones uses that path anymore.  Everything goes through ata_eh_revalidate_and_attach().  Richard, can you be persuaded into trying pata_amd?
Comment 39 Richard Genoud 2008-03-31 10:46:08 UTC
with pata_amd, the dvd writer is working. (I applied libata-fix-cable-detection.patch anyway).
Comment 40 Richard Genoud 2008-03-31 10:47:20 UTC
Created attachment 15537 [details]
2.6.25-rc5 dmesg with patch libata-fix-cable-detection.patch and pata_amd
Comment 41 Richard Genoud 2008-03-31 10:48:18 UTC
Created attachment 15538 [details]
2.6.25-rc5 lspci with patch libata-fix-cable-detection.patch and pata_amd
Comment 42 Tejun Heo 2008-04-01 18:21:18 UTC
So, pata_amd works fine.  It doesn't look like the reversed IDENTIFY order is the actual culprit in the probing failure.  pata_amd has been doing it that way for a long time now.  It was converted to new EH pretty early and have been using the reverse order IDENTIFY since it was first applied to EH till now.  I'll forward the fix for obsolete path upstream.  Thanks.
Comment 43 sergey 2008-05-23 13:55:54 UTC
Hi All! Sorry for my ... and for my bad English :-)
Can you see this bugreport please: 
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480679?
This bug was reported to Debian, but it is exact what I have on my Slackware 12.1 with kernel 2.6.24.5. And I can't find it at http://bugzilla.kernel.org. But the key words of dmesg ("hdc: status timeout: status=0xd0 { Busy }  ide: failed opcode was: unknown  drive not ready for command") is equialent to this topic.
Is Debian bug 480679 equialent to kernel bug 10239 or should be reported separately?

P.S. Additionally to Debian bugreport, I have next features on Slackware: 
- not only hald-addon-storage, but k3b also can freeze system;
- trying to renice k3b process that eats ~99% CPU have no results.
Comment 44 Bartlomiej Zolnierkiewicz 2008-06-18 15:26:49 UTC
This bug as it has been fixed months ago by reverting the change causing regression (sorry for the late update):

commit f367bed005b06db7067fc378a5f2253fac54e5d9
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Sat Mar 29 19:48:21 2008 +0100

    Revert "ide: change master/slave IDENTIFY order"

    This reverts commit b140b99c413ce410197cfcd4014e757cd745226a.
...

[ also in the end it looked like a generic timing issue in IDE code triggered
  by my change - the underlying issue may have been fixed in newer kernels
  (there were a ton of fixes since then) so it would useful if somebody tries
  to revert-the-revert and see if kernel still breaks ]

Richard: thanks for all your help on this, also by incident I found a real problem with pata_amd while analyzing the lspci outputs from you:

amd74xxx:

00: de 10 65 00 05 00 b0 00 a2 8a 01 01 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 f0 00 00 00 00 00 00 00 00 00 00 43 10 11 0c
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 01
40: 43 10 11 0c 01 00 02 00 00 00 00 00 00 09 00 00
50: 03 f0 00 00 00 00 00 00 a8 20 a8 20 66 00 20 20

pata_amd:

00: de 10 65 00 05 00 b0 00 a2 8a 01 01 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 f0 00 00 00 00 00 00 00 00 00 00 43 f0 11 0c
30: 00 00 00 00 44 00 00 00 00 00 00 00 00 00 03 01
40: 43 f0 11 0c 01 00 02 00 00 00 00 00 00 09 00 00
50: 03 f0 00 00 00 00 00 00 99 20 99 20 22 00 20 20

pata_amd incorrectly programs FIFO settings at offset 0x41 instead of 0x51
(which seems to be shadowed at 0x2d so it also results in wrong "Subsystem" of PCI device being reported)

amd74xx:	Subsystem: ASUSTeK Computer Inc. Unknown device 0c11
pata_amd:	Subsystem: Unknown device f043:0c11

Tejun: feel free to fix (or forward to Alan) the above issue (I'm too busy with other stuff + amd74xx doesn't have the problem)

sergey: please try some recent kernel (preferably 2.6.26-rc6) and if the issue still happens there open a new bug
Comment 45 Bartlomiej Zolnierkiewicz 2008-06-18 15:29:15 UTC
Patrice: please close this bug