Bug 3352

Summary: (sata nv) module fails to find drives
Product: IO/Storage Reporter: John Stebbins (stebbins)
Component: Serial ATAAssignee: Andrew Chew (achew)
Status: REJECTED INSUFFICIENT_DATA    
Severity: normal CC: alan, apa3a, benny+bugzilla, bunk, eric, herbert, jgarzik, johann, kiall, martin, nagendra.cl, pklong, stanmuffin, u288
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.8.1 Subsystem:
Regression: --- Bisected commit-id:

Description John Stebbins 2004-09-06 14:42:13 UTC
Distribution: 
Fedora core 2 x86_64

Hardware Environment:
Epox 8KDA3+ (nForce3 250Gb chipset)
200G Seagate sata (ST3200822AS)

Problem Description:
The sata_nv driver does not see attached drives.

Problem seen when upgrading to the 2.6.8.1 fedora core 2 x86_64 kernel. This is
the kernel supplied by the fedora project. I'm not building kernels.

I am not alone in seeing this problem.  Someone else with an Asus K8N-E Deluxe
(also nForce3-250Gb chipset) sees this problem as well.  See this AMDZone forum
thread for details:
http://www.amdzone.com/modules.php?op=modload&name=PNphpBB2&file=viewtopic&p=29775

When upgrading from the previous 2.6.6 kernel it is necessary to run kudzu
manually to get the system to configure the sata_nv module. After doing so,
booting up 2.6.8.1 still fails to recognize drives on the nvidia sata
controller. Loading the sata_nv driver with modprobe takes a long time. Looking
at dmesg, I see:

sata_nv version 0.02
...
ata5 is slow to respond, please be patient
ata5 failed to respond (30 secs)
scsi4 : sata_nv
ata6 is slow to respond, please be patient
ata6 failed to respond (30 secs)
scsi5 : sata_nv
...

SCSI info: /proc/scsi/scsi is empty

PCI info (lspci -vvv output):
---------------------
00:00.0 Host bridge: nVidia Corporation: Unknown device 00e1 (rev a1)       
Subsystem: Unknown device 1695:100c        Control: I/O- Mem+ BusMaster+
SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status:
Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR-
<PERR-        Latency: 0        Region 0: Memory at e0000000 (32-bit,
prefetchable)        Capabilities: [44] #08 [01c0]        Capabilities: [c0] AGP
version 3.0                Status: RQ=32 Iso- ArqSz=2 Cal=0 SBA+ ITACoh- GART64-
HTrans- 64bit- FW+ AGP3+ Rate=x4,x8                Command: RQ=1 ArqSz=0 Cal=0
SBA- AGP- GART64- 64bit- FW- Rate=x400:01.0 ISA bridge: nVidia Corporation:
Unknown device 00e0 (rev a2)        Subsystem: Unknown device 1695:100c       
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping-
SERR- FastB2B-        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast
>TAbort- <TAbort- <MAbort- >SERR- <PERR-        Latency: 000:01.1 SMBus: nVidia
Corporation: Unknown device 00e4 (rev a1)        Subsystem: Unknown device
1695:100c        Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF- FastB2B+
ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-        Interrupt:
pin A routed to IRQ 5        Region 0: I/O ports at e000        Region 4: I/O
ports at 4c00 [size=64]        Region 5: I/O ports at 4c40 [size=64]       
Capabilities: [44] Power Management version 2                Flags: PMEClk- DSI-
D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)                Status: D0
PME-Enable- DSel=0 DScale=0 PME-00:02.0 USB Controller: nVidia Corporation:
Unknown device 00e7 (rev a1) (prog-if 10 [OHCI])        Subsystem: Unknown
device 1695:100c        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF-
FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-       
Latency: 0 (750ns min, 250ns max)        Interrupt: pin A routed to IRQ 11     
  Region 0: Memory at ec003000 (32-bit, non-prefetchable)        Capabilities:
[44] Power Management version 2                Flags: PMEClk- DSI- D1+ D2+
AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)                Status: D0
PME-Enable- DSel=0 DScale=0 PME-00:02.1 USB Controller: nVidia Corporation:
Unknown device 00e7 (rev a1) (prog-if 10 [OHCI])        Subsystem: Unknown
device 1695:100c        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF-
FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-       
Latency: 0 (750ns min, 250ns max)        Interrupt: pin B routed to IRQ 11     
  Region 0: Memory at ec004000 (32-bit, non-prefetchable)        Capabilities:
[44] Power Management version 2                Flags: PMEClk- DSI- D1+ D2+
AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)                Status: D0
PME-Enable- DSel=0 DScale=0 PME-00:02.2 USB Controller: nVidia Corporation:
Unknown device 00e8 (rev a2) (prog-if 20 [EHCI])        Subsystem: Unknown
device 1695:100c        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV-
VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF-
FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-       
Latency: 0 (750ns min, 250ns max)        Interrupt: pin C routed to IRQ 11     
  Region 0: Memory at ec005000 (32-bit, non-prefetchable)        Capabilities:
[44] #0a [2098]        Capabilities: [80] Power Management version 2           
    Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) 
              Status: D0 PME-Enable- DSel=0 DScale=0 PME-00:05.0 Bridge: nVidia
Corporation: Unknown device 00df (rev a2)        Subsystem: Unknown device
1695:100c        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF- FastB2B+
ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-        Latency: 0
(250ns min, 5000ns max)        Interrupt: pin A routed to IRQ 5        Region 0:
Memory at ec000000 (32-bit, non-prefetchable)        Region 1: I/O ports at b400
[size=8]        Capabilities: [44] Power Management version 2               
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)     
          Status: D0 PME-Enable- DSel=0 DScale=0 PME-00:06.0 Multimedia audio
controller: nVidia Corporation: Unknown device 00ea (rev a1)        Subsystem:
Unknown device 1695:100b        Control: I/O+ Mem+ BusMaster+ SpecCycle-
MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+
UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-     
  Latency: 0 (500ns min, 1250ns max)        Interrupt: pin A routed to IRQ 5   
    Region 0: I/O ports at b800        Region 1: I/O ports at bc00 [size=128]  
     Region 2: Memory at ec001000 (32-bit, non-prefetchable) [size=4K]       
Capabilities: [44] Power Management version 2                Flags: PMEClk- DSI-
D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)                Status: D0
PME-Enable- DSel=0 DScale=0 PME-00:08.0 IDE interface: nVidia Corporation:
Unknown device 00e5 (rev a2) (prog-if 8a [Master SecP PriP])        Subsystem:
Unknown device 1695:100c        Control: I/O+ Mem- BusMaster+ SpecCycle-
MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+
UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-     
  Latency: 0 (750ns min, 250ns max)        Region 4: I/O ports at f000 [size=16]
       Capabilities: [44] Power Management version 2                Flags:
PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)            
   Status: D0 PME-Enable- DSel=0 DScale=0 PME-00:0a.0 IDE interface: nVidia
Corporation: Unknown device 00e3 (rev a2) (prog-if 85 [Master SecO PriO])      
 Subsystem: Unknown device 1695:100c        Control: I/O+ Mem- BusMaster+
SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status:
Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR-
<PERR-        Latency: 0 (750ns min, 250ns max)        Interrupt: pin A routed
to IRQ 11        Region 0: I/O ports at 09f0        Region 1: I/O ports at 0bf0
[size=4]        Region 2: I/O ports at 0970 [size=8]        Region 3: I/O ports
at 0b70 [size=4]        Region 4: I/O ports at d800 [size=16]        Region 5:
I/O ports at dc00 [size=128]        Capabilities: [44] Power Management version
2                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)                Status: D0 PME-Enable- DSel=0
DScale=0 PME-00:0b.0 PCI bridge: nVidia Corporation: Unknown device 00e2 (rev
a2) (prog-if 00 [Normal decode])        Control: I/O+ Mem+ BusMaster+ SpecCycle-
MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-        Status: Cap- 66Mhz+
UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-   
    Latency: 16        Bus: primary=00, secondary=01, subordinate=01,
sec-latency=10        I/O behind bridge: 0000a000-0000afff        Memory behind
bridge: e8000000-e9ffffff        Prefetchable memory behind bridge:
c0000000-dfffffff        Expansion ROM at 0000a000 [disabled] [size=4K]       
BridgeCtl: Parity- SERR+ NoISA+ VGA+ MAbort- >Reset- FastB2B-00:0e.0 PCI bridge:
nVidia Corporation: Unknown device 00ed (rev a2) (prog-if 00 [Normal decode])  
     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-        Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr-
DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-        Latency: 0       
Bus: primary=00, secondary=02, subordinate=02, sec-latency=128        I/O behind
bridge: 00008000-00009fff        Memory behind bridge: ea000000-ebffffff       
Prefetchable memory behind bridge: fff00000-000fffff        Expansion ROM at
00008000 [disabled] [size=8K]        BridgeCtl: Parity- SERR+ NoISA+ VGA-
MAbort- >Reset- FastB2B-00:18.0 Host bridge: Advanced Micro Devices [AMD] K8
NorthBridge        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz- UDF- FastB2B-
ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-       
Capabilities: [80] #08 [2101]00:18.1 Host bridge: Advanced Micro Devices [AMD]
K8 NorthBridge        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV-
VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap- 66Mhz- UDF-
FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-00:18.2
Host bridge: Advanced Micro Devices [AMD] K8 NorthBridge        Control: I/O-
Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- 
      Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR-00:18.3 Host bridge: Advanced Micro Devices [AMD] K8
NorthBridge        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-        Status: Cap- 66Mhz- UDF- FastB2B-
ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-01:00.0 VGA
compatible controller: ATI Technologies Inc RV350 AP [Radeon 9600] (prog-if 00
[VGA])        Subsystem: PC Partner Limited: Unknown device 7c20        Control:
I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR-
FastB2B-        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-        Latency: 32 (2000ns min), Cache Line Size
08        Interrupt: pin A routed to IRQ 11        Region 0: Memory at c0000000
(32-bit, prefetchable)        Region 1: I/O ports at a000 [size=256]       
Region 2: Memory at e9000000 (32-bit, non-prefetchable) [size=64K]       
Capabilities: [58] AGP version 3.0                Status: RQ=256 Iso- ArqSz=0
Cal=0 SBA+ ITACoh- GART64- HTrans- 64bit- FW+ AGP3+ Rate=x4,x8               
Command: RQ=1 ArqSz=0 Cal=0 SBA+ AGP- GART64- 64bit- FW- Rate=<none>       
Capabilities: [50] Power Management version 2                Flags: PMEClk- DSI-
D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)                Status: D0
PME-Enable- DSel=0 DScale=0 PME-01:00.1 Display controller: ATI Technologies Inc
RV350 AP [Radeon 9600] (Secondary)        Subsystem: PC Partner Limited: Unknown
device 7c21        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF- FastB2B+
ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-        Region 0:
Memory at d0000000 (32-bit, prefetchable) [disabled]        Region 1: Memory at
e9010000 (32-bit, non-prefetchable) [disabled] [size=64K]        Capabilities:
[50] Power Management version 2                Flags: PMEClk- DSI- D1+ D2+
AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)                Status: D0
PME-Enable- DSel=0 DScale=0 PME-02:06.0 Ethernet controller: Digital Equipment
Corporation DECchip 21140 [FasterNet] (rev 22)        Subsystem: D-Link System
Inc DFE-500TX Fast Ethernet        Control: I/O+ Mem+ BusMaster+ SpecCycle-
MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-        Status: Cap- 66Mhz-
UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-   
    Latency: 32 (5000ns min, 10000ns max), Cache Line Size 08        Interrupt:
pin A routed to IRQ 11        Region 0: I/O ports at 8000        Region 1:
Memory at eb000000 (32-bit, non-prefetchable) [size=128]02:0c.0 RAID bus
controller: Silicon Image, Inc. (formerly CMD Technology Inc) Silicon Image SiI
3114 SATARaid Controller (rev 02)        Subsystem: Unknown device 1695:9018   
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr-
DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-        Latency: 32, Cache
Line Size 08        Interrupt: pin A routed to IRQ 5        Region 0: I/O ports
at 8400        Region 1: I/O ports at 8800 [size=4]        Region 2: I/O ports
at 8c00 [size=8]        Region 3: I/O ports at 9000 [size=4]        Region 4:
I/O ports at 9400 [size=16]        Region 5: Memory at eb001000 (32-bit,
non-prefetchable) [size=1K]        Capabilities: [60] Power Management version 2
               Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)                Status: D0 PME-Enable- DSel=0
DScale=2 PME-
Comment 1 Chris Osgood 2004-10-07 15:20:30 UTC
I can confirm this behavior.  I have the same hard drive on a dual Opteron DK8N
nForce3 Pro 250 chipset board and the errors are identical.

Seagate ST3200822AS hard drive on nForce chipset will not detect the drives.

This problem only occurs in the newer libata SCSI stuff (sata_nv), the old
deprecated IDE/SATA "AMD and nVidia IDE support" works fine. 
Comment 2 Chris Osgood 2004-10-07 15:37:17 UTC
After more testing I can confirm this also exists in the very latest kernels. 
2.6.9-rc3-bk7 and 2.6.9-rc3-mm3 exhibit the same behavior.  Both 32-bit and
64-bit kernels have the same behavior as well.

Some additional output (it appears to detect the controller fine; I have two of
these drives connected):

ACPI: PCI interrupt 0000:00:0a.0[A] -> GSI 11 (level, low) -> IRQ 11
ata1: SATA max UDMA/133 cmd 0xEC00 ctl 0xE802 dmdma 0xDC00 irq 11
ata2: SATA max UDMA/133 cmd 0xE400 ctl 0xE002 dmdma 0xDC08 irq 11
nv_data: Primary device added
...

Then the "slow to respond" and eventually it times out.
Comment 3 Andrew Chew 2004-10-07 15:54:27 UTC
Haven't had time lately, but at a certain point two others have contacted me 
regarding a bug that appears symptomatically identical to the one reported 
here.  They were both using Epox motherboards.

The DK8N is, what, an iWill board?

I'll look into this.

By the way, a workaround that worked for the aforementioned two people was to 
have sata_nv.c override the phy_reset() callback in the ata_port_operations 
table (by defining our own function, nv_phy_reset(), and setting phy_reset to 
nv_phy_reset).

In nv_phy_reset(), we're going to do what sata_phy_reset() does EXCEPT for the 
actual phy reset.  So we want to copy in the contents of sata_phy_reset(), but 
exclude the lines that read from

if (ap->flags & ATA_FLAG_SATA_RESET) {

 all the way down to 

} while (time_before(jiffies, timeout));

In hindsight, you might just have to comment out the ATA_FLAG_SATA_RESET from 
sata_nv.c's host_flags.  This may accomplish the same thing.

Not sure when I'll get to this, but if you guys get around to it, can you let 
me know how this works out for you?
Comment 4 Chris Osgood 2004-10-07 20:15:49 UTC
Yes, the DK8N is an Iwill board.  Taking out the ATA_FLAG_SATA_RESET appears to
make it work.  The system came up fine with the sata_nv driver and I'm using it
right now.

Is this something specific to this hardware combination?  I was thinking it is a
Seagate SATA issue.  What happens with this code change, can the drive now not
be reset?

I made the following change to kernel 2.6.9-rc3-bk7 sata_nv.c and it seems to
work so far:

--- sata_nv.c_orig      2004-10-07 23:06:40.721827293 -0400
+++ sata_nv.c   2004-10-07 23:10:45.234309240 -0400
@@ -221,7 +221,7 @@
 static struct ata_port_info nv_port_info = {
        .sht            = &nv_sht,
        .host_flags     = ATA_FLAG_SATA |
-                         ATA_FLAG_SATA_RESET |
+                         /*ATA_FLAG_SATA_RESET |*/
                          ATA_FLAG_SRST |
                          ATA_FLAG_NO_LEGACY,
        .pio_mask       = NV_PIO_MASK,
Comment 5 Andrew Chew 2004-10-07 20:21:43 UTC
The BIOS typically does a reset of the SATA phy.  As long as you don't do any 
hotplug (which isn't supported by libata yet anyway), this shouldn't be a 
problem.  This still needs to be solved to prepare for hotplug, though.

You should be safe with this workaround until a proper fix is found.
Comment 6 Andriy Palamarchuk 2004-11-28 13:34:21 UTC
Wonder if it is the same problem, but I can't install Fedore Core 3 on the MSI
K7N2GM2 motherboard with nForce2 chipset. sata_nv is loaded but when time comes
to format hard drive it complains "no valid devices were found". I had similiar
problem with Mandrake 10.1, however I was able to install Mandrake 10.

Let me know if I can provide more information.
Andriy
Comment 7 Andrew Chew 2004-11-29 19:08:25 UTC
There's another workaround I'd like to try to fix the SATA phy reset issue.  
Will someone who's encountered this problem volunteer to spend some time with 
me to test the new workaround?
Comment 8 Andriy Palamarchuk 2004-11-29 19:46:55 UTC
I have drive Seagate ST380013AS SATA (Serial#: 3JVC1Q6S) 80GB.
I will try to do what I can to fix the issue.
What I need to do?
Comment 9 Philip Long 2004-11-30 03:19:49 UTC
I have the same problem as everyone else here, but with the 2.6.9 kernel as
shipped with Fedora Core 3.

My hardware is:
Gigabyte GA-K8NS Motherboard (chipset nVidia nForce3 250) 
200G Maxtor DMax+10 SATA150/7200/8M Hard Drive

sata_nv module is loaded, but times out and fails to find the drive. The
deprecated IDE driver works fine, if I compile a kernel that reverts to it.
Comment 10 Pablo 2004-12-06 05:55:54 UTC
 I have sata_nv with 2 discs in raid 0 mode. Taking out the ATA_FLAG_SATA_RESET
sorts the timeout problem but both 2 disks are detected separately instead of 1
disc so it doesn
Comment 11 Andrew Chew 2004-12-07 12:29:24 UTC
It seems that the NVIDIA SATA controller needs more time to settle between the 
reset bit write and the reset bit clear.  Can I get you guys to do a little 
experiment for me?

In drivers/scsi/libata-core.c, look for a function called __sata_phy_reset().  
There should be a "udelay(400);", with the comment "/* FIXME: a guess */".  
Can you change that 400 to a 1000, rebuild, and see if the problem goes away 
for you?  (Put that ATA_FLAG_SATA_RESET flag back in, in sata_nv.c, of course.)

This fixes the problem for one user so far (thanks, Joseph!)  If this works 
for others as well, I will work on a patch for lkml (either increasing the 
delay in libata-core.c as per this experiment, or add a custom SATA phy reset 
routine to sata_nv.c.
Comment 12 Ulrich Petri 2004-12-07 15:48:06 UTC
Hi,

i have had the same symptoms (slow to respond, timeout) with my nForce3 250Gb
based MSI K8N Neo with one Samsung SV1604N attached via a SATA-2-PATA converter.
I tried Andrew Chew's fix (increasing udelay to 1000 in libata-core.c) and can
confirm that it indeed does fix the problem!

Thanks.

Ciao Ulrich
Comment 13 Andrew Chew 2004-12-07 15:53:28 UTC
Excellent.  This is good news.

Make sure you undo the ATA_FLAG_SATA_RESET removal workaround, of course.  
Otherwise, the reset code isn't even getting entered!
Comment 14 Andrew Chew 2004-12-07 16:40:56 UTC
I'd also be interested in seeing if replacing that "udelay(400);" with "msleep
(1);" would work as well.  It's friendlier than having the CPU busy spin.  Can 
you guys also give this a try?  If that doesn't work, then we can use "udelay
(1000);".
Comment 15 leha 2004-12-08 12:26:41 UTC
I tried "msleep(1);", looks like it is working good with 2.6.9-gentoo-r9 (MSI
k8n neo platinum).
Comment 16 Eric Koldeweij 2004-12-08 16:10:22 UTC
I am using a MSI K8N Neo2 Platinum mainboard (NForce3-250Gb chipset). I have 3
Maxtor DiamondMax 10 300Gb disks (model 6B300S0) installed. I've experienced all
the problems described here with all 3 HDs installed, with 2 or less installed I
did not have problems booting the kernel (plenty of problems, but not sata_nv
related)

kernel is
Linux sirius 2.6.9-1.681_FC3 #1 Thu Nov 18 15:13:22 EST 2004 x86_64 x86_64
x86_64 GNU/Linux

Applying the patch described in "Additional Comment #11 From Andrew Chew 
2004-12-07 12:29", setting udelay(400) to udelay(1000) did NOT solve the problem
for me, I still got the "ata3 is slow to respond" messages - always sata3 no
matter which sata ports I used and no difference at all compared to the original
version.

Applying the patch described in "Additional Comment #4 From Chris Osgood 
2004-10-07 20:15" DID do the trick, for the first time my system is running with
all three disks happily spinnning.

It seems that just changing the waiting period does not always solve the problem
but it might be that even a delay of 1000ms is not enough - maybe the more disks
installed the more time it needs? If you wish I can do some tests changing the
delay times but it's a bit cumbersome for me because my LVM volumes refuse to
come up every time after I add or remove a disk causing it to bail out
prematurely (FC3 initscripts problem, not kernel-related)

A final piece of possibly useful information, the BIOS of the board itself has
the same problems, with 3 disks atatched it just hangs at the IDE/SATA detection
phase, with 2 disks and the latest official BIOS version (1.30) it works but
only after a powercycle. After applying the latest BIOS beta version (1.51) it
passes detection and allows the OS to boot.
Comment 17 Andrew Chew 2004-12-10 15:48:21 UTC
Eric, does increasing the delay work for you if you only have one SATA disk?
Comment 18 Tegia Orion 2004-12-12 01:11:54 UTC
If i want to install suse 9.2 pro, where and what i must change that this
intallation could find my hdd ?
Comment 19 Philip Long 2004-12-12 05:48:59 UTC
Tegia, If you need to ask that question then you're pretty much screwed. You
won't be able to install at all if the installation disk is based on the  2.6.8
or greater Kernel,  as the install disk won't even see your hard disk. Easy
solution if you have not bought the machine yet, avoid the nVidia nForce chipset!

Solutions (For those who can't / don't want to compile custom kernels)

1) Use an older Linux distribution, based on an older Kernel. Remember not to
update it when you have installed it, until this bug is fixed!

2) Use a PATA hard disk. 

3) Use a SATA controller not based on the nForce chipset, that is if your
motherboard has some

or 

4)Install Windows and laugh at all your Linux friends when they complain about
how buggy Windows is (oops I think I might have given the game away and shown
you how much this bug has ^W^W^W annoyed me!)
Comment 20 Anton Bakken 2004-12-12 18:45:16 UTC
Firstly, thanx guys for this bug post. I have a K7N2 Delta2 motherboard that
uses the nforce3 chipset. The udelay(1000) and also the msleep(1) options for
the libata-core.c worked fine for my board. I now have two 200G ST3200822AS
disks, that I can use. 

Secondly Tegia, sorry I don't have an exact answer for you, but you may want to
look into using diff to make patch files. I know some flavors of linux have an
expert install mode where you can use a command "patch" or "updatemodule". To
install with either of those types of files from a floppy disk. As I said sorry
I can't be of more help, because I have not used SUSE but I am still sure that
it will have either of these options. Sorry I will leave it up to you to find
out how to get it to work. Google is your friend.  
Comment 21 Philip Long 2004-12-13 12:00:34 UTC
OK, Just found the time to play with my Kernel. The udelay(1000) and msleep(1)
fixes work great for me. Hope this makes it into Kernel 2.6.10. 

(Gigabyte K8NS / Maxtor 200G D-Max 10 HDD.)
Comment 22 Daniel Drake 2004-12-14 02:47:27 UTC
Followup to comment #15:
The gentoo kernel has the patch in comment #4 applied, you may wish to confirm
that you reverted this before applying the msleep fix.
Comment 23 Eric Koldeweij 2004-12-19 05:27:09 UTC
Update on Comment #16:
First I have to apologise for my earlier post as it was not correct. After
patching the libata and sata_nv modules I forgot to include them in my initrd
image so they never got loaded :S

This is the status so far with all 3 HDs installed:
- removing the ATA_FLAG_SATA_RESET from sata_nv.c works. It always boots fine.
- setting the delay at 1000 works in almost all cases at a cold boot. However,
it does not work at a warm boot.
- setting the delay to even higher values (went as far as 5000) did not change
anything. Works at cold boot, does not work at warm boot.

Hope this helps a bit.
Comment 24 Brendan Miller 2004-12-19 22:19:58 UTC
Another data point from another nForce3 250Gb user...

Machine is a Biostar iDEQ 210P w/ Athlon 64 3400+...

FC3 uses kernel 2.6.9-1.667 which successfully loads sata_nv and can detect
single SATA drive to do the install, but boot from SATA fails when trying to
detect SATA after initrd loads.  Then no root filesystem causes problems.  I
thought it was weird that it worked every time I boot FC3 from the DVD (even in
rescue mode), but not from the hard drive.  Maybe this has something to do with
the warm boot/cold boot issue Eric reported.

The msleep(1) did not work.  I did not try the udelay(1000).  Patching to remove
the ATA_FLAG_SATA_RESET flag does work, and I booted the DVD into rescue mode,
downloaded the kernel source, patched, rebuilt the modules, and rebuilt the
initrd image to get my new FC3 install to boot.  Been running this way for a few
days now.  So it would be nice to figure out whether or not the reset is
necessary or not, and if so, what the right delay is.

Good luck, and thanks for your investigations.  Since this is a new machine on
which I do not yet rely 100%, I would be willing to test new ideas.

Brendan
Comment 25 Andrew Chew 2004-12-20 13:02:19 UTC
Well, that's unfortunate.  Sounds like this increased delay doesn't fix this 
specific problem.

Can you guys try removing the ATA_FLAG_SRST, rather than ATA_FLAG_SATA_RESET, 
and see if this changes any behavior?  In any case, my goal is to try to get 
the workaround in for the 2.6.10 kernel.
Comment 26 Anton Bakken 2004-12-21 07:13:55 UTC
Hey Andrew, I was about to comment that I spoke to soon. I have an MSI K7N2
Delta2 Platinum board, with two ST3200822AS HD's setup as RAID 1. I am using the
2.6.8.1-12mdk kernel that comes with Mandrake 10.1. The udelay(1000) and
msleep(1) patches for the libata-core.c worked to get the kernel to recognize
that HD's were connected to the controller. I was able to set everything up but
I to have a problem with autodetect, not just on warm boot, it varies. I notice
that there are a lot of timeout messages when the two drives are rebuilding. I
tried using the depreciated SCSI support without the libata, which works a bit
better but I still get DMA timeouts when the drives are rebuilding. I have some
time today so I will try taking out the ATA_FLAG_SRST function, instead of the
ATA_FLAG_SATA_RESET. I will also assume that you want the libata-core.c left
patched. I will post my findings. 
Comment 27 Anton Bakken 2004-12-21 08:17:09 UTC
I rebuilt the kernel with the ATA_FLAG_SRST commented out. I still have the same
problem. It's still picky when autodetect at startup. I also still have the
timeouts,

(ata1:command 0x35 timeout, stat 0x50 host_stat 0x4)
(ata2:command 0x35 timeout, stat 0x50 host_stat 0x4)

when the drivers are rebuilding (syncing in general). I notice that I only get
the timeouts when I try to use the mount at the same time it is rebuilding.
Comment 28 Jeremy Stanley 2004-12-27 18:59:15 UTC
I have an ASUS K8N-E Deluxe, and I'm running FC3 x86_64 with the Red
Hat-supplied kernel 2.6.9-1.681_FC3.

I have two SATA drives plugged into the NVIDIA SATA ports:
1. Western Digital WDC-1600JD 160GB 
2. Seagate Barracuda 7200.7 160GB
 
Linux sees the WD drive just fine, but does not see the Seagate.  I get a
30-second boot delay before "ata2 failed to respond".

I'll see if I can muck around with the sata_nv module and apply the
workaround(s) listed here.  It's been awhile since I've done any kernel hacking
though.
Comment 29 Jeremy Stanley 2004-12-28 11:42:23 UTC
Follow-up to #28: Updating to the 2.6.10 kernel (which comments out the
ATA_FLAG_SATA_RESET) solved my problem; now both SATA drives are detected and
function properly.
Comment 30 Anton Bakken 2005-01-04 11:44:38 UTC
Just for some additional information, I tried a 120G Maxtor Drive on my MSI 
K7N2 Delta2 Platinum board. I only had the one drive so I can't try raid, that 
and it's got some windows information I would like to keep. However on install 
and startup, Linux is able to find the HD and mount it for use, without making 
any chages to libata-core or sata_nv. I'm assuming that the current problem is 
due to the size of the drives? Not sure if everyone was already aware of this? 
Comment 31 Anton Bakken 2005-01-05 13:56:10 UTC
Last thing.... Then I guess I'll give up on my comments, cause I've heard
nothing in a while. 

The two ST3200822AS SATA (200GB) HD's I have, I connected to an old MSI K7N2
Delta Board that uses the nForce2 chipset. 

It found the Drives fine on install. Config of RAID1 went fine, (no timeouts). 

However I have the problem that the drives are not found on warm or cold boot.
Comment 32 Flup 2005-09-07 10:55:18 UTC
I did some tests with the suggested fixes here.

Hardware environment:
MSI PT8 Neo-V motherboard
VIA PT800 northbridge
VIA VT8273 southbridge (integrated sata support)
Seagate ST320082 2AS sata disk (200GB)
2 ide western digital drives
Gentoo 2005.1 kernel-2.6.12-gentoo-r10

I applied the following fixes. These fixes are always applied to the original
source code.
fix 1: sata_via.c line 237
       - ATA_FLAG_SATA_RESET |
       + /*ATA_FLAG_SATA_RESET*/
fix 2: sata_via.c line 237
       - ATA_FLAG_SATA_RESET |
       + ATA_FLAG_SRST
fix 3: libata-core.c line 1411
       - udelay(400)
       + udelay(10000)

I tested them with a cold boot (halt -> power off -> power on) and a warm boot
(reboot). These are the results:

               cold boot   warm boot
fix1           ok          failed
fix2           ok          failed
fix3           nt (*)      failed
original (**)  ok          failed

(*) not tested
(**) results with the original source code

output when there's a fail:
libata version 1.11 loaded.
sata_via version 1.1
sata_via(0000:00:0f.0): routed to hard irq line 11
ata1: SATA max UDMA/133 cmd 0xE800 ctl 0xE402 bmdma 0xD800 irq 11
ata2: SATA max UDMA/133 cmd 0xE000 ctl 0xDC02 bmdma 0xD808 irq 11
ata1 is slow to respond, please be patient
ata1 failed to respond (30 secs)
scsi0 : sata_via
ata2: no device found (phy stat 00000000)
scsi1 : sata_via

conclusion:
cold boot always works, warm boot never works...

request: a fix for the warm boot please.
I really don't understand why there's a difference between cold and warm boots.
A boot is a boot, wether it's cold or warm... no? Except for the seagate drive
that resets something that happens automatically on 'halt -> power on', and does
not happen for a warm reboot. So the logical question is: is it possible to do
that 'reset thing' for the drive in source code?
Windows xp always 'recognizes' the drive, either it's a cold or a warm boot. But
it always takes much longer (approx. 4s) to 'recognize' the drive, compared with
the time it takes for linux (< 1s). (I used quotation marks for 'recognize'
because I don't know if that is the right terminology).

I'm migrating from windows to linux, and I actually bought that seagate drive to
install linux on. I needed to repartition one of my ide hard drives to install a
 (temporary) gentoo distribution on. So I hope you see how a frustrating bug
this is for me.
Comment 33 Alan 2007-06-18 07:59:06 UTC
Is anyone still seeing this problem with 2.6.21+ ?
Comment 34 Tejun Heo 2007-08-08 21:04:03 UTC
This bug is way too cold.  I think it's better to close now.