Bug 3352
Summary: | (sata nv) module fails to find drives | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | John Stebbins (stebbins) |
Component: | Serial ATA | Assignee: | Andrew Chew (achew) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | alan, apa3a, benny+bugzilla, bunk, eric, herbert, jgarzik, johann, kiall, martin, nagendra.cl, pklong, stanmuffin, u288 |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.8.1 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
John Stebbins
2004-09-06 14:42:13 UTC
I can confirm this behavior. I have the same hard drive on a dual Opteron DK8N nForce3 Pro 250 chipset board and the errors are identical. Seagate ST3200822AS hard drive on nForce chipset will not detect the drives. This problem only occurs in the newer libata SCSI stuff (sata_nv), the old deprecated IDE/SATA "AMD and nVidia IDE support" works fine. After more testing I can confirm this also exists in the very latest kernels. 2.6.9-rc3-bk7 and 2.6.9-rc3-mm3 exhibit the same behavior. Both 32-bit and 64-bit kernels have the same behavior as well. Some additional output (it appears to detect the controller fine; I have two of these drives connected): ACPI: PCI interrupt 0000:00:0a.0[A] -> GSI 11 (level, low) -> IRQ 11 ata1: SATA max UDMA/133 cmd 0xEC00 ctl 0xE802 dmdma 0xDC00 irq 11 ata2: SATA max UDMA/133 cmd 0xE400 ctl 0xE002 dmdma 0xDC08 irq 11 nv_data: Primary device added ... Then the "slow to respond" and eventually it times out. Haven't had time lately, but at a certain point two others have contacted me regarding a bug that appears symptomatically identical to the one reported here. They were both using Epox motherboards. The DK8N is, what, an iWill board? I'll look into this. By the way, a workaround that worked for the aforementioned two people was to have sata_nv.c override the phy_reset() callback in the ata_port_operations table (by defining our own function, nv_phy_reset(), and setting phy_reset to nv_phy_reset). In nv_phy_reset(), we're going to do what sata_phy_reset() does EXCEPT for the actual phy reset. So we want to copy in the contents of sata_phy_reset(), but exclude the lines that read from if (ap->flags & ATA_FLAG_SATA_RESET) { all the way down to } while (time_before(jiffies, timeout)); In hindsight, you might just have to comment out the ATA_FLAG_SATA_RESET from sata_nv.c's host_flags. This may accomplish the same thing. Not sure when I'll get to this, but if you guys get around to it, can you let me know how this works out for you? Yes, the DK8N is an Iwill board. Taking out the ATA_FLAG_SATA_RESET appears to make it work. The system came up fine with the sata_nv driver and I'm using it right now. Is this something specific to this hardware combination? I was thinking it is a Seagate SATA issue. What happens with this code change, can the drive now not be reset? I made the following change to kernel 2.6.9-rc3-bk7 sata_nv.c and it seems to work so far: --- sata_nv.c_orig 2004-10-07 23:06:40.721827293 -0400 +++ sata_nv.c 2004-10-07 23:10:45.234309240 -0400 @@ -221,7 +221,7 @@ static struct ata_port_info nv_port_info = { .sht = &nv_sht, .host_flags = ATA_FLAG_SATA | - ATA_FLAG_SATA_RESET | + /*ATA_FLAG_SATA_RESET |*/ ATA_FLAG_SRST | ATA_FLAG_NO_LEGACY, .pio_mask = NV_PIO_MASK, The BIOS typically does a reset of the SATA phy. As long as you don't do any hotplug (which isn't supported by libata yet anyway), this shouldn't be a problem. This still needs to be solved to prepare for hotplug, though. You should be safe with this workaround until a proper fix is found. Wonder if it is the same problem, but I can't install Fedore Core 3 on the MSI K7N2GM2 motherboard with nForce2 chipset. sata_nv is loaded but when time comes to format hard drive it complains "no valid devices were found". I had similiar problem with Mandrake 10.1, however I was able to install Mandrake 10. Let me know if I can provide more information. Andriy There's another workaround I'd like to try to fix the SATA phy reset issue. Will someone who's encountered this problem volunteer to spend some time with me to test the new workaround? I have drive Seagate ST380013AS SATA (Serial#: 3JVC1Q6S) 80GB. I will try to do what I can to fix the issue. What I need to do? I have the same problem as everyone else here, but with the 2.6.9 kernel as shipped with Fedora Core 3. My hardware is: Gigabyte GA-K8NS Motherboard (chipset nVidia nForce3 250) 200G Maxtor DMax+10 SATA150/7200/8M Hard Drive sata_nv module is loaded, but times out and fails to find the drive. The deprecated IDE driver works fine, if I compile a kernel that reverts to it. I have sata_nv with 2 discs in raid 0 mode. Taking out the ATA_FLAG_SATA_RESET sorts the timeout problem but both 2 disks are detected separately instead of 1 disc so it doesn It seems that the NVIDIA SATA controller needs more time to settle between the reset bit write and the reset bit clear. Can I get you guys to do a little experiment for me? In drivers/scsi/libata-core.c, look for a function called __sata_phy_reset(). There should be a "udelay(400);", with the comment "/* FIXME: a guess */". Can you change that 400 to a 1000, rebuild, and see if the problem goes away for you? (Put that ATA_FLAG_SATA_RESET flag back in, in sata_nv.c, of course.) This fixes the problem for one user so far (thanks, Joseph!) If this works for others as well, I will work on a patch for lkml (either increasing the delay in libata-core.c as per this experiment, or add a custom SATA phy reset routine to sata_nv.c. Hi, i have had the same symptoms (slow to respond, timeout) with my nForce3 250Gb based MSI K8N Neo with one Samsung SV1604N attached via a SATA-2-PATA converter. I tried Andrew Chew's fix (increasing udelay to 1000 in libata-core.c) and can confirm that it indeed does fix the problem! Thanks. Ciao Ulrich Excellent. This is good news. Make sure you undo the ATA_FLAG_SATA_RESET removal workaround, of course. Otherwise, the reset code isn't even getting entered! I'd also be interested in seeing if replacing that "udelay(400);" with "msleep (1);" would work as well. It's friendlier than having the CPU busy spin. Can you guys also give this a try? If that doesn't work, then we can use "udelay (1000);". I tried "msleep(1);", looks like it is working good with 2.6.9-gentoo-r9 (MSI k8n neo platinum). I am using a MSI K8N Neo2 Platinum mainboard (NForce3-250Gb chipset). I have 3 Maxtor DiamondMax 10 300Gb disks (model 6B300S0) installed. I've experienced all the problems described here with all 3 HDs installed, with 2 or less installed I did not have problems booting the kernel (plenty of problems, but not sata_nv related) kernel is Linux sirius 2.6.9-1.681_FC3 #1 Thu Nov 18 15:13:22 EST 2004 x86_64 x86_64 x86_64 GNU/Linux Applying the patch described in "Additional Comment #11 From Andrew Chew 2004-12-07 12:29", setting udelay(400) to udelay(1000) did NOT solve the problem for me, I still got the "ata3 is slow to respond" messages - always sata3 no matter which sata ports I used and no difference at all compared to the original version. Applying the patch described in "Additional Comment #4 From Chris Osgood 2004-10-07 20:15" DID do the trick, for the first time my system is running with all three disks happily spinnning. It seems that just changing the waiting period does not always solve the problem but it might be that even a delay of 1000ms is not enough - maybe the more disks installed the more time it needs? If you wish I can do some tests changing the delay times but it's a bit cumbersome for me because my LVM volumes refuse to come up every time after I add or remove a disk causing it to bail out prematurely (FC3 initscripts problem, not kernel-related) A final piece of possibly useful information, the BIOS of the board itself has the same problems, with 3 disks atatched it just hangs at the IDE/SATA detection phase, with 2 disks and the latest official BIOS version (1.30) it works but only after a powercycle. After applying the latest BIOS beta version (1.51) it passes detection and allows the OS to boot. Eric, does increasing the delay work for you if you only have one SATA disk? If i want to install suse 9.2 pro, where and what i must change that this intallation could find my hdd ? Tegia, If you need to ask that question then you're pretty much screwed. You won't be able to install at all if the installation disk is based on the 2.6.8 or greater Kernel, as the install disk won't even see your hard disk. Easy solution if you have not bought the machine yet, avoid the nVidia nForce chipset! Solutions (For those who can't / don't want to compile custom kernels) 1) Use an older Linux distribution, based on an older Kernel. Remember not to update it when you have installed it, until this bug is fixed! 2) Use a PATA hard disk. 3) Use a SATA controller not based on the nForce chipset, that is if your motherboard has some or 4)Install Windows and laugh at all your Linux friends when they complain about how buggy Windows is (oops I think I might have given the game away and shown you how much this bug has ^W^W^W annoyed me!) Firstly, thanx guys for this bug post. I have a K7N2 Delta2 motherboard that uses the nforce3 chipset. The udelay(1000) and also the msleep(1) options for the libata-core.c worked fine for my board. I now have two 200G ST3200822AS disks, that I can use. Secondly Tegia, sorry I don't have an exact answer for you, but you may want to look into using diff to make patch files. I know some flavors of linux have an expert install mode where you can use a command "patch" or "updatemodule". To install with either of those types of files from a floppy disk. As I said sorry I can't be of more help, because I have not used SUSE but I am still sure that it will have either of these options. Sorry I will leave it up to you to find out how to get it to work. Google is your friend. OK, Just found the time to play with my Kernel. The udelay(1000) and msleep(1) fixes work great for me. Hope this makes it into Kernel 2.6.10. (Gigabyte K8NS / Maxtor 200G D-Max 10 HDD.) Followup to comment #15: The gentoo kernel has the patch in comment #4 applied, you may wish to confirm that you reverted this before applying the msleep fix. Update on Comment #16: First I have to apologise for my earlier post as it was not correct. After patching the libata and sata_nv modules I forgot to include them in my initrd image so they never got loaded :S This is the status so far with all 3 HDs installed: - removing the ATA_FLAG_SATA_RESET from sata_nv.c works. It always boots fine. - setting the delay at 1000 works in almost all cases at a cold boot. However, it does not work at a warm boot. - setting the delay to even higher values (went as far as 5000) did not change anything. Works at cold boot, does not work at warm boot. Hope this helps a bit. Another data point from another nForce3 250Gb user... Machine is a Biostar iDEQ 210P w/ Athlon 64 3400+... FC3 uses kernel 2.6.9-1.667 which successfully loads sata_nv and can detect single SATA drive to do the install, but boot from SATA fails when trying to detect SATA after initrd loads. Then no root filesystem causes problems. I thought it was weird that it worked every time I boot FC3 from the DVD (even in rescue mode), but not from the hard drive. Maybe this has something to do with the warm boot/cold boot issue Eric reported. The msleep(1) did not work. I did not try the udelay(1000). Patching to remove the ATA_FLAG_SATA_RESET flag does work, and I booted the DVD into rescue mode, downloaded the kernel source, patched, rebuilt the modules, and rebuilt the initrd image to get my new FC3 install to boot. Been running this way for a few days now. So it would be nice to figure out whether or not the reset is necessary or not, and if so, what the right delay is. Good luck, and thanks for your investigations. Since this is a new machine on which I do not yet rely 100%, I would be willing to test new ideas. Brendan Well, that's unfortunate. Sounds like this increased delay doesn't fix this specific problem. Can you guys try removing the ATA_FLAG_SRST, rather than ATA_FLAG_SATA_RESET, and see if this changes any behavior? In any case, my goal is to try to get the workaround in for the 2.6.10 kernel. Hey Andrew, I was about to comment that I spoke to soon. I have an MSI K7N2 Delta2 Platinum board, with two ST3200822AS HD's setup as RAID 1. I am using the 2.6.8.1-12mdk kernel that comes with Mandrake 10.1. The udelay(1000) and msleep(1) patches for the libata-core.c worked to get the kernel to recognize that HD's were connected to the controller. I was able to set everything up but I to have a problem with autodetect, not just on warm boot, it varies. I notice that there are a lot of timeout messages when the two drives are rebuilding. I tried using the depreciated SCSI support without the libata, which works a bit better but I still get DMA timeouts when the drives are rebuilding. I have some time today so I will try taking out the ATA_FLAG_SRST function, instead of the ATA_FLAG_SATA_RESET. I will also assume that you want the libata-core.c left patched. I will post my findings. I rebuilt the kernel with the ATA_FLAG_SRST commented out. I still have the same problem. It's still picky when autodetect at startup. I also still have the timeouts, (ata1:command 0x35 timeout, stat 0x50 host_stat 0x4) (ata2:command 0x35 timeout, stat 0x50 host_stat 0x4) when the drivers are rebuilding (syncing in general). I notice that I only get the timeouts when I try to use the mount at the same time it is rebuilding. I have an ASUS K8N-E Deluxe, and I'm running FC3 x86_64 with the Red Hat-supplied kernel 2.6.9-1.681_FC3. I have two SATA drives plugged into the NVIDIA SATA ports: 1. Western Digital WDC-1600JD 160GB 2. Seagate Barracuda 7200.7 160GB Linux sees the WD drive just fine, but does not see the Seagate. I get a 30-second boot delay before "ata2 failed to respond". I'll see if I can muck around with the sata_nv module and apply the workaround(s) listed here. It's been awhile since I've done any kernel hacking though. Follow-up to #28: Updating to the 2.6.10 kernel (which comments out the ATA_FLAG_SATA_RESET) solved my problem; now both SATA drives are detected and function properly. Just for some additional information, I tried a 120G Maxtor Drive on my MSI K7N2 Delta2 Platinum board. I only had the one drive so I can't try raid, that and it's got some windows information I would like to keep. However on install and startup, Linux is able to find the HD and mount it for use, without making any chages to libata-core or sata_nv. I'm assuming that the current problem is due to the size of the drives? Not sure if everyone was already aware of this? Last thing.... Then I guess I'll give up on my comments, cause I've heard nothing in a while. The two ST3200822AS SATA (200GB) HD's I have, I connected to an old MSI K7N2 Delta Board that uses the nForce2 chipset. It found the Drives fine on install. Config of RAID1 went fine, (no timeouts). However I have the problem that the drives are not found on warm or cold boot. I did some tests with the suggested fixes here. Hardware environment: MSI PT8 Neo-V motherboard VIA PT800 northbridge VIA VT8273 southbridge (integrated sata support) Seagate ST320082 2AS sata disk (200GB) 2 ide western digital drives Gentoo 2005.1 kernel-2.6.12-gentoo-r10 I applied the following fixes. These fixes are always applied to the original source code. fix 1: sata_via.c line 237 - ATA_FLAG_SATA_RESET | + /*ATA_FLAG_SATA_RESET*/ fix 2: sata_via.c line 237 - ATA_FLAG_SATA_RESET | + ATA_FLAG_SRST fix 3: libata-core.c line 1411 - udelay(400) + udelay(10000) I tested them with a cold boot (halt -> power off -> power on) and a warm boot (reboot). These are the results: cold boot warm boot fix1 ok failed fix2 ok failed fix3 nt (*) failed original (**) ok failed (*) not tested (**) results with the original source code output when there's a fail: libata version 1.11 loaded. sata_via version 1.1 sata_via(0000:00:0f.0): routed to hard irq line 11 ata1: SATA max UDMA/133 cmd 0xE800 ctl 0xE402 bmdma 0xD800 irq 11 ata2: SATA max UDMA/133 cmd 0xE000 ctl 0xDC02 bmdma 0xD808 irq 11 ata1 is slow to respond, please be patient ata1 failed to respond (30 secs) scsi0 : sata_via ata2: no device found (phy stat 00000000) scsi1 : sata_via conclusion: cold boot always works, warm boot never works... request: a fix for the warm boot please. I really don't understand why there's a difference between cold and warm boots. A boot is a boot, wether it's cold or warm... no? Except for the seagate drive that resets something that happens automatically on 'halt -> power on', and does not happen for a warm reboot. So the logical question is: is it possible to do that 'reset thing' for the drive in source code? Windows xp always 'recognizes' the drive, either it's a cold or a warm boot. But it always takes much longer (approx. 4s) to 'recognize' the drive, compared with the time it takes for linux (< 1s). (I used quotation marks for 'recognize' because I don't know if that is the right terminology). I'm migrating from windows to linux, and I actually bought that seagate drive to install linux on. I needed to repartition one of my ide hard drives to install a (temporary) gentoo distribution on. So I hope you see how a frustrating bug this is for me. Is anyone still seeing this problem with 2.6.21+ ? This bug is way too cold. I think it's better to close now. |