Bug 16606
Summary: | sata_sil no longer detects sata hard disk | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Dieter Plaetinck (dieter) |
Component: | Serial ATA | Assignee: | Jeff Garzik (jgarzik) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | emond.papegaaij, jbeulich, reiver, tj |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.35 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
screenshot1
screenshot2 screenshot3 reset-classify-dbg.patch /var/log/dmesg.log 2.6.34 /var/log/kernel.log 2.6.34 boot patched 2.6.35.2, shot 1 boot patched 2.6.35.2, shot 2 (/proc/interrupts) boot patched 2.6.35.2, shot 3 (kernel panic) reset-classify-dbg-1.patch boot patched 2.6.35.2, reset-classify-dbg-1.patch boot patched 2.6.35.2, reset-classify-dbg-1.patch, manual probing nodev-hint-dbg.patch boot patched 2.6.35.2, reset-classify-dbg-1.patch, 74GB disk in port 2 boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe -r boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe |
Description
Dieter Plaetinck
2010-08-16 13:19:00 UTC
fwiw, the a7n8x is a v2.0, and the sata controller is actually 3112A Created attachment 27481 [details]
screenshot1
Copying attachments for easier / future reference.
Created attachment 27482 [details]
screenshot2
Created attachment 27483 [details]
screenshot3
Created attachment 27484 [details]
reset-classify-dbg.patch
Can you please apply this patch and post the screenshot? Also, please attach the output of "hdparm -I /dev/sda", "lspci -nn" and successful boot log with 2.6.34.
Thanks.
Linux dieter-ws-a7n8x-arch 2.6.34-ARCH #1 SMP PREEMPT Tue Aug 10 21:38:22 CEST 2010 i686 Unknown CPU Typ AuthenticAMD GNU/Linux [root@dieter-ws-a7n8x-arch ~]# hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: WDC WD360GD-00FNA0 Serial Number: WD-WMAH91444247 Firmware Revision: 35.06K35 Standards: Supported: 6 5 4 Likely used: 6 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 72303840 LBA48 user addressable sectors: 72303840 Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 35304 MBytes device size with M = 1000*1000: 37019 MBytes (37 GB) cache/buffer size = 8192 KBytes (type=DualPortCache) Capabilities: LBA, IORDY(can be disabled) Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * DOWNLOAD_MICROCODE Power-Up In Standby feature set SET_MAX security extension * Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * Gen1 signaling speed (1.5Gb/s) * Host-initiated interface power management * Device-initiated interface power management Security: supported not enabled not locked not frozen not expired: security count not supported: enhanced erase HW reset results: CBLID- above Vih Device num = 0 determined by the jumper Checksum: correct [root@dieter-ws-a7n8x-arch ~]# lspci -nn 00:00.0 Host bridge [0600]: nVidia Corporation nForce2 IGP2 [10de:01e0] (rev c1) 00:00.1 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 1 [10de:01eb] (rev c1) 00:00.2 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 4 [10de:01ee] (rev c1) 00:00.3 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 3 [10de:01ed] (rev c1) 00:00.4 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 2 [10de:01ec] (rev c1) 00:00.5 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 5 [10de:01ef] (rev c1) 00:01.0 ISA bridge [0601]: nVidia Corporation nForce2 ISA Bridge [10de:0060] (rev a4) 00:01.1 SMBus [0c05]: nVidia Corporation nForce2 SMBus (MCP) [10de:0064] (rev a2) 00:02.0 USB Controller [0c03]: nVidia Corporation nForce2 USB Controller [10de:0067] (rev a4) 00:02.1 USB Controller [0c03]: nVidia Corporation nForce2 USB Controller [10de:0067] (rev a4) 00:02.2 USB Controller [0c03]: nVidia Corporation nForce2 USB Controller [10de:0068] (rev a4) 00:04.0 Ethernet controller [0200]: nVidia Corporation nForce2 Ethernet Controller [10de:0066] (rev a1) 00:05.0 Multimedia audio controller [0401]: nVidia Corporation nForce Audio Processing Unit [10de:006b] (rev a2) 00:06.0 Multimedia audio controller [0401]: nVidia Corporation nForce2 AC97 Audio Controler (MCP) [10de:006a] (rev a1) 00:08.0 PCI bridge [0604]: nVidia Corporation nForce2 External PCI Bridge [10de:006c] (rev a3) 00:09.0 IDE interface [0101]: nVidia Corporation nForce2 IDE [10de:0065] (rev a2) 00:0c.0 PCI bridge [0604]: nVidia Corporation nForce2 PCI Bridge [10de:006d] (rev a3) 00:0d.0 FireWire (IEEE 1394) [0c00]: nVidia Corporation nForce2 FireWire (IEEE 1394) Controller [10de:006e] (rev a3) 00:1e.0 PCI bridge [0604]: nVidia Corporation nForce2 AGP [10de:01e8] (rev c1) 01:0a.0 Ethernet controller [0200]: Intel Corporation 82541PI Gigabit Ethernet Controller [8086:107c] (rev 05) 01:0b.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller [1095:3112] (rev 01) 02:01.0 Ethernet controller [0200]: 3Com Corporation 3C920B-EMB Integrated Fast Ethernet Controller [Tornado] [10b7:9201] (rev 40) 03:00.0 VGA compatible controller [0300]: nVidia Corporation NV30 [GeForce FX 5800] [10de:0302] (rev a2) [root@dieter-ws-a7n8x-arch ~]# cat /var/log/boot* cat: /var/log/boot*: No such file or directory I will attach dmesg and kernel.log as attachments, I will also recompile 2.6.35.2 with your patch and try it Created attachment 27491 [details]
/var/log/dmesg.log 2.6.34
Created attachment 27492 [details]
/var/log/kernel.log 2.6.34
Created attachment 27499 [details]
boot patched 2.6.35.2, shot 1
Created attachment 27500 [details]
boot patched 2.6.35.2, shot 2 (/proc/interrupts)
Created attachment 27501 [details]
boot patched 2.6.35.2, shot 3 (kernel panic)
this i haven't seen before. with an unpatched 2.6.35.2 I could just leave the shell from the initramfs and the system would reboot. now i get errors and a kernel panic. not our primary concern in this ticket but maybe interesting to know.
Created attachment 27503 [details]
reset-classify-dbg-1.patch
Strange, I thought the problem would be in the reset part but it went well and got the correct classification of the device. It seems we'll have to follow further into probing sequence. Can you please apply this one instead and retry?
Thanks.
will do, do you still want all the pictures, or do you only need the output of modprobe/modprobe -r and dmesg | egrep 'sd|sata' ? Oh, the first pic of kernel messages w/ debug messages would be enough. reset-classify-dbg-1.patch does not apply to my 2.6.35.2 checkout (which i got from ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.35.2.tar.bz2): $ patch -p1 -i reset-classify-dbg-1.patch patching file drivers/ata/libata-core.c Hunk #1 FAILED at 5873. 1 out of 1 hunk FAILED -- saving rejects to file drivers/ata/libata-core.c.rej patching file drivers/ata/libata-eh.c Hunk #1 succeeded at 2739 (offset -1 lines). Hunk #2 succeeded at 2749 (offset -1 lines). Hunk #3 succeeded at 2928 (offset -2 lines). Hunk #4 succeeded at 2966 (offset -2 lines). Hunk #5 succeeded at 3016 (offset -2 lines). $ cat drivers/ata/libata-core.c.rej --- drivers/ata/libata-core.c +++ drivers/ata/libata-core.c @@ -5873,7 +5873,7 @@ ehi->probe_mask |= ATA_ALL_DEVICES; ehi->action |= ATA_EH_RESET; - ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET; + ehi->flags |= ATA_EHI_NO_AUTOPSY/* | ATA_EHI_QUIET*/; ap->pflags &= ~ATA_PFLAG_INITIALIZING; ap->pflags |= ATA_PFLAG_LOADING; I also cloned the 2.6.35.y git repo (from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.35.y.git), and I can't find the 39325fd revision mentioned in your patch. Dieter Sorry, it was against the current devel branch. You can just comment out ATA_EHI_QUIET in async_port_probe(). Thanks. Ok, the output is exactly the same as with the previous patch. The new messages are not triggered. Only the "online/offline after reset" ones, just like before. Hmm... if you have commented out ATA_EHI_QUIET, there should at least be extra messages like "hardresetting link", are they missing? I changed the line in async_port_probe() in drivers/ata/libata-core.c to be just: ehi->flags |= ATA_EHI_NO_AUTOPSY; (line 6029 in my 2.6.35.2 code) all the other changes have been applied by your patch. To verify: $ grep XXX drivers/ata/libata-eh.c "XXX online after reset, class=%d\n", "XXX offline after reset, class=%d\n", ata_dev_printk(dev, KERN_INFO, "XXX reval&attach: %d %d\n", ata_dev_printk(dev, KERN_INFO, "XXX ata_dev_read_id rc=%d\n", rc); ata_dev_printk(dev, KERN_INFO, "XXX ata_dev_configure rc=%d\n", rc); I'm pretty sure I compiled, installed and ran the kernel correctly. Yes, I only see the same messages as with the previous patch If ATA_EHI_QUIET isn't there, hardresetting link message gotta be there before the link on/offline messages. Can you please double check the correct patched kernel and modules are being used. If unsure, adding a printk of a short string in the module init path is usually a pretty good way of verifying that the actual patched kernel and drivers are in use. e.g. You can add a simple printk in async_port_probe() and check it gets printed out. Thanks. Ok i will verify with a DPRINTK() message in async_port_probe(). You say there should be a hardresetting link message. did you mean 'reval&attach' or 'ata_dev_configure'? because your patch doesn't contain 'hardreset' or something like that Please use printk("I'm here\n"); or something like that. DPRINK() is compiled out unless debug macro is set. The message I'm talking about is if (verbose) ata_link_printk(link, KERN_INFO, "%s resetting link\n", reset == softreset ? "soft" : "hard"); in drivers/ata/libata-eh.c::ata_eh_reset(). @verbose is set if ATA_EHI_QUIET is clear, so... Thanks. My mistake. I had forgotten to rebuild my initramfs. I have 2 images with the messages you're looking for. The first one is taken without me modprobing anything (I guess when the module gets loaded automatically). Unfortunately I could not get all output because the scrollback buffer didn't go up further. the 2nd image shows (different) output when i manually modprobe. Created attachment 27511 [details]
boot patched 2.6.35.2, reset-classify-dbg-1.patch
Created attachment 27512 [details]
boot patched 2.6.35.2, reset-classify-dbg-1.patch, manual probing
Created attachment 27513 [details]
nodev-hint-dbg.patch
Okay, ata_dev_read_id() is failing with -ENOENT. That's the mechanism to skip ghost devices on controllers w/o link registers. This is the first time I see it triggering on sata_sil. Can you please boot with the attached patch applied and post the output? Let's see which one is triggering.
Thanks.
Only this patch right, or in combination with the previous one? Fwiw i never had problems with this sata controller or the hard disks before, works with win xp as well. Having probably the same issue with same motherboard on 2.6.35(.2) where the first SATA disk is not found but only second SATA disk is found and hence assigned /dev/sda instead of /dev/sdb. The first SATA disk has root filesystem and as equivalent partition on second SATA disk is swap I get a boot panic for no root filesystem found. Currently waiting on null modem to collect output as boot_delay not effective in helping capture console before scrolled. I can confirm what Alan said: so my main disk is a 37GB sata disk. I now added a 74GB sata disk in the second sata port. Linux only makes a /dev/sda, which points to the 2nd disk. I will upload a screenshot of a 2.6.35.2 boot with reset-classify-dbg-1.patch applied and with the 2nd disk connected. Here is the hdparm info of the 2nd disk (now /dev/sdb as i'm running 2.6.34). I still need to do the experiment with the nodev-hint-dbg.patch patch, do you want me to do it with the 2nd disk attached or not? # hdparm -I /dev/sdb /dev/sdb: ATA device, with non-removable media Model Number: WDC WD740GD-00FLA1 Serial Number: WD-WMAKE1663221 Firmware Revision: 27.08D27 Standards: Supported: 6 5 4 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 145226112 LBA48 user addressable sectors: 145226112 Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 70911 MBytes device size with M = 1000*1000: 74355 MBytes (74 GB) cache/buffer size = 8192 KBytes (type=DualPortCache) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Release interrupt * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE * READ/WRITE_DMA_QUEUED Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * Gen1 signaling speed (1.5Gb/s) * Host-initiated interface power management * Device-initiated interface power management * SMART Command Transport (SCT) feature set * SCT Long Sector Access (AC1) * SCT LBA Segment Access (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) Security: supported not enabled not locked not frozen not expired: security count not supported: enhanced erase Checksum: correct Created attachment 27561 [details]
boot patched 2.6.35.2, reset-classify-dbg-1.patch, 74GB disk in port 2
Sorry, I was offline for the past few days. Yeah, applying the nodev-hint-dbg.patch alone should be enough. Can you please post screenshot w/ the patch? Thanks. okay, here we go again. a pic of modprobe -r, one of modprobe, with 2 disks attached. Created attachment 27831 [details]
boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe -r
Created attachment 27841 [details]
boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe
So, where are we at? Did my last screenshots help? Can I help? Root cause bisected and fix patch posted, http://thread.gmane.org/gmane.linux.ide/47506 Thanks. First patch from linux-ide mailing list adding sff_check_status to libata-sff.c also resolves issue. 2.6.35.5 has this patched, and works for me. Thanks all |