Bug 16606 - sata_sil no longer detects sata hard disk
Summary: sata_sil no longer detects sata hard disk
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Jeff Garzik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-08-16 13:19 UTC by Dieter Plaetinck
Modified: 2010-09-21 20:07 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.35
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
screenshot1 (77.87 KB, image/jpeg)
2010-08-17 10:24 UTC, Tejun Heo
Details
screenshot2 (67.59 KB, image/jpeg)
2010-08-17 10:24 UTC, Tejun Heo
Details
screenshot3 (55.56 KB, image/jpeg)
2010-08-17 10:25 UTC, Tejun Heo
Details
reset-classify-dbg.patch (1023 bytes, patch)
2010-08-17 11:54 UTC, Tejun Heo
Details | Diff
/var/log/dmesg.log 2.6.34 (31.82 KB, text/plain)
2010-08-17 17:33 UTC, Dieter Plaetinck
Details
/var/log/kernel.log 2.6.34 (54.35 KB, text/plain)
2010-08-17 17:36 UTC, Dieter Plaetinck
Details
boot patched 2.6.35.2, shot 1 (183.40 KB, image/jpeg)
2010-08-18 20:23 UTC, Dieter Plaetinck
Details
boot patched 2.6.35.2, shot 2 (/proc/interrupts) (169.57 KB, image/jpeg)
2010-08-18 20:24 UTC, Dieter Plaetinck
Details
boot patched 2.6.35.2, shot 3 (kernel panic) (171.33 KB, image/jpeg)
2010-08-18 20:26 UTC, Dieter Plaetinck
Details
reset-classify-dbg-1.patch (2.61 KB, patch)
2010-08-19 08:54 UTC, Tejun Heo
Details | Diff
boot patched 2.6.35.2, reset-classify-dbg-1.patch (662.90 KB, image/jpeg)
2010-08-19 15:53 UTC, Dieter Plaetinck
Details
boot patched 2.6.35.2, reset-classify-dbg-1.patch, manual probing (184.63 KB, image/jpeg)
2010-08-19 15:55 UTC, Dieter Plaetinck
Details
nodev-hint-dbg.patch (1.77 KB, patch)
2010-08-19 16:13 UTC, Tejun Heo
Details | Diff
boot patched 2.6.35.2, reset-classify-dbg-1.patch, 74GB disk in port 2 (645.05 KB, image/jpeg)
2010-08-21 06:36 UTC, Dieter Plaetinck
Details
boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe -r (202.51 KB, image/jpeg)
2010-08-24 19:56 UTC, Dieter Plaetinck
Details
boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe (117.26 KB, image/jpeg)
2010-08-24 19:57 UTC, Dieter Plaetinck
Details

Description Dieter Plaetinck 2010-08-16 13:19:00 UTC
Hi,
hardware:
- asus a7n8x motherboard
- silicon image sata controller 3112 (onboard)
- Western digital raptor 10k rpm sata hard disk.
(I can check the exact specifications and version numbers later today)

When booting 2.6.35.1 and 2.6.35.2 (Archlinux 32bit), I have a regression compared to version 2.6.34.3 which worked fine.
That is: sata_sil is loaded, sd_mod is not.  Even when I load sd_mod manually, it still does not find my hard disk (no /dev/sd* gets created)

I have made some pictures, which show `dmesg | egrep 'sd_mod|sata'`, rmmod / modprobe output, and /cat/interrupts: http://users.edpnet.be/dieter/kernel/
Comment 1 Dieter Plaetinck 2010-08-16 19:54:44 UTC
fwiw, the a7n8x is a v2.0, and the sata controller is actually 3112A
Comment 2 Tejun Heo 2010-08-17 10:24:32 UTC
Created attachment 27481 [details]
screenshot1

Copying attachments for easier / future reference.
Comment 3 Tejun Heo 2010-08-17 10:24:54 UTC
Created attachment 27482 [details]
screenshot2
Comment 4 Tejun Heo 2010-08-17 10:25:13 UTC
Created attachment 27483 [details]
screenshot3
Comment 5 Tejun Heo 2010-08-17 11:54:46 UTC
Created attachment 27484 [details]
reset-classify-dbg.patch

Can you please apply this patch and post the screenshot?  Also, please attach the output of "hdparm -I /dev/sda", "lspci -nn" and successful boot log with 2.6.34.

Thanks.
Comment 6 Dieter Plaetinck 2010-08-17 17:31:54 UTC
Linux dieter-ws-a7n8x-arch 2.6.34-ARCH #1 SMP PREEMPT Tue Aug 10 21:38:22 CEST 2010 i686 Unknown CPU Typ AuthenticAMD GNU/Linux


[root@dieter-ws-a7n8x-arch ~]# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
	Model Number:       WDC WD360GD-00FNA0                      
	Serial Number:      WD-WMAH91444247
	Firmware Revision:  35.06K35
Standards:
	Supported: 6 5 4 
	Likely used: 6
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:   72303840
	LBA48  user addressable sectors:   72303840
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:       35304 MBytes
	device size with M = 1000*1000:       37019 MBytes (37 GB)
	cache/buffer size  = 8192 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(can be disabled)
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 16
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	DOWNLOAD_MICROCODE
	    	Power-Up In Standby feature set
	    	SET_MAX security extension
	   *	Automatic Acoustic Management feature set
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Host-initiated interface power management
	   *	Device-initiated interface power management
Security: 
		supported
	not	enabled
	not	locked
	not	frozen
	not	expired: security count
	not	supported: enhanced erase
HW reset results:
	CBLID- above Vih
	Device num = 0 determined by the jumper
Checksum: correct


[root@dieter-ws-a7n8x-arch ~]# lspci -nn
00:00.0 Host bridge [0600]: nVidia Corporation nForce2 IGP2 [10de:01e0] (rev c1)
00:00.1 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 1 [10de:01eb] (rev c1)
00:00.2 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 4 [10de:01ee] (rev c1)
00:00.3 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 3 [10de:01ed] (rev c1)
00:00.4 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 2 [10de:01ec] (rev c1)
00:00.5 RAM memory [0500]: nVidia Corporation nForce2 Memory Controller 5 [10de:01ef] (rev c1)
00:01.0 ISA bridge [0601]: nVidia Corporation nForce2 ISA Bridge [10de:0060] (rev a4)
00:01.1 SMBus [0c05]: nVidia Corporation nForce2 SMBus (MCP) [10de:0064] (rev a2)
00:02.0 USB Controller [0c03]: nVidia Corporation nForce2 USB Controller [10de:0067] (rev a4)
00:02.1 USB Controller [0c03]: nVidia Corporation nForce2 USB Controller [10de:0067] (rev a4)
00:02.2 USB Controller [0c03]: nVidia Corporation nForce2 USB Controller [10de:0068] (rev a4)
00:04.0 Ethernet controller [0200]: nVidia Corporation nForce2 Ethernet Controller [10de:0066] (rev a1)
00:05.0 Multimedia audio controller [0401]: nVidia Corporation nForce Audio Processing Unit [10de:006b] (rev a2)
00:06.0 Multimedia audio controller [0401]: nVidia Corporation nForce2 AC97 Audio Controler (MCP) [10de:006a] (rev a1)
00:08.0 PCI bridge [0604]: nVidia Corporation nForce2 External PCI Bridge [10de:006c] (rev a3)
00:09.0 IDE interface [0101]: nVidia Corporation nForce2 IDE [10de:0065] (rev a2)
00:0c.0 PCI bridge [0604]: nVidia Corporation nForce2 PCI Bridge [10de:006d] (rev a3)
00:0d.0 FireWire (IEEE 1394) [0c00]: nVidia Corporation nForce2 FireWire (IEEE 1394) Controller [10de:006e] (rev a3)
00:1e.0 PCI bridge [0604]: nVidia Corporation nForce2 AGP [10de:01e8] (rev c1)
01:0a.0 Ethernet controller [0200]: Intel Corporation 82541PI Gigabit Ethernet Controller [8086:107c] (rev 05)
01:0b.0 RAID bus controller [0104]: Silicon Image, Inc. SiI 3112 [SATALink/SATARaid] Serial ATA Controller [1095:3112] (rev 01)
02:01.0 Ethernet controller [0200]: 3Com Corporation 3C920B-EMB Integrated Fast Ethernet Controller [Tornado] [10b7:9201] (rev 40)
03:00.0 VGA compatible controller [0300]: nVidia Corporation NV30 [GeForce FX 5800] [10de:0302] (rev a2)


[root@dieter-ws-a7n8x-arch ~]# cat /var/log/boot*
cat: /var/log/boot*: No such file or directory

I will attach dmesg and kernel.log as attachments,
I will also recompile 2.6.35.2 with your patch and try it
Comment 7 Dieter Plaetinck 2010-08-17 17:33:30 UTC
Created attachment 27491 [details]
/var/log/dmesg.log 2.6.34
Comment 8 Dieter Plaetinck 2010-08-17 17:36:10 UTC
Created attachment 27492 [details]
/var/log/kernel.log 2.6.34
Comment 9 Dieter Plaetinck 2010-08-18 20:23:16 UTC
Created attachment 27499 [details]
boot patched 2.6.35.2, shot 1
Comment 10 Dieter Plaetinck 2010-08-18 20:24:02 UTC
Created attachment 27500 [details]
boot patched 2.6.35.2, shot 2 (/proc/interrupts)
Comment 11 Dieter Plaetinck 2010-08-18 20:26:22 UTC
Created attachment 27501 [details]
boot patched 2.6.35.2, shot 3 (kernel panic)

this i haven't seen before. with an unpatched 2.6.35.2 I could just leave the shell from the initramfs and the system would reboot.  now i get errors and a kernel panic.  not our primary concern in this ticket but maybe interesting to know.
Comment 12 Tejun Heo 2010-08-19 08:54:52 UTC
Created attachment 27503 [details]
reset-classify-dbg-1.patch

Strange, I thought the problem would be in the reset part but it went well and got the correct classification of the device.  It seems we'll have to follow further into probing sequence.  Can you please apply this one instead and retry?

Thanks.
Comment 13 Dieter Plaetinck 2010-08-19 09:01:10 UTC
will do, do you still want all the pictures, or do you only need the output of modprobe/modprobe -r and dmesg | egrep 'sd|sata' ?
Comment 14 Tejun Heo 2010-08-19 09:08:01 UTC
Oh, the first pic of kernel messages w/ debug messages would be enough.
Comment 15 Dieter Plaetinck 2010-08-19 11:25:06 UTC
reset-classify-dbg-1.patch does not apply to my 2.6.35.2 checkout (which i got from ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.35.2.tar.bz2):

$ patch -p1 -i reset-classify-dbg-1.patch
patching file drivers/ata/libata-core.c
Hunk #1 FAILED at 5873.
1 out of 1 hunk FAILED -- saving rejects to file drivers/ata/libata-core.c.rej
patching file drivers/ata/libata-eh.c
Hunk #1 succeeded at 2739 (offset -1 lines).
Hunk #2 succeeded at 2749 (offset -1 lines).
Hunk #3 succeeded at 2928 (offset -2 lines).
Hunk #4 succeeded at 2966 (offset -2 lines).
Hunk #5 succeeded at 3016 (offset -2 lines).
$ cat drivers/ata/libata-core.c.rej
--- drivers/ata/libata-core.c
+++ drivers/ata/libata-core.c
@@ -5873,7 +5873,7 @@
 
 		ehi->probe_mask |= ATA_ALL_DEVICES;
 		ehi->action |= ATA_EH_RESET;
-		ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
+		ehi->flags |= ATA_EHI_NO_AUTOPSY/* | ATA_EHI_QUIET*/;
 
 		ap->pflags &= ~ATA_PFLAG_INITIALIZING;
 		ap->pflags |= ATA_PFLAG_LOADING;

I also cloned the 2.6.35.y git repo (from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.35.y.git), and I can't find the 39325fd revision mentioned in your patch.

Dieter
Comment 16 Tejun Heo 2010-08-19 11:27:22 UTC
Sorry, it was against the current devel branch.  You can just comment out ATA_EHI_QUIET in async_port_probe().

Thanks.
Comment 17 Dieter Plaetinck 2010-08-19 13:20:10 UTC
Ok,
the output is exactly the same as with the previous patch.  The new messages are not triggered.  Only the "online/offline after reset" ones, just like before.
Comment 18 Tejun Heo 2010-08-19 13:22:55 UTC
Hmm... if you have commented out ATA_EHI_QUIET, there should at least be extra messages like "hardresetting link", are they missing?
Comment 19 Dieter Plaetinck 2010-08-19 13:32:08 UTC
I changed the line in async_port_probe() in drivers/ata/libata-core.c to be just:
ehi->flags |= ATA_EHI_NO_AUTOPSY; (line 6029 in my 2.6.35.2 code)

all the other changes have been applied by your patch.
To verify:
$ grep XXX drivers/ata/libata-eh.c
					"XXX online after reset, class=%d\n",
					"XXX offline after reset, class=%d\n",
		ata_dev_printk(dev, KERN_INFO, "XXX reval&attach: %d %d\n",
				ata_dev_printk(dev, KERN_INFO, "XXX ata_dev_read_id rc=%d\n", rc);
		ata_dev_printk(dev, KERN_INFO, "XXX ata_dev_configure rc=%d\n", rc);

I'm pretty sure I compiled, installed and ran the kernel correctly.
Yes, I only see the same messages as with the previous patch
Comment 20 Tejun Heo 2010-08-19 13:36:54 UTC
If ATA_EHI_QUIET isn't there, hardresetting link message gotta be there before the link on/offline messages.  Can you please double check the correct patched kernel and modules are being used.  If unsure, adding a printk of a short string in the module init path is usually a pretty good way of verifying that the actual patched kernel and drivers are in use.  e.g. You can add a simple printk in async_port_probe() and check it gets printed out.

Thanks.
Comment 21 Dieter Plaetinck 2010-08-19 13:46:55 UTC
Ok i will verify with a DPRINTK() message in async_port_probe().

You say there should be a hardresetting link message.  did you mean 'reval&attach' or 'ata_dev_configure'? because your patch doesn't contain 'hardreset' or something like that
Comment 22 Tejun Heo 2010-08-19 13:53:09 UTC
Please use printk("I'm here\n"); or something like that.  DPRINK() is compiled out unless debug macro is set.  The message I'm talking about is 

	if (verbose)
		ata_link_printk(link, KERN_INFO, "%s resetting link\n",
				reset == softreset ? "soft" : "hard");

in drivers/ata/libata-eh.c::ata_eh_reset().  @verbose is set if ATA_EHI_QUIET is clear, so...

Thanks.
Comment 23 Dieter Plaetinck 2010-08-19 15:51:13 UTC
My mistake.  I had forgotten to rebuild my initramfs.
I have 2 images with the messages you're looking for.  The first one is taken without me modprobing anything (I guess when the module gets loaded automatically).  Unfortunately I could not get all output because the scrollback buffer didn't go up further.  the 2nd image shows (different) output when i manually modprobe.
Comment 24 Dieter Plaetinck 2010-08-19 15:53:17 UTC
Created attachment 27511 [details]
boot patched 2.6.35.2, reset-classify-dbg-1.patch
Comment 25 Dieter Plaetinck 2010-08-19 15:55:02 UTC
Created attachment 27512 [details]
boot patched 2.6.35.2, reset-classify-dbg-1.patch, manual probing
Comment 26 Tejun Heo 2010-08-19 16:13:15 UTC
Created attachment 27513 [details]
nodev-hint-dbg.patch

Okay, ata_dev_read_id() is failing with -ENOENT.  That's the mechanism to skip ghost devices on controllers w/o link registers.  This is the first time I see it triggering on sata_sil.  Can you please boot with the attached patch applied and post the output?  Let's see which one is triggering.

Thanks.
Comment 27 Dieter Plaetinck 2010-08-19 16:20:58 UTC
Only this patch right, or in combination with the previous one?

Fwiw i never had problems with this sata controller or the hard disks before, works with win xp as well.
Comment 28 Alan Swanson 2010-08-20 20:24:45 UTC
Having probably the same issue with same motherboard on 2.6.35(.2) where the first SATA disk is not found but only second SATA disk is found and hence assigned /dev/sda instead of /dev/sdb.

The first SATA disk has root filesystem and as equivalent partition on second SATA disk is swap I get a boot panic for no root filesystem found. Currently waiting on null modem to collect output as boot_delay not effective in helping capture console before scrolled.
Comment 29 Dieter Plaetinck 2010-08-21 06:34:48 UTC
I can confirm what Alan said:
so my main disk is a 37GB sata disk. I now added a 74GB sata disk in the second sata port.  Linux only makes a /dev/sda, which points to the 2nd disk.

I will upload a screenshot of a 2.6.35.2 boot with reset-classify-dbg-1.patch applied and with the 2nd disk connected.   Here is the hdparm info of the 2nd disk (now /dev/sdb as i'm running 2.6.34).  I still need to do the experiment with the nodev-hint-dbg.patch patch, do you want me to do it with the 2nd disk attached or not?


# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
	Model Number:       WDC WD740GD-00FLA1                      
	Serial Number:      WD-WMAKE1663221
	Firmware Revision:  27.08D27
Standards:
	Supported: 6 5 4 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  145226112
	LBA48  user addressable sectors:  145226112
	Logical/Physical Sector size:           512 bytes
	device size with M = 1024*1024:       70911 MBytes
	device size with M = 1000*1000:       74355 MBytes (74 GB)
	cache/buffer size  = 8192 KBytes (type=DualPortCache)
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 16
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Release interrupt
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	   *	READ/WRITE_DMA_QUEUED
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	SET_MAX security extension
	    	Automatic Acoustic Management feature set
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	Gen1 signaling speed (1.5Gb/s)
	   *	Host-initiated interface power management
	   *	Device-initiated interface power management
	   *	SMART Command Transport (SCT) feature set
	   *	SCT Long Sector Access (AC1)
	   *	SCT LBA Segment Access (AC2)
	   *	SCT Error Recovery Control (AC3)
	   *	SCT Features Control (AC4)
Security: 
		supported
	not	enabled
	not	locked
	not	frozen
	not	expired: security count
	not	supported: enhanced erase
Checksum: correct
Comment 30 Dieter Plaetinck 2010-08-21 06:36:46 UTC
Created attachment 27561 [details]
boot patched 2.6.35.2, reset-classify-dbg-1.patch, 74GB disk in port 2
Comment 31 Tejun Heo 2010-08-23 10:29:39 UTC
Sorry, I was offline for the past few days.  Yeah, applying the nodev-hint-dbg.patch alone should be enough.  Can you please post screenshot w/ the patch?

Thanks.
Comment 32 Dieter Plaetinck 2010-08-24 19:54:28 UTC
okay, here we go again.
a pic of modprobe -r, one of modprobe, with 2 disks attached.
Comment 33 Dieter Plaetinck 2010-08-24 19:56:52 UTC
Created attachment 27831 [details]
boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe -r
Comment 34 Dieter Plaetinck 2010-08-24 19:57:48 UTC
Created attachment 27841 [details]
boot patched 2.6.35.2, nodev-hint-dbg.patch, moprobe
Comment 35 Dieter Plaetinck 2010-09-02 08:33:03 UTC
So, where are we at?  Did my last screenshots help?  Can I help?
Comment 36 Tejun Heo 2010-09-09 15:15:07 UTC
Root cause bisected and fix patch posted,

 http://thread.gmane.org/gmane.linux.ide/47506

Thanks.
Comment 37 Alan Swanson 2010-09-09 17:06:04 UTC
First patch from linux-ide mailing list adding sff_check_status to libata-sff.c
 also resolves issue.
Comment 38 Dieter Plaetinck 2010-09-21 20:07:16 UTC
2.6.35.5 has this patched, and works for me.  Thanks all

Note You need to log in before you can comment on or make changes to this bug.