Bug 11445 - ata3: COMRESET failed (errno=-16)
Summary: ata3: COMRESET failed (errno=-16)
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Serial ATA (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Tejun Heo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-28 08:29 UTC by François
Modified: 2012-05-22 14:14 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.27-1.2
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
nv-nohrst.patch (1.41 KB, patch)
2008-08-29 02:49 UTC, Tejun Heo
Details | Diff
hdparm -I and lscpi -nn for a kernel w/o SATA problem (8.24 KB, application/octet-stream)
2010-01-12 15:38 UTC, John Scott
Details
boot log w/ libata.force=nohrst (44.18 KB, text/plain)
2010-01-14 09:26 UTC, Tejun Heo
Details
w/ kernel param "sata_nv.swncq=0" (43.98 KB, text/plain)
2010-01-14 09:26 UTC, Tejun Heo
Details
w/ kernel param "sata_nv.swncq=0 libata.force=nohrst" (44.11 KB, text/plain)
2010-01-14 09:27 UTC, Tejun Heo
Details
and without any param (47.34 KB, text/plain)
2010-01-14 09:28 UTC, Tejun Heo
Details
sata_nv-oh-my-god.patch (1.37 KB, patch)
2010-01-22 07:09 UTC, Tejun Heo
Details | Diff
SL112-x86_64-bko11445_dbg.iso boot.msg (47.28 KB, application/octet-stream)
2010-01-26 17:35 UTC, John Scott
Details
kISO (2) boot.msg results (48.72 KB, application/octet-stream)
2010-02-17 23:43 UTC, John Scott
Details
Install Screen (73.04 KB, image/jpeg)
2010-02-17 23:51 UTC, John Scott
Details
Boot.msg file from 04/03/2010 (53.56 KB, application/octet-stream)
2010-04-04 15:18 UTC, John Scott
Details
Boot log + error messages (189.28 KB, text/plain)
2010-08-31 07:57 UTC, Thomas Pilarski
Details

Description François 2008-08-28 08:29:49 UTC
Latest working kernel version: 2.6.24
Earliest failing kernel version:2.6.26
Distribution:Ubuntu
Hardware Environment:Bug Filing FAQ is 404 not found I don't know what I have to type
Software Environment:Bug Filing FAQ is 404 not found I don't know what I have to type
Problem Description:At bootup I end up in busybox and I see the following message on the top of the screen "Gave up waiting for root device"
Actually I think that my hard drive "falls asleep" just after leaving grub. When I'm in the busybox I need to unplug my hard drive (serial ata) and to plug it again so that I can hear it restarting. After doing that I type exit in the busybox and the boot process continues normally.
Dmesg shows me that:

[ 9.672007] ata3: link is slow to respond, please be patient (ready=0)
[ 14.320007] ata3: COMRESET failed (errno=-16)
[ 19.680006] ata3: link is slow to respond, please be patient (ready=0)
[ 24.328007] ata3: COMRESET failed (errno=-16)
[ 29.688007] ata3: link is slow to respond, please be patient (ready=0)
[ 59.092004] ata3: COMRESET failed (errno=-16)
[ 59.092004] ata3: limiting SATA link speed to 1.5 Gbps
[ 59.688009] ata3: SATA link down (SStatus 0 SControl 310)
[ 60.164017] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 60.196367] ata4.00: HPA detected: current 160834367, native 160836480
[ 60.196371] ata4.00: ATA-6: HDS722580VLSA80, V32OA6MA, max UDMA/100
[ 60.196373] ata4.00: 160834367 sectors, multi 16: LBA48

The COMRESET thing continues as far as I don't unplug and plug again my hard drive.
I tried recently other distributions with the same kernel and I get the same error (Debian and pmagic liveCD) So I think this bug concerns the kernel.
I also have to tell you that it's a SATA II hard drive (3gbps) on a (nforce 3) SATA I controller (1.5gbps). And it appears that the controller does not fully support the hard drive (or the SATA I retro-compatibility of the hard drive is malfunctioning I don't know) But with older kernel it did always work without any problem.

I'm running Ubuntu intrepid ibex alpha up-to-date, kernel 2.6.27-1.2 (I recently updated from 2.6.26 to 2.6.27 but the problem is the same)

Thanks

I don't know how to attach files here so if you want the dmesg.log etc. files I reported the bug on launchpad where you'll find these files.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/256637
Comment 1 Andrew Morton 2008-08-28 12:31:36 UTC
Marked as a regression

Reassigned to ATA

It's sata_nv.  dmesg is here:  http://launchpadlibrarian.net/16703605/dmesg.log
Comment 2 Robert Hancock 2008-08-28 17:17:32 UTC
Don't think anything's changed at the sata_nv level that would cause this. Tejun, there were some reset changes in libata recently, weren't there?
Comment 3 Tejun Heo 2008-08-29 02:41:40 UTC
Yes, libata is now defaulting to hardreset and early nv's seem to have problem with it.  I sent the test patches a few times but haven't got enough response to commit it.  So, let's do one more testing.
Comment 4 Tejun Heo 2008-08-29 02:49:30 UTC
Created attachment 17525 [details]
nv-nohrst.patch

Can you test whether the attached patch fixes the problem?  Thanks.
Comment 5 François 2008-08-29 03:54:24 UTC
(In reply to comment #4)
> Created an attachment (id=17525) [details]
> nv-nohrst.patch
> 
> Can you test whether the attached patch fixes the problem?  Thanks.
> 

Could you explain the procedure to follow in order to apply the patch?
Comment 6 Tejun Heo 2008-08-29 03:59:51 UTC
First, build your own 2.6.26.3 and boot the system with it and check everything is as expected.  Then, cd to the source tree and apply the patch by executing "patch -p1 < nv-nohrst.patch" and build the kernel again (just running make again will do) and test the new kernel.
Comment 7 François 2008-08-29 23:59:47 UTC
The patch fixes the problem :D Thank you
Do you need some more info to ensure the patch worked as expected?
Will this patch be included in the latest kernel?
For info I upgraded to the 2.6.27 and the problem was the same. This is the one I patched.
Comment 8 Tejun Heo 2008-08-30 01:32:30 UTC
Yes, already pending for 2.6.27 and I'll send it to -stable once it gets accepted upstream.  Thanks.
Comment 9 François 2008-10-12 04:00:04 UTC
This bug happens again due to my recent update from 2.6.27-4 to 2.6.27-7.
I don't know if it's due to a change upstream or a change related to ubuntu.
2.6.27-4 worked fine but 2.6.27-6 and 2.6.27-7 are suffering of the same bug.
Has the patch been removed? Were there changes in sata_nv?
Comment 10 Tejun Heo 2008-10-12 22:20:47 UTC
The problem is still not completely resolved.  For generic and ck804, it works but something is still wrong with nf2/3 and I'm still working on it.  Please wait a bit.  Thanks.
Comment 11 Anssi Saari 2008-10-21 12:41:58 UTC
I just wanted to point out that I have this exact problem on ck804 and kernels 2.6.26 and 2.6.27.1. So it's not exactly working there either. The patch above works for me too, except then I got a whole bunch of messages to syslog just saying ata1: EH complete and ata2: EH complete. Those are empty, my drives are on ata3 and ata4.
Comment 12 Tejun Heo 2008-10-21 19:43:14 UTC
I just bought a nf2/3 board and am waiting for it to arrive.  Please give me a few more days.  Thanks.
Comment 13 Jehan Bruggeman 2009-01-05 16:15:22 UTC
Hi, has anything new happened concerning this bug ? If I can be of any help, I have a system on which this bug occurs.

I sent my dmesg and lscpi to the launchpad.net bug report already mentioned by François ( https://bugs.launchpad.net/debian/+source/linux/+bug/256637 ).
Comment 14 Robert Hancock 2009-01-05 16:23:19 UTC
Which attachments in that bug report are yours? It seems to be discussing likely multiple different unrelated problems. This bug is dealing with sata_nv and nForce2/3 chipsets.
Comment 15 Jehan Bruggeman 2009-01-05 16:34:05 UTC
Indeed, I forgot to specify that ;-). My username is "sym_zo" and my comment the following : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/256637/comments/36

uploaded files : http://launchpadlibrarian.net/20681738/lspci_and_dmidecode.zip
Comment 16 Robert Hancock 2009-01-05 17:00:49 UTC
So your machine is nForce4? That seems odd, I have that chipset and I've never run into that problem with any Fedora 2.6.27 kernel or with vanilla 2.6.28. Can you test with vanilla 2.6.27.10 or 2.6.28?
Comment 17 Jehan Bruggeman 2009-01-05 17:27:35 UTC
Ok, I'll try it out tomorrow (it's 02:30 AM in my timezone^^). I suppose there isn't an easier way than downloading from kernel.org and compiling ? 
Comment 18 Jehan Bruggeman 2009-01-09 03:31:00 UTC
Sorry for the delay. I tested with a vanilla 2.6.28 : I still get the error (and a busybox). The latest kernel installed on my machine with which it accepts to boot is still 2.6.24.
Comment 19 Tejun Heo 2009-01-13 21:39:36 UTC
Can you please attach the failing log here?
Comment 20 Anssi Saari 2009-01-21 00:51:20 UTC
(In reply to comment #16)
> So your machine is nForce4? That seems odd, I have that chipset and I've
> never
> run into that problem with any Fedora 2.6.27 kernel or with vanilla 2.6.28.
> Can
> you test with vanilla 2.6.27.10 or 2.6.28?

I also have nforce4 and using sata_nv, but for me this problem doesn't happen in 2.6.27.8 or 2.6.28 anymore. As I recall, 2.6.26 was where the problem first showed up and also some earlier 2.6.27 kernels, earlier than 2.6.27.8 that is.
Comment 21 Jehan Bruggeman 2009-01-22 17:41:44 UTC
Tejun Heo >> Sorry for the delay. I'll do that ASAP. 
When you say "failing log", I suppose it is /var/log/dmesg you want ?
Comment 22 Tejun Heo 2009-01-22 18:12:29 UTC
Yeap, dmesg output after the failure.  Preferably w/ printk timestamp turned on.
Comment 23 John Scott 2010-01-12 15:38:06 UTC
Created attachment 24521 [details]
hdparm -I    and lscpi -nn  for a kernel w/o SATA problem

As you requested TJ, here's the info on the last kernel I've been able to successfully install. All later kernels are not able to recognize my sata drives.
Comment 24 Tejun Heo 2010-01-14 09:26:21 UTC
Created attachment 24553 [details]
boot log w/ libata.force=nohrst

Logs John sent me via email.  Attaching here for later reference.
Comment 25 Tejun Heo 2010-01-14 09:26:54 UTC
Created attachment 24554 [details]
w/ kernel param "sata_nv.swncq=0"
Comment 26 Tejun Heo 2010-01-14 09:27:20 UTC
Created attachment 24555 [details]
w/ kernel param "sata_nv.swncq=0 libata.force=nohrst"
Comment 27 Tejun Heo 2010-01-14 09:28:02 UTC
Created attachment 24556 [details]
and without any param
Comment 28 Tejun Heo 2010-01-14 09:33:14 UTC
The biggest related change since 2.6.24 would be restructuring of reset operations which happened between 2.6.24 and 25.  Our current sequence should be basically the same as before.  I currently have no idea what could be the difference.  What is the mainboard?  Can you please post the output of dmidecode?

Thanks.
Comment 29 John Scott 2010-01-20 03:43:22 UTC
TJ, this happens on both my Windows Vista 64 Bit Ultimate box and the Linux box.

Here's the dmidecode from my Linux Box


john@johnsubuntu:~$ sudo dmidecode
[sudo] password for john: 
# dmidecode 2.9
SMBIOS 2.4 present.
38 structures occupying 1145 bytes.
Table at 0x000F0000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
	Vendor: Phoenix Technologies, LTD
	Version: 6.00 PG
	Release Date: 07/11/2007
	Address: 0xE0000
	Runtime Size: 128 kB
	ROM Size: 512 kB
	Characteristics:
		ISA is supported
		PCI is supported
		PNP is supported
		APM is supported
		BIOS is upgradeable
		BIOS shadowing is allowed
		Boot from CD is supported
		Selectable boot is supported
		BIOS ROM is socketed
		EDD is supported
		5.25"/360 KB floppy services are supported (int 13h)
		5.25"/1.2 MB floppy services are supported (int 13h)
		3.5"/720 KB floppy services are supported (int 13h)
		3.5"/2.88 MB floppy services are supported (int 13h)
		Print screen service is supported (int 5h)
		8042 keyboard services are supported (int 9h)
		Serial services are supported (int 14h)
		Printer services are supported (int 17h)
		CGA/mono video services are supported (int 10h)
		ACPI is supported
		USB legacy is supported
		LS-120 boot is supported
		ATAPI Zip drive boot is supported
		BIOS boot specification is supported
		Targeted content distribution is supported

Handle 0x0001, DMI type 1, 27 bytes
System Information
	Manufacturer: NVIDIA
	Product Name: NFORCE 680i LT SLI
	Version: 2
	Serial Number: 1
	UUID: 6A97600D-034B-0400-0000-000000000000
	Wake-up Type: Power Switch
	SKU Number:  
	Family:  

Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
	Manufacturer: NVIDIA
	Product Name: NFORCE 680i LT SLI
	Version: 2
	Serial Number: 1

Handle 0x0003, DMI type 3, 17 bytes
Chassis Information
	Manufacturer: NVIDIA
	Type: Desktop
	Lock: Not Present
	Version: NFORCE 680i LT SLI
	Serial Number:  
	Asset Tag:  
	Boot-up State: Unknown
	Power Supply State: Unknown
	Thermal State: Unknown
	Security Status: Unknown
	OEM Information: 0x00000000

Handle 0x0004, DMI type 4, 35 bytes
Processor Information
	Socket Designation: Socket 775
	Type: Central Processor
	Family: Other
	Manufacturer: Intel
	ID: FB 06 00 00 FF FB EB BF
	Version: Intel(R) Core(TM)2 Quad CPU
	Voltage: 1.7 V
	External Clock: 336 MHz
	Max Speed: 200 MHz
	Current Speed: 3024 MHz
	Status: Populated, Enabled
	Upgrade: ZIF Socket
	L1 Cache Handle: 0x000A
	L2 Cache Handle: 0x000B
	L3 Cache Handle: Not Provided
	Serial Number:  
	Asset Tag:  
	Part Number:  

Handle 0x0005, DMI type 5, 24 bytes
Memory Controller Information
	Error Detecting Method: None
	Error Correcting Capabilities:
		None
	Supported Interleave: One-way Interleave
	Current Interleave: One-way Interleave
	Maximum Memory Module Size: 32 MB
	Maximum Total Memory Size: 128 MB
	Supported Speeds:
		70 ns
		60 ns
	Supported Memory Types:
		Standard
		EDO
	Memory Module Voltage: 5.0 V
	Associated Memory Slots: 4
		0x0006
		0x0007
		0x0008
		0x0009
	Enabled Error Correcting Capabilities: None

Handle 0x0006, DMI type 6, 12 bytes
Memory Module Information
	Socket Designation: A0
	Bank Connections: 0 1
	Current Speed: 10 ns
	Type: Other
	Installed Size: 1024 MB (Double-bank Connection)
	Enabled Size: 1024 MB (Double-bank Connection)
	Error Status: OK

Handle 0x0007, DMI type 6, 12 bytes
Memory Module Information
	Socket Designation: A1
	Bank Connections: 2 3
	Current Speed: 10 ns
	Type: Other
	Installed Size: 1024 MB (Double-bank Connection)
	Enabled Size: 1024 MB (Double-bank Connection)
	Error Status: OK

Handle 0x0008, DMI type 6, 12 bytes
Memory Module Information
	Socket Designation: A2
	Bank Connections: 4 5
	Current Speed: 10 ns
	Type: Other
	Installed Size: 1024 MB (Double-bank Connection)
	Enabled Size: 1024 MB (Double-bank Connection)
	Error Status: OK

Handle 0x0009, DMI type 6, 12 bytes
Memory Module Information
	Socket Designation: A3
	Bank Connections: 6 7
	Current Speed: 10 ns
	Type: Other
	Installed Size: 1024 MB (Double-bank Connection)
	Enabled Size: 1024 MB (Double-bank Connection)
	Error Status: OK

Handle 0x000A, DMI type 7, 19 bytes
Cache Information
	Socket Designation: Internal Cache
	Configuration: Enabled, Not Socketed, Level 1
	Operational Mode: Write Back
	Location: Internal
	Installed Size: 32 KB
	Maximum Size: 32 KB
	Supported SRAM Types:
		Synchronous
	Installed SRAM Type: Synchronous
	Speed: Unknown
	Error Correction Type: None
	System Type: Instruction
	Associativity: 8-way Set-associative

Handle 0x000B, DMI type 7, 19 bytes
Cache Information
	Socket Designation: External Cache
	Configuration: Enabled, Not Socketed, Level 2
	Operational Mode: Write Back
	Location: External
	Installed Size: 4096 KB
	Maximum Size: 4096 KB
	Supported SRAM Types:
		Synchronous
	Installed SRAM Type: Synchronous
	Speed: Unknown
	Error Correction Type: None
	System Type: Unified
	Associativity: 8-way Set-associative

Handle 0x000C, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: PRIMARY IDE
	Internal Connector Type: On Board IDE
	External Reference Designator: Not Specified
	External Connector Type: None
	Port Type: Other

Handle 0x000D, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: FDD
	Internal Connector Type: On Board Floppy
	External Reference Designator: Not Specified
	External Connector Type: None
	Port Type: 8251 FIFO Compatible

Handle 0x000E, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: COM1
	Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
	External Reference Designator:  
	External Connector Type: DB-9 male
	Port Type: Serial Port 16450 Compatible

Handle 0x000F, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Keyboard
	Internal Connector Type: PS/2
	External Reference Designator:  
	External Connector Type: PS/2
	Port Type: Keyboard Port

Handle 0x0010, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: PS/2 Mouse
	Internal Connector Type: PS/2
	External Reference Designator:  
	External Connector Type: PS/2
	Port Type: Mouse Port

Handle 0x0011, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Not Specified
	Internal Connector Type: None
	External Reference Designator: USB0
	External Connector Type: Other
	Port Type: USB

Handle 0x0012, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Not Specified
	Internal Connector Type: None
	External Reference Designator: USB1
	External Connector Type: Other
	Port Type: USB

Handle 0x0013, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Not Specified
	Internal Connector Type: None
	External Reference Designator: USB2
	External Connector Type: Other
	Port Type: USB

Handle 0x0014, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Not Specified
	Internal Connector Type: None
	External Reference Designator: USB3
	External Connector Type: Other
	Port Type: USB

Handle 0x0015, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Not Specified
	Internal Connector Type: None
	External Reference Designator: USB4
	External Connector Type: Other
	Port Type: USB

Handle 0x0016, DMI type 8, 9 bytes
Port Connector Information
	Internal Reference Designator: Not Specified
	Internal Connector Type: None
	External Reference Designator: USB5
	External Connector Type: Other
	Port Type: USB

Handle 0x0017, DMI type 9, 13 bytes
System Slot Information
	Designation: PCI0
	Type: 32-bit PCI
	Current Usage: Available
	Length: Long
	ID: 1
	Characteristics:
		5.0 V is provided
		PME signal is supported

Handle 0x0018, DMI type 9, 13 bytes
System Slot Information
	Designation: PCI1
	Type: 32-bit PCI
	Current Usage: Available
	Length: Long
	ID: 2
	Characteristics:
		5.0 V is provided
		PME signal is supported

Handle 0x0019, DMI type 13, 22 bytes
BIOS Language Information
	Installable Languages: 3
		n|US|iso8859-1
		n|US|iso8859-1
		r|CA|iso8859-1
	Currently Installed Language: n|US|iso8859-1

Handle 0x001A, DMI type 16, 15 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: None
	Maximum Capacity: 2 GB
	Error Information Handle: Not Provided
	Number Of Devices: 4

Handle 0x001B, DMI type 17, 27 bytes
Memory Device
	Array Handle: 0x001A
	Error Information Handle: Not Provided
	Total Width: 128 bits
	Data Width: 128 bits
	Size: 1024 MB
	Form Factor: DIMM
	Set: None
	Locator: A0
	Bank Locator: Bank0/1
	Type: DRAM
	Type Detail: None
	Speed: 798 MHz (1.3 ns)
	Manufacturer: None
	Serial Number: None
	Asset Tag: None
	Part Number: None

Handle 0x001C, DMI type 17, 27 bytes
Memory Device
	Array Handle: 0x001A
	Error Information Handle: Not Provided
	Total Width: 128 bits
	Data Width: 128 bits
	Size: 1024 MB
	Form Factor: DIMM
	Set: None
	Locator: A1
	Bank Locator: Bank2/3
	Type: DRAM
	Type Detail: None
	Speed: 798 MHz (1.3 ns)
	Manufacturer: None
	Serial Number: None
	Asset Tag: None
	Part Number: None

Handle 0x001D, DMI type 17, 27 bytes
Memory Device
	Array Handle: 0x001A
	Error Information Handle: Not Provided
	Total Width: 128 bits
	Data Width: 128 bits
	Size: 1024 MB
	Form Factor: DIMM
	Set: None
	Locator: A2
	Bank Locator: Bank4/5
	Type: DRAM
	Type Detail: None
	Speed: 798 MHz (1.3 ns)
	Manufacturer: None
	Serial Number: None
	Asset Tag: None
	Part Number: None

Handle 0x001E, DMI type 17, 27 bytes
Memory Device
	Array Handle: 0x001A
	Error Information Handle: Not Provided
	Total Width: 128 bits
	Data Width: 128 bits
	Size: 1024 MB
	Form Factor: DIMM
	Set: None
	Locator: A3
	Bank Locator: Bank6/7
	Type: DRAM
	Type Detail: None
	Speed: 798 MHz (1.3 ns)
	Manufacturer: None
	Serial Number: None
	Asset Tag: None
	Part Number: None

Handle 0x001F, DMI type 19, 15 bytes
Memory Array Mapped Address
	Starting Address: 0x00000000000
	Ending Address: 0x000FFFFFFFF
	Range Size: 4 GB
	Physical Array Handle: 0x001A
	Partition Width: 0

Handle 0x0020, DMI type 20, 19 bytes
Memory Device Mapped Address
	Starting Address: 0x00000000000
	Ending Address: 0x0003FFFFFFF
	Range Size: 1 GB
	Physical Device Handle: 0x001B
	Memory Array Mapped Address Handle: 0x001F
	Partition Row Position: 1

Handle 0x0021, DMI type 20, 19 bytes
Memory Device Mapped Address
	Starting Address: 0x00040000000
	Ending Address: 0x0007FFFFFFF
	Range Size: 1 GB
	Physical Device Handle: 0x001C
	Memory Array Mapped Address Handle: 0x001F
	Partition Row Position: 1

Handle 0x0022, DMI type 20, 19 bytes
Memory Device Mapped Address
	Starting Address: 0x00080000000
	Ending Address: 0x000BFFFFFFF
	Range Size: 1 GB
	Physical Device Handle: 0x001D
	Memory Array Mapped Address Handle: 0x001F
	Partition Row Position: 1

Handle 0x0023, DMI type 20, 19 bytes
Memory Device Mapped Address
	Starting Address: 0x000C0000000
	Ending Address: 0x000FFFFFFFF
	Range Size: 1 GB
	Physical Device Handle: 0x001E
	Memory Array Mapped Address Handle: 0x001F
	Partition Row Position: 1

Handle 0x0024, DMI type 32, 11 bytes
System Boot Information
	Status: No errors detected

Handle 0x0025, DMI type 127, 4 bytes
End Of Table

john@johnsubuntu:~$
Comment 30 John Scott 2010-01-20 03:46:00 UTC
My Windows box is a EVGA X58 3x SLI and I can't get any kernel after 2.6.24-24 to recognize SATA drives on that system either.

Before I forget, thanks so much for being willing to put in the time and effort to fix this!!!
Comment 31 John Scott 2010-01-20 04:06:39 UTC
I forgot to tell you, the board for the dmidecode above is an
XFX 680i LT SLI
Comment 32 Tejun Heo 2010-01-22 07:09:08 UTC
Created attachment 24672 [details]
sata_nv-oh-my-god.patch

Hmmmm.... I found a pretty similar board here (ASUSTek L1N64-SLI WS) which shows the same PCI ID for the SATA controller and I also have WD5000YS drive around.  Unfortunately, it works fine here.  I wonder what the difference could be.

Can you please apply the attached patch and see how it works?  Please attach the kernel log with the patch applied.  Thanks.
Comment 33 John Scott 2010-01-22 16:38:06 UTC
OK, I'm happy to do that but I think I'd need pretty detailed instructions to do it. I'm really a newbie.

I'm working with install CD's, do I simply add the patch as a kernel parameter during install? Or do I need to get kernel sources, apply the patch, compile, etc?

Is there a way for you to remotely access my machine?
Comment 34 John Scott 2010-01-25 16:38:34 UTC
How about this?

Can you set up your test kernel, then ftp it to me and I can try it out?
Comment 35 Tejun Heo 2010-01-26 01:16:58 UTC
Ah.... alright.  Let me prep something called kISO.  It's a bare minimum installation media which can contain a new kernel and should be used in combination with actual full installation media.
Comment 36 Tejun Heo 2010-01-26 05:01:00 UTC
Can you please try the following kISO?

  http://htj.dyndns.org/export/testing/sl112-x86_64-bko11445_dbg0/SL112-x86_64-bko11445_dbg0.iso

For instructions on using kISO.

  http://htj.dyndns.org/export/testing/SL103-kISO-doc.txt

Please try to acquire kernel boot log as you did with the installation media.  Thanks.
Comment 37 John Scott 2010-01-26 16:53:01 UTC
OK.....is the kISO disk all I need or do I also need the SUSE 11.2 CD? The instructions aren't clear.
Comment 38 John Scott 2010-01-26 17:35:26 UTC
Created attachment 24735 [details]
SL112-x86_64-bko11445_dbg.iso  boot.msg
Comment 39 John Scott 2010-01-26 17:36:24 UTC
Same result, TJ.

Don't give up!
Comment 40 John Scott 2010-01-30 18:59:29 UTC
Any news on this, TJ?
Comment 41 Tejun Heo 2010-02-02 03:27:15 UTC
Sorry about the lack of response.  I'm running out of ideas.  The last kISO skips all hardreset related things including simple link resume, even then the drive fails to respond with DRDY to SRST.  I think I'll have to compare 2.6.24 init path and try to find out what the difference is.  The problem is that the code has changed a lot since then.  Is it possible for you to set up an environment where you can test a patched kernel?

Thanks.
Comment 42 John Scott 2010-02-02 14:34:39 UTC
The kISO worked fine if you can do that again or, I'm pretty good at following directions. Want to send me (ftp or whatever) a patched kernel and then phone me to talk me through it?
Comment 43 John Scott 2010-02-10 16:55:08 UTC
TJ, as an update, I just tried the Kubuntu 9.10 installer and had the exact same problem. Everything works fine until you get to the partitioning. The hardrives are not recognized. It's as though I unplugged them.


Any luck comparing the 2.6.24 init path?
Comment 44 Tejun Heo 2010-02-11 02:14:03 UTC
Sorry caught up doing other stuff.  Will do it in a few days.  Thanks.
Comment 45 John Scott 2010-02-15 20:43:08 UTC
Don't give up, TJ!
Comment 46 Tejun Heo 2010-02-17 06:11:31 UTC
Alright, can you please test this one and report the kernel boot log?

http://htj.dyndns.org/export/testing/sl112-x86_64-bko11445_dbg1/SL112-x86_64-bko11445_dbg1.iso
Comment 47 John Scott 2010-02-17 23:43:48 UTC
Created attachment 25094 [details]
kISO (2) boot.msg results

Here's the boot.msg, TJ
Comment 48 John Scott 2010-02-17 23:51:41 UTC
Created attachment 25095 [details]
Install Screen

Thought you'd like to see this, too.
Comment 49 Tejun Heo 2010-02-18 00:56:59 UTC
Hmmm... the workaround didn't kick in.  Strange.  This is the machine you posted the dmidecode for, rigth?
Comment 50 Tejun Heo 2010-02-18 01:01:52 UTC
Oops, strike that.  I was looking at the boot log from the first kiso.  The workaround kicked in but the detection failed the same.  With the workaround applied, the behavior is very close to 2.6.24.  I'll look again.  :-(
Comment 51 John Scott 2010-02-23 22:03:17 UTC
What's the news, TJ??
Comment 52 Tejun Heo 2010-02-25 02:56:22 UTC
Ummm.... are you interested in sending the board to me?  I can buy it if it isn't too expensive.  If it is, I can pay for the round-trip cost.

Thanks.
Comment 53 John Scott 2010-02-25 04:07:18 UTC
Sure, I could send you the board. But I really do want it back. I'll be out of town for the next week, so I won't miss it too much.  What address?
Comment 54 John Scott 2010-03-08 16:05:19 UTC
TJ, are we at a dead end on this?
Comment 55 Tejun Heo 2010-03-09 02:08:38 UTC
Hmmm... looks that way for the moment.  I'll see if there's anything else I can do remotely.  Thanks.
Comment 56 John Scott 2010-03-12 14:55:45 UTC
How about changing some of the BIOS settings?
Comment 57 John Scott 2010-03-12 15:33:22 UTC
Maybe you could work through a Local Ubuntu Team?

https://wiki.ubuntu.com/LoCoTeamList

I don't know the guy but there's a "Nerdy Nick" here in Denver.

Cheers!
Comment 58 Tejun Heo 2010-04-02 03:35:02 UTC
Sorry about the long delay.  Can you please test this kiso?

http://htj.dyndns.org/export/testing/sl112-x86_64-bko11445_dbg2/SL112-x86_64-bko11445_dbg2.iso

Thanks.
Comment 59 John Scott 2010-04-02 21:36:30 UTC
OK, tell me if I'm doing this right. I download the kiso and burn it as an iso.

Then I boot the kiso on my Linux box and choose "Install" from the first menu. Right?

Next, it wants me to remove the kiso disk and put in the Suse install disk, right?
Comment 60 John Scott 2010-04-02 22:10:54 UTC
OK, tell me if I'm doing this righ. I download the kiso and burn it as 
an iso.

Then I boot the kiso on my Linux box and choose "Install" from the first 
menu. Right?

Next, it wants me to remove the kiso disk and put in the Suse install 
disk, right?

On 4/1/2010 9:35 PM, bugzilla-daemon@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=11445
>
>
>
>
>
> --- Comment #56 from John Scott<gr8-scott@comcast.net>   2010-03-12 14:55:45
> ---
> How about changing some of the BIOS settings?
>
> --- Comment #57 from John Scott<gr8-scott@comcast.net>   2010-03-12 15:33:22
> ---
> Maybe you could work through a Local Ubuntu Team?
>
> https://wiki.ubuntu.com/LoCoTeamList
>
> I don't know the guy but there's a "Nerdy Nick" here in Denver.
>
> Cheers!
>
> --- Comment #58 from Tejun Heo<tj@kernel.org>   2010-04-02 03:35:02 ---
> Sorry about the long delay.  Can you please test this kiso?
>
>
> http://htj.dyndns.org/export/testing/sl112-x86_64-bko11445_dbg2/SL112-x86_64-bko11445_dbg2.iso
>
> Thanks.
>
>
Comment 61 Tejun Heo 2010-04-03 01:18:52 UTC
You don't need openSUSE installation media.  Just boot into rescue mode and fetch boot.msg from there.
Comment 62 John Scott 2010-04-04 15:18:16 UTC
Created attachment 25847 [details]
Boot.msg file from 04/03/2010

On screen Error message - failed to detect SATA HD's
Comment 63 Tejun Heo 2010-04-05 01:13:42 UTC
Okay, another miss.  I'm afraid I don't have much left to try remotely at this point.  :-(
Comment 64 John Scott 2010-04-05 02:56:02 UTC
I'm just an newbie, but would it do any good to compare the boot.msg from a kernel that boots OK with one that doesn't?

Would that help to isolate the problem?

Next ???, what's a good  Socket 775 SATA board that does work? Can you compare the boot.msg from that board to mine?
Comment 65 Tejun Heo 2010-04-05 06:03:02 UTC
boot.msg wouldn't show any new information at this point.  I'm quite lost as to where the difference is. :-( Short of testing things locally (and maybe try to hook it up w/ a bus tracer), I'm not sure what to do.

Most 775 boards work fine.  You're currently the only one reporting boot probing problems on sata_nv.

Thanks.
Comment 66 John Scott 2010-04-05 13:25:33 UTC
The reason I asked about a board that you're sure works is that I've got two different 775 boards and both have the same problem. Do you know of a reasonably priced 775 board that works? I'd buy it just to get this behind me.

Also, I beg to differ about my problem being unusual. I searched the net and the problem I'm having isn't unique. It's been going on for over a year. Every kernel since 2.6.24-24 has failed on my systems and on many others.


Check it out.  Google "errno=-16"

http://search.yahoo.com/search?p=errno%3D-16&ei=UTF-8&fr=moz35

Any way, thanks for your efforts
Comment 67 Tejun Heo 2010-04-06 01:54:11 UTC
Oh... sure, reset failures sure have been reported a lot.  The thing is that the failures are caused by a lot of different reasons (IRQ delivery problems is the most common reason now) on a lot of different configurations and you're currently the only one who is reporting probe failure on sata_nv which hasn't been root caused yet.  If you can point me to a reasonably priced 775 board which doesn't work, I'll be happy to get one and fix it.  What is the other board you're having problem with?

Thanks.
Comment 68 John Scott 2010-04-06 02:09:48 UTC
The other board is an EVGA X58 3x SLI running a Core i7 920 processor. Right now, I've got Windows 7 on that machine.



Same issue... cannot find SATA drives. SRST fails  errno=-16
Comment 69 Tejun Heo 2010-04-06 02:12:54 UTC
That's an intel ich10.  You're seeing probe failures on that board?  Can you please post boot.msg from that machine?

Thanks.
Comment 70 Thomas Pilarski 2010-08-29 15:45:03 UTC
This happens for me too twice with the 2.6.35.2 kernel after ~12h and ~50h uptime, but only with my new sdd with Sandforce controller. Never before with my old slow disc drive. 

The controller is an Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) in sata mode

[42004.832055] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[42004.832061] ata3.00: failed command: FLUSH CACHE
[42004.832070] ata3.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[42004.832071]          res 40/00:00:00:4f:c2/00:01:00:00:00/00 Emask 0x4 (timeout)
[42004.832076] ata3.00: status: { DRDY }
[42004.832081] ata3: hard resetting link
[42010.192060] ata3: link is slow to respond, please be patient (ready=0)
[42014.849045] ata3: COMRESET failed (errno=-16)
[42014.849058] ata3: hard resetting link
[42020.208053] ata3: link is slow to respond, please be patient (ready=0)
[42024.857047] ata3: COMRESET failed (errno=-16)
[42024.857060] ata3: hard resetting link
[42030.224024] ata3: link is slow to respond, please be patient (ready=0)
[42059.912068] ata3: COMRESET failed (errno=-16)
[42059.912082] ata3: limiting SATA link speed to 1.5 Gbps
[42059.912086] ata3: hard resetting link
[42064.940048] ata3: COMRESET failed (errno=-16)
[42064.940059] ata3: reset failed, giving up
[42064.940063] ata3.00: disabled
[42064.940069] ata3.00: device reported invalid CHS sector 0
[42064.940087] ata3: EH complete
[42064.940087] ata3: EH complete
[42064.940110] end_request: I/O error, dev sdb, sector 0
[42064.940172] Aborting journal on device dm-1-8.
[42064.940191] end_request: I/O error, dev sdb, sector 0
[42064.940281] Aborting journal on device dm-4-8.
[42064.940318] sd 2:0:0:0: [sdb] Unhandled error code
[42064.940321] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[42064.940325] sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 00 f1 84 e0 00 00 20 00
[42064.940334] end_request: I/O error, dev sdb, sector 15828192
[42064.940346] sd 2:0:0:0: [sdb] Unhandled error code
[42064.940348] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[42064.940351] sd 2:0:0:0: [sdb] CDB: Read(10): 28 00 01 e4 4b 08 00 00 20 00
[42064.940359] end_request: I/O error, dev sdb, sector 31738632
[42064.940369] sd 2:0:0:0: [sdb] Unhandled error code
[42064.940371] sd 2:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
...
Comment 71 Tejun Heo 2010-08-30 08:50:54 UTC
Thomas, can you please attach full kernel boot log (/var/log/boot.msg or the output of dmesg after boot)?  But it looks like the device has shut down.  More likely to be a device problem than anything else.

Thanks.
Comment 72 Thomas Pilarski 2010-08-31 07:57:41 UTC
Created attachment 28601 [details]
Boot log + error messages

It's the dmegs output of the day, when I happens. I could save it on another drive, after the ssd did not answer anymore.
I have send a request to the manufacturer of the ssd too, as it was my first assumption too. But I am running my system currently with the 2.6.32 kernel for two days, and there is still no connection timeout. But it's not a proof, as the error is not deterministic.
Comment 73 Tejun Heo 2010-09-01 10:06:52 UTC
FLUSH_CACHE is a non-tagged nodata command, which means that no other command is in progress and all the host controller does is issuing a single command packet to the device for the command.  There isn't much the host can screw up for this type of commands.  For drives w/ rotating media, FLUSH_CACHE often spikes power consumption and inadequate power supply often shows up as FLUSH_CACHE timeouts.  For SSDs, problems like this have usually been remedied by firmware updates on the drive side.  On some rare cases, disabling NCQ seems to help too for whatever reason.

Thanks.
Comment 74 Thomas Pilarski 2010-09-01 10:59:41 UTC
Thanks a lot for the help. I will try to disabling NCQ, as long as there is no firmware update available.

Note You need to log in before you can comment on or make changes to this bug.