Most recent kernel where this bug did not occur: not known Distribution: Gentoo Hardware Environment: x86_64, Seagate HD Software Environment: Problem Description: Seagate HD not detected Steps to reproduce: I recently added a Seagate HD (500Go Sata 7200.10) to my PC. It had already a ST3120026AS and a WD1500ADFD-0, properly running, and still running correctly... The drive is visible in the BIOS. This one is not detected by the linux kernel whereas it is by the other OS (and performs very well ~48Mo/s writing), and fail with this in dmesg: ata1: port is slow to respond, please be patient ata1: port failed to respond (30 secs) I tried some random tweaking of libata-core.c:__libata_phy_reset() by lengthening the timeouts and delays, because some googling revealed this to paper over similar reports of failure to detect seagate drives... But it is not working, I know get that: Jun 28 04:10:04 localhost ata1: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0x2 frozen Jun 28 04:10:05 localhost ata1: soft resetting port Jun 28 04:10:12 localhost ata1: port is slow to respond, please be patient Jun 28 04:10:35 localhost ata1: port failed to respond (30 secs) Jun 28 04:10:35 localhost ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Jun 28 04:10:35 localhost ata1: EH pending after completion, repeating EH (cnt=4) Jun 28 04:10:35 localhost ata1: EH complete but it's about 8 minutes after system is booted... ask for more info, and I'll provide it... I'currently compiling -mm to see if it has better results...
Here is the dmesg extract from 2.6.17-mm3, the drive is not detected, but the port is slow to respond messages are gone... [...] libata version 1.30 loaded. sata_nv 0000:00:07.0: version 0.9 ACPI: PCI Interrupt Link [APSI] enabled at IRQ 22 GSI 18 sharing vector 0xE9 and IRQ 18 ACPI: PCI Interrupt 0000:00:07.0[A] -> Link [APSI] -> GSI 22 (level, low) -> IRQ 233 PCI: Setting latency timer of device 0000:00:07.0 to 64 ata1: SATA max UDMA/133 cmd 0x9F0 ctl 0xBF2 bmdma 0xD800 irq 233 ata2: SATA max UDMA/133 cmd 0x970 ctl 0xB72 bmdma 0xD808 irq 233 scsi0 : sata_nv ata1: SATA link down (SStatus 0 SControl 300) scsi1 : sata_nv ieee1394: Host added: ID:BUS[0-01:1023] GUID[00301bb9000047cc] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: configured for UDMA/133 Vendor: ATA Model: WDC WD1500ADFD-0 Rev: 19.0 Type: Direct-Access ANSI SCSI revision: 05 ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 21 GSI 19 sharing vector 0x32 and IRQ 19 ACPI: PCI Interrupt 0000:00:08.0[A] -> Link [APSJ] -> GSI 21 (level, low) -> IRQ 50 PCI: Setting latency timer of device 0000:00:08.0 to 64 ata3: SATA max UDMA/133 cmd 0x9E0 ctl 0xBE2 bmdma 0xC400 irq 50 ata4: SATA max UDMA/133 cmd 0x960 ctl 0xB62 bmdma 0xC408 irq 50 scsi2 : sata_nv ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata3.00: configured for UDMA/133 scsi3 : sata_nv ata4: SATA link down (SStatus 0 SControl 300) Vendor: ATA Model: ST3120026AS Rev: 3.05 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 293046768 512-byte hdwr sectors (150040 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back SCSI device sda: 293046768 512-byte hdwr sectors (150040 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 > sd 1:0:0:0: Attached scsi disk sda SCSI device sdb: 234441648 512-byte hdwr sectors (120034 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back SCSI device sdb: 234441648 512-byte hdwr sectors (120034 MB) sdb: Write Protect is off sdb: Mode Sense: 00 3a 00 00 SCSI device sdb: drive cache: write back sdb: sdb1 < sdb5 sdb6 > sdb2 sdb3 sdb4 sd 2:0:0:0: Attached scsi disk sdb sd 1:0:0:0: Attached scsi generic sg0 type 0 sd 2:0:0:0: Attached scsi generic sg1 type 0 ieee1394: raw1394: /dev/raw1394 device initialized [...] anything to test / do to help narrow this down ?
Hello, Can you post the result of 'lspci -n'? Also, please try to connect the drive to other ports - ata4 or swap with ata3 - and see what happens.
Hello Tejun, please ignore my noise, this is probably not a bug... I solved it in the BIOS, by disabling the IDE controller: 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) Which was probably interacting with the first sata port to which this new drive has been plugged. The problem somehow didn't show on the 2 other sata disks I had previously in there, which is a bit strange... Would it be possible to detect this kind of misconfiguration, and display a more informative message, something like "Try tweaking BIOS ide related settings" to help people having the same problems... Maybe it could be considered as a bug that linux cannot use the drive despite the BIOS being badly configured, as I said, the other OS did work around the broken setup... Maybe they have better HW information than you have... Here is the lspci -n output 00:00.0 0580: 10de:005e (rev a3) 00:01.0 0601: 10de:0050 (rev a3) 00:01.1 0c05: 10de:0052 (rev a2) 00:02.0 0c03: 10de:005a (rev a2) 00:02.1 0c03: 10de:005b (rev a3) 00:06.0 0101: 10de:0053 (rev a2) 00:07.0 0101: 10de:0054 (rev a3) 00:08.0 0101: 10de:0055 (rev a3) 00:09.0 0604: 10de:005c (rev a2) 00:0a.0 0680: 10de:0057 (rev a3) 00:0b.0 0604: 10de:005d (rev a3) 00:0c.0 0604: 10de:005d (rev a3) 00:0d.0 0604: 10de:005d (rev a3) 00:0e.0 0604: 10de:005d (rev a3) 00:18.0 0600: 1022:1100 00:18.1 0600: 1022:1101 00:18.2 0600: 1022:1102 00:18.3 0600: 1022:1103 01:00.0 0300: 1002:554d 01:00.1 0380: 1002:556d 05:06.0 0401: 1412:1724 (rev 01) 05:07.0 0c00: 1106:3044 (rev 80) I'll try to hook that disk to another port later, maybe saturday, because I want to see the performance impact of separating it from the primary disk's (WDC) controller. If you want to close the bug, I've no objection.
Yeah, probably the IDE and SATA controllers are sharing primary/secondary IDE interfaces, allocating two to IDE and two to SATA. Boards w/ intel chipsets do similar stuff. The reason why the other OS (Windows, I presume) can use all four ports is probably because it uses the better interface (ADMA) to program the controller which doesn't have legacy IDE limitations. There's alpha version of adma sata_nv driver in libata-dev tree but currently it seems to lack momentum. Anyways, I'm closing the bug.
Maybe I jumped to conclusion too quickly... The first fdisk -l showed wrong partition type, a "SFS" that I've never seen anywhere, but I mounted it nevertheless, knowing or thinking it was ntfs... localhost ~ # fdisk -l /dev/sda Disk /dev/sda: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 60801 488384001 42 SFS localhost ~ # mount -t ntfs /dev/sda1 /mnt/cdrom localhost ~ # cd /mnt/cdrom/ localhost cdrom # l total 0 dr-x------ 1 root root 0 Jun 27 20:40 System Volume Information/ I left that disk alone, idle, the ntfs partition on it mounted (rw, btw, d'oh...) and later wanted to "fdisk -l" it again, that command hanged in D+ state, another "ls -l /mount/point" one too. After a while, they exited with: localhost ~ # ls /mnt/cdrom/ ls: reading directory /mnt/cdrom/: Input/output error or nothing from fdisk, and the dmesg showed that: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180000 action 0x2 frozen ata1.00: (BMDMA stat 0x21) ata1.00: tag 0 cmd 0xc8 Emask 0x4 stat 0x40 err 0x0 (timeout) ata1: port is slow to respond, please be patient ata1: port failed to respond (30 secs) ata1: soft resetting port ata1: port is slow to respond, please be patient ata1: port failed to respond (30 secs) ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ATA: abnormal status 0xD0 on port 0x9F7 ATA: abnormal status 0xD0 on port 0x9F7 ATA: abnormal status 0xD0 on port 0x9F7 ATA: abnormal status 0xD0 on port 0x9F7 ATA: abnormal status 0xD0 on port 0x9F7 ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata1.00: revalidation failed (errno=-5) ata1: failed to recover some devices, retrying in 5 secs ata1: hard resetting port ata1: SATA link down (SStatus 100 SControl 300) ata1: failed to recover some devices, retrying in 5 secs ata1: hard resetting port ata1: SATA link down (SStatus 100 SControl 300) ata1.00: disabled ata1: EH pending after completion, repeating EH (cnt=4) ata1: soft resetting port ata1: SATA link down (SStatus 100 SControl 300) sd 0:0:0:0: SCSI error: return code = 0x08000002 sda: Current: sense key=0xb ASC=0x0 ASCQ=0x0 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 Buffer I/O error on device sda, logical block 1 Buffer I/O error on device sda, logical block 2 Buffer I/O error on device sda, logical block 3 sd 0:0:0:0: rejecting I/O to offline device ata1: EH complete ata1.00: detaching (SCSI 0:0:0:0) sd 0:0:0:0: SCSI error: return code = 0x00010000 end_request: I/O error, dev sda, sector 6291527 NTFS-fs error (device sda1): ntfs_end_buffer_async_read(): Buffer I/O error, logical block 0x600008. NTFS-fs error (device sda1): ntfs_end_buffer_async_read(): Buffer I/O error, logical block 0x600009. NTFS-fs error (device sda1): ntfs_end_buffer_async_read(): Buffer I/O error, logical block 0x60000a. NTFS-fs error (device sda1): ntfs_end_buffer_async_read(): Buffer I/O error, logical block 0x60000b. NTFS-fs error (device sda1): ntfs_end_buffer_async_read(): Buffer I/O error, logical block 0x60000c. NTFS-fs error (device sda1): ntfs_end_buffer_async_read(): Buffer I/O error, logical block 0x60000d. scsi 0:0:0:0: rejecting I/O to dead device Reducing readahead size to 32K scsi 0:0:0:0: rejecting I/O to dead device ata1: soft resetting port ata1: SATA link down (SStatus 100 SControl 300) ata1: EH pending after completion, repeating EH (cnt=4) ata1: EH complete with last 4 lines repeating over and over...
Is this looking like a failing drive or driver ? Should I test it more thoroughly with that other OS to rule out disk failure, or is it pointless anyway ? I'm sorry but I fear testing "adma sata_nv driver in libata-dev tree" because all of my other disks (system and data) are hooked to the same controller and so handled by the same driver... If it would be possible to use vanilla sata_nv driver for those disks and the adma one for the problematic one, I might try it...
And I have closed it too soon. I'll just leave it closed until things become clearer. I have the same request - move the problemetic disk to another port, swap with working disks and see what happens. Your last report seems like a hardware or cabling problem to me and swapping disks is a good way to find out those problems. The adma driver is in early alpha stage. I don't think using it is a good idea.
Argh, I think you were right, I've just plugged it on the cable of a working HD, and it worked great there, I've not yet tested the other disk on the failing cable, will do next. Survived: dd if=/dev/sda1 of=/dev/null bs=64k (^C after 2 minutes, 15Go read from HD, 70+ Mo/s) mkfs.ext2 /dev/sda1 bonnie++ on ext2 1.93c,1.93c,localhost,1,1151688993,2G,,584,99,68588,17,30958,8,921,98,72852,11,226.8,5,16,,,,,4412,97,+++++,+++,+++++,+++,4836,98,+++++,+++,16319,97,14725us,229ms,188ms,47224us,35835us,546ms,22464us,8065us,78us,4410us,8021us,7025us But I'm puzzled, how come that Win2k3 did work around a HW problem ? This does not sound right to me. I'll test more combinations...
This is starting to get really strange, I let the 7200.10 on the WDC cable, plugged the WDC in the cable where the 7200.10 failed, and now both seem to work, the 3rd either, as I'm currently pushing the 3 of them to hard work and everything goes right... I think the 2 that are on the same controller are stealing performance from each other, it's not benchmarked, but it's visible.
Argh, something is definitively wrong with this sata port, as now it's the perfectly working WDC drive that get the EH's, but strangely, it does not get the port is slow msg at boot time... Is there something I could try, or some info I could send to get help debugging this ? Should the bug status change to "reopened" ?
Just a comment to let other know this really was a HW/cabling problem, as now with a new cable all 3 drives are working properly... Thanks Tejun, and sorry to have make you lose time on this issue...
Great. Thanks for reporting the result. This one is *really* closed now.
Just a note to let others know this is looking more and more like a cabling problem, because after a few monthes, I got new sata problems which disappeared as soon as I changed the sata cable from the one built into the case to a probably better one I got with another motherboard, despite the new one being longer. The now 2 failing cables were the ones I got builtin the Shuttle SN25p. Maybe I'll come again here when the third one (still there) fails... :-)