Latest working kernel version:2.6.27 Earliest failing kernel version:2.6.27.10 Hardware Environment:x86 ASUS M2N-E Problem Description: sata_nv hotplug dont work in 2.6.27.10 Steps to reproduce: when I hotplug my sata disk dmesg say: - kernel 2.6.27: ta2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake } ata2: hard resetting link ata2: link is slow to respond, please be patient (ready=-19) ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: ST3320620AS, 3.AAJ, max UDMA/133 ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata2: EH complete scsi 1:0:0:0: Direct-Access ATA ST3320620AS 3.AA PQ: 0 ANSI: 5 sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13 sdb14 sdb15 > sd 1:0:0:0: [sdb] Attached SCSI disk - kernel 2.6.27.10: ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake Dispar } ata2: link is slow to respond, please be patient (ready=0) ata2: device not ready (errno=-16), forcing hardreset ata2: soft resetting link ata2: link is slow to respond, please be patient (ready=0) ata2: SRST failed (errno=-16) ata2: soft resetting link ata2: link is slow to respond, please be patient (ready=0) ata2: SRST failed (errno=-16) ata2: soft resetting link ata2: link is slow to respond, please be patient (ready=0) ata2: SRST failed (errno=-16) ata2: limiting SATA link speed to 1.5 Gbps ata2: soft resetting link ata2: SRST failed (errno=-16) ata2: reset failed, giving up ata2: EH complete Thanks.
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Sat, 3 Jan 2009 13:12:09 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=12351 > > Summary: sata_nv hotplug not work in 2.6.27.10 > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.27.10 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Serial ATA > AssignedTo: jgarzik@pobox.com > ReportedBy: gpanco@tiscali.it > > > Latest working kernel version:2.6.27 > Earliest failing kernel version:2.6.27.10 A regression in -stable. > Hardware Environment:x86 ASUS M2N-E > Problem Description: sata_nv hotplug dont work in 2.6.27.10 > > Steps to reproduce: > > when I hotplug my sata disk dmesg say: > > - kernel 2.6.27: > > ta2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen > ata2: SError: { PHYRdyChg CommWake } > ata2: hard resetting link > ata2: link is slow to respond, please be patient (ready=-19) > ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) > ata2.00: ATA-7: ST3320620AS, 3.AAJ, max UDMA/133 > ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) > ata2.00: configured for UDMA/133 > ata2: EH complete > scsi 1:0:0:0: Direct-Access ATA ST3320620AS 3.AA PQ: 0 ANSI: 5 > sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) > sd 1:0:0:0: [sdb] Write Protect is off > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) > sd 1:0:0:0: [sdb] Write Protect is off > sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 > sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support > DPO or FUA > sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13 sdb14 > sdb15 > > > sd 1:0:0:0: [sdb] Attached SCSI disk > > > - kernel 2.6.27.10: > > ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xe frozen > ata2: SError: { PHYRdyChg CommWake Dispar } > ata2: link is slow to respond, please be patient (ready=0) > ata2: device not ready (errno=-16), forcing hardreset > ata2: soft resetting link > ata2: link is slow to respond, please be patient (ready=0) > ata2: SRST failed (errno=-16) > ata2: soft resetting link > ata2: link is slow to respond, please be patient (ready=0) > ata2: SRST failed (errno=-16) > ata2: soft resetting link > ata2: link is slow to respond, please be patient (ready=0) > ata2: SRST failed (errno=-16) > ata2: limiting SATA link speed to 1.5 Gbps > ata2: soft resetting link > ata2: SRST failed (errno=-16) > ata2: reset failed, giving up > ata2: EH complete >
(CCing Tejun) Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Sat, 3 Jan 2009 13:12:09 -0800 (PST) > bugme-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=12351 >> >> Summary: sata_nv hotplug not work in 2.6.27.10 >> Product: IO/Storage >> Version: 2.5 >> KernelVersion: 2.6.27.10 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Serial ATA >> AssignedTo: jgarzik@pobox.com >> ReportedBy: gpanco@tiscali.it >> >> >> Latest working kernel version:2.6.27 >> Earliest failing kernel version:2.6.27.10 > > A regression in -stable. Does reverting this patch help? http://git.kernel.org/?p=linux/kernel/git/hpa/linux-2.6-allstable.git;a=commit;h=814eb57e1799337d9fbb68f5d838afa507dc014e > >> Hardware Environment:x86 ASUS M2N-E OK, this is an MCP61 board. We're now using softreset instead of hardreset on hotplug and apparently that doesn't work. Thing is that: http://bugzilla.kernel.org/show_bug.cgi?id=11195 reported that hardreset was borked on that controller. Seems kind of contradictory.. /if only NVidia could be consistent in its hardware bugs.. >> Problem Description: sata_nv hotplug dont work in 2.6.27.10 >> >> Steps to reproduce: >> >> when I hotplug my sata disk dmesg say: >> >> - kernel 2.6.27: >> >> ta2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen >> ata2: SError: { PHYRdyChg CommWake } >> ata2: hard resetting link >> ata2: link is slow to respond, please be patient (ready=-19) >> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) >> ata2.00: ATA-7: ST3320620AS, 3.AAJ, max UDMA/133 >> ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) >> ata2.00: configured for UDMA/133 >> ata2: EH complete >> scsi 1:0:0:0: Direct-Access ATA ST3320620AS 3.AA PQ: 0 ANSI: 5 >> sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) >> sd 1:0:0:0: [sdb] Write Protect is off >> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 >> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support >> DPO or FUA >> sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) >> sd 1:0:0:0: [sdb] Write Protect is off >> sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 >> sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support >> DPO or FUA >> sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13 sdb14 >> sdb15 >> sd 1:0:0:0: [sdb] Attached SCSI disk >> >> >> - kernel 2.6.27.10: >> >> ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xe frozen >> ata2: SError: { PHYRdyChg CommWake Dispar } >> ata2: link is slow to respond, please be patient (ready=0) >> ata2: device not ready (errno=-16), forcing hardreset >> ata2: soft resetting link >> ata2: link is slow to respond, please be patient (ready=0) >> ata2: SRST failed (errno=-16) >> ata2: soft resetting link >> ata2: link is slow to respond, please be patient (ready=0) >> ata2: SRST failed (errno=-16) >> ata2: soft resetting link >> ata2: link is slow to respond, please be patient (ready=0) >> ata2: SRST failed (errno=-16) >> ata2: limiting SATA link speed to 1.5 Gbps >> ata2: soft resetting link >> ata2: SRST failed (errno=-16) >> ata2: reset failed, giving up >> ata2: EH complete >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: >>> Hardware Environment:x86 ASUS M2N-E > > OK, this is an MCP61 board. We're now using softreset instead of hardreset > on hotplug and apparently that doesn't work. Thing is that: no, it is not MCP61, but MCP55: dual ~ # lspci 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1) 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2) 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) dual ~ # dmidecode # dmidecode 2.9 SMBIOS 2.4 present. 72 structures occupying 2069 bytes. Table at 0x000F0000. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: Phoenix Technologies, LTD Version: ASUS M2N-E ACPI BIOS Revision 1601
Giovanni Pancotti wrote: > On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: > >>>> Hardware Environment:x86 ASUS M2N-E >> OK, this is an MCP61 board. We're now using softreset instead of hardreset >> on hotplug and apparently that doesn't work. Thing is that: > > no, it is not MCP61, but MCP55: Ahh, ok, that is less contradictory then :-) Presumably we should still be using hardreset on that chipset. > > dual ~ # lspci > 00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1) > 00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2) > 00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2) > 00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) > 00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) > 00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) > 00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) > > dual ~ # dmidecode > # dmidecode 2.9 > SMBIOS 2.4 present. > 72 structures occupying 2069 bytes. > Table at 0x000F0000. > > Handle 0x0000, DMI type 0, 24 bytes > BIOS Information > Vendor: Phoenix Technologies, LTD > Version: ASUS M2N-E ACPI BIOS Revision 1601 > >
Ah... we had report of broken hardreset on GENERIC and MCP55 shares code paths with GENERIC other than command issue path, so I assumed it would behave the same (in fact, w/ swncq disabled it shares all the code paths). Making MCP55 to use hardreset isn't difficult but I'm afraid it might break boot probing on some machines. GENERIC probing failure didn't occur on all the machines. Argggh.... how many reset related bugs can this series of chips have? Does anyone know where MCP55 is located in the chipset family tree? I wanna make sure there's meaningful distinction between GENERICs and SWNCQs before making yet another switch. Also, I think we should wait till 2.6.29 rather than risking breaking boot probing on 2.6.28 yet again. :-(
Created attachment 19684 [details] swncq-hardreset-debug Can you please verify whether this patch fixes the problem?
verified, hotplug don't works :-( dmesg at boot time: sata_nv 0000:00:05.0: version 3.5 ACPI: PCI Interrupt Link [APSI] enabled at IRQ 23 sata_nv 0000:00:05.0: PCI INT A -> Link[APSI] -> GSI 23 (level, low) -> IRQ 23 sata_nv 0000:00:05.0: Using SWNCQ mode sata_nv 0000:00:05.0: setting latency timer to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xdc00 irq 23 ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xdc08 irq 23 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: Maxtor 6V250F0, VA111900, max UDMA/133 ata1.00: 490234752 sectors, multi 1: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 scsi 0:0:0:0: Direct-Access ATA Maxtor 6V250F0 VA11 PQ: 0 ANSI: 5 ata1.00: Disabling SWNCQ mode (depth 1) sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 > sd 0:0:0:0: [sda] Attached SCSI disk ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 22 sata_nv 0000:00:05.1: PCI INT B -> Link[APSJ] -> GSI 22 (level, low) -> IRQ 22 sata_nv 0000:00:05.1: Using SWNCQ mode sata_nv 0000:00:05.1: setting latency timer to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0x9e0 ctl 0xbe0 bmdma 0xc800 irq 22 ata4: SATA max UDMA/133 cmd 0x960 ctl 0xb60 bmdma 0xc808 irq 22 ACPI: PCI Interrupt Link [ASA2] enabled at IRQ 21 sata_nv 0000:00:05.2: PCI INT C -> Link[ASA2] -> GSI 21 (level, low) -> IRQ 21 sata_nv 0000:00:05.2: Using SWNCQ mode sata_nv 0000:00:05.2: setting latency timer to 64 scsi4 : sata_nv scsi5 : sata_nv ata5: SATA max UDMA/133 cmd 0xc400 ctl 0xc000 bmdma 0xb400 irq 21 ata6: SATA max UDMA/133 cmd 0xbc00 ctl 0xb800 bmdma 0xb408 irq 21 after hotplug: ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake Dispar } ata2: link is slow to respond, please be patient (ready=0) ata2: device not ready (errno=-16), forcing hardreset ata2: soft resetting link ata2: link is slow to respond, please be patient (ready=0) ata2: SRST failed (errno=-16) ata2: soft resetting link ata2: link is slow to respond, please be patient (ready=0) ata2: SRST failed (errno=-16) ata2: soft resetting link ata2: link is slow to respond, please be patient (ready=0) ata2: SRST failed (errno=-16) ata2: limiting SATA link speed to 1.5 Gbps ata2: soft resetting link ata2: SRST failed (errno=-16) ata2: reset failed, giving up ata2: EH complete
Created attachment 19712 [details] swncq-hardreset-debug-2 Sorry, I missed one line. Can you please test with both sata_nv.swncq=0 and sata_nv.swncq=1?
with swncq-hardreset-debug-2 and sata_nv.swncq=0, dmesg at boot: sata_nv 0000:00:05.0: version 3.5 ACPI: PCI Interrupt Link [APSI] enabled at IRQ 23 sata_nv 0000:00:05.0: PCI INT A -> Link[APSI] -> GSI 23 (level, low) -> IRQ 23 sata_nv 0000:00:05.0: setting latency timer to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xdc00 irq 23 ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xdc08 irq 23 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: Maxtor 6V250F0, VA111900, max UDMA/133 ata1.00: 490234752 sectors, multi 1: LBA48 NCQ (depth 0/32) ata1.00: configured for UDMA/133 ata2: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA Maxtor 6V250F0 VA11 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 > sd 0:0:0:0: [sda] Attached SCSI disk ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 22 sata_nv 0000:00:05.1: PCI INT B -> Link[APSJ] -> GSI 22 (level, low) -> IRQ 22 sata_nv 0000:00:05.1: setting latency timer to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0x9e0 ctl 0xbe0 bmdma 0xc800 irq 22 ata4: SATA max UDMA/133 cmd 0x960 ctl 0xb60 bmdma 0xc808 irq 22 ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) ACPI: PCI Interrupt Link [ASA2] enabled at IRQ 21 sata_nv 0000:00:05.2: PCI INT C -> Link[ASA2] -> GSI 21 (level, low) -> IRQ 21 sata_nv 0000:00:05.2: setting latency timer to 64 scsi4 : sata_nv scsi5 : sata_nv ata5: SATA max UDMA/133 cmd 0xc400 ctl 0xc000 bmdma 0xb400 irq 21 ata6: SATA max UDMA/133 cmd 0xbc00 ctl 0xb800 bmdma 0xb408 irq 21 ata5: SATA link down (SStatus 0 SControl 300) ata6: SATA link down (SStatus 0 SControl 300) after hotplug hdisk: no result :-( dual ~ # echo "- - -" > /sys/class/scsi_host/host1/scan ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xf ata2: SError: { PHYRdyChg CommWake Dispar } ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5) ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: NODEV after polling detection ata2: EH complete -------------------------------------------------------------- with swncq-hardreset-debug-2 and sata_nv.swncq=1, dmesg at boot: sata_nv 0000:00:05.0: version 3.5 ACPI: PCI Interrupt Link [APSI] enabled at IRQ 23 sata_nv 0000:00:05.0: PCI INT A -> Link[APSI] -> GSI 23 (level, low) -> IRQ 23 sata_nv 0000:00:05.0: Using SWNCQ mode sata_nv 0000:00:05.0: setting latency timer to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0x9f0 ctl 0xbf0 bmdma 0xdc00 irq 23 ata2: SATA max UDMA/133 cmd 0x970 ctl 0xb70 bmdma 0xdc08 irq 23 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: Maxtor 6V250F0, VA111900, max UDMA/133 ata1.00: 490234752 sectors, multi 1: LBA48 NCQ (depth 31/32) ata1.00: configured for UDMA/133 ata2: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA Maxtor 6V250F0 VA11 PQ: 0 ANSI: 5 ata1.00: Disabling SWNCQ mode (depth 1) sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 490234752 512-byte hardware sectors (251000 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 > sd 0:0:0:0: [sda] Attached SCSI disk ACPI: PCI Interrupt Link [APSJ] enabled at IRQ 22 sata_nv 0000:00:05.1: PCI INT B -> Link[APSJ] -> GSI 22 (level, low) -> IRQ 22 sata_nv 0000:00:05.1: Using SWNCQ mode sata_nv 0000:00:05.1: setting latency timer to 64 scsi2 : sata_nv scsi3 : sata_nv ata3: SATA max UDMA/133 cmd 0x9e0 ctl 0xbe0 bmdma 0xc800 irq 22 ata4: SATA max UDMA/133 cmd 0x960 ctl 0xb60 bmdma 0xc808 irq 22 ata3: SATA link down (SStatus 0 SControl 300) ata4: SATA link down (SStatus 0 SControl 300) ACPI: PCI Interrupt Link [ASA2] enabled at IRQ 21 sata_nv 0000:00:05.2: PCI INT C -> Link[ASA2] -> GSI 21 (level, low) -> IRQ 21 sata_nv 0000:00:05.2: Using SWNCQ mode sata_nv 0000:00:05.2: setting latency timer to 64 scsi4 : sata_nv scsi5 : sata_nv ata5: SATA max UDMA/133 cmd 0xc400 ctl 0xc000 bmdma 0xb400 irq 21 ata6: SATA max UDMA/133 cmd 0xbc00 ctl 0xb800 bmdma 0xb408 irq 21 ata5: SATA link down (SStatus 0 SControl 300) ata6: SATA link down (SStatus 0 SControl 300) after hotplug hdisk: ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake } ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2: EH complete dual ~ #echo "- - -" > /sys/class/scsi_host/host1/scan ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: ATA-7: ST3320620AS, 3.AAJ, max UDMA/133 ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: configured for UDMA/133 ata2: EH complete scsi 1:0:0:0: Direct-Access ATA ST3320620AS 3.AA PQ: 0 ANSI: 5 sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 1:0:0:0: [sdb] Write Protect is off sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sdb2 < sdb5 sdb6 sdb7 sdb8 sdb9 sdb10 sdb11 sdb12 sdb13 sdb14 sdb15 > sd 1:0:0:0: [sdb] Attached SCSI disk sd 1:0:0:0: Attached scsi generic sg1 type 0 better :-)
Hmm... I thought warmplug would work for swncq=0. :-( Can you please try hotplug several times without the explicit rescan request? Does it always fail to proceed after that?
On Tue, Jan 06, 2009 at 06:19:44PM -0600, Robert Hancock wrote: > Giovanni Pancotti wrote: > > On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: > > > >>>> Hardware Environment:x86 ASUS M2N-E > >> OK, this is an MCP61 board. We're now using softreset instead of hardreset > >> on hotplug and apparently that doesn't work. Thing is that: > > > > no, it is not MCP61, but MCP55: > > Ahh, ok, that is less contradictory then :-) Presumably we should still > be using hardreset on that chipset. So, do we know how to solve this in the 2.6.27.y tree? thanks, greg k-h
No, not yet and I'd really like to delay this to the next release rather than risking breaking nv yet again as it only affects hotplug. Giovanni, can you please test hotplug w/o the rescan request? Thanks.
Greg KH wrote: > On Tue, Jan 06, 2009 at 06:19:44PM -0600, Robert Hancock wrote: >> Giovanni Pancotti wrote: >>> On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: >>> >>>>>> Hardware Environment:x86 ASUS M2N-E >>>> OK, this is an MCP61 board. We're now using softreset instead of hardreset >>>> on hotplug and apparently that doesn't work. Thing is that: >>> no, it is not MCP61, but MCP55: >> Ahh, ok, that is less contradictory then :-) Presumably we should still >> be using hardreset on that chipset. > > So, do we know how to solve this in the 2.6.27.y tree? Can you try reverting this patch? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c324283e6cdb79210cf7975c3e40d3ba3e672b2 This likely isn't a proper fix as it will probably re-break some other chipsets but it will confirm what the problem is in this case. It looks like this patch changed MCP55 to inherit from generic_ops instead of common_ops which caused it to use soft reset instead of hard reset. Not sure if that was intentional or not.. Tejun? We really ought to fix up some of the naming in this driver to be less confusing. Especially the "generic" stuff should be renamed, it's not generic at all (it seems to only apply to MCP61 currently) yet it's used as a base operations for other chipset types.
Tejun, with swncq=0 hotplug definitely don't work. I've tried several time. with swncq=1 w/o rescan the same :-( . first hotplug: ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake } ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2: EH complete ata2: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen ata2: SError: { PHYRdyChg LinkSeq TrStaTrns } ata2: hard resetting link ata2: SATA link down (SStatus 0 SControl 300) ata2: EH complete second: ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake Dispar } ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) hda-intel: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj. ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5) ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: NODEV after polling detection ata2: EH complete third: ata2: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen ata2: SError: { PHYRdyChg LinkSeq TrStaTrns } ata2: hard resetting link ata2: SATA link down (SStatus 0 SControl 300) ata2: EH complete last: ata2: exception Emask 0x10 SAct 0x0 SErr 0x50000 action 0xe frozen ata2: SError: { PHYRdyChg CommWake } ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: qc timeout (cmd 0xec) ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5) ata2: hard resetting link ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata2.00: NODEV after polling detection ata2: EH complete Robert, I am tring the patch.
Robert, reverting the patch hotplug works.
Created attachment 19945 [details] swncq-hardreset-debug-3 Can you please try this one?
it works! :-) if you need other info/test, let me know. thanks a lot.
So as long as reset protocol is concerned, swncq controllers are much closer to nf2 than generic. Arghh... at this point, I can't say I have a lot positive feelings for this series of controllers with so many finely different reset protocol breakages. :-) Will forward the patch upstream. Just in case it might break other cases, I'll submit it for 2.6.29 but not 2.6.28-stable or 2.6.27-stable. Thanks.
On Fri, Jan 16, 2009 at 08:38:18PM -0600, Robert Hancock wrote: > Greg KH wrote: > > On Tue, Jan 06, 2009 at 06:19:44PM -0600, Robert Hancock wrote: > >> Giovanni Pancotti wrote: > >>> On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: > >>> > >>>>>> Hardware Environment:x86 ASUS M2N-E > >>>> OK, this is an MCP61 board. We're now using softreset instead of > hardreset > >>>> on hotplug and apparently that doesn't work. Thing is that: > >>> no, it is not MCP61, but MCP55: > >> Ahh, ok, that is less contradictory then :-) Presumably we should still > >> be using hardreset on that chipset. > > > > So, do we know how to solve this in the 2.6.27.y tree? > > Can you try reverting this patch? > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c324283e6cdb79210cf7975c3e40d3ba3e672b2 > > This likely isn't a proper fix as it will probably re-break some other > chipsets but it will confirm what the problem is in this case. It looks > like this patch changed MCP55 to inherit from generic_ops instead of > common_ops which caused it to use soft reset instead of hard reset. Not > sure if that was intentional or not.. Tejun? I don't want to revert that, as I don't want to break anything else :) thanks, greg k-h
Greg KH wrote: > On Fri, Jan 16, 2009 at 08:38:18PM -0600, Robert Hancock wrote: >> Greg KH wrote: >>> On Tue, Jan 06, 2009 at 06:19:44PM -0600, Robert Hancock wrote: >>>> Giovanni Pancotti wrote: >>>>> On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: >>>>> >>>>>>>> Hardware Environment:x86 ASUS M2N-E >>>>>> OK, this is an MCP61 board. We're now using softreset instead of >>>>>> hardreset >>>>>> on hotplug and apparently that doesn't work. Thing is that: >>>>> no, it is not MCP61, but MCP55: >>>> Ahh, ok, that is less contradictory then :-) Presumably we should still >>>> be using hardreset on that chipset. >>> So, do we know how to solve this in the 2.6.27.y tree? >> Can you try reverting this patch? >> >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c324283e6cdb79210cf7975c3e40d3ba3e672b2 >> >> This likely isn't a proper fix as it will probably re-break some other >> chipsets but it will confirm what the problem is in this case. It looks >> like this patch changed MCP55 to inherit from generic_ops instead of >> common_ops which caused it to use soft reset instead of hard reset. Not >> sure if that was intentional or not.. Tejun? > > I don't want to revert that, as I don't want to break anything else :) That was directed at the reporter :-) However, hopefully this patch in current git will resolve the problem: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2d775708bc6613f1be47f1e720781343341ecc94
On Mon, Feb 02, 2009 at 08:04:42PM -0600, Robert Hancock wrote: > Greg KH wrote: >> On Fri, Jan 16, 2009 at 08:38:18PM -0600, Robert Hancock wrote: >>> Greg KH wrote: >>>> On Tue, Jan 06, 2009 at 06:19:44PM -0600, Robert Hancock wrote: >>>>> Giovanni Pancotti wrote: >>>>>> On Monday 05 January 2009, alle 18:57, Robert Hancock wrote: >>>>>> >>>>>>>>> Hardware Environment:x86 ASUS M2N-E >>>>>>> OK, this is an MCP61 board. We're now using softreset instead of >>>>>>> hardreset on hotplug and apparently that doesn't work. Thing is that: >>>>>> no, it is not MCP61, but MCP55: >>>>> Ahh, ok, that is less contradictory then :-) Presumably we should still >>>>> be using hardreset on that chipset. >>>> So, do we know how to solve this in the 2.6.27.y tree? >>> Can you try reverting this patch? >>> >>> >>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=3c324283e6cdb79210cf7975c3e40d3ba3e672b2 >>> >>> This likely isn't a proper fix as it will probably re-break some other >>> chipsets but it will confirm what the problem is in this case. It looks >>> like this patch changed MCP55 to inherit from generic_ops instead of >>> common_ops which caused it to use soft reset instead of hard reset. Not >>> sure if that was intentional or not.. Tejun? >> I don't want to revert that, as I don't want to break anything else :) > > That was directed at the reporter :-) However, hopefully this patch in > current git will resolve the problem: > > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2d775708bc6613f1be47f1e720781343341ecc94 Ah nice, I'll queue that one up as well :) thanks, greg k-h
I wasn't really sure whether to put this in -stable or not as the pros and cons balanced each other very well. ie. it's seemingly safe regression fix vs. it's only hotplug (boot is not broken) && any code change is dangerous. Anyways, getting it into -stable is probably the better choice and if this one is going into -stable, the following one should too. http://article.gmane.org/gmane.linux.ide/38011 Thanks.
I don't know if any patch has been submitted to 2.6.29 but hotplug still doesn't work in the 2.6.29.3 version (MCP61 chipset). Works ok with 2.6.24.5 and "echo scsi add-single-device 1 0 0 0 > /proc/scsi/scsi". Output with 2.6.24.5: [ 73.875304] ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xb [ 73.875311] ata2: SError: { PHYRdyChg CommWake Dispar } [ 73.875320] ata2: hard resetting link [ 75.595026] ata2: SRST failed (errno=-19) [ 75.595032] ata2: reset failed (errno=-19), retrying in 9 secs [ 83.857952] ata2: hard resetting link [ 84.929820] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 84.949985] ata2.00: HPA detected: current 488395055, native 488397168 [ 84.949994] ata2.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133 [ 84.950016] ata2.00: 488395055 sectors, multi 0: LBA48 NCQ (depth 0/32) [ 84.989959] ata2.00: configured for UDMA/133 [ 84.989975] ata2: EH complete [ 84.990031] scsi 1:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5 [ 84.990114] sd 1:0:0:0: [sdd] 488395055 512-byte hardware sectors (250058 MB) [ 84.990152] sd 1:0:0:0: [sdd] Write Protect is off [ 84.990155] sd 1:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 84.990169] sd 1:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 84.990224] sd 1:0:0:0: [sdd] 488395055 512-byte hardware sectors (250058 MB) [ 84.990231] sd 1:0:0:0: [sdd] Write Protect is off [ 84.990233] sd 1:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 84.990282] sd 1:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 84.990287] sdd: sdd1 sdd2 sdd3 [ 85.013247] sd 1:0:0:0: [sdd] Attached SCSI disk Output with 2.6.29.3: [563189.384129] ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xf [563189.384139] ata2: SError: { PHYRdyChg CommWake Dispar } [563190.112529] ata2: soft resetting link [563195.320030] ata2: link is slow to respond, please be patient (ready=0) [563200.140026] ata2: SRST failed (errno=-16) [563200.140037] ata2: soft resetting link [563205.340025] ata2: link is slow to respond, please be patient (ready=0) [563210.160021] ata2: SRST failed (errno=-16) [563210.160032] ata2: soft resetting link [563215.330026] ata2: link is slow to respond, please be patient (ready=0) [563245.200038] ata2: SRST failed (errno=-16) [563245.200049] ata2: limiting SATA link speed to 1.5 Gbps [563245.200057] ata2: soft resetting link [563250.270046] ata2: SRST failed (errno=-16) [563250.270055] ata2: reset failed, giving up [563250.270064] ata2: EH complete
Hello, Samo. Hardreset was removed from sata_nv because it didn't work reliably. Please read bko#11195 for details. Because sometimes the link fails to come online, even if it's enabled only when softreset fails, we run the risk of losing the device due to hardreset killing the link when retrial of softreset can recover it. Maybe it can be modified such that only hotplug event uses hardreset. Ugh... this is getting uglier than I ever imagined. :-(
Created attachment 21645 [details] nv-hardreset-only-on-probing.patch Samo, can you please try this patch? Thanks.
Created attachment 21647 [details] nv-hardreset-only-on-probing.patch Slightly updated. Please try this one. Thanks.
Created attachment 21649 [details] nv-hardreset-only-on-probing.patch Oops, inverted condition on the update. Please test this one. Sorry about the fuss.
Vanilla 2.6.29.4: [ 100.104368] ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xf [ 100.104374] ata2: SError: { PHYRdyChg CommWake Dispar } [ 100.832524] ata2: soft resetting link [ 106.032575] ata2: link is slow to respond, please be patient (ready=0) [ 110.892534] ata2: SRST failed (errno=-16) [ 110.892541] ata2: soft resetting link [ 116.091302] ata2: link is slow to respond, please be patient (ready=0) [ 120.952514] ata2: SRST failed (errno=-16) [ 120.952521] ata2: soft resetting link [ 126.152518] ata2: link is slow to respond, please be patient (ready=0) [ 155.970025] ata2: SRST failed (errno=-16) [ 155.970035] ata2: limiting SATA link speed to 1.5 Gbps [ 155.970042] ata2: soft resetting link [ 160.992547] ata2: SRST failed (errno=-16) [ 160.992555] ata2: reset failed, giving up [ 160.992564] ata2: EH complete Patching it: patch -p1<../nv-sata.patch patching file drivers/ata/libata-core.c Hunk #1 succeeded at 5382 (offset -26 lines). Hunk #3 succeeded at 5995 (offset -26 lines). patching file drivers/ata/sata_nv.c Hunk #2 succeeded at 415 (offset -1 lines). Hunk #4 succeeded at 450 (offset -1 lines). Hunk #6 succeeded at 1561 (offset -1 lines). Patched 2.6.29.4: [ 88.168815] ata2: exception Emask 0x10 SAct 0x0 SErr 0x150000 action 0xf [ 88.168821] ata2: SError: { PHYRdyChg CommWake Dispar } [ 88.168831] ata2: hard resetting link [ 92.972538] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 93.032766] ata2.00: HPA detected: current 488395055, native 488397168 [ 93.032773] ata2.00: ATA-7: ST3250410AS, 3.AAC, max UDMA/133 [ 93.032775] ata2.00: 488395055 sectors, multi 0: LBA48 NCQ (depth 0/32) [ 93.072780] ata2.00: configured for UDMA/133 [ 93.072787] ata2: EH complete [ 93.072876] scsi 1:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5 [ 93.073157] sd 1:0:0:0: [sdd] 488395055 512-byte hardware sectors: (250 GB/232 GiB) [ 93.073172] sd 1:0:0:0: [sdd] Write Protect is off [ 93.073174] sd 1:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 93.073192] sd 1:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 93.073255] sd 1:0:0:0: [sdd] 488395055 512-byte hardware sectors: (250 GB/232 GiB) [ 93.073266] sd 1:0:0:0: [sdd] Write Protect is off [ 93.073268] sd 1:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 93.073284] sd 1:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 93.073289] sdd: sdd1 sdd2 sdd3 [ 93.095861] sd 1:0:0:0: [sdd] Attached SCSI disk Seems it works, thanks :) Samo
Great. Can you please attach full kernel log including boot messages and the hotplug messages? Also, please test booting with the port occupied, detaching the drive and then reattaching quickly. Thanks.
Created attachment 21670 [details] boot messages
I hope the kernel log helps. The hotplug detection process was started manually after inserting the disk with the "echo scsi add-single-device 1 0 0 0 > /proc/scsi/scsi" command. I'm sorry but I won't be able to help you with the boot deataching test - the box is a production server and I can't play with it that much. Samo
Seems like it's working as expected. Hmmm... the boot detaching test isn't pervasive at all tho. 1. Boot with ata2.00 occupied (the drive you used for warmplug testing) 2. Remove the drive and do "echo - - - > /sys/class/scsi_host/host1/scan". ATA scan will kick in and detach the drive after retrying a few times. 3. Re-plug the drive and do "echo - - - > /sys/class/scsi_host/host1/scan". The drive should appear again. 4. Attach the output of "dmesg". Thanks.
Will do during the weekend. Samo
Created attachment 21783 [details] The test This is the output from the requested test. I guess it works as it should, but you will know better :) Samo
Thanks for testing. Yes, it worked as expected. I'll polish up the patch and submit upstream.
Resolving as FIXED.