Bug 14922

Summary: 2.6.32 seemed to have broken nVidia MCP7A sata controller
Product: IO/Storage Reporter: Rafael J. Wysocki (rjw)
Component: Serial ATAAssignee: Jeff Garzik (jgarzik)
Status: CLOSED CODE_FIX    
Severity: normal CC: florian, hancockrwd, pchen, tobias, zbiggy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 14230    
Attachments: Proposed fix from Robert Hancock

Description Rafael J. Wysocki 2009-12-28 23:53:03 UTC
Subject    : 2.6.32 seemed to have broken nVidia MCP7A sata controller
Submitter  : Mike Cui <cuicui@gmail.com>
Date       : 2009-12-19 6:13
References : http://marc.info/?l=linux-ide&m=126120323407742&w=4
Handled-By : Jeff Garzik <jeff@garzik.org>
Handled-By : Robert Hancock <hancockrwd@gmail.com>

This entry is being used for tracking a regression from 2.6.31.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Rafael J. Wysocki 2009-12-29 21:45:49 UTC
On Tuesday 29 December 2009, Robert Hancock wrote:
> On Tue, Dec 29, 2009 at 9:28 AM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=14922
> > Subject         : 2.6.32 seemed to have broken nVidia MCP7A sata controller
> > Submitter       : Mike Cui <cuicui@gmail.com>
> > Date            : 2009-12-19 6:13 (11 days old)
> > References      : http://marc.info/?l=linux-ide&m=126120323407742&w=4
> > Handled-By      : Jeff Garzik <jeff@garzik.org>
> >                  Robert Hancock <hancockrwd@gmail.com>
> 
> Yes, FPDMA auto-activate optimization was introduced for AHCI in
> 2.6.32 and it appears it doesn't work quite right with either the
> reporter's AHCI controller or their drive. I believe they were going
> to try the drive with an Intel controller to see if it worked there.
> 
> It would be useful if we could get other success/failure reports with
> either the particular drive, WDC WD800ADFS-75SLR2 (or at least other
> WD Raptor ADFS-series) on other AHCI controllers, or other drives
> which have AA support on the MCP7A chipset. One of the two needs
> blacklisting for AA support. I'm leaning towards the controller since
> other WD drives with AA support work fine on Intel AHCI.
Comment 2 Robert Hancock 2010-01-10 23:15:47 UTC
Adding Peer Chen to CC: are you aware of any problems with FPDMA auto-activate optimization on NVIDIA MCP7A AHCI or any other NVIDIA AHCI controllers?
Comment 3 Rafael J. Wysocki 2010-01-11 19:53:07 UTC
On Monday 11 January 2010, Robert Hancock wrote:
> On Sun, Jan 10, 2010 at 4:56 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=14922
> > Subject         : 2.6.32 seemed to have broken nVidia MCP7A sata controller
> > Submitter       : Mike Cui <cuicui@gmail.com>
> > Date            : 2009-12-19 6:13 (23 days old)
> > References      : http://marc.info/?l=linux-ide&m=126120323407742&w=4
> > Handled-By      : Jeff Garzik <jeff@garzik.org>
> >                  Robert Hancock <hancockrwd@gmail.com>
> 
> Still outstanding. Waiting for testing results from the reporter with
> that drive on another AHCI controller, if possible. I CCed Peer Chen
> from NVIDIA on the Bugzilla report, maybe they know if there is some
> known problem with FPDMA AA on that controller.
Comment 4 Rafael J. Wysocki 2010-01-25 20:45:18 UTC
On Monday 25 January 2010, Robert Hancock wrote:
> On Sun, Jan 24, 2010 at 4:23 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.31 and 2.6.32.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.31 and 2.6.32.  Please verify if it still should
> > be listed and let me know (either way).
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=14922
> > Subject         : 2.6.32 seemed to have broken nVidia MCP7A sata controller
> > Submitter       : Mike Cui <cuicui@gmail.com>
> > Date            : 2009-12-19 6:13 (37 days old)
> > References      : http://marc.info/?l=linux-ide&m=126120323407742&w=4
> > Handled-By      : Jeff Garzik <jeff@garzik.org>
> >                  Robert Hancock <hancockrwd@gmail.com>
> 
> It's a confirmed regression. Waiting on some lspci -nn output from the
> reporter. However, for now, disabling auto-activate optimization on
> all NVIDIA AHCIs may be the easiest option.
Comment 5 Jeff Garzik 2010-02-08 00:51:23 UTC
Created attachment 24940 [details]
Proposed fix from Robert Hancock
Comment 6 Tobias Munter 2010-03-03 13:07:43 UTC
Im sorry to say it's not only MCP7A that this happens on, I tried Jeff's patch, and after a few hours my disks started to do show the same errors (the patch was applied to nv controller, but my disks are attached to a marvell controller) I commented our the whole if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
+			pi.flags |= ATA_FLAG_FPDMA_AA;
section, and have been running stable now for 6 hours, I will report back if it starts hanging again.


lspci -vv:

03:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SX7042 PCI-e 4-port SATA-II (rev 02)
        Subsystem: Marvell Technology Group Ltd. Device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 19
        Region 0: Memory at feb00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at ec00 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <256ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <256ns, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: sata_mv
        Kernel modules: sata_mv
Comment 7 Robert Hancock 2010-03-03 14:30:00 UTC
If the controller's using sata_mv that code won't have any effect, it applies to the AHCI driver only..
Comment 8 Tobias Munter 2010-03-04 21:25:03 UTC
Hi,

I dont know if was unclear in my earlier post, or its just that i don't understand what the AHCI does - if it has anything to do with my marvel controller or not, but after commenting out the setting of ATA_FLAG_FPDMA_AA my controller behaves normal again..
Comment 9 Robert Hancock 2010-03-05 00:52:31 UTC
Whether the FPDMA_AA flag in ahci is set will have no effect on sata_mv, they're entirely separate drivers. If the problem seems to have gone away after that change then I'd say it's either coincidence or something else has actually changed.
Comment 10 Tobias Munter 2010-03-16 08:38:25 UTC
Happened again after recompile and complete removal of the AA flag, got any pointers to where I should start looking?

Br
Tobias
Comment 11 Zbigniew Luszpinski 2010-03-18 09:45:26 UTC
Please whitelist Nvidia nForce MCP78S. I run vanilla kernel 2.6.32.8, have only SATA drives and have no problems with SATA AHCI.

My config:
ASrock K10N78FullHD-hSLI 3.0
Seagate Barracuda 500GB 7200rpm 32MB SATA/300 NCQ
Samsung DVD+/-RW SH-S223C bulk black (Serial ATA) 

MCP78S work very well with kernels 2.6.32 except this damn OHCI USB which hangs without noapic or acpi=noirq.
Comment 12 Robert Hancock 2010-03-18 14:45:09 UTC
Does the kernel report it's using AA on the drive?
Comment 13 Zbigniew Luszpinski 2010-03-18 15:12:19 UTC
How can I check in dmesg if ahci uses AA? If you tell me how to enable it manually I can test it.
Comment 14 Robert Hancock 2010-03-18 23:12:20 UTC
The line where the drive gets detected should show AA if the kernel is using it:

ata1.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA

Of course if your kernel already has the patch where it was disabled on NVIDIA, it won't show it.

You can tell if your drive supports it at all by doing "hdparm -I /dev/sdX" and looking for a line listing "DMA Setup Auto-Activate optimization".
Comment 15 Zbigniew Luszpinski 2010-03-21 19:19:30 UTC
ata2.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32)
hdparm -I /dev/sda | grep DMA
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
           *    {READ,WRITE}_DMA_EXT_GPL commands

It seems my hdd does not support AA. That is why I did not encounter the bug.
My hdd: Seagate ST3500320AS
hdparm v9.28
Forget about my comments. I'm unable to test this bug on my mainboard.
Comment 16 Florian Mickler 2011-02-02 09:49:48 UTC
the fix for this issue got merged in v2.6.34-rc1:

commit 453d3131ec7aab82eaaa8401a50522a337092aa8
Author: Robert Hancock <hancockrwd@gmail.com>
Date:   Tue Jan 26 22:33:23 2010 -0600

    ahci: disable FPDMA auto-activate optimization on NVIDIA AHCI