Bug 7538

Summary: WARNING at drivers/ata/sata_nv.c:762 nv_adma_bmdma_setup()
Product: IO/Storage Reporter: Nicolas Mailhot (Nicolas.Mailhot)
Component: Serial ATAAssignee: Robert Hancock (hancockrwd)
Status: RESOLVED CODE_FIX    
Severity: normal CC: akpm, bunk, hancockrwd, htejun
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19-rc5-mm2 - 2.6.19-rc6-mm1 Subsystem:
Regression: --- Bisected commit-id:
Attachments: Successful 2.6.19-rc5-git6 dmesg
Problem 2.6.19-rc5-mm2 dmesg
Kernel config
lspci
Patch to fix silly mistake in nv_adma_bmdma functions
dmesg with attachment #9540
Another patch to fix BMDMA setup
Another patch to fix BMDMA setup - with notifier clear fix
9547: dmesg with attachment #9553
Patch with more debugging
dmesg with attachment #9563
Patch for ATAPI support in ADMA mode
dmesg with attachment #9625

Description Nicolas Mailhot 2006-11-16 14:18:43 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.19-rc6

Distribution: Fedora Core Devel

Hardware Environment: CK804 mainboard with DVD + 2 PATA on CK804 and core system
on secondary sata chipset (sil)

Problem Description:

Lots of traces, dvd drive dead

Steps to reproduce:

boot
Comment 1 Nicolas Mailhot 2006-11-16 14:20:33 UTC
Created attachment 9535 [details]
Successful 2.6.19-rc5-git6 dmesg
Comment 2 Nicolas Mailhot 2006-11-16 14:21:48 UTC
Created attachment 9536 [details]
Problem 2.6.19-rc5-mm2 dmesg
Comment 3 Nicolas Mailhot 2006-11-16 14:23:24 UTC
Created attachment 9537 [details]
Kernel config
Comment 4 Nicolas Mailhot 2006-11-16 14:24:44 UTC
Created attachment 9538 [details]
lspci
Comment 5 Andrew Morton 2006-11-16 14:34:27 UTC
looks like sata_nv shat itself quite seriously here.
Comment 6 Robert Hancock 2006-11-16 15:07:55 UTC
That would be a problem in the sata_nv ADMA patch. I'll look into this, it seems
like a simple bug. This code path probably isn't used other than in the case of
an ATAPI device connected to the SATA controller which you may well have been
the first person to have tested this patch with.

In the meantime you can try disabling ADMA by passing the parameter adma=0 to
the sata_nv module on loading, if you can, or change this line:

static int adma_enabled = 1;

to

static int adma_enabled = 0;

in sata_nv.c.
Comment 7 Nicolas Mailhot 2006-11-16 15:19:56 UTC
> In the meantime you can try disabling ADMA by passing the parameter adma=0 to
> the sata_nv module on loading, if you can,

is that sata_nv.adma=0 on the kernel command line or something else?
Comment 8 Robert Hancock 2006-11-16 15:43:43 UTC
Created attachment 9540 [details]
Patch to fix silly mistake in nv_adma_bmdma functions

That would be if your sata_nv.c was not built modular, I believe.

But first, can you test this patch? It should at least get rid of those stack
dumps, hopefully the drive will actually work as well :-) I don't have any
means of testing it myself (not even for non-ATAPI at the moment,
unfortunately).
Comment 9 Nicolas Mailhot 2006-11-17 03:27:53 UTC
With attachment #9540 [details] the traces are gone but the dvd drive is still dead
Comment 10 Robert Hancock 2006-11-17 05:01:12 UTC
Can you post the full dmesg again with the patch?
Comment 11 Nicolas Mailhot 2006-11-17 06:11:50 UTC
Created attachment 9547 [details]
dmesg with attachment #9540 [details]

sorry about forgetting this
Comment 12 Robert Hancock 2006-11-17 15:35:28 UTC
Created attachment 9552 [details]
Another patch to fix BMDMA setup

OK, I think I may see the error of my ways now. These controllers allow using
all of the registers in MMIO mode except for the legacy BMDMA registers (or at
least I know of no way to use them in MMIO mode). I was just overriding the
MMIO flag around the call to the bmdma functions, but those seem to call other
functions which may depend on the MMIO flag as well and access other registers
which should in fact be used as MMIO. I've added the appropriate code in now to
do the bmdma operations directly without messing with the MMIO flag.

This is against plain 2.6.19-rc5-mm2 again (so you'll have to revert my last
patch and put this on). It's once again compile tested only. Let me know how it
works. If it doesn't work, post the dmesg again.

By the way I noticed you have some drives attached to the Silicon Image
controller on your board. Those will perform faster on the NVIDIA controller
since that now supports NCQ and the Silicon Image controller does not.
Comment 13 Nicolas Mailhot 2006-11-17 15:50:22 UTC
when I built the box sata_sil was stable and sata_nv not
Once this bit is fixed I'll probably move one disk to sata_nv, it's better for
redundancy anyway (the two sata_sil drives are mirrored)

rebuilding with your patch now
Comment 14 Robert Hancock 2006-11-17 16:02:21 UTC
Created attachment 9553 [details]
Another patch to fix BMDMA setup - with notifier clear fix

I just noticed another potential issue in the driver. It might not affect you
but it could hit if there was an ATAPI device and a non-ATAPI device on two
ports on the same controller. Yet another patch version with that fix..
Comment 15 Nicolas Mailhot 2006-11-17 16:42:14 UTC
Well with the patch in attachment #9553 [details], the boot got real slow, but I didn't
get the dvd drive back :(
Comment 16 Nicolas Mailhot 2006-11-17 16:43:41 UTC
Created attachment 9554 [details]
9547: dmesg with attachment #9553 [details]
Comment 17 Nicolas Mailhot 2006-11-17 16:48:55 UTC
BTW in case it wasn't clear : the dvd drive on sata_nv is a sata drive (at least
externally)
Comment 18 Robert Hancock 2006-11-17 21:14:08 UTC
I think it might be a bit closer to functioning now, actually, but something
funky is obviously still going on. I'll try and look into it some more tomorrow.
Comment 19 Robert Hancock 2006-11-18 15:50:37 UTC
Created attachment 9563 [details]
Patch with more debugging

Here's another patch for you to try. I doubt it'll fix the problem but it
should spew out some more debugging information in order to help track it down.


I'm suspecting that we need to disable ADMA mode entirely on the port with an
ATAPI device in order for the normal BMDMA engine to work properly..
Comment 20 Nicolas Mailhot 2006-11-19 05:31:21 UTC
Created attachment 9565 [details]
dmesg with attachment #9563 [details]
Comment 21 Robert Hancock 2006-11-19 07:59:18 UTC
OK, from the output it appears that the normal BMDMA engine is essentially
stalling whenever we try and use it. My guess is that this is because we still
have ADMA mode enabled on the port (the
NV_MCP_SATA_CFG_20_PORT0_EN/NV_MCP_SATA_CFG_20_PORT0_PWB_EN or
NV_MCP_SATA_CFG_20_PORT1_EN/NV_MCP_SATA_CFG_20_PORT1_PWB_EN flags in the
configuration register.) I'll try and get something together which disables
those bits when we are in port-register mode and see if that helps, however I'll
likely need to test that at home to make sure it doesn't somehow screw up ADMA
mode as well. (This is where actually having specs from NVIDIA on the controller
would be useful..)

Aside from that, I noticed another issue: The way it's intended to work with DMA
on ATAPI on this controller is that when an ATAPI device is configured we change
the DMA parameters to be the ones for standard PCI IDE BMDMA in the slave_config
function. However, it seems like from your original stack dumps that the libata
code is issuing DMA PACKET commands before the slave_config function has been
called. In this case the DMA parameters will still be set to the values for ADMA
mode and things may go haywire. I think I need to override the check_atapi_dma
function and disallow ATAPI DMA until the slave_config function has been called
to set up the parameters properly. (Jeff/Tejun, if you're listening, does that
sound reasonable?)
Comment 22 Nicolas Mailhot 2006-11-25 04:33:10 UTC
Still in 2.6.19-rc6-mm1
Comment 23 Nicolas Mailhot 2006-11-25 05:41:23 UTC
2.6.19-rc6-mm1 dmesg in attachment #9623 [details]
Comment 24 Robert Hancock 2006-11-25 19:12:48 UTC
Created attachment 9625 [details]
Patch for ATAPI support in ADMA mode

OK, here's another patch. I haven't gotten an SATA-to-PATA adapter to try ATAPI
mode myself, but I've kind of simulated it by testing a hacked version which
uses "ATAPI mode" on everything, and that seemed to work OK. Can you test this
out and see if it works? This is against 2.6.19-rc6-mm1.

This version turns off the PCI configuration space bits which control whether
ADMA is enabled, so we disable it entirely when an ATAPI device is connected.
Otherwise it seems like the legacy DMA engine doesn't work properly.
Comment 25 Nicolas Mailhot 2006-11-26 03:14:32 UTC
Created attachment 9627 [details]
dmesg with attachment #9625 [details]

This patch works at last
Comment 26 Robert Hancock 2006-11-26 12:19:25 UTC
Excellent. I just submitted the patch to Andrew and Jeff.
Comment 27 Robert Hancock 2006-11-26 12:21:45 UTC
Marking as fix available.
Comment 28 Nicolas Mailhot 2006-11-26 12:34:56 UTC
Thank you for looking at this!