Most recent kernel where this bug did *NOT* occur: 2.6.19-rc6 Distribution: Fedora Core Devel Hardware Environment: CK804 mainboard with DVD + 2 PATA on CK804 and core system on secondary sata chipset (sil) Problem Description: Lots of traces, dvd drive dead Steps to reproduce: boot
Created attachment 9535 [details] Successful 2.6.19-rc5-git6 dmesg
Created attachment 9536 [details] Problem 2.6.19-rc5-mm2 dmesg
Created attachment 9537 [details] Kernel config
Created attachment 9538 [details] lspci
looks like sata_nv shat itself quite seriously here.
That would be a problem in the sata_nv ADMA patch. I'll look into this, it seems like a simple bug. This code path probably isn't used other than in the case of an ATAPI device connected to the SATA controller which you may well have been the first person to have tested this patch with. In the meantime you can try disabling ADMA by passing the parameter adma=0 to the sata_nv module on loading, if you can, or change this line: static int adma_enabled = 1; to static int adma_enabled = 0; in sata_nv.c.
> In the meantime you can try disabling ADMA by passing the parameter adma=0 to > the sata_nv module on loading, if you can, is that sata_nv.adma=0 on the kernel command line or something else?
Created attachment 9540 [details] Patch to fix silly mistake in nv_adma_bmdma functions That would be if your sata_nv.c was not built modular, I believe. But first, can you test this patch? It should at least get rid of those stack dumps, hopefully the drive will actually work as well :-) I don't have any means of testing it myself (not even for non-ATAPI at the moment, unfortunately).
With attachment #9540 [details] the traces are gone but the dvd drive is still dead
Can you post the full dmesg again with the patch?
Created attachment 9547 [details] dmesg with attachment #9540 [details] sorry about forgetting this
Created attachment 9552 [details] Another patch to fix BMDMA setup OK, I think I may see the error of my ways now. These controllers allow using all of the registers in MMIO mode except for the legacy BMDMA registers (or at least I know of no way to use them in MMIO mode). I was just overriding the MMIO flag around the call to the bmdma functions, but those seem to call other functions which may depend on the MMIO flag as well and access other registers which should in fact be used as MMIO. I've added the appropriate code in now to do the bmdma operations directly without messing with the MMIO flag. This is against plain 2.6.19-rc5-mm2 again (so you'll have to revert my last patch and put this on). It's once again compile tested only. Let me know how it works. If it doesn't work, post the dmesg again. By the way I noticed you have some drives attached to the Silicon Image controller on your board. Those will perform faster on the NVIDIA controller since that now supports NCQ and the Silicon Image controller does not.
when I built the box sata_sil was stable and sata_nv not Once this bit is fixed I'll probably move one disk to sata_nv, it's better for redundancy anyway (the two sata_sil drives are mirrored) rebuilding with your patch now
Created attachment 9553 [details] Another patch to fix BMDMA setup - with notifier clear fix I just noticed another potential issue in the driver. It might not affect you but it could hit if there was an ATAPI device and a non-ATAPI device on two ports on the same controller. Yet another patch version with that fix..
Well with the patch in attachment #9553 [details], the boot got real slow, but I didn't get the dvd drive back :(
Created attachment 9554 [details] 9547: dmesg with attachment #9553 [details]
BTW in case it wasn't clear : the dvd drive on sata_nv is a sata drive (at least externally)
I think it might be a bit closer to functioning now, actually, but something funky is obviously still going on. I'll try and look into it some more tomorrow.
Created attachment 9563 [details] Patch with more debugging Here's another patch for you to try. I doubt it'll fix the problem but it should spew out some more debugging information in order to help track it down. I'm suspecting that we need to disable ADMA mode entirely on the port with an ATAPI device in order for the normal BMDMA engine to work properly..
Created attachment 9565 [details] dmesg with attachment #9563 [details]
OK, from the output it appears that the normal BMDMA engine is essentially stalling whenever we try and use it. My guess is that this is because we still have ADMA mode enabled on the port (the NV_MCP_SATA_CFG_20_PORT0_EN/NV_MCP_SATA_CFG_20_PORT0_PWB_EN or NV_MCP_SATA_CFG_20_PORT1_EN/NV_MCP_SATA_CFG_20_PORT1_PWB_EN flags in the configuration register.) I'll try and get something together which disables those bits when we are in port-register mode and see if that helps, however I'll likely need to test that at home to make sure it doesn't somehow screw up ADMA mode as well. (This is where actually having specs from NVIDIA on the controller would be useful..) Aside from that, I noticed another issue: The way it's intended to work with DMA on ATAPI on this controller is that when an ATAPI device is configured we change the DMA parameters to be the ones for standard PCI IDE BMDMA in the slave_config function. However, it seems like from your original stack dumps that the libata code is issuing DMA PACKET commands before the slave_config function has been called. In this case the DMA parameters will still be set to the values for ADMA mode and things may go haywire. I think I need to override the check_atapi_dma function and disallow ATAPI DMA until the slave_config function has been called to set up the parameters properly. (Jeff/Tejun, if you're listening, does that sound reasonable?)
Still in 2.6.19-rc6-mm1
2.6.19-rc6-mm1 dmesg in attachment #9623 [details]
Created attachment 9625 [details] Patch for ATAPI support in ADMA mode OK, here's another patch. I haven't gotten an SATA-to-PATA adapter to try ATAPI mode myself, but I've kind of simulated it by testing a hacked version which uses "ATAPI mode" on everything, and that seemed to work OK. Can you test this out and see if it works? This is against 2.6.19-rc6-mm1. This version turns off the PCI configuration space bits which control whether ADMA is enabled, so we disable it entirely when an ATAPI device is connected. Otherwise it seems like the legacy DMA engine doesn't work properly.
Created attachment 9627 [details] dmesg with attachment #9625 [details] This patch works at last
Excellent. I just submitted the patch to Andrew and Jeff.
Marking as fix available.
Thank you for looking at this!