Bug 9408

Summary: Regression in 2.6.24-rc3 - 64-bit DMA fails for BCM94311MCG rev 02
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjwysocki)
Component: x86-64Assignee: Larry Finger (Larry.Finger)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: hpa, Larry.Finger
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.24-rc3 Subsystem:
Regression: --- Bisected commit-id:

Description Rafael J. Wysocki 2007-11-19 10:50:45 UTC
Subject         : Regression in 2.6.24-rc3 - 64-bit DMA fails for BCM94311MCG rev 02
Submitter       : Larry Finger <Larry.Finger@lwfinger.net>
References      : http://lkml.org/lkml/2007/11/18/123
Handled-By      : Arjan van de Ven <arjan@infradead.org>
Comment 1 Larry Finger 2007-11-19 20:39:02 UTC
I am not sure that this is a regression, a driver error, a hardware error, or an error in the firmware. What I do know for certain is that the reversion that "fixes" the problem is not the cause - it merely masks the real problem.

DMA for this device is accomplished by creating a Descriptor Ring Address buffer that can be accessed by the firmware, and creating a number of descriptors to populate the ring. Each such descriptor consists of 2 control words and two 32-bit quantities that contain the address of the DMA buffer. This scheme is similar to that used in the 30- and 32-bit DMA used by bcm43xx and b43 for all previous cards.

When commit 44048d70 is included, the ring buffer is located at a physical address equal to 0x5877E000 (~1.48 GB - this machine has 1.5 GB RAM). It fails with an RX error code indicating a descriptor read error. I tried booting with mem=1024M and mem=512M options. The error persisted.

After reverting the above commit, the ring buffer was located at a physical address of 0x0279A000 (~41 MB), which worked.
Comment 2 Ingo Molnar 2007-11-19 22:21:39 UTC
* bugme-daemon@bugzilla.kernel.org <bugme-daemon@bugzilla.kernel.org> wrote:

> When commit 44048d70 is included, the ring buffer is located at a 
> physical address equal to 0x5877E000 (~1.48 GB - this machine has 1.5 
> GB RAM). It fails with an RX error code indicating a descriptor read 
> error. I tried booting with mem=1024M and mem=512M options. The error 
> persisted.
> 
> After reverting the above commit, the ring buffer was located at a 
> physical address of 0x0279A000 (~41 MB), which worked.

that's interesting. What happens if you try a stupid hack that uses 
GFP_DMA to allocate the ringbuffer to below 16 MB - does that result in 
a working driver too?

	Ingo
Comment 3 Larry Finger 2007-11-20 09:30:40 UTC
With your suggested hack, the ring buffer is located at 0xA08000 and the system works. The data buffers are still at high addresses (~1.5 GB) - only the descriptor ring is at the low address.

This looks more and more like a firmware error.
Comment 4 Rafael J. Wysocki 2007-11-24 12:18:14 UTC
I'm removing this issue from the list of recent regressions.

Please let me know if this bug should be closed.
Comment 5 Larry Finger 2007-11-26 10:25:26 UTC
The problem is that the BCM card with 64-bit DMA requires the descriptor ring to be in low-order memory. We have not delineated the boundary between good/bad behavior; however memory locations as low as 48M work. It may be that 64 M is tha "magic" boundary. For these chips, these buffers are allocated with the GFP_DMA flag, as though this device were an ISA interface. This has worked to date.

The real problem is likely an error in the chip itself.