Bug 38102 - BUG kmalloc-2048: Poison overwritten
BUG kmalloc-2048: Poison overwritten
Status: RESOLVED OBSOLETE
Product: Drivers
Classification: Unclassified
Component: Network
All Linux
: P1 high
Assigned To: drivers_network@kernel-bugs.osdl.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-23 17:33 UTC by Alexey Zaytsev
Modified: 2012-08-24 14:55 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.0.0-rc4
Tree: Mainline
Regression: No


Attachments
dmesg (123.93 KB, text/plain)
2011-06-23 17:33 UTC, Alexey Zaytsev
Details
.config (72.73 KB, text/plain)
2011-06-23 17:34 UTC, Alexey Zaytsev
Details
an other dmesg (816.31 KB, text/plain)
2011-07-05 05:19 UTC, Alexey Zaytsev
Details
a few lines to trace over the code (2.25 KB, patch)
2011-07-05 05:21 UTC, Alexey Zaytsev
Details | Diff

Description Alexey Zaytsev 2011-06-23 17:33:51 UTC
Created attachment 63342 [details]
dmesg

Hi.

Getting a bunch of kmalloc-2048 use-after-free. First noticed in 2.6.39, but also present in 3.0.0-rc4.

I guess it might be driver-related, as the problem went unnoticed for a whole release. The device is

02:0e.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev 02)
	Subsystem: Hewlett-Packard Company NX7300 laptop
	Flags: bus master, fast devsel, latency 64, IRQ 16
	Memory at f4108000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 2
	Kernel driver in use: b44


Config and dmesg attached. I'm noticing the problem when ssh'ing to an illumos box, as ssh would disconnect when the illumos build process spews the cached build-logs. But the problem might be unrelated, as one corruption contains http data. It also seems is hard to trigger any other traffic, even from the same box.
Comment 1 Alexey Zaytsev 2011-06-23 17:34:23 UTC
Created attachment 63352 [details]
.config
Comment 2 Andrew Morton 2011-06-29 21:52:10 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 23 Jun 2011 17:33:54 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=38102
> 
>            Summary: BUG kmalloc-2048: Poison overwritten
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 3.0.0-rc4

Looks like a 2.6.38->2.6.39 regression, perhaps a memory scribble in b44.

>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: Network
>         AssignedTo: drivers_network@kernel-bugs.osdl.org
>         ReportedBy: alexey.zaytsev@gmail.com
>         Regression: Yes
> 
> 
> Created an attachment (id=63342)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=63342)
> dmesg
> 
> Hi.
> 
> Getting a bunch of kmalloc-2048 use-after-free. First noticed in 2.6.39, but
> also present in 3.0.0-rc4.
> 
> I guess it might be driver-related, as the problem went unnoticed for a whole
> release. The device is
> 
> 02:0e.0 Ethernet controller: Broadcom Corporation BCM4401-B0 100Base-TX (rev
> 02)
>     Subsystem: Hewlett-Packard Company NX7300 laptop
>     Flags: bus master, fast devsel, latency 64, IRQ 16
>     Memory at f4108000 (32-bit, non-prefetchable) [size=8K]
>     Capabilities: [40] Power Management version 2
>     Kernel driver in use: b44
> 
> 
> Config and dmesg attached. I'm noticing the problem when ssh'ing to an illumos
> box, as ssh would disconnect when the illumos build process spews the cached
> build-logs. But the problem might be unrelated, as one corruption contains http
> data. It also seems is hard to trigger any other traffic, even from the same
> box.
>
Comment 3 Alexey Zaytsev 2011-07-01 06:01:11 UTC
On Thu, Jun 30, 2011 at 01:51, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Thu, 23 Jun 2011 17:33:54 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=38102
>>
>>            Summary: BUG kmalloc-2048: Poison overwritten
>>            Product: Drivers
>>            Version: 2.5
>>     Kernel Version: 3.0.0-rc4
>
> Looks like a 2.6.38->2.6.39 regression, perhaps a memory scribble in b44.

Actually, not sure about the version. 39 was the first one I've been
using in the scenario. Checking older versions now.
And git-log does not show a lot of changes to the b44 driver, so it
might be something unrelated.
Comment 4 Alexey Zaytsev 2011-07-04 11:50:20 UTC
On Sun, Jul 3, 2011 at 19:46, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le dimanche 03 juillet 2011 à 01:25 +0400, Alexey Zaytsev a écrit :
>> On Fri, Jul 1, 2011 at 10:01, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
>> > On Thu, Jun 30, 2011 at 01:51, Andrew Morton <akpm@linux-foundation.org> wrote:
>> >>
>> >> (switched to email.  Please respond via emailed reply-to-all, not via the
>> >> bugzilla web interface).
>> >>
>> >> On Thu, 23 Jun 2011 17:33:54 GMT
>> >> bugzilla-daemon@bugzilla.kernel.org wrote:
>> >>
>> >>> https://bugzilla.kernel.org/show_bug.cgi?id=38102
>> >>>
>> >>>            Summary: BUG kmalloc-2048: Poison overwritten
>> >>>            Product: Drivers
>> >>>            Version: 2.5
>> >>>     Kernel Version: 3.0.0-rc4
>> >>
>> >> Looks like a 2.6.38->2.6.39 regression, perhaps a memory scribble in b44.
>> >
>> > Actually, not sure about the version. 39 was the first one I've been
>> > using in the scenario. Checking older versions now.
>> > And git-log does not show a lot of changes to the b44 driver, so it
>> > might be something unrelated.
>> >
>>
>> I've checked back as far as 2.6.27, and the problem is still there.
>> I've also looked through the allocation-related code, and it seemed
>> sane. I'm not sure I understand the 1GB dma workaround, but this path
>> is never hit in my case. So adding the driver authors to CC. This
>> could be something different, but I've been unable to reproduce using
>> an other machine with an rtl8139 nic.
>
> Hmm, looking at b44 code, I believe there is a race there.
>
> Could you try following patch ?
>

This might fix a potential problem, but unfortunately did not help here.

There is an other place that looks suspicious to me:

 812                         struct sk_buff *copy_skb;
 813
 814                         b44_recycle_rx(bp, cons, bp->rx_prod);
 815                         copy_skb = netdev_alloc_skb(bp->dev, len + 2);
 816                         if (copy_skb == NULL)
 817                                 goto drop_it_no_recycle;
 818
 819                         skb_reserve(copy_skb, 2);
 820                         skb_put(copy_skb, len);
 821                         /* DMA sync done above, copy just the
actual packet */
 822                         skb_copy_from_linear_data_offset(skb,
RX_PKT_OFFSET,
 823
copy_skb->data, len);
 824                         skb = copy_skb;


The skb is reinserted into the ring before its data is copied, it
seems. But this can't be the cause of my problem, as it would lead to
data corruption at most, not a write-after-free.

And an other question. Why so we have the logic to work-around the 1Gb
DMA limit instead of just setting the dma mask?
Comment 5 Anonymous Emailer 2011-07-04 14:41:36 UTC
Reply-To: m@bues.ch

On Mon, 04 Jul 2011 15:57:02 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I dont have the b44 specs, but :

It uses the 30-bit version of the Broadcom HND engine, for which complete
specs are available here:
http://bcm-v4.sipsolutions.net/802.11/DMA

> For sure, addr should be set before ctl, just in case ctl allows chip to
> start a dma transfert (to previous packet), because a OWN bit is unset
> for example...

Certainly not.
The device does not know about the buffer until this line:
 bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
which advances the DMA descriptor pointer.
However, I do think we probably need a wmb() right before that bw32() line, to make sure
memory is committed before we tell the device it's OK to access it.
We do this in b43, which has exactly the same DMA engine.
Comment 6 Eric Dumazet 2011-07-04 14:45:23 UTC
Le lundi 04 juillet 2011 à 16:31 +0200, Michael Büsch a écrit :
> On Mon, 04 Jul 2011 16:00:49 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > And an other question. Why so we have the logic to work-around the 1Gb
> > > DMA limit instead of just setting the dma mask?
> > 
> > Your problem is in RX side : NIC actually writes to a buffer that is
> > supposedly not its property.
> 
> The problem is on both sides, because some Linux architectures simply
> do not support any DMA mask less than 32. This applied to i386 (IA32) last
> time I looked.
> The b44 DMA engine can only address 30-bits.


Michael, traces provided by Alexey are in the RX path.

NIC does a DMA  (Receives an UDP frame) into a 2048 bytes buffers that
was freed.
Comment 7 Anonymous Emailer 2011-07-04 14:51:52 UTC
Reply-To: m@bues.ch

On Mon, 04 Jul 2011 16:45:05 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 04 juillet 2011 à 16:31 +0200, Michael Büsch a écrit :
> > On Mon, 04 Jul 2011 16:00:49 +0200
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > > And an other question. Why so we have the logic to work-around the 1Gb
> > > > DMA limit instead of just setting the dma mask?
> > > 
> > > Your problem is in RX side : NIC actually writes to a buffer that is
> > > supposedly not its property.
> > 
> > The problem is on both sides, because some Linux architectures simply
> > do not support any DMA mask less than 32. This applied to i386 (IA32) last
> > time I looked.
> > The b44 DMA engine can only address 30-bits.
> 
> 
> Michael, traces provided by Alexey are in the RX path.
> 
> NIC does a DMA  (Receives an UDP frame) into a 2048 bytes buffers that
> was freed.

Yeah sure. That's obvious from the logs.
By "the problem" I meant "the 30bit limitation".
Comment 8 Anonymous Emailer 2011-07-04 14:59:51 UTC
Reply-To: m@bues.ch

On Mon, 4 Jul 2011 16:27:26 +0200
Michael Büsch <m@bues.ch> wrote:
> We do this in b43, which has exactly the same DMA engine.

(Ok, it turns out we don't do this in b43 (We only do it on the TX side).
 But that's a bug. We should do a wmb() on the RX side before advancing the
 descriptor ring pointer.)
Comment 9 Eric Dumazet 2011-07-04 15:17:39 UTC
Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> On Mon, 4 Jul 2011 16:27:26 +0200
> Michael Büsch <m@bues.ch> wrote:
> > We do this in b43, which has exactly the same DMA engine.
> 
> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>  But that's a bug. We should do a wmb() on the RX side before advancing the
>  descriptor ring pointer.)


Also it appears rx_ring (or tx_ring ) are not necessarily 4K aligned if
kzalloc() path is taken.

(SL?B DEBUG -> kzalloc(4096) might not be 4K aligned)
Comment 10 Anonymous Emailer 2011-07-04 15:34:48 UTC
Reply-To: m@bues.ch

On Mon, 04 Jul 2011 16:00:49 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > And an other question. Why so we have the logic to work-around the 1Gb
> > DMA limit instead of just setting the dma mask?
> 
> Your problem is in RX side : NIC actually writes to a buffer that is
> supposedly not its property.

The problem is on both sides, because some Linux architectures simply
do not support any DMA mask less than 32. This applied to i386 (IA32) last
time I looked.
The b44 DMA engine can only address 30-bits.
Comment 11 Eric Dumazet 2011-07-04 15:51:36 UTC
Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
> On Mon, 4 Jul 2011 16:27:26 +0200
> Michael Büsch <m@bues.ch> wrote:
> > We do this in b43, which has exactly the same DMA engine.
> 
> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>  But that's a bug. We should do a wmb() on the RX side before advancing the
>  descriptor ring pointer.)

I am wondering what happens if RX ring is set to 64, and we receive
exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?

Alexey, could you try this patch please ?

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..51072a3 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -829,6 +829,7 @@ static int b44_rx(struct b44 *bp, int budget)
 	}
 
 	bp->rx_cons = cons;
+	wmb();
 	bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
 
 	return received;
Comment 12 Alexey Zaytsev 2011-07-04 20:25:21 UTC
On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>> On Mon, 4 Jul 2011 16:27:26 +0200
>> Michael Büsch <m@bues.ch> wrote:
>> > We do this in b43, which has exactly the same DMA engine.
>>
>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>>  descriptor ring pointer.)
>
> I am wondering what happens if RX ring is set to 64, and we receive
> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>
> Alexey, could you try this patch please ?

Sorry, did not help.
Comment 13 Alexey Zaytsev 2011-07-04 22:29:28 UTC
On Tue, Jul 5, 2011 at 00:25, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Mon, Jul 4, 2011 at 19:12, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le lundi 04 juillet 2011 à 16:43 +0200, Michael Büsch a écrit :
>>> On Mon, 4 Jul 2011 16:27:26 +0200
>>> Michael Büsch <m@bues.ch> wrote:
>>> > We do this in b43, which has exactly the same DMA engine.
>>>
>>> (Ok, it turns out we don't do this in b43 (We only do it on the TX side).
>>>  But that's a bug. We should do a wmb() on the RX side before advancing the
>>>  descriptor ring pointer.)
>>
>> I am wondering what happens if RX ring is set to 64, and we receive
>> exactly 64 buffers in one round, B44_DMARX_PTR wont change at all ?
>>
>> Alexey, could you try this patch please ?
>
> Sorry, did not help.
>

Ran a few rounds of tcpdump. Seeing a significant number or duplicate
ACKs from the problematic machine. Not seeing them when testing
between this machine and an other linux box. Or the illumos machine
and the other linux box.

Dumps are available here:

http://zaytsev.su/tmp/caps/

dump1-3 - between the problematic machine an the illumos box,
collected on illumos side. All show dups.
dump5 - between an other linux box and the illumos machine, no dups.
Collcted on the illumos side.
dump-linux - between 2 linux machines, collected on the
non-problematic side. No dups, no corruptions.

192.168.0.33 - the problematic machine.
192.168.0.72 - the illumos machine.
192.168.0.122 - an other linux machine.
Comment 14 Eric Dumazet 2011-07-05 04:14:36 UTC
Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >
> > > I dont care about duplicate acks at this point.
> > >
> > > Thats a separate issue (TCP layer)
> > >
> > 
> > Maybe some tx packets are just sent out more then once? Or a single
> > packet is sent out instead of some other packets?
> > The delays between two dups is short, and they come in bursts, up to a
> > few hundreds of duplicate packets at a time.
> > 
> 
> Thats a completely different problem. SSH is very expensive for your
> receiver (your dump1 file has small packets (560 bytes)), and it cannot
> cope with the stress.
> 
> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
> packets at once. This sure is a problem for tcp, as it stalls the thing.
> 
> You could avoid this by doing this at b44 machine (the receiver)
> 
> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
> 
> So that sender wont be able to push so many packets

You can also try using more packets in rx ring : (default is 200
packets, limit ~511)

ethtool -G eth0 rx 400
Comment 15 Alexey Zaytsev 2011-07-05 04:17:41 UTC
On Tue, Jul 5, 2011 at 08:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
>> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
>> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > >
>> > > I dont care about duplicate acks at this point.
>> > >
>> > > Thats a separate issue (TCP layer)
>> > >
>> >
>> > Maybe some tx packets are just sent out more then once? Or a single
>> > packet is sent out instead of some other packets?
>> > The delays between two dups is short, and they come in bursts, up to a
>> > few hundreds of duplicate packets at a time.
>> >
>>
>> Thats a completely different problem. SSH is very expensive for your
>> receiver (your dump1 file has small packets (560 bytes)), and it cannot
>> cope with the stress.
>>
>> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
>> packets at once. This sure is a problem for tcp, as it stalls the thing.
>>
>> You could avoid this by doing this at b44 machine (the receiver)
>>
>> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
>>
>> So that sender wont be able to push so many packets
>
> You can also try using more packets in rx ring : (default is 200
> packets, limit ~511)
>
> ethtool -G eth0 rx 400
>
>
Check out starting at packet 302893. 383 _identical_ ACKs were sent
out by the b44 machine within 30 milliseconds.

>
>
Comment 16 Alexey Zaytsev 2011-07-05 04:18:37 UTC
On Tue, Jul 5, 2011 at 08:17, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 08:14, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le mardi 05 juillet 2011 à 06:11 +0200, Eric Dumazet a écrit :
>>> Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
>>> > On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > >
>>> > > I dont care about duplicate acks at this point.
>>> > >
>>> > > Thats a separate issue (TCP layer)
>>> > >
>>> >
>>> > Maybe some tx packets are just sent out more then once? Or a single
>>> > packet is sent out instead of some other packets?
>>> > The delays between two dups is short, and they come in bursts, up to a
>>> > few hundreds of duplicate packets at a time.
>>> >
>>>
>>> Thats a completely different problem. SSH is very expensive for your
>>> receiver (your dump1 file has small packets (560 bytes)), and it cannot
>>> cope with the stress.
>>>
>>> You're filling the b44 rx ring, and b44 driver has no choice to zap 200
>>> packets at once. This sure is a problem for tcp, as it stalls the thing.
>>>
>>> You could avoid this by doing this at b44 machine (the receiver)
>>>
>>> echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem
>>>
>>> So that sender wont be able to push so many packets
>>
>> You can also try using more packets in rx ring : (default is 200
>> packets, limit ~511)
>>
>> ethtool -G eth0 rx 400
>>
>>
> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> out by the b44 machine within 30 milliseconds.

In dump1.pcap, that is.
Comment 17 Eric Dumazet 2011-07-05 04:21:10 UTC
Le mardi 05 juillet 2011 à 05:44 +0200, Eric Dumazet a écrit :

> Maybe we should do instead a fast dequeue of packets (recycling them
> instead of pushing them to upper stack) in case too many packets are
> ready to be delivered, and always make sure NIC has a reserve of
> available buffers for DMA accesses, before it can assert ISTAT_RFO
> 
> 

Another way would be to add Explicit Congestion Notification when too
many packets are received in a burst, but unfortunately not enough TCP
flows are ECN ready :)
Comment 18 Eric Dumazet 2011-07-05 04:25:40 UTC
Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
> >
> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> out by the b44 machine within 30 milliseconds.


As I said, b44 driver lost at least 200 consecutive frames (source says
recovery takes about 20 ms)

TCP then do its normal job.
Comment 19 Alexey Zaytsev 2011-07-05 04:30:03 UTC
On Tue, Jul 5, 2011 at 08:25, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
>> >
>> Check out starting at packet 302893. 383 _identical_ ACKs were sent
>> out by the b44 machine within 30 milliseconds.
>
>
> As I said, b44 driver lost at least 200 consecutive frames (source says
> recovery takes about 20 ms)
>
> TCP then do its normal job.
>

From my understanding, after a frame is lost, TCP would be waiting for
a retransmit. Or at least, it would not be sending 400 duplicate ACKs
for the single last frame received, right? Let me run tcpdump on the
b44 side now. I'm quite sure I won't see any ACK dups leaving the
stack.
Comment 20 Eric Dumazet 2011-07-05 04:38:44 UTC
Le mardi 05 juillet 2011 à 08:29 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 08:25, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le mardi 05 juillet 2011 à 08:17 +0400, Alexey Zaytsev a écrit :
> >> >
> >> Check out starting at packet 302893. 383 _identical_ ACKs were sent
> >> out by the b44 machine within 30 milliseconds.
> >
> >
> > As I said, b44 driver lost at least 200 consecutive frames (source says
> > recovery takes about 20 ms)
> >
> > TCP then do its normal job.
> >
> 
> From my understanding, after a frame is lost, TCP would be waiting for
> a retransmit. Or at least, it would not be sending 400 duplicate ACKs
> for the single last frame received, right? Let me run tcpdump on the
> b44 side now. I'm quite sure I won't see any ACK dups leaving the
> stack.

Wow, I believe you are on a wrong track. Honestly.

Try to unpplug the wire for 100ms, and watch your "duplicate acks
disease".

Thats exactly what is happening with b44 driver doing a "fast recovery"
right now.

Thats a moot point. Running tcpdump on your b44 machine will kill your
performance even more, it wont solve the b44 bug.

If you prefer to 'fix tcp', please open another thread.
Comment 21 Eric Dumazet 2011-07-05 04:42:42 UTC
Le mardi 05 juillet 2011 à 07:56 +0400, Alexey Zaytsev a écrit :
> On Tue, Jul 5, 2011 at 07:44, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I dont care about duplicate acks at this point.
> >
> > Thats a separate issue (TCP layer)
> >
> 
> Maybe some tx packets are just sent out more then once? Or a single
> packet is sent out instead of some other packets?
> The delays between two dups is short, and they come in bursts, up to a
> few hundreds of duplicate packets at a time.
> 

Thats a completely different problem. SSH is very expensive for your
receiver (your dump1 file has small packets (560 bytes)), and it cannot
cope with the stress.

You're filling the b44 rx ring, and b44 driver has no choice to zap 200
packets at once. This sure is a problem for tcp, as it stalls the thing.

You could avoid this by doing this at b44 machine (the receiver)

echo "4096 32768 87380" >/proc/sys/net/ipv4/tcp_rmem

So that sender wont be able to push so many packets

> > Do you still have memory scribbles ?
> Yes.

OK

> 
> >
> > I wonder if the problem is not coming from the "fast recovery" added in
> > commit 32737e934a952c (PATCH: b44 Handle RX FIFO overflow better
> > (simplified))
> >
> 
> I've tested back to 2.6.27. I did not test all releases of course, so
> maybe this was fixed, and then broken again.
Comment 22 Eric Dumazet 2011-07-05 05:10:56 UTC
Le mardi 05 juillet 2011 à 08:57 +0400, Alexey Zaytsev a écrit :

> Ran tcpdump. You are right, I was wrong. Sorry for the noise.

Thanks for testing ;)

It would be nice to know if the memory scribbles start after or before
one RFO triggers.

I can see this calls b44_init_rings() without really stopping the device
before. This seems very suspect to me.



diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..b22dd4c 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -829,6 +829,7 @@ static int b44_rx(struct b44 *bp, int budget)
 	}
 
 	bp->rx_cons = cons;
+	wmb();
 	bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
 
 	return received;
@@ -848,6 +849,7 @@ static int b44_poll(struct napi_struct *napi, int budget)
 		/* spin_unlock(&bp->tx_lock); */
 	}
 	if (bp->istat & ISTAT_RFO) {	/* fast recovery, in ~20msec */
+		pr_err("b44: ISTAT_RFO !\n");
 		bp->istat &= ~ISTAT_RFO;
 		b44_disable_ints(bp);
 		ssb_device_enable(bp->sdev, 0); /* resets ISTAT_RFO */
Comment 23 Alexey Zaytsev 2011-07-05 05:19:38 UTC
Created attachment 64652 [details]
an other dmesg
Comment 24 Alexey Zaytsev 2011-07-05 05:21:07 UTC
Created attachment 64662 [details]
a few lines to trace over the code
Comment 25 Eric Dumazet 2011-07-05 05:59:51 UTC
Le mardi 05 juillet 2011 à 07:33 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 09:18 +0400, Alexey Zaytsev a écrit :
> 
> > Actually, I've added a trace to show b44_init_rings and b44_free_rings
> > calls, and they are only called once, right after the driver is
> > loaded. So it can't be related to START_RFO. Will attach the diff and
> > dmesg.
> 
> Thanks
> 
> I was wondering if DMA could be faster if providing word aligned
> addresses, could you try :
> 
> -#define RX_PKT_OFFSET          (RX_HEADER_LEN + 2)
> +#define RX_PKT_OFFSET          (RX_HEADER_LEN + NET_IP_ALIGN)
> 
> (On x86, we now have NET_IP_ALIGN = 0 since commit ea812ca1)
> 

I suspect a hardware bug.

You could force copybreak, so that b44 only touch kind of private
memory.

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index a69331e..62a0599 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -75,7 +75,7 @@
 	  (BP)->tx_cons - (BP)->tx_prod - TX_RING_GAP(BP))
 #define NEXT_TX(N)		(((N) + 1) & (B44_TX_RING_SIZE - 1))
 
-#define RX_PKT_OFFSET		(RX_HEADER_LEN + 2)
+#define RX_PKT_OFFSET		(RX_HEADER_LEN + NET_IP_ALIGN)
 #define RX_PKT_BUF_SZ		(1536 + RX_PKT_OFFSET)
 
 /* minimum number of free TX descriptors required to wake up TX process */
@@ -829,6 +829,7 @@ static int b44_rx(struct b44 *bp, int budget)
 	}
 
 	bp->rx_cons = cons;
+	wmb();
 	bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));
 
 	return received;
@@ -848,6 +849,7 @@ static int b44_poll(struct napi_struct *napi, int budget)
 		/* spin_unlock(&bp->tx_lock); */
 	}
 	if (bp->istat & ISTAT_RFO) {	/* fast recovery, in ~20msec */
+		pr_err("b44: ISTAT_RFO !\n");
 		bp->istat &= ~ISTAT_RFO;
 		b44_disable_ints(bp);
 		ssb_device_enable(bp->sdev, 0); /* resets ISTAT_RFO */
@@ -2155,7 +2157,7 @@ static int __devinit b44_init_one(struct ssb_device *sdev,
 	bp = netdev_priv(dev);
 	bp->sdev = sdev;
 	bp->dev = dev;
-	bp->force_copybreak = 0;
+	bp->force_copybreak = 1;
 
 	bp->msg_enable = netif_msg_init(b44_debug, B44_DEF_MSG_ENABLE);
Comment 26 Anonymous Emailer 2011-07-05 16:27:45 UTC
Reply-To: m@bues.ch

On Tue, 05 Jul 2011 18:12:32 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Hmm... We are in a NAPI handler... There wont be a new interrupt.
> 
> Plus, we do at start of b44_rx() :
> 
> prod  = br32(bp, B44_DMARX_STAT) & DMARX_STAT_CDMASK;
> 
> So all descriptors before prod are guaranteed to be ready for host
> consume... Fact that a dma access is running on 'next descriptor' should
> be irrelevant.
> 
> IMHO Peeking B44_DMARX_STAT for each packet would be a waste of time.

Yeah I think so, too. We don't need to wait for the _whole_ DMA engine
to go idle, before we can access a buffer in the ring. We just need to make sure
that the device is finished with that buffer. And we do this by reading the current
descriptor pointer.
Comment 27 Neil Horman 2011-07-05 16:41:57 UTC
On Tue, Jul 05, 2011 at 07:59:33AM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 07:33 +0200, Eric Dumazet a écrit :
> > Le mardi 05 juillet 2011 à 09:18 +0400, Alexey Zaytsev a écrit :
> > 
> > > Actually, I've added a trace to show b44_init_rings and b44_free_rings
> > > calls, and they are only called once, right after the driver is
> > > loaded. So it can't be related to START_RFO. Will attach the diff and
> > > dmesg.
> > 
> > Thanks
> > 
> > I was wondering if DMA could be faster if providing word aligned
> > addresses, could you try :
> > 
> > -#define RX_PKT_OFFSET          (RX_HEADER_LEN + 2)
> > +#define RX_PKT_OFFSET          (RX_HEADER_LEN + NET_IP_ALIGN)
> > 
> > (On x86, we now have NET_IP_ALIGN = 0 since commit ea812ca1)
> > 
> 
> I suspect a hardware bug.
> 
I'm not sure if this helps, but I've been reading over this bug, and it seems
that the rx path never checks the status of a buffers rx header prior to
unmapping it or otherwise modifying it in hardware.  If we were to start munging
pointers in the rx channel while a dma was active in it still, it sems like the
observed corruption might be the result.  The docs aren't super clear on this,
but I think a descriptor needs to be in the idle wait or stopped state prior to
being acessed.  This patch might help out there (although I don't have hardware
to test)
Neil

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 3d247f3..48540ad 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -769,7 +769,19 @@ static int b44_rx(struct b44 *bp, int budget)
 		dma_addr_t map = rp->mapping;
 		struct rx_header *rh;
 		u16 len;
-
+		u32 state = br32(bp, B44_DMARX_STAT) & DMARX_STAT_SMASK;
+		state >>= 12;
+
+		/*
+ 		 * I _think_ descriptors need to be in the idle or stopped state
+ 		 * before its safe to access them.  If the current buffer
+ 		 * pointed to by the dma channel is in state 1 or lower (active
+ 		 * or disabled), then we should just stop receving until the
+ 		 * next interrupt kicks us again (I think)
+ 		 */
+		if (state < 2)
+			return;
+ 
 		dma_sync_single_for_cpu(bp->sdev->dev, map,
 					    RX_PKT_BUF_SZ,
 					    DMA_FROM_DEVICE);
Comment 28 Neil Horman 2011-07-05 16:42:23 UTC
On Tue, Jul 05, 2011 at 06:12:32PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 12:05 -0400, Neil Horman a écrit :
> > On Tue, Jul 05, 2011 at 07:59:33AM +0200, Eric Dumazet wrote:
> > > Le mardi 05 juillet 2011 à 07:33 +0200, Eric Dumazet a écrit :
> > > > Le mardi 05 juillet 2011 à 09:18 +0400, Alexey Zaytsev a écrit :
> > > > 
> > > > > Actually, I've added a trace to show b44_init_rings and b44_free_rings
> > > > > calls, and they are only called once, right after the driver is
> > > > > loaded. So it can't be related to START_RFO. Will attach the diff and
> > > > > dmesg.
> > > > 
> > > > Thanks
> > > > 
> > > > I was wondering if DMA could be faster if providing word aligned
> > > > addresses, could you try :
> > > > 
> > > > -#define RX_PKT_OFFSET          (RX_HEADER_LEN + 2)
> > > > +#define RX_PKT_OFFSET          (RX_HEADER_LEN + NET_IP_ALIGN)
> > > > 
> > > > (On x86, we now have NET_IP_ALIGN = 0 since commit ea812ca1)
> > > > 
> > > 
> > > I suspect a hardware bug.
> > > 
> > I'm not sure if this helps, but I've been reading over this bug, and it seems
> > that the rx path never checks the status of a buffers rx header prior to
> > unmapping it or otherwise modifying it in hardware.  If we were to start munging
> > pointers in the rx channel while a dma was active in it still, it sems like the
> > observed corruption might be the result.  The docs aren't super clear on this,
> > but I think a descriptor needs to be in the idle wait or stopped state prior to
> > being acessed.  This patch might help out there (although I don't have hardware
> > to test)
> > Neil
> > 
> > diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> > index 3d247f3..48540ad 100644
> > --- a/drivers/net/b44.c
> > +++ b/drivers/net/b44.c
> > @@ -769,7 +769,19 @@ static int b44_rx(struct b44 *bp, int budget)
> >  		dma_addr_t map = rp->mapping;
> >  		struct rx_header *rh;
> >  		u16 len;
> > -
> > +		u32 state = br32(bp, B44_DMARX_STAT) & DMARX_STAT_SMASK;
> > +		state >>= 12;
> > +
> > +		/*
> > + 		 * I _think_ descriptors need to be in the idle or stopped state
> > + 		 * before its safe to access them.  If the current buffer
> > + 		 * pointed to by the dma channel is in state 1 or lower (active
> > + 		 * or disabled), then we should just stop receving until the
> > + 		 * next interrupt kicks us again (I think)
> > + 		 */
> > +		if (state < 2)
> > +			return;
> > + 
> >  		dma_sync_single_for_cpu(bp->sdev->dev, map,
> >  					    RX_PKT_BUF_SZ,
> >  					    DMA_FROM_DEVICE);
> 
> Hmm... We are in a NAPI handler... There wont be a new interrupt.
> 
Not until we're done with the napi handler, no.  But if we encounter a dma
descriptor that isn't idle, then we know that either we're clearing out the ring
(ie. for a shutdown), or we'll get another interrupt when the descriptor we
failed on completes.

> Plus, we do at start of b44_rx() :
> 
> prod  = br32(bp, B44_DMARX_STAT) & DMARX_STAT_CDMASK;
> 
Yes, that just tells us which is the current dma index.  After that we loop
through subsequent dma descriptor incrementing the index each time.

> So all descriptors before prod are guaranteed to be ready for host
> consume... Fact that a dma access is running on 'next descriptor' should
> be irrelevant.
> 
But we handle more than one descriptor per b44_rx call - theres a while loop in
there where we do advance to the next descriptor.

> IMHO Peeking B44_DMARX_STAT for each packet would be a waste of time.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Comment 29 Neil Horman 2011-07-05 19:54:32 UTC
On Tue, Jul 05, 2011 at 08:45:16PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 20:32 +0200, Eric Dumazet a écrit :
> 
> > Then, maybe the driver model is completely wrong, and should really
> > setup 512 buffers, or use less descs but set EOT on last one.
> > 
> > Currently it uses a 200 sliding window out of the 512 descs.
> > 
> > 
> 
> One thing we could do would be to allocate a special guard buffer and
> set all 'out of window' descriptors to point to this guard buffer, and
> periodically check if buffer is dirtied by the card.
> 
> (first word would be enough)
> 
> (instead of setting desc->addr to NULL, set to
> dma_map_single(guard_buffer))
> 
I think this is a goo idea, at least for testing.  It seems odd to me that we
have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
descriptor to be processed next (the documentation isnt' very verbose on the
subject), along with the EOT bit on a descriptor.  It seems like both the
register and the bit are capable of conveying the same (or at least overlapping)
information.

I think what I'm having the most trouble with is understanding when the hw looks
at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
register?  Of does it stall until such time as the DMARX_PTR register is rotated
around?  What if it doesn't see the EOT bit set?  Does it just keep going with
the next descriptor?  

Also, there seems to be some inconsistency in the settnig of the B44_DMARX_PTR
register.  In bnx2_init_hw its set to the value of bp->rx_pending, which is
defined as being 200.  But in b44_rx its advanced by sizeof(struct dma_desc) for
every iteration.  So in b44_init_hw we write the value 200 to it, ostensibly
indicating a limit of 200 descriptors, but in b44_rx we iteratively write the
values 0, 8, 16, 24...4*n to the register to indicate which descriptor we're
indexing?  Something really doesn't sit right with me there.  In the former case
we treat the register as holding  number of entries, and in the latter we treat
it as holding a byte offset into an array.  Or am I missing something?

Regards
Neil

> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Comment 30 Eric Dumazet 2011-07-05 20:02:46 UTC
Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> I think this is a goo idea, at least for testing.  It seems odd to me that we
> have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> descriptor to be processed next (the documentation isnt' very verbose on the
> subject), along with the EOT bit on a descriptor.  It seems like both the
> register and the bit are capable of conveying the same (or at least overlapping)
> information.
> 
> I think what I'm having the most trouble with is understanding when the hw looks
> at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> register?  Of does it stall until such time as the DMARX_PTR register is rotated
> around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> the next descriptor?  
> 
> Also, there seems to be some inconsistency in the settnig of the B44_DMARX_PTR
> register.  In bnx2_init_hw its set to the value of bp->rx_pending, which is
> defined as being 200.  But in b44_rx its advanced by sizeof(struct dma_desc) for
> every iteration.  So in b44_init_hw we write the value 200 to it, ostensibly
> indicating a limit of 200 descriptors, but in b44_rx we iteratively write the
> values 0, 8, 16, 24...4*n to the register to indicate which descriptor we're
> indexing?  Something really doesn't sit right with me there.  In the former case
> we treat the register as holding  number of entries, and in the latter we treat
> it as holding a byte offset into an array.  Or am I missing something?
> 

Yes, definitely this needs some clarification.

More over, when we hit the last entry (currently at slot 511), the EOT
instructs hardware to go back to slot 0, while our real window is
511-200 -> 511 . Slot 0 contains garbage (well, an old value)

Its late here so I dont plan to send a patch before 8/10 hours.
Comment 31 Eric Dumazet 2011-07-05 20:15:58 UTC
Le mardi 05 juillet 2011 à 22:02 +0200, Eric Dumazet a écrit :
> Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> > I think this is a goo idea, at least for testing.  It seems odd to me that we
> > have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> > descriptor to be processed next (the documentation isnt' very verbose on the
> > subject), along with the EOT bit on a descriptor.  It seems like both the
> > register and the bit are capable of conveying the same (or at least overlapping)
> > information.
> > 
> > I think what I'm having the most trouble with is understanding when the hw looks
> > at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> > set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> > register?  Of does it stall until such time as the DMARX_PTR register is rotated
> > around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> > the next descriptor?  

Since there is no OWN bit (at least not on the online doc I got : it
says the rx_ring is read only by the NIC), I would say we really need to
advance DMARX_PTR to signal NIC a new entry is available for following
incoming frames.

This is the reason rx_pending max value is B44_RX_RING_SIZE - 1, or else
chip could loop on a circular rx_ring.
Comment 32 Neil Horman 2011-07-05 22:07:16 UTC
On Tue, Jul 05, 2011 at 10:15:40PM +0200, Eric Dumazet wrote:
> Le mardi 05 juillet 2011 à 22:02 +0200, Eric Dumazet a écrit :
> > Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> > > I think this is a goo idea, at least for testing.  It seems odd to me that we
> > > have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> > > descriptor to be processed next (the documentation isnt' very verbose on the
> > > subject), along with the EOT bit on a descriptor.  It seems like both the
> > > register and the bit are capable of conveying the same (or at least overlapping)
> > > information.
> > > 
> > > I think what I'm having the most trouble with is understanding when the hw looks
> > > at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> > > set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> > > register?  Of does it stall until such time as the DMARX_PTR register is rotated
> > > around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> > > the next descriptor?  
> 
> Since there is no OWN bit (at least not on the online doc I got : it
> says the rx_ring is read only by the NIC), I would say we really need to
> advance DMARX_PTR to signal NIC a new entry is available for following
> incoming frames.
> 
> This is the reason rx_pending max value is B44_RX_RING_SIZE - 1, or else
> chip could loop on a circular rx_ring.
> 
Agree, although that still leaves open the question of what exactly should get
written into the DMARX_PTR.  Is it an index of the descriptor number, or a byte
offset.

Regardless, I think we ned to fix up the looping so as to prevent an EOT reset
jumping outside of our valid ring window.  Alexey, theres better ways to do
this, but if in the interim, you could please try this patch, it makes the valid
receive window for b44 cover the entire ring, so as to avoid this problem.  It
will at least help support or refute this theory.  Note its not exactly the same
as my previous patch.  If you set the default ring pending to 512, the math in
the b44_alloc_rx_skb path is wrong, we skip the EOT bit as well as the first
entry in the ring.  At 511 it should work out properly.

Thanks
Neil


diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 3d247f3..b7f5ed1 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -57,7 +57,7 @@
 #define B44_MAX_MTU			1500
 
 #define B44_RX_RING_SIZE		512
-#define B44_DEF_RX_RING_PENDING		200
+#define B44_DEF_RX_RING_PENDING		511
 #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
 				 B44_RX_RING_SIZE)
 #define B44_TX_RING_SIZE		512
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Comment 33 Anonymous Emailer 2011-07-06 15:33:16 UTC
Reply-To: m@bues.ch

On Tue, 5 Jul 2011 18:06:44 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> On Tue, Jul 05, 2011 at 10:15:40PM +0200, Eric Dumazet wrote:
> > Le mardi 05 juillet 2011 à 22:02 +0200, Eric Dumazet a écrit :
> > > Le mardi 05 juillet 2011 à 15:53 -0400, Neil Horman a écrit :
> > > > I think this is a goo idea, at least for testing.  It seems odd to me that we
> > > > have the B44_DMARX_PTR value which indicates (ostensibly) the pointer to the
> > > > descriptor to be processed next (the documentation isnt' very verbose on the
> > > > subject), along with the EOT bit on a descriptor.  It seems like both the
> > > > register and the bit are capable of conveying the same (or at least overlapping)
> > > > information.
> > > > 
> > > > I think what I'm having the most trouble with is understanding when the hw looks
> > > > at the EOT bit in the descriptor.  If it completes a DMA and sees the EOT bit
> > > > set, does the next DMA occur to the descriptor pointed to by the DMARX_ADDR
> > > > register?  Of does it stall until such time as the DMARX_PTR register is rotated
> > > > around?  What if it doesn't see the EOT bit set?  Does it just keep going with
> > > > the next descriptor?  
> > 
> > Since there is no OWN bit (at least not on the online doc I got : it
> > says the rx_ring is read only by the NIC), I would say we really need to
> > advance DMARX_PTR to signal NIC a new entry is available for following
> > incoming frames.
> > 
> > This is the reason rx_pending max value is B44_RX_RING_SIZE - 1, or else
> > chip could loop on a circular rx_ring.
> > 
> Agree, although that still leaves open the question of what exactly should get
> written into the DMARX_PTR.  Is it an index of the descriptor number, or a byte
> offset.
> 
> Regardless, I think we ned to fix up the looping so as to prevent an EOT reset
> jumping outside of our valid ring window.  Alexey, theres better ways to do
> this, but if in the interim, you could please try this patch, it makes the valid
> receive window for b44 cover the entire ring, so as to avoid this problem.  It
> will at least help support or refute this theory.  Note its not exactly the same
> as my previous patch.  If you set the default ring pending to 512, the math in
> the b44_alloc_rx_skb path is wrong, we skip the EOT bit as well as the first
> entry in the ring.  At 511 it should work out properly.
> 
> Thanks
> Neil
> 
> 
> diff --git a/drivers/net/b44.c b/drivers/net/b44.c
> index 3d247f3..b7f5ed1 100644
> --- a/drivers/net/b44.c
> +++ b/drivers/net/b44.c
> @@ -57,7 +57,7 @@
>  #define B44_MAX_MTU			1500
>  
>  #define B44_RX_RING_SIZE		512
> -#define B44_DEF_RX_RING_PENDING		200
> +#define B44_DEF_RX_RING_PENDING		511
>  #define B44_RX_RING_BYTES	(sizeof(struct dma_desc) * \
>  				 B44_RX_RING_SIZE)
>  #define B44_TX_RING_SIZE		512

You guys are mixing up quite a bit of stuff here...

The EOT bit has _nothing_ to do with the descriptor pointers.
It simply marks the last descriptor in the (linear) descriptor
page, so that it becomes an actual ring:

   DDDDDDDDDDDDDDDDDDDDDDDDDDDE
   |                          O
   |                          T
   ^--------------------------|

It doesn't say anything about the read and write pointers
to the ring.

The B44_DMARX_PTR is the write-end pointer. It points one entry
beyond the end of the write area. Then there's the software pointer
where we keep track of the read position.

   -rx_cons     DMARX_PTR-
   v                     v
   DDDDDDDDDDDDDDDDDDDDDDDDDDE
   ^                    ^    O
   |                    |    T
   Device might write from
   here to here.

On an RX interrupt (or poll), we read the _actual_ device write
pointer. (B44_DMARX_STAT & DMARX_STAT_CDMASK). If that is equal
to our stored rx_cons, the device didn't write anything.
So we read buffers until we hit the _actual_ device write pointer.
So rx_cons is equal to (B44_DMARX_STAT & DMARX_STAT_CDMASK), except
that it lags behind by one IRQ/poll.
If we read are done, we set the DMARX_PTR write pointer to one desc
beyond the buffer that we just ate. So the device is free to continue
writing the ring _up to_ the position we left.

I don't know why b44 sets the DMARX_PTR to 200 initially (which is 40
descriptors, as this is a byte pointer). This seems kind of arbitrary.
In b43 we set it to (NR_OF_DESCRIPTORS - 1) * sizeof(descriptor).
But I don't think it really matters. It only limits the usable DMA
area before the first interrupt (or poll) occurs. After the final
B44_DMARX_PTR write in b44_rx(), the full descriptor range (well, minus one)
will be usable.

Summary: I don't see where the DMA engine code is broken (except for
the minor missing wmb(), which doesn't trigger this memory corruption, though)

I hope that helps to clear up stuff...
Comment 34 Eric Dumazet 2011-07-06 16:01:27 UTC
Le mercredi 06 juillet 2011 à 17:32 +0200, Michael Büsch a écrit :

> You guys are mixing up quite a bit of stuff here...
> 
> The EOT bit has _nothing_ to do with the descriptor pointers.
> It simply marks the last descriptor in the (linear) descriptor
> page, so that it becomes an actual ring:
> 
>    DDDDDDDDDDDDDDDDDDDDDDDDDDDE
>    |                          O
>    |                          T
>    ^--------------------------|
> 
> It doesn't say anything about the read and write pointers
> to the ring.
> 
> The B44_DMARX_PTR is the write-end pointer. It points one entry
> beyond the end of the write area. Then there's the software pointer
> where we keep track of the read position.
> 
>    -rx_cons     DMARX_PTR-
>    v                     v
>    DDDDDDDDDDDDDDDDDDDDDDDDDDE
>    ^                    ^    O
>    |                    |    T
>    Device might write from
>    here to here.
> 
> On an RX interrupt (or poll), we read the _actual_ device write
> pointer. (B44_DMARX_STAT & DMARX_STAT_CDMASK). If that is equal
> to our stored rx_cons, the device didn't write anything.
> So we read buffers until we hit the _actual_ device write pointer.
> So rx_cons is equal to (B44_DMARX_STAT & DMARX_STAT_CDMASK), except
> that it lags behind by one IRQ/poll.
> If we read are done, we set the DMARX_PTR write pointer to one desc
> beyond the buffer that we just ate. So the device is free to continue
> writing the ring _up to_ the position we left.

Not exactly :

If we read one skb at descriptor 0, we prepare a new buffer on slot 200,
and advance DMARX_PTR to 201*sizeof(descriptor).

> 
> I don't know why b44 sets the DMARX_PTR to 200 initially (which is 40
> descriptors, as this is a byte pointer). This seems kind of arbitrary.
> In b43 we set it to (NR_OF_DESCRIPTORS - 1) * sizeof(descriptor).
> But I don't think it really matters. It only limits the usable DMA
> area before the first interrupt (or poll) occurs. After the final
> B44_DMARX_PTR write in b44_rx(), the full descriptor range (well, minus one)
> will be usable.

Yes, this is probably a small bug, we should fix it for correctness.

> 
> Summary: I don't see where the DMA engine code is broken (except for
> the minor missing wmb(), which doesn't trigger this memory corruption, though)
> 
> I hope that helps to clear up stuff...

Well, you describe (nicely, thanks !) your understanding of how work the
driver and chip.

Problem is we suspect a wrong statement or wrong hardware ;)

Another problem is Alexey doesnt answer anymore, and I dont have this
(old) hardware...

Other point : Do you know why b44_get_ringparam() doesnt  set
ering->tx_max_pending and ering->tx_pending

The comment seems wrong :
/* XXX ethtool lacks a tx_max_pending, oops... */
Comment 35 Anonymous Emailer 2011-07-06 16:13:27 UTC
Reply-To: m@bues.ch

On Wed, 06 Jul 2011 18:00:19 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Not exactly :
> 
> If we read one skb at descriptor 0, we prepare a new buffer on slot 200,
> and advance DMARX_PTR to 201*sizeof(descriptor).

I don't think so. Why do you think this is the case?
We allocate a new descriptor buffer for the consumed buffer at exactly
the same place (which is "cons").
(Alternatively, we leave the buffer in place, and just copy the data to a new buffer).
And DMARX_PTR is updated to the last "cons", which is one beyond
the last buffer that we consumed (and pushed up the net stack).

(Note that "beyond" always means "beyond" in the sense of a DMA _ring_,
which honors the EOT bit. This is done by masking cons with RX_RING_SIZE-1,
which is thus assumed to be a power of two).

> > I don't know why b44 sets the DMARX_PTR to 200 initially (which is 40
> > descriptors, as this is a byte pointer). This seems kind of arbitrary.
> > In b43 we set it to (NR_OF_DESCRIPTORS - 1) * sizeof(descriptor).
> > But I don't think it really matters. It only limits the usable DMA
> > area before the first interrupt (or poll) occurs. After the final
> > B44_DMARX_PTR write in b44_rx(), the full descriptor range (well, minus one)
> > will be usable.
> 
> Yes, this is probably a small bug, we should fix it for correctness.

I wouldn't call this a bug.

> > Summary: I don't see where the DMA engine code is broken (except for
> > the minor missing wmb(), which doesn't trigger this memory corruption, though)
> > 
> > I hope that helps to clear up stuff...
> 
> Well, you describe (nicely, thanks !) your understanding of how work the
> driver and chip.
> 
> Problem is we suspect a wrong statement or wrong hardware ;)
>
> Another problem is Alexey doesnt answer anymore, and I dont have this
> (old) hardware...

I do really think that either
1) His device is broken. Chips break. I already saw several devices
with HND DMA engine that broke down after some time of use.
or
2) The bug is at another place, but not in the lowlevel DMA handling.

Could this possibly be some kind of context issue? threaded IRQs?
The net subsys was rather picky about context, last time I looked
into it. see the .._ni() style functions.

> Other point : Do you know why b44_get_ringparam() doesnt  set
> ering->tx_max_pending and ering->tx_pending
> 
> The comment seems wrong :
> /* XXX ethtool lacks a tx_max_pending, oops... */

Well, I _guess_ that ering->tx_max_pending was added later? (I didn't
even check if it's there _now_, though.)
Comment 36 Eric Dumazet 2011-07-06 16:35:50 UTC
Le mercredi 06 juillet 2011 à 18:12 +0200, Michael Büsch a écrit :
> On Wed, 06 Jul 2011 18:00:19 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Not exactly :
> > 
> > If we read one skb at descriptor 0, we prepare a new buffer on slot 200,
> > and advance DMARX_PTR to 201*sizeof(descriptor).
> 
> I don't think so. Why do you think this is the case?
> We allocate a new descriptor buffer for the consumed buffer at exactly
> the same place (which is "cons").
> (Alternatively, we leave the buffer in place, and just copy the data to a new buffer).
> And DMARX_PTR is updated to the last "cons", which is one beyond
> the last buffer that we consumed (and pushed up the net stack).

Not at all.

If it was true, b44_recycle_rx() would not even exist as is.

When we allocate a new buffer, we put it at rx_prod index, which is the
slot _after_ window, not the first slot.

First time we dequeue a packet from NIC, rx_prod is something like 200


Oh, it seems we do the following in b44_init_hw()

bp->rx_prod = bp->rx_pending;

But this seems completely wrong, if b44_init_rings() was not able to
allocate rx_pending buffers (b44_alloc_rx_skb() can return NULL)
Comment 37 Eric Dumazet 2011-07-06 16:56:47 UTC
Le mercredi 06 juillet 2011 à 17:32 +0200, Michael Büsch a écrit :

> You guys are mixing up quite a bit of stuff here...

Well

> 
> The EOT bit has _nothing_ to do with the descriptor pointers.
> It simply marks the last descriptor in the (linear) descriptor
> page, so that it becomes an actual ring:
> 
>    DDDDDDDDDDDDDDDDDDDDDDDDDDDE
>    |                          O
>    |                          T
>    ^--------------------------|
> 
> It doesn't say anything about the read and write pointers
> to the ring.
> 
> The B44_DMARX_PTR is the write-end pointer. It points one entry
> beyond the end of the write area. Then there's the software pointer
> where we keep track of the read position.
> 


Thats not how b44_rx() works :

It writes on DMARX_PTR the last slot that driver _dequeued_ in its NAPI
run. Its not the end of the window that device is allowed to use.

bw32(bp, B44_DMARX_PTR, cons * sizeof(struct dma_desc));

The end of the 'allocated buffers' is in rx_prod. Problem is NIC have no
idea of where is the end of window. We never give rx_prod to NIC.

So NIC actually read old descriptors value. We need to clear them to
avoid memory corruption.


diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 6c4ef96..ec9773b 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -725,6 +725,7 @@ static void b44_recycle_rx(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 			                 DMA_BIDIRECTIONAL);
 
 	ctrl = src_desc->ctrl;
+	src_desc->ctrl = ctrl & cpu_to_le32(DESC_CTRL_EOT);
 	if (dest_idx == (B44_RX_RING_SIZE - 1))
 		ctrl |= cpu_to_le32(DESC_CTRL_EOT);
 	else
@@ -732,6 +733,7 @@ static void b44_recycle_rx(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 
 	dest_desc->ctrl = ctrl;
 	dest_desc->addr = src_desc->addr;
+	src_desc->addr = 0;
 
 	src_map->skb = NULL;
 
@@ -1118,6 +1120,7 @@ static void b44_init_rings(struct b44 *bp)
 		if (b44_alloc_rx_skb(bp, -1, i) < 0)
 			break;
 	}
+	bp->rx_prod = i;
 }
 
 /*
@@ -1406,7 +1409,6 @@ static void b44_init_hw(struct b44 *bp, int reset_kind)
 		bw32(bp, B44_DMARX_ADDR, bp->rx_ring_dma + bp->dma_offset);
 
 		bw32(bp, B44_DMARX_PTR, bp->rx_pending);
-		bp->rx_prod = bp->rx_pending;
 
 		bw32(bp, B44_MIB_CTRL, MIB_CTRL_CLR_ON_READ);
 	}
Comment 38 Alexey Zaytsev 2011-07-07 06:32:58 UTC
Sorry, been busy for the last couple days. Any patches I should test?
Comment 39 Eric Dumazet 2011-07-07 06:48:27 UTC
Le jeudi 07 juillet 2011 à 10:32 +0400, Alexey Zaytsev a écrit :
> Sorry, been busy for the last couple days. Any patches I should test?

Please try :

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 6c4ef96..860b236 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -688,8 +688,8 @@ static int b44_alloc_rx_skb(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 		ctrl |= DESC_CTRL_EOT;
 
 	dp = &bp->rx_ring[dest_idx];
-	dp->ctrl = cpu_to_le32(ctrl);
 	dp->addr = cpu_to_le32((u32) mapping + bp->dma_offset);
+	dp->ctrl = cpu_to_le32(ctrl);
 
 	if (bp->flags & B44_FLAG_RX_RING_HACK)
 		b44_sync_dma_desc_for_device(bp->sdev, bp->rx_ring_dma,
@@ -725,13 +725,15 @@ static void b44_recycle_rx(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 			                 DMA_BIDIRECTIONAL);
 
 	ctrl = src_desc->ctrl;
+	src_desc->ctrl = (ctrl & cpu_to_le32(DESC_CTRL_EOT));
 	if (dest_idx == (B44_RX_RING_SIZE - 1))
 		ctrl |= cpu_to_le32(DESC_CTRL_EOT);
 	else
 		ctrl &= cpu_to_le32(~DESC_CTRL_EOT);
 
-	dest_desc->ctrl = ctrl;
 	dest_desc->addr = src_desc->addr;
+	dest_desc->ctrl = ctrl;
+	src_desc->addr = 0;
 
 	src_map->skb = NULL;
 
@@ -1118,6 +1120,7 @@ static void b44_init_rings(struct b44 *bp)
 		if (b44_alloc_rx_skb(bp, -1, i) < 0)
 			break;
 	}
+	bp->rx_prod = i;
 }
 
 /*
@@ -1405,8 +1408,7 @@ static void b44_init_hw(struct b44 *bp, int reset_kind)
 				      (RX_PKT_OFFSET << DMARX_CTRL_ROSHIFT)));
 		bw32(bp, B44_DMARX_ADDR, bp->rx_ring_dma + bp->dma_offset);
 
-		bw32(bp, B44_DMARX_PTR, bp->rx_pending);
-		bp->rx_prod = bp->rx_pending;
+		bw32(bp, B44_DMARX_PTR, 0);
 
 		bw32(bp, B44_MIB_CTRL, MIB_CTRL_CLR_ON_READ);
 	}
Comment 40 Alexey Zaytsev 2011-07-07 07:46:06 UTC
On Thu, Jul 7, 2011 at 10:48, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 07 juillet 2011 à 10:32 +0400, Alexey Zaytsev a écrit :
>> Sorry, been busy for the last couple days. Any patches I should test?
>
> Please try :
>

Thanks. Seems to fail to initialize, getting this in dmesg:

[  103.421577] b44 0000:02:0e.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  103.440139] ssb: Core 0 found: Fast Ethernet (cc 0x806, rev 0x07,
vendor 0x4243)
[  103.440159] ssb: Core 1 found: V90 (cc 0x807, rev 0x03, vendor 0x4243)
[  103.440177] ssb: Core 2 found: PCI (cc 0x804, rev 0x0A, vendor 0x4243)
[  103.481128] ssb: Sonics Silicon Backplane found on PCI device 0000:02:0e.0
[  103.481532] b44: b44.c:v2.0
[  103.502185] b44 ssb1:0: eth0: Broadcom 44xx/47xx 10/100BaseT
Ethernet 00:17:a4:dd:4e:93
[  109.405071] b44 ssb1:0: eth0: powering down PHY
[  112.816456] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
[  112.816470] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
[  112.952073] b44 ssb1:0: eth0: powering down PHY
[  113.816148] b44 ssb1:0: eth0: Link is down
[  114.953717] b44 ssb1:0: eth0: powering down PHY
[  117.816246] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
[  117.816260] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
[  117.963238] b44 ssb1:0: eth0: powering down PHY
[  118.816128] b44 ssb1:0: eth0: Link is down
[  119.962817] b44 ssb1:0: eth0: powering down PHY
Comment 41 Alexey Zaytsev 2011-07-07 09:44:02 UTC
On Thu, Jul 7, 2011 at 13:37, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 13:34, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
>> On Thu, Jul 7, 2011 at 13:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Le jeudi 07 juillet 2011 à 11:45 +0400, Alexey Zaytsev a écrit :
>>>> On Thu, Jul 7, 2011 at 10:48, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> > Le jeudi 07 juillet 2011 à 10:32 +0400, Alexey Zaytsev a écrit :
>>>> >> Sorry, been busy for the last couple days. Any patches I should test?
>>>> >
>>>> > Please try :
>>>> >
>>>>
>>>> Thanks. Seems to fail to initialize, getting this in dmesg:
>>>>
>>>> [  103.421577] b44 0000:02:0e.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>>>> [  103.440139] ssb: Core 0 found: Fast Ethernet (cc 0x806, rev 0x07,
>>>> vendor 0x4243)
>>>> [  103.440159] ssb: Core 1 found: V90 (cc 0x807, rev 0x03, vendor 0x4243)
>>>> [  103.440177] ssb: Core 2 found: PCI (cc 0x804, rev 0x0A, vendor 0x4243)
>>>> [  103.481128] ssb: Sonics Silicon Backplane found on PCI device 0000:02:0e.0
>>>> [  103.481532] b44: b44.c:v2.0
>>>> [  103.502185] b44 ssb1:0: eth0: Broadcom 44xx/47xx 10/100BaseT
>>>> Ethernet 00:17:a4:dd:4e:93
>>>> [  109.405071] b44 ssb1:0: eth0: powering down PHY
>>>> [  112.816456] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
>>>> [  112.816470] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
>>>> [  112.952073] b44 ssb1:0: eth0: powering down PHY
>>>> [  113.816148] b44 ssb1:0: eth0: Link is down
>>>> [  114.953717] b44 ssb1:0: eth0: powering down PHY
>>>> [  117.816246] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
>>>> [  117.816260] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
>>>> [  117.963238] b44 ssb1:0: eth0: powering down PHY
>>>> [  118.816128] b44 ssb1:0: eth0: Link is down
>>>> [  119.962817] b44 ssb1:0: eth0: powering down PHY
>>>
>>> Maybe this is the b44_init_hw() change :
>>>
>>> bw32(bp, B44_DMARX_PTR, 0);
>>>
>>> So please change this part
>>>
>>> Updated patch :
>>
>> Initializes fine now, but the transfer would stall every few seconds.
>>
> Err, wait a sec, patches the wrong version.
>
Ok, I've wrongly fixed this part in the clean b44.c, and got stalls.
Then fixed the patched one after applying the stash, and still got the
stalls.
Comment 42 Eric Dumazet 2011-07-07 09:52:36 UTC
Le jeudi 07 juillet 2011 à 13:43 +0400, Alexey Zaytsev a écrit :

> Ok, I've wrongly fixed this part in the clean b44.c, and got stalls.
> Then fixed the patched one after applying the stash, and still got the
> stalls.

So, filling NULL to 'should not be used slots' definitely removes the
memory corruption, but also stalls the NIC.

Driver model is completely wrong. Without full hardware specs, this will
be hard to guess appropriate fixes.
Comment 43 Alexey Zaytsev 2011-07-07 10:42:02 UTC
On Thu, Jul 7, 2011 at 13:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 07 juillet 2011 à 11:45 +0400, Alexey Zaytsev a écrit :
>> On Thu, Jul 7, 2011 at 10:48, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Le jeudi 07 juillet 2011 à 10:32 +0400, Alexey Zaytsev a écrit :
>> >> Sorry, been busy for the last couple days. Any patches I should test?
>> >
>> > Please try :
>> >
>>
>> Thanks. Seems to fail to initialize, getting this in dmesg:
>>
>> [  103.421577] b44 0000:02:0e.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>> [  103.440139] ssb: Core 0 found: Fast Ethernet (cc 0x806, rev 0x07,
>> vendor 0x4243)
>> [  103.440159] ssb: Core 1 found: V90 (cc 0x807, rev 0x03, vendor 0x4243)
>> [  103.440177] ssb: Core 2 found: PCI (cc 0x804, rev 0x0A, vendor 0x4243)
>> [  103.481128] ssb: Sonics Silicon Backplane found on PCI device 0000:02:0e.0
>> [  103.481532] b44: b44.c:v2.0
>> [  103.502185] b44 ssb1:0: eth0: Broadcom 44xx/47xx 10/100BaseT
>> Ethernet 00:17:a4:dd:4e:93
>> [  109.405071] b44 ssb1:0: eth0: powering down PHY
>> [  112.816456] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
>> [  112.816470] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
>> [  112.952073] b44 ssb1:0: eth0: powering down PHY
>> [  113.816148] b44 ssb1:0: eth0: Link is down
>> [  114.953717] b44 ssb1:0: eth0: powering down PHY
>> [  117.816246] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
>> [  117.816260] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
>> [  117.963238] b44 ssb1:0: eth0: powering down PHY
>> [  118.816128] b44 ssb1:0: eth0: Link is down
>> [  119.962817] b44 ssb1:0: eth0: powering down PHY
>
> Maybe this is the b44_init_hw() change :
>
> bw32(bp, B44_DMARX_PTR, 0);
>
> So please change this part
>
> Updated patch :

Initializes fine now, but the transfer would stall every few seconds.
Comment 44 Alexey Zaytsev 2011-07-07 10:42:10 UTC
On Thu, Jul 7, 2011 at 13:34, Alexey Zaytsev <alexey.zaytsev@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 13:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le jeudi 07 juillet 2011 à 11:45 +0400, Alexey Zaytsev a écrit :
>>> On Thu, Jul 7, 2011 at 10:48, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > Le jeudi 07 juillet 2011 à 10:32 +0400, Alexey Zaytsev a écrit :
>>> >> Sorry, been busy for the last couple days. Any patches I should test?
>>> >
>>> > Please try :
>>> >
>>>
>>> Thanks. Seems to fail to initialize, getting this in dmesg:
>>>
>>> [  103.421577] b44 0000:02:0e.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
>>> [  103.440139] ssb: Core 0 found: Fast Ethernet (cc 0x806, rev 0x07,
>>> vendor 0x4243)
>>> [  103.440159] ssb: Core 1 found: V90 (cc 0x807, rev 0x03, vendor 0x4243)
>>> [  103.440177] ssb: Core 2 found: PCI (cc 0x804, rev 0x0A, vendor 0x4243)
>>> [  103.481128] ssb: Sonics Silicon Backplane found on PCI device 0000:02:0e.0
>>> [  103.481532] b44: b44.c:v2.0
>>> [  103.502185] b44 ssb1:0: eth0: Broadcom 44xx/47xx 10/100BaseT
>>> Ethernet 00:17:a4:dd:4e:93
>>> [  109.405071] b44 ssb1:0: eth0: powering down PHY
>>> [  112.816456] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
>>> [  112.816470] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
>>> [  112.952073] b44 ssb1:0: eth0: powering down PHY
>>> [  113.816148] b44 ssb1:0: eth0: Link is down
>>> [  114.953717] b44 ssb1:0: eth0: powering down PHY
>>> [  117.816246] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
>>> [  117.816260] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
>>> [  117.963238] b44 ssb1:0: eth0: powering down PHY
>>> [  118.816128] b44 ssb1:0: eth0: Link is down
>>> [  119.962817] b44 ssb1:0: eth0: powering down PHY
>>
>> Maybe this is the b44_init_hw() change :
>>
>> bw32(bp, B44_DMARX_PTR, 0);
>>
>> So please change this part
>>
>> Updated patch :
>
> Initializes fine now, but the transfer would stall every few seconds.
>
Err, wait a sec, patches the wrong version.
Comment 45 Eric Dumazet 2011-07-07 10:42:18 UTC
Le jeudi 07 juillet 2011 à 11:45 +0400, Alexey Zaytsev a écrit :
> On Thu, Jul 7, 2011 at 10:48, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le jeudi 07 juillet 2011 à 10:32 +0400, Alexey Zaytsev a écrit :
> >> Sorry, been busy for the last couple days. Any patches I should test?
> >
> > Please try :
> >
> 
> Thanks. Seems to fail to initialize, getting this in dmesg:
> 
> [  103.421577] b44 0000:02:0e.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [  103.440139] ssb: Core 0 found: Fast Ethernet (cc 0x806, rev 0x07,
> vendor 0x4243)
> [  103.440159] ssb: Core 1 found: V90 (cc 0x807, rev 0x03, vendor 0x4243)
> [  103.440177] ssb: Core 2 found: PCI (cc 0x804, rev 0x0A, vendor 0x4243)
> [  103.481128] ssb: Sonics Silicon Backplane found on PCI device 0000:02:0e.0
> [  103.481532] b44: b44.c:v2.0
> [  103.502185] b44 ssb1:0: eth0: Broadcom 44xx/47xx 10/100BaseT
> Ethernet 00:17:a4:dd:4e:93
> [  109.405071] b44 ssb1:0: eth0: powering down PHY
> [  112.816456] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
> [  112.816470] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
> [  112.952073] b44 ssb1:0: eth0: powering down PHY
> [  113.816148] b44 ssb1:0: eth0: Link is down
> [  114.953717] b44 ssb1:0: eth0: powering down PHY
> [  117.816246] b44 ssb1:0: eth0: Link is up at 100 Mbps, full duplex
> [  117.816260] b44 ssb1:0: eth0: Flow control is off for TX and off for RX
> [  117.963238] b44 ssb1:0: eth0: powering down PHY
> [  118.816128] b44 ssb1:0: eth0: Link is down
> [  119.962817] b44 ssb1:0: eth0: powering down PHY

Maybe this is the b44_init_hw() change :

bw32(bp, B44_DMARX_PTR, 0);

So please change this part 

Updated patch :

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index 6c4ef96..555a8ce 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -688,8 +688,8 @@ static int b44_alloc_rx_skb(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 		ctrl |= DESC_CTRL_EOT;
 
 	dp = &bp->rx_ring[dest_idx];
-	dp->ctrl = cpu_to_le32(ctrl);
 	dp->addr = cpu_to_le32((u32) mapping + bp->dma_offset);
+	dp->ctrl = cpu_to_le32(ctrl);
 
 	if (bp->flags & B44_FLAG_RX_RING_HACK)
 		b44_sync_dma_desc_for_device(bp->sdev, bp->rx_ring_dma,
@@ -725,13 +725,15 @@ static void b44_recycle_rx(struct b44 *bp, int src_idx, u32 dest_idx_unmasked)
 			                 DMA_BIDIRECTIONAL);
 
 	ctrl = src_desc->ctrl;
+	src_desc->ctrl = (ctrl & cpu_to_le32(DESC_CTRL_EOT));
 	if (dest_idx == (B44_RX_RING_SIZE - 1))
 		ctrl |= cpu_to_le32(DESC_CTRL_EOT);
 	else
 		ctrl &= cpu_to_le32(~DESC_CTRL_EOT);
 
-	dest_desc->ctrl = ctrl;
 	dest_desc->addr = src_desc->addr;
+	dest_desc->ctrl = ctrl;
+	src_desc->addr = 0;
 
 	src_map->skb = NULL;
 
@@ -1118,6 +1120,7 @@ static void b44_init_rings(struct b44 *bp)
 		if (b44_alloc_rx_skb(bp, -1, i) < 0)
 			break;
 	}
+	bp->rx_prod = i;
 }
 
 /*
@@ -1406,7 +1409,6 @@ static void b44_init_hw(struct b44 *bp, int reset_kind)
 		bw32(bp, B44_DMARX_ADDR, bp->rx_ring_dma + bp->dma_offset);
 
 		bw32(bp, B44_DMARX_PTR, bp->rx_pending);
-		bp->rx_prod = bp->rx_pending;
 
 		bw32(bp, B44_MIB_CTRL, MIB_CTRL_CLR_ON_READ);
 	}
Comment 46 Rafael J. Wysocki 2011-07-10 12:31:19 UTC
On Sunday, July 10, 2011, Alexey Zaytsev wrote:
> On Sun, Jul 10, 2011 at 14:58, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of regressions introduced between 2.6.38 and 2.6.39.
> >
> > The following bug entry is on the current list of known regressions
> > introduced between 2.6.38 and 2.6.39.  Please verify if it still should
> > be listed and let the tracking team know (either way).
> 
> Hey.
> 
> This is not a regression, it's been this way since at least since
> 2.6.27 (checked), and probably from the beginning.
> 
> >
> >
> > Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=38102
> > Subject         : BUG kmalloc-2048: Poison overwritten
> > Submitter       : Alexey Zaytsev <alexey.zaytsev@gmail.com>
> > Date            : 2011-06-23 17:33 (18 days old)
> >
> >
> >
> 
>
Comment 47 Rafael J. Wysocki 2011-07-10 12:35:06 UTC
Dropping from the list of recent regressions.

Note You need to log in before you can comment on or make changes to this bug.