Bug 16626 - Machine hangs with EIP at skb_copy_and_csum_dev
Summary: Machine hangs with EIP at skb_copy_and_csum_dev
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PCI (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: drivers_pci@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 16444
  Show dependency tree
 
Reported: 2010-08-19 09:57 UTC by Plamen Petrov
Modified: 2010-09-12 17:29 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.36-rc1-00127-g763008c
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
config-v2.6.33.7-FS (51.10 KB, application/octet-stream)
2010-08-19 10:01 UTC, Plamen Petrov
Details
the output of dmesg with 2.6.33.7 (48.94 KB, text/plain)
2010-08-19 10:03 UTC, Plamen Petrov
Details
config of v2.6.36-rc1-127-g763008c (53.07 KB, application/octet-stream)
2010-08-19 10:08 UTC, Plamen Petrov
Details

Description Plamen Petrov 2010-08-19 09:57:22 UTC
After upgrade from 2.6.33.7 to 2.6.35.2 a server hanged twice, so
continued on 2.6.33.7.

Today decided to try lates Linus' tree with no luck.

The first time I started on 2.6.36-rc1-00127-g763008c it ran for a few
minutes, then whent dead with this on the screen:
[picture 1]
http://picpaste.com/9cfb03116d41f27568e1bb2a67b7f4dc.jpg

[picture 2]
Then I power-cycled the machine, only two get this:
http://picpaste.com/6d70f453e462d1aed038781ad4bdb741.jpg

And because [picture 2] seemed too bad on the lower half of the screen,
here is 
[picture 3]
http://picpaste.com/0a51ae079ace2e4abd9e9d29226069f7.jpg
Comment 1 Plamen Petrov 2010-08-19 09:58:55 UTC
Currently running on 2.6.33.7, whose dmesg and .config will follow as attachments.
Comment 2 Plamen Petrov 2010-08-19 10:01:08 UTC
Created attachment 27505 [details]
config-v2.6.33.7-FS

This is the config of the fine running 2.6.33.7
Comment 3 Plamen Petrov 2010-08-19 10:03:24 UTC
Created attachment 27507 [details]
the output of dmesg with 2.6.33.7

the output of dmesg with v2.6.33.7-FS
Comment 4 Plamen Petrov 2010-08-19 10:08:22 UTC
Created attachment 27508 [details]
config of v2.6.36-rc1-127-g763008c

This is the config I used with v2.6.36-rc1-127-g763008c
Comment 5 Andrew Morton 2010-08-19 22:22:43 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 19 Aug 2010 09:57:25 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=16626
> 
>            Summary: Machine hangs with EIP at skb_copy_and_csum_dev
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 2.6.36-rc1-00127-g763008c
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: blocking
>           Priority: P1
>          Component: PCI
>         AssignedTo: drivers_pci@kernel-bugs.osdl.org
>         ReportedBy: pvp-lsts@fs.uni-ruse.bg
>         Regression: Yes

A post-2.6.35 regression.

> 
> After upgrade from 2.6.33.7 to 2.6.35.2 a server hanged twice, so
> continued on 2.6.33.7.
> 
> Today decided to try lates Linus' tree with no luck.
> 
> The first time I started on 2.6.36-rc1-00127-g763008c it ran for a few
> minutes, then whent dead with this on the screen:
> [picture 1]
> http://picpaste.com/9cfb03116d41f27568e1bb2a67b7f4dc.jpg
> 
> [picture 2]
> Then I power-cycled the machine, only two get this:
> http://picpaste.com/6d70f453e462d1aed038781ad4bdb741.jpg
> 
> And because [picture 2] seemed too bad on the lower half of the screen,
> here is 
> [picture 3]
> http://picpaste.com/0a51ae079ace2e4abd9e9d29226069f7.jpg

Might have triggered the BUG_ON() in skb_copy_and_csum_dev().  Might be
a tg3 thing.  Hard to tell.

It'd be really nice to get that first screenful.  Sigh.  How long have
we had this oops-scrolls-off problem??  Perhaps you could set
/proc/sys/kernel/printk_delay to 100 (it's in milliseconds) so that the
oops scrolls past nice and slowly?
Comment 6 Andrew Morton 2010-08-20 05:09:31 UTC
On Fri, 20 Aug 2010 08:03:21 +0300 Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> wrote:

> (responding via emailed reply-to-all)
> 
> ____ 20.8.2010 __. 01:21, Andrew Morton ____________:
> >
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Thu, 19 Aug 2010 09:57:25 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> >
> >> https://bugzilla.kernel.org/show_bug.cgi?id=16626
> >>
> >>             Summary: Machine hangs with EIP at skb_copy_and_csum_dev
> >>             Product: Drivers
> >>             Version: 2.5
> >>      Kernel Version: 2.6.36-rc1-00127-g763008c
> >>            Platform: All
> >>          OS/Version: Linux
> >>                Tree: Mainline
> >>              Status: NEW
> >>            Severity: blocking
> >>            Priority: P1
> >>           Component: PCI
> >>          AssignedTo: drivers_pci@kernel-bugs.osdl.org
> >>          ReportedBy: pvp-lsts@fs.uni-ruse.bg
> >>          Regression: Yes
> >
> > A post-2.6.35 regression.
> >
> >>
> >> After upgrade from 2.6.33.7 to 2.6.35.2 a server hanged twice, so
> >> continued on 2.6.33.7.
> >>
> >> Today decided to try lates Linus' tree with no luck.
> >>
> >> The first time I started on 2.6.36-rc1-00127-g763008c it ran for a few
> >> minutes, then whent dead with this on the screen:
> >> [picture 1]
> >> http://picpaste.com/9cfb03116d41f27568e1bb2a67b7f4dc.jpg
> >>
> >> [picture 2]
> >> Then I power-cycled the machine, only two get this:
> >> http://picpaste.com/6d70f453e462d1aed038781ad4bdb741.jpg
> >>
> >> And because [picture 2] seemed too bad on the lower half of the screen,
> >> here is
> >> [picture 3]
> >> http://picpaste.com/0a51ae079ace2e4abd9e9d29226069f7.jpg
> >
> > Might have triggered the BUG_ON() in skb_copy_and_csum_dev().  Might be
> > a tg3 thing.  Hard to tell.
> >
> > It'd be really nice to get that first screenful.  Sigh.  How long have
> > we had this oops-scrolls-off problem??  Perhaps you could set
> > /proc/sys/kernel/printk_delay to 100 (it's in milliseconds) so that the
> > oops scrolls past nice and slowly?
> >
> So you need the begining of the oops screen - I will try to get that
> with the proposed pirntk_delay setting.

Thanks.

> But wich kernel should I use? Linus' latest tree or 2.6.35.2 ? They
> both fail the same way here, as far as I can say.

Current mainline would be best, because we'd fix the bug there first
then backport the fix into -stable.  But it doesn't matter a lot in
this case - whatever's most convenient for you, I'd say.
Comment 7 Plamen Petrov 2010-08-20 05:46:35 UTC
(responding via emailed reply-to-all)

На 20.8.2010 г. 01:21, Andrew Morton написа:
>
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Thu, 19 Aug 2010 09:57:25 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>> https://bugzilla.kernel.org/show_bug.cgi?id=16626
>>
>>             Summary: Machine hangs with EIP at skb_copy_and_csum_dev
>>             Product: Drivers
>>             Version: 2.5
>>      Kernel Version: 2.6.36-rc1-00127-g763008c
>>            Platform: All
>>          OS/Version: Linux
>>                Tree: Mainline
>>              Status: NEW
>>            Severity: blocking
>>            Priority: P1
>>           Component: PCI
>>          AssignedTo: drivers_pci@kernel-bugs.osdl.org
>>          ReportedBy: pvp-lsts@fs.uni-ruse.bg
>>          Regression: Yes
>
> A post-2.6.35 regression.
>
>>
>> After upgrade from 2.6.33.7 to 2.6.35.2 a server hanged twice, so
>> continued on 2.6.33.7.
>>
>> Today decided to try lates Linus' tree with no luck.
>>
>> The first time I started on 2.6.36-rc1-00127-g763008c it ran for a few
>> minutes, then whent dead with this on the screen:
>> [picture 1]
>> http://picpaste.com/9cfb03116d41f27568e1bb2a67b7f4dc.jpg
>>
>> [picture 2]
>> Then I power-cycled the machine, only two get this:
>> http://picpaste.com/6d70f453e462d1aed038781ad4bdb741.jpg
>>
>> And because [picture 2] seemed too bad on the lower half of the screen,
>> here is
>> [picture 3]
>> http://picpaste.com/0a51ae079ace2e4abd9e9d29226069f7.jpg
>
> Might have triggered the BUG_ON() in skb_copy_and_csum_dev().  Might be
> a tg3 thing.  Hard to tell.
>
> It'd be really nice to get that first screenful.  Sigh.  How long have
> we had this oops-scrolls-off problem??  Perhaps you could set
> /proc/sys/kernel/printk_delay to 100 (it's in milliseconds) so that the
> oops scrolls past nice and slowly?
>
So you need the begining of the oops screen - I will try to get that
with the proposed pirntk_delay setting.
But wich kernel should I use? Linus' latest tree or 2.6.35.2 ? They
both fail the same way here, as far as I can say.
Comment 8 Plamen Petrov 2010-08-20 06:12:51 UTC
На 20.8.2010 г. 08:11, Andrew Morton написа:
> On Fri, 20 Aug 2010 08:03:21 +0300 Plamen Petrov<pvp-lsts@fs.uni-ruse.bg> 
> wrote:
>
>> (responding via emailed reply-to-all)
>>
>> ____ 20.8.2010 __. 01:21, Andrew Morton ____________:
>>>
>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>> bugzilla web interface).
>>>
>>> On Thu, 19 Aug 2010 09:57:25 GMT
>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=16626
>>>>
>>>>              Summary: Machine hangs with EIP at skb_copy_and_csum_dev
>>>>              Product: Drivers
>>>>              Version: 2.5
>>>>       Kernel Version: 2.6.36-rc1-00127-g763008c
>>>>             Platform: All
>>>>           OS/Version: Linux
>>>>                 Tree: Mainline
>>>>               Status: NEW
>>>>             Severity: blocking
>>>>             Priority: P1
>>>>            Component: PCI
>>>>           AssignedTo: drivers_pci@kernel-bugs.osdl.org
>>>>           ReportedBy: pvp-lsts@fs.uni-ruse.bg
>>>>           Regression: Yes
>>>
>>> A post-2.6.35 regression.
>>>
>>>>
>>>> After upgrade from 2.6.33.7 to 2.6.35.2 a server hanged twice, so
>>>> continued on 2.6.33.7.
>>>>
>>>> Today decided to try lates Linus' tree with no luck.
>>>>
>>>> The first time I started on 2.6.36-rc1-00127-g763008c it ran for a few
>>>> minutes, then whent dead with this on the screen:
>>>> [picture 1]
>>>> http://picpaste.com/9cfb03116d41f27568e1bb2a67b7f4dc.jpg
>>>>
>>>> [picture 2]
>>>> Then I power-cycled the machine, only two get this:
>>>> http://picpaste.com/6d70f453e462d1aed038781ad4bdb741.jpg
>>>>
>>>> And because [picture 2] seemed too bad on the lower half of the screen,
>>>> here is
>>>> [picture 3]
>>>> http://picpaste.com/0a51ae079ace2e4abd9e9d29226069f7.jpg
>>>
>>> Might have triggered the BUG_ON() in skb_copy_and_csum_dev().  Might be
>>> a tg3 thing.  Hard to tell.
>>>
>>> It'd be really nice to get that first screenful.  Sigh.  How long have
>>> we had this oops-scrolls-off problem??  Perhaps you could set
>>> /proc/sys/kernel/printk_delay to 100 (it's in milliseconds) so that the
>>> oops scrolls past nice and slowly?
>>>
>> So you need the begining of the oops screen - I will try to get that
>> with the proposed pirntk_delay setting.
>
> Thanks.
>
>> But wich kernel should I use? Linus' latest tree or 2.6.35.2 ? They
>> both fail the same way here, as far as I can say.
>
> Current mainline would be best, because we'd fix the bug there first
> then backport the fix into -stable.  But it doesn't matter a lot in
> this case - whatever's most convenient for you, I'd say.
>
With the "echo 100 > /proc/sys/kernel/printk_delay" command run by
/etc/rc.d/rc.local, while still on 2.6.36-rc1-00127-g763008c, I got
these:

[picture 4]
http://picpaste.com/aa3e373e894179e8ba19587ed63d8104.jpg

[picture 5]
http://picpaste.com/9bc4bdc04f5a84fdaf49d6e1db23ede8.jpg

[picture 6]
http://picpaste.com/da3ccd69a0a1221bb55f48b39c4ad950.jpg

Hope the above help.

And by the way, I think you are correct that this is a
post-2.6.35 thing, because 2.6.35.2 was the first to give
me this kind of problems, and I can confirm that 2.6.34
does not have it, because the system was on 2.6.34.4 for
the last 12 hours without problems, then just a moment ago
crashed on 2.6.36-rc1-00127-g763008c, and now back on
2.6.34.4

P.S. Shouldn't "echo 100 > /proc/sys/kernel/printk_delay" be
somewhere on the "How to debug a crashing kernel guide"
somewhere?

Thanks!
Comment 9 Andrew Morton 2010-08-20 06:18:39 UTC
On Fri, 20 Aug 2010 09:12:10 +0300 Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> wrote:

> With the "echo 100 > /proc/sys/kernel/printk_delay" command run by
> /etc/rc.d/rc.local, while still on 2.6.36-rc1-00127-g763008c, I got
> these:
> 
> [picture 4]
> http://picpaste.com/aa3e373e894179e8ba19587ed63d8104.jpg

bewdy, thanks.

        BUG_ON(csstart > skb_headlen(skb));

Hopefully that's enough for the net guys to work with.

> [picture 5]
> http://picpaste.com/9bc4bdc04f5a84fdaf49d6e1db23ede8.jpg
> 
> [picture 6]
> http://picpaste.com/da3ccd69a0a1221bb55f48b39c4ad950.jpg
> 
> Hope the above help.
> 
> And by the way, I think you are correct that this is a
> post-2.6.35 thing, because 2.6.35.2 was the first to give
> me this kind of problems, and I can confirm that 2.6.34
> does not have it, because the system was on 2.6.34.4 for
> the last 12 hours without problems, then just a moment ago
> crashed on 2.6.36-rc1-00127-g763008c, and now back on
> 2.6.34.4

OK.

> P.S. Shouldn't "echo 100 > /proc/sys/kernel/printk_delay" be
> somewhere on the "How to debug a crashing kernel guide"
> somewhere?

I only just thought of it ;)

I'm thinking perhaps that we should have a lines_after_oops boot
parameter which makes the console shut up N lines after someone called
oops_enter().
Comment 10 Plamen Petrov 2010-08-20 06:27:39 UTC
Posting this only in the hope that it will be helpfull to the
"net guys"...

root@fs:~# uname -a; lspci -vvv; ethtool -i eth1
> Linux fs 2.6.34.4-FS #1 SMP Thu Aug 19 18:09:58 UTC 2010 i686 Intel(R)
> Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
> 00:00.0 Host bridge: Intel Corporation E7230/3000/3010 Memory Controller Hub
>         Subsystem: Dell Unknown device 01df
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort+ >SERR- <PERR-
>         Latency: 0
>         Capabilities: [e0] Vendor Specific Information
>
> 00:01.0 PCI bridge: Intel Corporation E7230/3000/3010 PCI Express Root Port
> (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 0000f000-00000fff
>         Memory behind bridge: efe00000-efefffff
>         Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>         Capabilities: [88] Subsystem: Intel Corporation Unknown device 0000
>         Capabilities: [80] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>                 PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [90] Message Signalled Interrupts: Mask- 64bit-
>         Queue=0/0 Enable+
>                 Address: fee0300c  Data: 4149
>         Capabilities: [a0] Express Root Port (Slot+) IRQ 0
>                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
>                 Device: Latency L0s <64ns, L1 <1us
>                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 2
>                 Link: Latency L0s <256ns, L1 <4us
>                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x0
>                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug- Surpise-
>                 Slot: Number 1, PowerLimit 25.000000
>                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
>                 Slot: AttnInd Off, PwrInd On, Power-
>                 Root: Correctable- Non-Fatal- Fatal- PME-
>         Capabilities: [100] Virtual Channel
>         Capabilities: [140] Unknown (5)
>
> 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1
> (rev 01) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
>         I/O behind bridge: 00001000-00001fff
>         Memory behind bridge: efd00000-efdfffff
>         Prefetchable memory behind bridge: 0000000020000000-00000000201fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>         Capabilities: [40] Express Root Port (Slot+) IRQ 0
>                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
>                 Device: Latency L0s unlimited, L1 unlimited
>                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x4, ASPM L0s, Port 1
>                 Link: Latency L0s <256ns, L1 <4us
>                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x0
>                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
>                 Slot: Number 4, PowerLimit 10.000000
>                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
>                 Slot: AttnInd Unknown, PwrInd Unknown, Power-
>                 Root: Correctable- Non-Fatal- Fatal- PME-
>         Capabilities: [80] Message Signalled Interrupts: Mask- 64bit-
>         Queue=0/0 Enable+
>                 Address: fee0300c  Data: 4151
>         Capabilities: [90] Subsystem: Gammagraphx, Inc. Unknown device 0000
>         Capabilities: [a0] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>                 PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [100] Virtual Channel
>         Capabilities: [180] Unknown (5)
>
> 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI
> Express Port 5 (rev 01) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
>         I/O behind bridge: 00002000-00002fff
>         Memory behind bridge: 20200000-203fffff
>         Prefetchable memory behind bridge: 0000000020400000-00000000205fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>         Capabilities: [40] Express Root Port (Slot+) IRQ 0
>                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
>                 Device: Latency L0s unlimited, L1 unlimited
>                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 5
>                 Link: Latency L0s <256ns, L1 <4us
>                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x0
>                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
>                 Slot: Number 1, PowerLimit 10.000000
>                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
>                 Slot: AttnInd Unknown, PwrInd Unknown, Power-
>                 Root: Correctable- Non-Fatal- Fatal- PME-
>         Capabilities: [80] Message Signalled Interrupts: Mask- 64bit-
>         Queue=0/0 Enable+
>                 Address: fee0300c  Data: 4159
>         Capabilities: [90] Subsystem: Gammagraphx, Inc. Unknown device 0000
>         Capabilities: [a0] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>                 PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [100] Virtual Channel
>         Capabilities: [180] Unknown (5)
>
> 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI
> Express Port 6 (rev 01) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
>         I/O behind bridge: 00003000-00003fff
>         Memory behind bridge: efc00000-efcfffff
>         Prefetchable memory behind bridge: 0000000020600000-00000000207fffff
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
>         Capabilities: [40] Express Root Port (Slot+) IRQ 0
>                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
>                 Device: Latency L0s unlimited, L1 unlimited
>                 Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                 Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 6
>                 Link: Latency L0s <256ns, L1 <4us
>                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x1
>                 Slot: AtnBtn- PwrCtrl- MRL- AtnInd- PwrInd- HotPlug+ Surpise+
>                 Slot: Number 0, PowerLimit 10.000000
>                 Slot: Enabled AtnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq-
>                 Slot: AttnInd Unknown, PwrInd Unknown, Power-
>                 Root: Correctable- Non-Fatal- Fatal- PME-
>         Capabilities: [80] Message Signalled Interrupts: Mask- 64bit-
>         Queue=0/0 Enable+
>                 Address: fee0300c  Data: 4161
>         Capabilities: [90] Subsystem: Gammagraphx, Inc. Unknown device 0000
>         Capabilities: [a0] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>                 PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [100] Virtual Channel
>         Capabilities: [180] Unknown (5)
>
> 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #1 (rev 01) (prog-if 00 [UHCI])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR- FastB2B-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 21
>         Region 4: I/O ports at ff80 [size=32]
>
> 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #2 (rev 01) (prog-if 00 [UHCI])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR- FastB2B-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Interrupt: pin B routed to IRQ 22
>         Region 4: I/O ports at ff60 [size=32]
>
> 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> Controller #3 (rev 01) (prog-if 00 [UHCI])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR- FastB2B-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Interrupt: pin C routed to IRQ 18
>         Region 4: I/O ports at ff40 [size=32]
>
> 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
> Controller (rev 01) (prog-if 20 [EHCI])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 21
>         Region 0: Memory at ff980800 (32-bit, non-prefetchable) [size=1K]
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
>                 PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [58] Debug port
>
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01
> [Subtractive decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Bus: primary=00, secondary=05, subordinate=05, sec-latency=32
>         I/O behind bridge: 0000d000-0000dfff
>         Memory behind bridge: efa00000-efbfffff
>         Prefetchable memory behind bridge: 00000000e0000000-00000000e7ffffff
>         Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort+ <SERR- <PERR-
>         BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
>         Capabilities: [50] Subsystem: Gammagraphx, Inc. Unknown device 0000
>
> 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface
> Bridge (rev 01)
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Capabilities: [e0] Vendor Specific Information
>
> 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller
> (rev 01) (prog-if 8a [Master SecP PriP])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR- FastB2B-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 16
>         Region 0: I/O ports at 01f0 [size=8]
>         Region 1: I/O ports at 03f4 [size=1]
>         Region 2: I/O ports at 0170 [size=8]
>         Region 3: I/O ports at 0374 [size=1]
>         Region 4: I/O ports at ffa0 [size=16]
>
> 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE
> Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR- FastB2B-
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0
>         Interrupt: pin C routed to IRQ 20
>         Region 0: I/O ports at fe00 [size=8]
>         Region 1: I/O ports at fe10 [size=4]
>         Region 2: I/O ports at fe20 [size=8]
>         Region 3: I/O ports at fe30 [size=4]
>         Region 4: I/O ports at fec0 [size=16]
>         Region 5: Memory at effffc00 (32-bit, non-prefetchable) [size=1K]
>         Capabilities: [70] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>                 PME(D0-,D1-,D2-,D3hot+,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>
> 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev
> 01)
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Interrupt: pin B routed to IRQ 17
>         Region 4: I/O ports at ece0 [size=32]
>
> 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754 Gigabit
> Ethernet PCI Express (rev 02)
>         Subsystem: Dell Unknown device 01df
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR- FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 28
>         Region 0: Memory at efcf0000 (64-bit, non-prefetchable) [size=64K]
>         Expansion ROM at <ignored> [disabled]
>         Capabilities: [48] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
>                 PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] Vital Product Data
>         Capabilities: [58] Vendor Specific Information
>         Capabilities: [e8] Message Signalled Interrupts: Mask- 64bit+
>         Queue=0/0 Enable+
>                 Address: 00000000fee0300c  Data: 4189
>         Capabilities: [d0] Express Endpoint IRQ 0
>                 Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag+
>                 Device: Latency L0s <4us, L1 unlimited
>                 Device: AtnBtn- AtnInd- PwrInd-
>                 Device: Errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
>                 Device: RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
>                 Device: MaxPayload 128 bytes, MaxReadReq 4096 bytes
>                 Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 0
>                 Link: Latency L0s <4us, L1 <64us
>                 Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
>                 Link: Speed 2.5Gb/s, Width x1
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [13c] Virtual Channel
>         Capabilities: [160] Device Serial Number 1b-8a-38-fe-ff-a0-1a-00
>         Capabilities: [16c] Power Budgeting
>
> 05:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
>         Subsystem: Realtek Semiconductor Co., Ltd. RT8139
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 64 (8000ns min, 16000ns max)
>         Interrupt: pin A routed to IRQ 18
>         Region 0: I/O ports at d400 [size=256]
>         Region 1: Memory at efaefe00 (32-bit, non-prefetchable) [size=256]
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>                 PME(D0-,D1+,D2+,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>
> 05:04.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
>         Subsystem: Realtek Semiconductor Co., Ltd. RT8139
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping- SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Latency: 64 (8000ns min, 16000ns max)
>         Interrupt: pin A routed to IRQ 16
>         Region 0: I/O ports at d800 [size=256]
>         Region 1: Memory at efaeff00 (32-bit, non-prefetchable) [size=256]
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>                 PME(D0-,D1+,D2+,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>
> 05:07.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)
> (prog-if 00 [VGA])
>         Subsystem: Dell Unknown device 01df
>         Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
>         Stepping+ SERR+ FastB2B-
>         Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
>         <TAbort- <MAbort- >SERR- <PERR-
>         Interrupt: pin A routed to IRQ 5
>         Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
>         Region 1: I/O ports at dc00 [size=256]
>         Region 2: Memory at efaf0000 (32-bit, non-prefetchable) [size=64K]
>         Expansion ROM at efb00000 [disabled] [size=128K]
>         Capabilities: [50] Power Management version 2
>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
>                 PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 PME-Enable- DSel=0 DScale=0 PME-
>
> driver: tg3
> version: 3.108
> firmware-version: 5754-v3.12
> bus-info: 0000:04:00.0
> root@fs:~#
Comment 11 Eric Dumazet 2010-08-20 06:40:44 UTC
Le vendredi 20 août 2010 à 09:26 +0300, Plamen Petrov a écrit :
> Posting this only in the hope that it will be helpfull to the
> "net guys"...

Its a forwarding setup.

Please post

ifconfig -a
iptables -nvL
iptables -t nat -nvL
iptables -t mangle -nvL
ip route
ethtool -k eth0   (& eth1 ...)

Try to disable gro ?
ethtool -K eth0 gro off
ethtool -K eth1 gro off
Comment 12 Plamen Petrov 2010-08-20 06:58:02 UTC
На 20.8.2010 г. 09:34, Eric Dumazet написа:
> Le vendredi 20 août 2010 à 09:26 +0300, Plamen Petrov a écrit :
>> Posting this only in the hope that it will be helpfull to the
>> "net guys"...
>
> Its a forwarding setup.
>
> Please post
>
> ifconfig -a
> iptables -nvL
> iptables -t nat -nvL
> iptables -t mangle -nvL
> ip route
> ethtool -k eth0   (&  eth1 ...)
>
> Try to disable gro ?
> ethtool -K eth0 gro off
> ethtool -K eth1 gro off
>
>
Here goes the output of the above:
root@fs:~# ifconfig -a
> eth0      Link encap:Ethernet  HWaddr 00:0E:2E:5C:27:EF
>           inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
>           inet6 addr: fe80::20e:2eff:fe5c:27ef/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:40986 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:32825 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:36081064 (34.4 MiB)  TX bytes:6457150 (6.1 MiB)
>           Interrupt:18 Base address:0xe00
>
> eth1      Link encap:Ethernet  HWaddr 00:1A:A0:38:8A:1B
>           inet addr:192.168.10.1  Bcast:192.168.10.255  Mask:255.255.255.0
>           inet6 addr: fe80::21a:a0ff:fe38:8a1b/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1220443 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:568452 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1477959893 (1.3 GiB)  TX bytes:144840881 (138.1 MiB)
>           Interrupt:17
>
> eth2      Link encap:Ethernet  HWaddr 00:0E:2E:5C:27:E6
>           inet addr:192.168.199.1  Bcast:192.168.199.255  Mask:255.255.255.0
>           inet6 addr: fe80::20e:2eff:fe5c:27e6/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1119 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1149 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:210401 (205.4 KiB)  TX bytes:505063 (493.2 KiB)
>           Interrupt:16 Base address:0x4f00
>
> gre0      Link encap:UNSPEC  HWaddr
> 00-00-00-00-FF-00-73-69-00-00-00-00-00-00-00-00
>           NOARP  MTU:1476  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> ifb0      Link encap:Ethernet  HWaddr 96:9F:C4:43:78:01
>           BROADCAST NOARP  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:32
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> ifb1      Link encap:Ethernet  HWaddr EA:B9:BD:02:C1:77
>           BROADCAST NOARP  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:32
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> ip6tnl0   Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           NOARP  MTU:1460  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:13546 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:13546 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:1051287 (1.0 MiB)  TX bytes:1051287 (1.0 MiB)
>
> sit0      Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-73-69-00-00-00-00-00-00-00-00
>           NOARP  MTU:1480  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> sixxs_t   Link encap:UNSPEC  HWaddr
> C0-A8-01-02-00-00-73-69-00-00-00-00-00-00-00-00
>           inet6 addr: 2001:15c0:65ff:64::2/64 Scope:Global
>           inet6 addr: fe80::c0a8:102/128 Scope:Link
>           UP POINTOPOINT RUNNING NOARP  MTU:1280  Metric:1
>           RX packets:145 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:145 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:15080 (14.7 KiB)  TX bytes:15080 (14.7 KiB)
>
> teql0     Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           NOARP  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:100
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> tun0      Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>           inet addr:192.168.11.1  P-t-P:192.168.11.2  Mask:255.255.255.255
>           UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:100
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
> tunl0     Link encap:IPIP Tunnel  HWaddr
>           NOARP  MTU:1480  Metric:1
>           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
>
root@fs:~# iptables -nvL
> Chain INPUT (policy ACCEPT 1207K packets, 1455M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:53
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:53
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:135
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:137
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:138
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:139
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:445
>     0     0 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               tcp dpt:993
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:135
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:137
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:138
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:139
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:445
>     0     0 DROP       udp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>               udp dpt:993
>
> Chain FORWARD (policy DROP 0 packets, 0 bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>     0     0 ACCEPT     all  --  tun0   *       0.0.0.0/0            0.0.0.0/0
>     0     0 ACCEPT     all  --  *      tun0    0.0.0.0/0            0.0.0.0/0
>     0     0 ACCEPT     all  --  eth2   eth1    192.168.199.15      
>     192.168.10.0/24
>     0     0 REJECT     all  --  eth2   *       0.0.0.0/0           
>     192.168.1.1         reject-with icmp-net-unreachable
>     0     0 REJECT     all  --  eth1   *       0.0.0.0/0           
>     192.168.1.1         reject-with icmp-net-unreachable
> 29650 5450K ACCEPT     all  --  eth1   eth0    192.168.10.0/24    
> !192.168.1.1
>   821  165K ACCEPT     all  --  eth2   eth0    192.168.199.0/24   
>   !192.168.1.1
>     0     0 ACCEPT     all  --  eth2   eth1    192.168.199.0/24    
>     192.168.10.5
>   624  438K ACCEPT     tcp  --  *      eth2    0.0.0.0/0           
>   192.168.199.0/24    tcp flags:!0x17/0x02
> 33228   34M ACCEPT     tcp  --  *      eth1    0.0.0.0/0           
> 192.168.10.0/24     tcp flags:!0x17/0x02
>   199 14395 ACCEPT    !tcp  --  *      eth2    0.0.0.0/0           
>   192.168.199.0/24
>  2891  597K ACCEPT    !tcp  --  *      eth1    0.0.0.0/0           
>  192.168.10.0/24
>
> Chain OUTPUT (policy ACCEPT 546K packets, 100M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
root@fs:~# iptables -t nat -nvL
> Chain PREROUTING (policy DROP 701 packets, 94310 bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>     0     0 LOG        tcp  --  eth1   *       192.168.10.0/24    
>     !192.168.10.1        tcp dpt:25 LOG flags 0 level 4
>     0     0 DROP       tcp  --  eth1   *       192.168.10.0/24    
>     !192.168.10.1        tcp dpt:25
>     0     0 LOG        tcp  --  eth2   *       192.168.199.0/24   
>     !192.168.199.1       tcp dpt:25 LOG flags 0 level 4
>     0     0 DROP       tcp  --  eth2   *       192.168.199.0/24   
>     !192.168.199.1       tcp dpt:25
>     0     0 LOG        tcp  --  tun0   *       192.168.11.0/24    
>     !192.168.11.1        tcp dpt:25 LOG flags 0 level 4
>     0     0 DROP       tcp  --  tun0   *       192.168.11.0/24    
>     !192.168.11.1        tcp dpt:25
>     0     0 ACCEPT     all  --  tun0   *       0.0.0.0/0            0.0.0.0/0
>     0     0 ACCEPT     all  --  *      *       0.0.0.0/0           
>     192.168.11.0/24
>     0     0 ACCEPT     all  --  *      *       192.168.11.0/24      0.0.0.0/0
>     0     0 DROP       tcp  --  *      *       87.249.45.135        0.0.0.0/0
>               tcp dpts:20:22
>     0     0 ACCEPT     all  --  eth0   *       212.18.63.73        
>     192.168.1.2
>     0     0 ACCEPT     all  --  eth2   *       192.168.199.15      
>     192.168.10.0/24
>     0     0 DROP       all  --  eth2   *       192.168.199.0/24    
>     192.168.10.0/24
>     0     0 DROP       all  --  eth1   *       192.168.10.0/24     
>     192.168.199.0/24
>  5629  597K ACCEPT     all  --  eth1   *       192.168.10.0/24      0.0.0.0/0
>   262 26231 ACCEPT     all  --  eth2   *       192.168.199.0/24     0.0.0.0/0
>     0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpt:21 state NEW,ESTABLISHED
>     0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp spts:1024:65535 dpts:1024:65535 state
>     RELATED,ESTABLISHED
>     0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpts:20:21 flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  *      *       77.0.0.0/8          
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  *      *       87.0.0.0/8          
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  *      *       90.0.0.0/8          
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  *      *       95.0.0.0/8          
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  *      *       212.45.77.11        
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02
>     0     0 LOG        tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02 LOG flags 0 level 4
>     0     0 DROP       tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpt:22 flags:0x17/0x02
>    11   628 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>    192.168.1.2         tcp dpt:25 flags:0x17/0x02
>     2    96 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpt:80 flags:0x17/0x02
>     2   100 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpt:443 flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0           
>     192.168.1.2         tcp dpt:41414 flags:0x17/0x02
>  1197 58544 DROP       tcp  --  eth0   *       0.0.0.0/0            0.0.0.0/0
>            tcp flags:0x17/0x02
>     0     0 ACCEPT     tcp  --  eth0   *       0.0.0.0/0           
>     192.168.1.2         tcp dpts:1023:65500
>     6  2006 ACCEPT     udp  --  eth1   *       0.0.0.0             
>     255.255.255.255     udp spts:67:68 dpts:67:68
>     1   328 ACCEPT     udp  --  eth2   *       0.0.0.0             
>     255.255.255.255     udp spts:67:68 dpts:67:68
>
> Chain OUTPUT (policy ACCEPT 2685 packets, 196K bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>
> Chain POSTROUTING (policy ACCEPT 2656 packets, 186K bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>     1   124 ACCEPT     all  --  *      eth0    192.168.1.2         
>     212.18.63.73
>     0     0 ACCEPT     tcp  --  *      eth0    192.168.1.2          0.0.0.0/0
>               tcp spts:1024:65535 dpts:1024:65535 state RELATED,ESTABLISHED
>     0     0 ACCEPT     tcp  --  *      eth0    192.168.1.2          0.0.0.0/0
>               tcp spt:20 state RELATED,ESTABLISHED
>     0     0 SNAT       all  --  *      eth0    192.168.10.16/29     0.0.0.0/0
>               to:192.168.1.2
>     4   192 SNAT       all  --  *      eth0    192.168.10.24/29     0.0.0.0/0
>               to:192.168.1.2
>     0     0 SNAT       all  --  *      eth0    192.168.10.32/29     0.0.0.0/0
>               to:192.168.1.2
>     0     0 SNAT       all  --  *      eth0    192.168.10.40/29     0.0.0.0/0
>               to:192.168.1.2
>     0     0 SNAT       all  --  *      eth0    192.168.10.48/29     0.0.0.0/0
>               to:192.168.1.2
>     0     0 SNAT       all  --  *      eth0    192.168.10.56/29     0.0.0.0/0
>               to:192.168.1.2
>     0     0 SNAT       all  --  *      eth0    192.168.10.64/28     0.0.0.0/0
>               to:192.168.1.2
>  4893  516K SNAT       all  --  *      eth0    192.168.10.0/24      0.0.0.0/0
>            to:192.168.1.2
>   211 16721 SNAT       all  --  *      eth0    192.168.199.0/24     0.0.0.0/0
>             to:192.168.1.2
>    24  8328 ACCEPT     udp  --  *      eth1    192.168.10.1        
>    255.255.255.255     udp spts:67:68 dpts:67:68
>     4  1388 ACCEPT     udp  --  *      eth2    192.168.199.1       
>     255.255.255.255     udp spts:67:68 dpts:67:68
>     0     0 DROP       all  --  *      *       0.0.0.0              0.0.0.0/0
root@fs:~# iptables -t mangle -nvL
> Chain PREROUTING (policy ACCEPT 1287K packets, 1508M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>
> Chain INPUT (policy ACCEPT 1212K packets, 1461M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>
> Chain FORWARD (policy ACCEPT 73581 packets, 47M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>
> Chain OUTPUT (policy ACCEPT 548K packets, 100M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
>
> Chain POSTROUTING (policy ACCEPT 622K packets, 147M bytes)
>  pkts bytes target     prot opt in     out     source              
>  destination
root@fs:~# ip route
> 192.168.11.2 dev tun0  proto kernel  scope link  src 192.168.11.1
> 192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.2
> 192.168.199.0/24 dev eth2  proto kernel  scope link  src 192.168.199.1
> 192.168.11.0/24 via 192.168.11.2 dev tun0
> 192.168.10.0/24 dev eth1  proto kernel  scope link  src 192.168.10.1
> 127.0.0.0/8 dev lo  scope link
> default via 192.168.1.1 dev eth0  metric 1
root@fs:~# ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: on
root@fs:~# ethtool -k eth1
> Offload parameters for eth1:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: on
> udp fragmentation offload: off
> generic segmentation offload: on
root@fs:~# ethtool -k eth2
> Offload parameters for eth2:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: on
root@fs:~# uname -a
> Linux fs 2.6.34.4-FS #1 SMP Thu Aug 19 18:09:58 UTC 2010 i686 Intel(R)
> Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
Comment 13 Plamen Petrov 2010-08-20 07:38:52 UTC
На 20.8.2010 г. 09:34, Eric Dumazet написа:
> Le vendredi 20 août 2010 à 09:26 +0300, Plamen Petrov a écrit :
>> Posting this only in the hope that it will be helpfull to the
>> "net guys"...
>
> Its a forwarding setup.
>
> Please post
>
> ifconfig -a
> iptables -nvL
> iptables -t nat -nvL
> iptables -t mangle -nvL
> ip route
> ethtool -k eth0   (&  eth1 ...)
>
> Try to disable gro ?
> ethtool -K eth0 gro off
> ethtool -K eth1 gro off
>
>
Sorry about this, but when I dig a bit into this, I found that
I had a pretty old ethtool:
> root@fs:~# ethtool --help
> ethtool version 5
> Usage:
> ethtool DEVNAME Display standard information about device
> ...

So my ethtool did not know anything about gro, because by
looking at
http://git.kernel.org/?p=network/ethtool/ethtool.git;a=shortlog
I saw I had this one:
2006-09-01	Jeff Garzik	Release version 5. origin v5
and gro reading/setting was added "a bit later"...
2009-03-06	Jeff Garzik	Get/set GRO settings.

So, I went on and compiled myself a new ethtool:
> root@fs:~# ethtool -h
> ethtool version 2.6.35
> Usage:
> ethtool DEVNAME Display standard information about device
> ...
And run it on my 2.6.34.4 kernel, which produced these:
>
root@fs:~# ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: off
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off
root@fs:~# ethtool -k eth1
> Offload parameters for eth1:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off
root@fs:~# ethtool -k eth2
> Offload parameters for eth2:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: off
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: off
> large-receive-offload: off
> ntuple-filters: off
> receive-hashing: off
root@fs:~#
>
But again, this is on 2.6.34.4. Could it be that
"generic-receive-offload" is ON by default on my
only gigabit capable tg3 card on 2.6.35.x and later?

I will try to determine this myself later, but the
machine is a gateway on the local area network, and
is needed from the user here...

Thanks!
Comment 14 Plamen Petrov 2010-08-20 08:31:59 UTC
На 20.8.2010 г. 10:38, Plamen Petrov написа:
> На 20.8.2010 г. 09:34, Eric Dumazet написа:
>> > Le vendredi 20 août 2010 à 09:26 +0300, Plamen Petrov a écrit :
>>> >> Posting this only in the hope that it will be helpfull to the
>>> >> "net guys"...
>> >
...
>> >
> But again, this is on 2.6.34.4. Could it be that
> "generic-receive-offload" is ON by default on my
> only gigabit capable tg3 card on 2.6.35.x and later?
>

Well, as it turns out - the commit enabling GRO for
tg3 cards was from the 2.6.35 merge window:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=cb903bf4ee2d6e53210e2174d363e10698112042

> tg3: Enable GRO by default.author     David S. Miller <davem@davemloft.net>   
>       Wed, 21 Apr 2010 01:49:45 +0000 (18:49 -0700)
> committer     David S. Miller <davem@davemloft.net>   
>       Wed, 21 Apr 2010 01:49:45 +0000 (18:49 -0700)
>
> This was merely an oversight when I added the *_gro_receive()
> calls.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>

The next thing I will be trying is disabling
GRO for the tg3 network card in my network
startup scripts via my shiny new ethtool... ;)
Comment 15 Plamen Petrov 2010-08-20 09:19:46 UTC
На 20.8.2010 г. 11:31, Plamen Petrov написа:
...
> The next thing I will be trying is disabling
> GRO for the tg3 network card in my network
> startup scripts via my shiny new ethtool... ;)

So, for nearly half an hour now running 2.6.36-rc1-FS-00127-g763008c:

>
root@fs:~# w; uname -a
>  12:09:48 up 26 min,  1 user,  load average: 2.43, 1.93, 1.26
> USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> root     pts/0    192.168.10.159   12:09    0.00s  0.02s  0.00s w
> Linux fs 2.6.36-rc1-FS-00127-g763008c #1 SMP Thu Aug 19 07:10:57 UTC 2010
> i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux

And that's with the usual load, which can be found here:
http://pvp.now.im/s/drraw.cgi?Mode=view;Dashboard=1173713287.13177

The only addition I made was running
	ethtool -K ethX gro off
for each network card.

Which means that I have a Broadcom Tygon 3 (tg3) which doesn't
like "generic-receive-offload" at all!

With "generic-receive-offload" off, I will stay on
2.6.36-rc1-FS-00127-g763008c for as long as I can, just to
confirm that in fact the "default on" setting for gro in the
tg3 driver is the culprit for my troubles...

Where to next?
Comment 16 Eric Dumazet 2010-08-20 10:26:55 UTC
Le vendredi 20 août 2010 à 12:19 +0300, Plamen Petrov a écrit :
> На 20.8.2010 г. 11:31, Plamen Petrov написа:
> ...
> > The next thing I will be trying is disabling
> > GRO for the tg3 network card in my network
> > startup scripts via my shiny new ethtool... ;)
> 
> So, for nearly half an hour now running 2.6.36-rc1-FS-00127-g763008c:
> 
> >
> root@fs:~# w; uname -a
> >  12:09:48 up 26 min,  1 user,  load average: 2.43, 1.93, 1.26
> > USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> > root     pts/0    192.168.10.159   12:09    0.00s  0.02s  0.00s w
> > Linux fs 2.6.36-rc1-FS-00127-g763008c #1 SMP Thu Aug 19 07:10:57 UTC 2010
> i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
> 
> And that's with the usual load, which can be found here:
> http://pvp.now.im/s/drraw.cgi?Mode=view;Dashboard=1173713287.13177
> 
> The only addition I made was running
>       ethtool -K ethX gro off
> for each network card.
> 
> Which means that I have a Broadcom Tygon 3 (tg3) which doesn't
> like "generic-receive-offload" at all!
> 
> With "generic-receive-offload" off, I will stay on
> 2.6.36-rc1-FS-00127-g763008c for as long as I can, just to
> confirm that in fact the "default on" setting for gro in the
> tg3 driver is the culprit for my troubles...
> 
> Where to next?


First, thanks a lot for finding this so fast.

Hmmm, please check that on your net/core/dev.c file, line 3146 you have
the fix added in 

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e5093aec2e6b60c3df2420057ffab9ed4a6d2792

commit e5093aec2e6b60c3df2420057ffab9ed4a6d2792
Author: Jarek Poplawski <jarkao2@gmail.com>
Date:   Wed Aug 11 02:02:10 2010 +0000

    net: Fix a memmove bug in dev_gro_receive()
    


	--skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));



If yes, I think David & Herbert will probably have to dig again into gro
code :(
Comment 17 Plamen Petrov 2010-08-20 10:54:06 UTC
На 20.8.2010 г. 13:26, Eric Dumazet написа:
> Le vendredi 20 août 2010 à 12:19 +0300, Plamen Petrov a écrit :
>> На 20.8.2010 г. 11:31, Plamen Petrov написа:
>> ...
...
>>
>> Where to next?
>
>
> First, thanks a lot for finding this so fast.
For nothing.
>
> Hmmm, please check that on your net/core/dev.c file, line 3146 you have
> the fix added in
>
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e5093aec2e6b60c3df2420057ffab9ed4a6d2792
>
> commit e5093aec2e6b60c3df2420057ffab9ed4a6d2792
> Author: Jarek Poplawski<jarkao2@gmail.com>
> Date:   Wed Aug 11 02:02:10 2010 +0000
>
>      net: Fix a memmove bug in dev_gro_receive()
>
>
>
>       --skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));
>

Yes, I confirm I have the above code in place.

>
> If yes, I think David&  Herbert will probably have to dig again into gro
> code :(
>

So, I guess its David and Herbert's turn?...

Thanks!
Plamen
Comment 18 Jarek Poplawski 2010-08-20 19:39:19 UTC
Plamen Petrov wrote, On 20.08.2010 12:53:
> So, I guess its David and Herbert's turn?...

If you're bored in the meantime I'd suggest to do check the realtek
driver eg:
- for locking with the patch below,
- to turn off with ethtool its tx-checksumming and/or scatter-gather,
or if possible try to reproduce this with other NIC.

Thanks,
Jarek P.
---

diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
index f5166dc..aaaccc5 100644
--- a/drivers/net/8139too.c
+++ b/drivers/net/8139too.c
@@ -1692,6 +1692,8 @@ static netdev_tx_t rtl8139_start_xmit (struct sk_buff *skb,
 	unsigned int len = skb->len;
 	unsigned long flags;
 
+	spin_lock_irqsave(&tp->lock, flags);
+
 	/* Calculate the next Tx descriptor entry. */
 	entry = tp->cur_tx % NUM_TX_DESC;
 
@@ -1700,14 +1702,14 @@ static netdev_tx_t rtl8139_start_xmit (struct sk_buff *skb,
 		if (len < ETH_ZLEN)
 			memset(tp->tx_buf[entry], 0, ETH_ZLEN);
 		skb_copy_and_csum_dev(skb, tp->tx_buf[entry]);
-		dev_kfree_skb(skb);
+		dev_kfree_skb_irq(skb);
 	} else {
+		spin_unlock_irqrestore(&tp->lock, flags);
 		dev_kfree_skb(skb);
 		dev->stats.tx_dropped++;
 		return NETDEV_TX_OK;
 	}
 
-	spin_lock_irqsave(&tp->lock, flags);
 	/*
 	 * Writing to TxStatus triggers a DMA transfer of the data
 	 * copied to tp->tx_buf[entry] above. Use a memory barrier
Comment 19 Jarek Poplawski 2010-08-21 07:48:29 UTC
On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> Plamen Petrov wrote, On 20.08.2010 12:53:
> > So, I guess its David and Herbert's turn?...
> 
> If you're bored in the meantime I'd suggest to do check the realtek
> driver eg:
> - for locking with the patch below,
> - to turn off with ethtool its tx-checksumming and/or scatter-gather,

After rethinking, it's almost impossible this patch could change
anything here, so don't bother, but consider mainly the second
proposal.

Jarek P.
Comment 20 Eric Dumazet 2010-08-21 07:57:25 UTC
Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> > Plamen Petrov wrote, On 20.08.2010 12:53:
> > > So, I guess its David and Herbert's turn?...
> > 
> > If you're bored in the meantime I'd suggest to do check the realtek
> > driver eg:
> > - for locking with the patch below,
> > - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> 
> After rethinking, it's almost impossible this patch could change
> anything here, so don't bother, but consider mainly the second
> proposal.
> 
> Jarek P.

Indeed ;)

Its true that not many nics use the skb_copy_and_csum_dev() helper,
maybe this one must be updated somehow ?
Comment 21 Jarek Poplawski 2010-08-21 08:08:30 UTC
On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
> Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
> > On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> > > Plamen Petrov wrote, On 20.08.2010 12:53:
> > > > So, I guess its David and Herbert's turn?...
> > > 
> > > If you're bored in the meantime I'd suggest to do check the realtek
> > > driver eg:
> > > - for locking with the patch below,
> > > - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> > 
> > After rethinking, it's almost impossible this patch could change
> > anything here, so don't bother, but consider mainly the second
> > proposal.
> > 
> > Jarek P.
> 
> Indeed ;)
> 
> Its true that not many nics use the skb_copy_and_csum_dev() helper,
> maybe this one must be updated somehow ?
> 
Yes, it seems it should be possible at least to handle the bug with
a warning and error return, considering Plamen's problems with getting
the trace.

Jarek P.
Comment 22 Plamen Petrov 2010-08-23 11:48:03 UTC
На 21.8.2010 г. 11:07, Jarek Poplawski написа:
> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
>> Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
>>>>> So, I guess its David and Herbert's turn?...
>>>>
>>>> If you're bored in the meantime I'd suggest to do check the realtek
>>>> driver eg:
>>>> - for locking with the patch below,
>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
>>>
>>> After rethinking, it's almost impossible this patch could change
>>> anything here, so don't bother, but consider mainly the second
>>> proposal.
>>>
>>> Jarek P.
>>
>> Indeed ;)
>>
>> Its true that not many nics use the skb_copy_and_csum_dev() helper,
>> maybe this one must be updated somehow ?
>>
> Yes, it seems it should be possible at least to handle the bug with
> a warning and error return, considering Plamen's problems with getting
> the trace.
>
> Jarek P.

Well, here is the current status:

Last I promised I will stay on 2.6.36-rc1-git for as long as possible,
so here is what I achieved:

>
root@fs:/boot# w; uname -a
>  12:08:18 up 3 days, 24 min,  1 user,  load average: 1.21, 1.29, 1.17
> USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> root     pts/0    192.168.10.159   12:04    0.00s  0.02s  0.00s w
> Linux fs 2.6.36-rc1-FS-00127-g763008c #1 SMP Thu Aug 19 07:10:57 UTC 2010
> i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux

Yeah, 3 days and counting, right until I decided to try the freshly
announced 2.6.36-rc2.

So I upgraded the kernel, but left the scripts that turn GRO off for
the tg3 card still run at system startup. This way the system ran for
2 and a half hours, when I decided its time to try turning GRO on.

I first tried to turn GRO on for the tg3 nic, and the system oopsed
immediately (if the panic screen is necessary - please, ask for it).

After the system came back, I tried turning GRO on for the 2 RealTek
8139 nics, too, but ethtool only accepted turning GRO off.

And unfortunately, I can't test if other nics will fail the same way
as the motherboard integrated tg3 I have does, so for now, this is
only a tg3 + GRO on problem; I don't have any other hardware to test
with available.

Thanks,
Plamen
Comment 23 Eric Dumazet 2010-08-23 12:36:48 UTC
Le lundi 23 août 2010 à 14:47 +0300, Plamen Petrov a écrit :

> Well, here is the current status:
> 
> Last I promised I will stay on 2.6.36-rc1-git for as long as possible,
> so here is what I achieved:
> 
> >
> root@fs:/boot# w; uname -a
> >  12:08:18 up 3 days, 24 min,  1 user,  load average: 1.21, 1.29, 1.17
> > USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> > root     pts/0    192.168.10.159   12:04    0.00s  0.02s  0.00s w
> > Linux fs 2.6.36-rc1-FS-00127-g763008c #1 SMP Thu Aug 19 07:10:57 UTC 2010
> i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
> 
> Yeah, 3 days and counting, right until I decided to try the freshly
> announced 2.6.36-rc2.
> 
> So I upgraded the kernel, but left the scripts that turn GRO off for
> the tg3 card still run at system startup. This way the system ran for
> 2 and a half hours, when I decided its time to try turning GRO on.
> 
> I first tried to turn GRO on for the tg3 nic, and the system oopsed
> immediately (if the panic screen is necessary - please, ask for it).
> 
> After the system came back, I tried turning GRO on for the 2 RealTek
> 8139 nics, too, but ethtool only accepted turning GRO off.
> 
> And unfortunately, I can't test if other nics will fail the same way
> as the motherboard integrated tg3 I have does, so for now, this is
> only a tg3 + GRO on problem; I don't have any other hardware to test
> with available.

There was no change in latest kernel in this area.

Should you have only tg3 cards, I guess there would be no bug.

Bug is probably a combination of :

1) tg3 + GRO , or any card enabling GRO
2) Some network code (netfilter ?)
3) a 8139too, or any card calling skb_copy_and_csum_dev()
Comment 24 Jarek Poplawski 2010-08-23 12:48:20 UTC
On Mon, Aug 23, 2010 at 02:47:23PM +0300, Plamen Petrov wrote:
> ???? 21.8.2010 ??. 11:07, Jarek Poplawski ????????????:
>> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
>>> Le samedi 21 ao??t 2010 ?? 09:47 +0200, Jarek Poplawski a écrit :
>>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
>>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
>>>>>> So, I guess its David and Herbert's turn?...
>>>>>
>>>>> If you're bored in the meantime I'd suggest to do check the realtek
>>>>> driver eg:
>>>>> - for locking with the patch below,
>>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
...
> Yeah, 3 days and counting, right until I decided to try the freshly
> announced 2.6.36-rc2.
>
> So I upgraded the kernel, but left the scripts that turn GRO off for
> the tg3 card still run at system startup. This way the system ran for
> 2 and a half hours, when I decided its time to try turning GRO on.
>
> I first tried to turn GRO on for the tg3 nic, and the system oopsed
> immediately (if the panic screen is necessary - please, ask for it).
>
> After the system came back, I tried turning GRO on for the 2 RealTek
> 8139 nics, too, but ethtool only accepted turning GRO off.
>
> And unfortunately, I can't test if other nics will fail the same way
> as the motherboard integrated tg3 I have does, so for now, this is
> only a tg3 + GRO on problem; I don't have any other hardware to test
> with available.

A little misunderstanding: I was intersted with turning off some
features on realteks to change the packet path from tg3 with gro
to realtek without gro and without tx-checksumming etc.

But maybe you could try the patch below instead (so the patched
kernel, tg3 with gro on, and realteks without any change).

Thanks,
Jarek P.

--- (for debugging only)

diff --git a/net/core/dev.c b/net/core/dev.c
index 3721fbb..51823cd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
 					      illegal_highdma(dev, skb))));
 }
 
+static int skb_csum_start_bug(struct sk_buff *skb)
+{
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		long csstart;
+
+		csstart = skb->csum_start - skb_headroom(skb);
+		if (WARN_ON(csstart > skb_headlen(skb))) {
+			pr_warning("csum_start %d, headroom %d, headlen %d\n",
+				   skb->csum_start, skb_headroom(skb),
+				   skb_headlen(skb));
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		skb_orphan_try(skb);
 
 		if (netif_needs_gso(dev, skb)) {
+			skb_csum_start_bug(skb);
 			if (unlikely(dev_gso_segment(skb)))
 				goto out_kfree_skb;
 			if (skb->next)
 				goto gso;
 		} else {
+			skb_csum_start_bug(skb);
 			if (skb_needs_linearize(skb, dev) &&
 			    __skb_linearize(skb))
 				goto out_kfree_skb;
@@ -1997,7 +2016,12 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
-		rc = ops->ndo_start_xmit(nskb, dev);
+		if (skb_csum_start_bug(skb)) {
+			kfree_skb(skb);
+			rc = NETDEV_TX_OK;
+		} else
+			rc = ops->ndo_start_xmit(nskb, dev);
+
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
Comment 25 Eric Dumazet 2010-08-23 13:02:04 UTC
Le lundi 23 août 2010 à 12:47 +0000, Jarek Poplawski a écrit :
> On Mon, Aug 23, 2010 at 02:47:23PM +0300, Plamen Petrov wrote:
> > ???? 21.8.2010 ??. 11:07, Jarek Poplawski ????????????:
> >> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
> >>> Le samedi 21 ao??t 2010 ?? 09:47 +0200, Jarek Poplawski a écrit :
> >>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> >>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
> >>>>>> So, I guess its David and Herbert's turn?...
> >>>>>
> >>>>> If you're bored in the meantime I'd suggest to do check the realtek
> >>>>> driver eg:
> >>>>> - for locking with the patch below,
> >>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> ...
> > Yeah, 3 days and counting, right until I decided to try the freshly
> > announced 2.6.36-rc2.
> >
> > So I upgraded the kernel, but left the scripts that turn GRO off for
> > the tg3 card still run at system startup. This way the system ran for
> > 2 and a half hours, when I decided its time to try turning GRO on.
> >
> > I first tried to turn GRO on for the tg3 nic, and the system oopsed
> > immediately (if the panic screen is necessary - please, ask for it).
> >
> > After the system came back, I tried turning GRO on for the 2 RealTek
> > 8139 nics, too, but ethtool only accepted turning GRO off.
> >
> > And unfortunately, I can't test if other nics will fail the same way
> > as the motherboard integrated tg3 I have does, so for now, this is
> > only a tg3 + GRO on problem; I don't have any other hardware to test
> > with available.
> 
> A little misunderstanding: I was intersted with turning off some
> features on realteks to change the packet path from tg3 with gro
> to realtek without gro and without tx-checksumming etc.
> 
> But maybe you could try the patch below instead (so the patched
> kernel, tg3 with gro on, and realteks without any change).
> 
> Thanks,
> Jarek P.
> 
> --- (for debugging only)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3721fbb..51823cd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff
> *skb,
>                                             illegal_highdma(dev, skb))));
>  }
>  
> +static int skb_csum_start_bug(struct sk_buff *skb)
> +{
> +
> +     if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +             long csstart;
> +
> +             csstart = skb->csum_start - skb_headroom(skb);
> +             if (WARN_ON(csstart > skb_headlen(skb))) {
> +                     pr_warning("csum_start %d, headroom %d, headlen %d\n",
> +                                skb->csum_start, skb_headroom(skb),
> +                                skb_headlen(skb));

I was about to suggest a similar patch ;)

Also prints skb->csum_offset and skb->len if possible

pr_err("csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
        skb->csum_start,
	skb->csum_offset,
	skb_headroom(skb),
        skb_headlen(skb),
	skb->len);


> +                     return 1;
> +             }
> +     }
> +     return 0;
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>                       struct netdev_queue *txq)
>  {
> @@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>               skb_orphan_try(skb);
>  
>               if (netif_needs_gso(dev, skb)) {
> +                     skb_csum_start_bug(skb);
>                       if (unlikely(dev_gso_segment(skb)))
>                               goto out_kfree_skb;
>                       if (skb->next)
>                               goto gso;
>               } else {
> +                     skb_csum_start_bug(skb);
>                       if (skb_needs_linearize(skb, dev) &&
>                           __skb_linearize(skb))
>                               goto out_kfree_skb;
> @@ -1997,7 +2016,12 @@ gso:
>               if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>                       skb_dst_drop(nskb);
>  
> -             rc = ops->ndo_start_xmit(nskb, dev);
> +             if (skb_csum_start_bug(skb)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(nskb, dev);
> +
>               if (unlikely(rc != NETDEV_TX_OK)) {
>                       if (rc & ~NETDEV_TX_MASK)
>                               goto out_kfree_gso_skb;
Comment 26 Jarek Poplawski 2010-08-23 13:11:55 UTC
On Mon, Aug 23, 2010 at 03:00:43PM +0200, Eric Dumazet wrote:
...
> I was about to suggest a similar patch ;)
> 
> Also prints skb->csum_offset and skb->len if possible

Feel free to send it: I'm a bit in hurry now...

Jarek P.
Comment 27 Plamen Petrov 2010-08-23 13:44:13 UTC
На 23.8.2010 г. 16:10, Jarek Poplawski написа:
> On Mon, Aug 23, 2010 at 03:00:43PM +0200, Eric Dumazet wrote:
> ...
>> I was about to suggest a similar patch ;)
>>
>> Also prints skb->csum_offset and skb->len if possible
>
> Feel free to send it: I'm a bit in hurry now...
>
> Jarek P.

Currently compiling the kernel with the attached patch.

And, by the way, if there are any patches to follow, would you
please, send them as attachments? I think my mail client is
line-wrapping the patches...

The results will be sent soon, too.

Plamen
Comment 28 Plamen Petrov 2010-08-23 14:06:13 UTC
На 23.8.2010 г. 16:43, Plamen Petrov написа:
> На 23.8.2010 г. 16:10, Jarek Poplawski написа:
>> On Mon, Aug 23, 2010 at 03:00:43PM +0200, Eric Dumazet wrote:
>> ...
>>> I was about to suggest a similar patch ;)
>>>
>>> Also prints skb->csum_offset and skb->len if possible
>>
>> Feel free to send it: I'm a bit in hurry now...
>>
>> Jarek P.
>
> Currently compiling the kernel with the attached patch.
>
> And, by the way, if there are any patches to follow, would you
> please, send them as attachments? I think my mail client is
> line-wrapping the patches...
>
> The results will be sent soon, too.

Well, so far - so good. No oopses.

I think I'm hitting some compiler related issue here...

The kernel with the patch I sent applied is working pretty fine
even with "generic-receive-offload" ON for my tg3 nic...

What is the big difference there?

Or do we need to dig into the object files now? Because that's
what's too scary for me...

Plamen

P.S. The compiler I am using is :
root@crux:~# gcc -v
> Using built-in specs.
> Target: i686-pc-linux-gnu
> Configured with: ../gcc-4.4.4/configure --prefix=/usr --libexecdir=/usr/lib
> --enable-languages=c,c++,objc --enable-threads=posix --enable-__cxa_atexit
> --enable-clocale=gnu --enable-shared --disable-nls --with-x=no
> Thread model: posix
> gcc version 4.4.4 (CRUX) (GCC)
Anything obviously special about it?
Comment 29 Jarek Poplawski 2010-08-23 14:15:21 UTC
On Mon, Aug 23, 2010 at 05:05:33PM +0300, Plamen Petrov wrote:
> Well, so far - so good. No oopses.
> 
> I think I'm hitting some compiler related issue here...
> 
> The kernel with the patch I sent applied is working pretty fine
> even with "generic-receive-offload" ON for my tg3 nic...
> 
> What is the big difference there?

Oopses should be replaced with warnings, so please check the
syslog etc from time to time.

Jarek P.
Comment 30 Eric Dumazet 2010-08-24 05:21:37 UTC
Le lundi 23 août 2010 à 14:47 +0300, Plamen Petrov a écrit :

> Well, here is the current status:
> 
> Last I promised I will stay on 2.6.36-rc1-git for as long as possible,
> so here is what I achieved:
> 
> >
> root@fs:/boot# w; uname -a
> >  12:08:18 up 3 days, 24 min,  1 user,  load average: 1.21, 1.29, 1.17
> > USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> > root     pts/0    192.168.10.159   12:04    0.00s  0.02s  0.00s w
> > Linux fs 2.6.36-rc1-FS-00127-g763008c #1 SMP Thu Aug 19 07:10:57 UTC 2010
> i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
> 
> Yeah, 3 days and counting, right until I decided to try the freshly
> announced 2.6.36-rc2.
> 
> So I upgraded the kernel, but left the scripts that turn GRO off for
> the tg3 card still run at system startup. This way the system ran for
> 2 and a half hours, when I decided its time to try turning GRO on.
> 
> I first tried to turn GRO on for the tg3 nic, and the system oopsed
> immediately (if the panic screen is necessary - please, ask for it).
> 
> After the system came back, I tried turning GRO on for the 2 RealTek
> 8139 nics, too, but ethtool only accepted turning GRO off.
> 
> And unfortunately, I can't test if other nics will fail the same way
> as the motherboard integrated tg3 I have does, so for now, this is
> only a tg3 + GRO on problem; I don't have any other hardware to test
> with available.

There was no change in latest kernel in this area.

Should you have only tg3 cards, I guess there would be no bug.

Bug is probably a combination of :

1) tg3 + GRO , or any card enabling GRO
2) Some network code (netfilter ?)
3) a 8139too, or any card calling skb_copy_and_csum_dev()
Comment 31 Plamen Petrov 2010-08-24 07:07:51 UTC
На 23.8.2010 г. 17:14, Jarek Poplawski написа:
> On Mon, Aug 23, 2010 at 05:05:33PM +0300, Plamen Petrov wrote:
>> Well, so far - so good. No oopses.
>>
>> I think I'm hitting some compiler related issue here...
>>
>> The kernel with the patch I sent applied is working pretty fine
>> even with "generic-receive-offload" ON for my tg3 nic...
>>
>> What is the big difference there?
>
> Oopses should be replaced with warnings, so please check the
> syslog etc from time to time.
>
> Jarek P.

So far - not so good. :(

After I left the machine, i rebooted once. Then I logged in
via ssh, and turned gro back ON for the tg3 nic - and in the
next 30 minutes - another reboot.

Today I will try to capture a picture of the oops/warning
with my phone camera, because contrary to what you suggest,
Jarek, there are no messages in the system logs.

Plamen
Comment 32 Plamen Petrov 2010-08-24 07:08:12 UTC
На 23.8.2010 г. 17:14, Jarek Poplawski написа:
> On Mon, Aug 23, 2010 at 05:05:33PM +0300, Plamen Petrov wrote:
>> Well, so far - so good. No oopses.
>>
>> I think I'm hitting some compiler related issue here...
>>
>> The kernel with the patch I sent applied is working pretty fine
>> even with "generic-receive-offload" ON for my tg3 nic...
>>
>> What is the big difference there?
>
> Oopses should be replaced with warnings, so please check the
> syslog etc from time to time.
>
> Jarek P.

So far - not so good. :(

After I left the machine, i rebooted once. Then I logged in
via ssh, and turned gro back ON for the tg3 nic - and in the
next 30 minutes - another reboot.

Today I will try to capture a picture of the oops/warning
with my phone camera, because contrary to what you suggest,
Jarek, there are no messages in the system logs.

Plamen
Comment 33 Plamen Petrov 2010-08-24 07:08:17 UTC
На 24.8.2010 г. 07:51, Plamen Petrov написа:
> На 23.8.2010 г. 17:14, Jarek Poplawski написа:
>> On Mon, Aug 23, 2010 at 05:05:33PM +0300, Plamen Petrov wrote:
>>> Well, so far - so good. No oopses.
>>>

...

>> Oopses should be replaced with warnings, so please check the
>> syslog etc from time to time.
>>
>> Jarek P.
>
> So far - not so good. :(
>
> After I left the machine, i rebooted once. Then I logged in
> via ssh, and turned gro back ON for the tg3 nic - and in the
> next 30 minutes - another reboot.
>
> Today I will try to capture a picture of the oops/warning
> with my phone camera,

And here is what I've got:

[picture 7]
http://picpaste.com/31d6a54fec9e87de0d1550ee02d9c336.jpg

[picture 8]
http://picpaste.com/02db6ad8abec6281065328fb52d328cf.jpg

[picture 9]
http://picpaste.com/9fbaaa14c679f57c82e96884d3274090.jpg

Sorry for the really bad quality, even for a phone, but the problem
is that I don't know when its going to happen, so... you see
the results.

  because contrary to what you suggest,
> Jarek, there are no messages in the system logs.
>
> Plamen

Ideas?

Thanks,
Plamen
Comment 34 Eric Dumazet 2010-08-24 07:08:29 UTC
Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> > Plamen Petrov wrote, On 20.08.2010 12:53:
> > > So, I guess its David and Herbert's turn?...
> > 
> > If you're bored in the meantime I'd suggest to do check the realtek
> > driver eg:
> > - for locking with the patch below,
> > - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> 
> After rethinking, it's almost impossible this patch could change
> anything here, so don't bother, but consider mainly the second
> proposal.
> 
> Jarek P.

Indeed ;)

Its true that not many nics use the skb_copy_and_csum_dev() helper,
maybe this one must be updated somehow ?
Comment 35 Jarek Poplawski 2010-08-24 07:08:38 UTC
On Tue, Aug 24, 2010 at 08:19:37AM +0300, Plamen Petrov wrote:
...
> And here is what I've got:
>
> [picture 7]
> http://picpaste.com/31d6a54fec9e87de0d1550ee02d9c336.jpg
>
> [picture 8]
> http://picpaste.com/02db6ad8abec6281065328fb52d328cf.jpg
>
> [picture 9]
> http://picpaste.com/9fbaaa14c679f57c82e96884d3274090.jpg
>
> Sorry for the really bad quality, even for a phone, but the problem
> is that I don't know when its going to happen, so... you see
> the results.
>
>  because contrary to what you suggest,
>> Jarek, there are no messages in the system logs.

I'm extremely sorry: I missed 1 place. Anyway, it's very helpful too.

>>
>> Plamen
>
> Ideas?

Try Eric's patch, and later maybe this one.

Thanks,
Jarek P.
Comment 36 Eric Dumazet 2010-08-24 07:08:43 UTC
Le lundi 23 août 2010 à 12:47 +0000, Jarek Poplawski a écrit :
> On Mon, Aug 23, 2010 at 02:47:23PM +0300, Plamen Petrov wrote:
> > ???? 21.8.2010 ??. 11:07, Jarek Poplawski ????????????:
> >> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
> >>> Le samedi 21 ao??t 2010 ?? 09:47 +0200, Jarek Poplawski a écrit :
> >>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> >>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
> >>>>>> So, I guess its David and Herbert's turn?...
> >>>>>
> >>>>> If you're bored in the meantime I'd suggest to do check the realtek
> >>>>> driver eg:
> >>>>> - for locking with the patch below,
> >>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> ...
> > Yeah, 3 days and counting, right until I decided to try the freshly
> > announced 2.6.36-rc2.
> >
> > So I upgraded the kernel, but left the scripts that turn GRO off for
> > the tg3 card still run at system startup. This way the system ran for
> > 2 and a half hours, when I decided its time to try turning GRO on.
> >
> > I first tried to turn GRO on for the tg3 nic, and the system oopsed
> > immediately (if the panic screen is necessary - please, ask for it).
> >
> > After the system came back, I tried turning GRO on for the 2 RealTek
> > 8139 nics, too, but ethtool only accepted turning GRO off.
> >
> > And unfortunately, I can't test if other nics will fail the same way
> > as the motherboard integrated tg3 I have does, so for now, this is
> > only a tg3 + GRO on problem; I don't have any other hardware to test
> > with available.
> 
> A little misunderstanding: I was intersted with turning off some
> features on realteks to change the packet path from tg3 with gro
> to realtek without gro and without tx-checksumming etc.
> 
> But maybe you could try the patch below instead (so the patched
> kernel, tg3 with gro on, and realteks without any change).
> 
> Thanks,
> Jarek P.
> 
> --- (for debugging only)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3721fbb..51823cd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff
> *skb,
>                                             illegal_highdma(dev, skb))));
>  }
>  
> +static int skb_csum_start_bug(struct sk_buff *skb)
> +{
> +
> +     if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +             long csstart;
> +
> +             csstart = skb->csum_start - skb_headroom(skb);
> +             if (WARN_ON(csstart > skb_headlen(skb))) {
> +                     pr_warning("csum_start %d, headroom %d, headlen %d\n",
> +                                skb->csum_start, skb_headroom(skb),
> +                                skb_headlen(skb));

I was about to suggest a similar patch ;)

Also prints skb->csum_offset and skb->len if possible

pr_err("csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
        skb->csum_start,
	skb->csum_offset,
	skb_headroom(skb),
        skb_headlen(skb),
	skb->len);


> +                     return 1;
> +             }
> +     }
> +     return 0;
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>                       struct netdev_queue *txq)
>  {
> @@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>               skb_orphan_try(skb);
>  
>               if (netif_needs_gso(dev, skb)) {
> +                     skb_csum_start_bug(skb);
>                       if (unlikely(dev_gso_segment(skb)))
>                               goto out_kfree_skb;
>                       if (skb->next)
>                               goto gso;
>               } else {
> +                     skb_csum_start_bug(skb);
>                       if (skb_needs_linearize(skb, dev) &&
>                           __skb_linearize(skb))
>                               goto out_kfree_skb;
> @@ -1997,7 +2016,12 @@ gso:
>               if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>                       skb_dst_drop(nskb);
>  
> -             rc = ops->ndo_start_xmit(nskb, dev);
> +             if (skb_csum_start_bug(skb)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(nskb, dev);
> +
>               if (unlikely(rc != NETDEV_TX_OK)) {
>                       if (rc & ~NETDEV_TX_MASK)
>                               goto out_kfree_gso_skb;
Comment 37 Eric Dumazet 2010-08-24 07:47:34 UTC
Le lundi 23 août 2010 à 14:47 +0300, Plamen Petrov a écrit :

> Well, here is the current status:
> 
> Last I promised I will stay on 2.6.36-rc1-git for as long as possible,
> so here is what I achieved:
> 
> >
> root@fs:/boot# w; uname -a
> >  12:08:18 up 3 days, 24 min,  1 user,  load average: 1.21, 1.29, 1.17
> > USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
> > root     pts/0    192.168.10.159   12:04    0.00s  0.02s  0.00s w
> > Linux fs 2.6.36-rc1-FS-00127-g763008c #1 SMP Thu Aug 19 07:10:57 UTC 2010
> i686 Intel(R) Pentium(R) D CPU 3.00GHz GenuineIntel GNU/Linux
> 
> Yeah, 3 days and counting, right until I decided to try the freshly
> announced 2.6.36-rc2.
> 
> So I upgraded the kernel, but left the scripts that turn GRO off for
> the tg3 card still run at system startup. This way the system ran for
> 2 and a half hours, when I decided its time to try turning GRO on.
> 
> I first tried to turn GRO on for the tg3 nic, and the system oopsed
> immediately (if the panic screen is necessary - please, ask for it).
> 
> After the system came back, I tried turning GRO on for the 2 RealTek
> 8139 nics, too, but ethtool only accepted turning GRO off.
> 
> And unfortunately, I can't test if other nics will fail the same way
> as the motherboard integrated tg3 I have does, so for now, this is
> only a tg3 + GRO on problem; I don't have any other hardware to test
> with available.

There was no change in latest kernel in this area.

Should you have only tg3 cards, I guess there would be no bug.

Bug is probably a combination of :

1) tg3 + GRO , or any card enabling GRO
2) Some network code (netfilter ?)
3) a 8139too, or any card calling skb_copy_and_csum_dev()
Comment 38 Plamen Petrov 2010-08-24 08:44:13 UTC
На 24.8.2010 г. 08:01, Eric Dumazet написа:
> Le mardi 24 août 2010 à 07:51 +0300, Plamen Petrov a écrit :
>> So far - not so good. :(
...
>
> Hmm... I was thinking adding a call __skb_linearize(), just in case...
>
> diff --git a/drivers/net/8139too.c b/drivers/net/8139too.c
> index f5166dc..10928a2 100644
> --- a/drivers/net/8139too.c
> +++ b/drivers/net/8139too.c
> @@ -1696,7 +1696,7 @@ static netdev_tx_t rtl8139_start_xmit (struct sk_buff
> *skb,
>       entry = tp->cur_tx % NUM_TX_DESC;
>
>       /* Note: the chip doesn't have auto-pad! */
> -     if (likely(len<  TX_BUF_SIZE)) {
> +     if (likely(len<  TX_BUF_SIZE&&  !__skb_linearize(skb))) {
>               if (len<  ETH_ZLEN)
>                       memset(tp->tx_buf[entry], 0, ETH_ZLEN);
>               skb_copy_and_csum_dev(skb, tp->tx_buf[entry]);
>
>
Here is what I've got while running the kernel with the patch
I attached earlier plus the above one, after turning GRO on for
my tg3 nic:

[picture 10]
http://picpaste.com/37e37f9ff9504e3a003f49092f9b1be6.jpg

[picture 11]
http://picpaste.com/5663ca7c5041c0ed7a1f3e6a9aa17a9e.jpg

[picture 12]
http://picpaste.com/17f2ecaa409a1ebbf835a2e0519d3099.jpg

Now moving on to the second proposed patch from Jarek.

Thanks,
Plamen
Comment 39 Plamen Petrov 2010-08-24 13:28:49 UTC
На 24.8.2010 г. 11:43, Plamen Petrov написа:
> На 24.8.2010 г. 08:01, Eric Dumazet написа:
>> Le mardi 24 août 2010 à 07:51 +0300, Plamen Petrov a écrit :
>>> So far - not so good. :(
...
>>
> Here is what I've got while running the kernel with the patch
> I attached earlier plus the above one, after turning GRO on for
> my tg3 nic:
>
> [picture 10]
> http://picpaste.com/37e37f9ff9504e3a003f49092f9b1be6.jpg
>
> [picture 11]
> http://picpaste.com/5663ca7c5041c0ed7a1f3e6a9aa17a9e.jpg
>
> [picture 12]
> http://picpaste.com/17f2ecaa409a1ebbf835a2e0519d3099.jpg
>
> Now moving on to the second proposed patch from Jarek.
>
> Thanks,
> Plamen

The current status: if I enable GRO on the tg3 - the kernel oopses.
It just takes a different amount of time to trigger: somewhere from
30 seconds to 30 minutes.

The oopses looks the same, and here are the latest:

[picture 13]
http://picpaste.com/c8dbda8f5c15d9ce3e050dd7f245f5d0.jpg

[picture 14]
http://picpaste.com/646cca586b704c5b72d3cf9fa54c7344.jpg

I was wondering which debug options could help us track this down?

Thanks,
Plamen
Comment 40 Eric Dumazet 2010-08-24 15:09:35 UTC
Le mardi 24 août 2010 à 16:27 +0300, Plamen Petrov a écrit :

> The current status: if I enable GRO on the tg3 - the kernel oopses.
> It just takes a different amount of time to trigger: somewhere from
> 30 seconds to 30 minutes.
> 
> The oopses looks the same, and here are the latest:
> 
> [picture 13]
> http://picpaste.com/c8dbda8f5c15d9ce3e050dd7f245f5d0.jpg
> 
> [picture 14]
> http://picpaste.com/646cca586b704c5b72d3cf9fa54c7344.jpg
> 
> I was wondering which debug options could help us track this down?
> 

Thanks, here is an updated patch (against linux-2.6)

diff --git a/net/core/dev.c b/net/core/dev.c
index 3721fbb..77c8eb7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1935,6 +1935,32 @@ static inline int skb_needs_linearize(struct sk_buff *skb,
 					      illegal_highdma(dev, skb))));
 }
 
+int skb_csum_start_bug(const struct sk_buff *skb, int pos)
+{
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		long csstart;
+
+		csstart = skb->csum_start - skb_headroom(skb);
+		if (WARN_ON(csstart > skb_headlen(skb))) {
+			int i;
+
+			pr_err("%d: csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
+				   pos, skb->csum_start, skb->csum_offset, skb_headroom(skb),
+				   skb_headlen(skb), skb->len);
+			pr_err("nr_frags=%u gso_size=%u ",
+					skb_shinfo(skb)->nr_frags,
+					skb_shinfo(skb)->gso_size);
+			for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+				pr_err("frag_size=%u ", skb_shinfo(skb)->frags[i].size);
+			}
+			pr_err("\n");
+			return 1;
+		}
+	}
+	return 0;
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1959,11 +1985,15 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				goto out_kfree_skb;
 			if (skb->next)
 				goto gso;
+			if (skb_csum_start_bug(skb, 10))
+				goto out_kfree_skb;
 		} else {
 			if (skb_needs_linearize(skb, dev) &&
 			    __skb_linearize(skb))
 				goto out_kfree_skb;
 
+			if (skb_csum_start_bug(skb, 20))
+				goto out_kfree_skb;
 			/* If packet is not checksummed and device does not
 			 * support checksumming for this protocol, complete
 			 * checksumming here.
@@ -1974,10 +2004,16 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				if (!dev_can_checksum(dev, skb) &&
 				     skb_checksum_help(skb))
 					goto out_kfree_skb;
+				if (skb_csum_start_bug(skb, 30))
+					goto out_kfree_skb;
 			}
 		}
 
-		rc = ops->ndo_start_xmit(skb, dev);
+		if (skb_csum_start_bug(skb, 40)) {
+			kfree_skb(skb);
+			rc = NETDEV_TX_OK;
+		} else
+			rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
 		return rc;
@@ -1997,7 +2033,12 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
-		rc = ops->ndo_start_xmit(nskb, dev);
+		if (skb_csum_start_bug(skb, 50)) {
+			kfree_skb(skb);
+			rc = NETDEV_TX_OK;
+		} else
+			rc = ops->ndo_start_xmit(nskb, dev);
+
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
 				goto out_kfree_gso_skb;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3a2513f..3d54a1b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1824,13 +1824,15 @@ void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
 {
 	__wsum csum;
 	long csstart;
+	extern int skb_csum_start_bug(const struct sk_buff *skb, int pos);
 
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
 		csstart = skb->csum_start - skb_headroom(skb);
 	else
 		csstart = skb_headlen(skb);
 
-	BUG_ON(csstart > skb_headlen(skb));
+	if (skb_csum_start_bug(skb, 100))
+		return;
 
 	skb_copy_from_linear_data(skb, to, csstart);
Comment 41 Plamen Petrov 2010-08-24 18:07:46 UTC
Eric Dumazet написа: 

> Le mardi 24 août 2010 à 16:27 +0300, Plamen Petrov a écrit : 
> 
>> The current status: if I enable GRO on the tg3 - the kernel oopses.
>> It just takes a different amount of time to trigger: somewhere from
>> 30 seconds to 30 minutes. 
>> 
>> The oopses looks the same, and here are the latest: 
>> 
>> [picture 13]
>> http://picpaste.com/c8dbda8f5c15d9ce3e050dd7f245f5d0.jpg 
>> 
>> [picture 14]
>> http://picpaste.com/646cca586b704c5b72d3cf9fa54c7344.jpg 
>> 
>> I was wondering which debug options could help us track this down? 
>> 
> 
> Thanks, here is an updated patch (against linux-2.6) 
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3721fbb..77c8eb7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1935,6 +1935,32 @@ static inline int skb_needs_linearize(struct sk_buff
> *skb,
>                                             illegal_highdma(dev, skb))));
>  }
>  
> +int skb_csum_start_bug(const struct sk_buff *skb, int pos)
> +{
> +
> +     if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +             long csstart;
> +
> +             csstart = skb->csum_start - skb_headroom(skb);
> +             if (WARN_ON(csstart > skb_headlen(skb))) {
> +                     int i;
> +
> +                     pr_err("%d: csum_start %u, offset %u, headroom %d,
> headlen %d, len %d\n",
> +                                pos, skb->csum_start, skb->csum_offset,
> skb_headroom(skb),
> +                                skb_headlen(skb), skb->len);
> +                     pr_err("nr_frags=%u gso_size=%u ",
> +                                     skb_shinfo(skb)->nr_frags,
> +                                     skb_shinfo(skb)->gso_size);
> +                     for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> +                             pr_err("frag_size=%u ",
> skb_shinfo(skb)->frags[i].size);
> +                     }
> +                     pr_err("\n");
> +                     return 1;
> +             }
> +     }
> +     return 0;
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>                       struct netdev_queue *txq)
>  {
> @@ -1959,11 +1985,15 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>                               goto out_kfree_skb;
>                       if (skb->next)
>                               goto gso;
> +                     if (skb_csum_start_bug(skb, 10))
> +                             goto out_kfree_skb;
>               } else {
>                       if (skb_needs_linearize(skb, dev) &&
>                           __skb_linearize(skb))
>                               goto out_kfree_skb;
>  
> +                     if (skb_csum_start_bug(skb, 20))
> +                             goto out_kfree_skb;
>                       /* If packet is not checksummed and device does not
>                        * support checksumming for this protocol, complete
>                        * checksumming here.
> @@ -1974,10 +2004,16 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>                               if (!dev_can_checksum(dev, skb) &&
>                                    skb_checksum_help(skb))
>                                       goto out_kfree_skb;
> +                             if (skb_csum_start_bug(skb, 30))
> +                                     goto out_kfree_skb;
>                       }
>               }
>  
> -             rc = ops->ndo_start_xmit(skb, dev);
> +             if (skb_csum_start_bug(skb, 40)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(skb, dev);
>               if (rc == NETDEV_TX_OK)
>                       txq_trans_update(txq);
>               return rc;
> @@ -1997,7 +2033,12 @@ gso:
>               if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>                       skb_dst_drop(nskb);
>  
> -             rc = ops->ndo_start_xmit(nskb, dev);
> +             if (skb_csum_start_bug(skb, 50)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(nskb, dev);
> +
>               if (unlikely(rc != NETDEV_TX_OK)) {
>                       if (rc & ~NETDEV_TX_MASK)
>                               goto out_kfree_gso_skb;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3a2513f..3d54a1b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1824,13 +1824,15 @@ void skb_copy_and_csum_dev(const struct sk_buff *skb,
> u8 *to)
>  {
>       __wsum csum;
>       long csstart;
> +     extern int skb_csum_start_bug(const struct sk_buff *skb, int pos);
>  
>       if (skb->ip_summed == CHECKSUM_PARTIAL)
>               csstart = skb->csum_start - skb_headroom(skb);
>       else
>               csstart = skb_headlen(skb);
>  
> -     BUG_ON(csstart > skb_headlen(skb));
> +     if (skb_csum_start_bug(skb, 100))
> +             return;
>  
>       skb_copy_from_linear_data(skb, to, csstart);
>   
> 
> 

Above patch applied, and happy to report the machine now spits data
in the logs instead of oopsing. Here is what we have now: 

[   10.721802] Ending clean XFS mount for filesystem: md12
[   11.669013] IPv4 FIB: Using LC-trie version 0.409
[   11.669101] eth2: link up, 100Mbps, full-duplex, lpa 0x45E1
[   11.746792] eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
[   11.757230] tg3 0000:04:00.0: irq 44 for MSI/MSI-X
[   11.810133] ADDRCONF(NETDEV_UP): eth1: link is not ready
[   11.957523] sixxs_t: Disabled Privacy Extensions
[   14.843711] tg3 0000:04:00.0: eth1: Link is up at 1000 Mbps, full duplex
[   14.843717] tg3 0000:04:00.0: eth1: Flow control is on for TX and on for 
RX
[   14.843753] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[   15.854861] tun0: Disabled Privacy Extensions
[  699.375620] ------------[ cut here ]------------
[  699.475648] WARNING: at net/core/dev.c:1945 
skb_csum_start_bug+0x46/0xf2()
[  699.575667] Hardware name: PowerEdge SC440
[  699.675688] Pid: 2963, comm: FahCore_78.exe Not tainted 
2.6.36-rc2-FS-00103-g2d6fa25 #1
[  699.775706] Call Trace:
[  699.975744]  [<c102d86c>] ? warn_slowpath_common+0x67/0x8c
[  700.175779]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[  700.375813]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[  700.575848]  [<c102d8ac>] ? warn_slowpath_null+0x1b/0x1f
[  700.775882]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[  700.975918]  [<c1024569>] ? __wake_up_sync_key+0x3c/0x52
[  701.175953]  [<c12a7bab>] ? skb_copy_and_csum_dev+0x2a/0xaf
[  701.375989]  [<c122483b>] ? rtl8139_start_xmit+0x4a/0x13a
[  701.576026]  [<c12ae29e>] ? dev_hard_start_xmit+0x220/0x4cc
[  701.776062]  [<c12bfbed>] ? sch_direct_xmit+0xac/0x174
[  701.976096]  [<c12c3f69>] ? nf_iterate+0x69/0x7c
[  702.176131]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[  702.376165]  [<c12b00eb>] ? dev_queue_xmit+0xc7/0x355
[  702.576198]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[  702.776232]  [<c12e8a92>] ? ip_finish_output+0x11c/0x2b6
[  702.976266]  [<c12e8f11>] ? ip_output+0xa4/0xc3
[  703.176299]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[  703.376332]  [<c12e4ff9>] ? ip_forward_finish+0x39/0x44
[  703.576365]  [<c12e3a38>] ? ip_rcv_finish+0xe8/0x39f
[  703.776398]  [<c12ad0fd>] ? __netif_receive_skb+0x237/0x2b3
[  703.976431]  [<c12ad70b>] ? netif_receive_skb+0x5f/0x64
[  704.176464]  [<c12ad75e>] ? napi_gro_complete+0x4e/0x94
[  704.376497]  [<c12ada9a>] ? dev_gro_receive+0x158/0x1f5
[  704.576530]  [<c12adc84>] ? napi_gro_receive+0x16/0x1f
[  704.776563]  [<c1217efb>] ? tg3_poll_work+0x5bc/0xbfb
[  704.976597]  [<c1006e50>] ? nommu_sync_single_for_device+0x0/0x1
[  705.176631]  [<c121ce68>] ? tg3_poll+0x43/0x194
[  705.376665]  [<c12ad8b3>] ? net_rx_action+0xcc/0x15b
[  705.576699]  [<c1031cad>] ? __do_softirq+0x7f/0xfa
[  705.776733]  [<c1053dc9>] ? handle_IRQ_event+0x48/0xa6
[  705.976767]  [<c105689b>] ? move_native_irq+0x9/0x3e
[  706.176799]  [<c1031d4f>] ? do_softirq+0x27/0x2a
[  706.376832]  [<c1031e9d>] ? irq_exit+0x63/0x68
[  706.576864]  [<c1003dda>] ? do_IRQ+0x44/0xa1
[  706.776897]  [<c1031e6b>] ? irq_exit+0x31/0x68
[  706.976930]  [<c101654e>] ? smp_apic_timer_interrupt+0x53/0x83
[  707.176963]  [<c1002d29>] ? common_interrupt+0x29/0x30
[  707.276981] ---[ end trace 75e4f8534893c910 ]---
[  707.376998] 100: csum_start 306, offset 16, headroom 390, headlen 70, 
len 70
[  707.477015] nr_frags=0 gso_size=0
[  707.577031]
[ 1012.931455] ------------[ cut here ]------------
[ 1013.031482] WARNING: at net/core/dev.c:1945 
skb_csum_start_bug+0x46/0xf2()
[ 1013.131501] Hardware name: PowerEdge SC440
[ 1013.231521] Pid: 2963, comm: FahCore_78.exe Tainted: G        W   
2.6.36-rc2-FS-00103-g2d6fa25 #1
[ 1013.331538] Call Trace:
[ 1013.531575]  [<c102d86c>] ? warn_slowpath_common+0x67/0x8c
[ 1013.731608]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[ 1013.931641]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[ 1014.131675]  [<c102d8ac>] ? warn_slowpath_null+0x1b/0x1f
[ 1014.331708]  [<c12abc76>] ? skb_csum_start_bug+0x46/0xf2
[ 1014.531742]  [<c1024569>] ? __wake_up_sync_key+0x3c/0x52
[ 1014.731775]  [<c12a7bab>] ? skb_copy_and_csum_dev+0x2a/0xaf
[ 1014.931809]  [<c122483b>] ? rtl8139_start_xmit+0x4a/0x13a
[ 1015.131841]  [<c12ae29e>] ? dev_hard_start_xmit+0x220/0x4cc
[ 1015.331875]  [<c12bfbed>] ? sch_direct_xmit+0xac/0x174
[ 1015.531908]  [<c12c3f69>] ? nf_iterate+0x69/0x7c
[ 1015.731941]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[ 1015.931973]  [<c12b00eb>] ? dev_queue_xmit+0xc7/0x355
[ 1016.132007]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[ 1016.332039]  [<c12e8a92>] ? ip_finish_output+0x11c/0x2b6
[ 1016.532071]  [<c12e8f11>] ? ip_output+0xa4/0xc3
[ 1016.732103]  [<c12e8976>] ? ip_finish_output+0x0/0x2b6
[ 1016.932135]  [<c12e4ff9>] ? ip_forward_finish+0x39/0x44
[ 1017.132166]  [<c12e3a38>] ? ip_rcv_finish+0xe8/0x39f
[ 1017.332198]  [<c12ad0fd>] ? __netif_receive_skb+0x237/0x2b3
[ 1017.532230]  [<c12ad70b>] ? netif_receive_skb+0x5f/0x64
[ 1017.732262]  [<c12ad75e>] ? napi_gro_complete+0x4e/0x94
[ 1017.932294]  [<c12ada9a>] ? dev_gro_receive+0x158/0x1f5
[ 1018.132326]  [<c12adc84>] ? napi_gro_receive+0x16/0x1f
[ 1018.332358]  [<c1217efb>] ? tg3_poll_work+0x5bc/0xbfb
[ 1018.532392]  [<c1006e50>] ? nommu_sync_single_for_device+0x0/0x1
[ 1018.732424]  [<c121ce68>] ? tg3_poll+0x43/0x194
[ 1018.932456]  [<c12ad8b3>] ? net_rx_action+0xcc/0x15b
[ 1019.132489]  [<c1031cad>] ? __do_softirq+0x7f/0xfa
[ 1019.332522]  [<c1053dc9>] ? handle_IRQ_event+0x48/0xa6
[ 1019.532554]  [<c105689b>] ? move_native_irq+0x9/0x3e
[ 1019.732586]  [<c1031d4f>] ? do_softirq+0x27/0x2a
[ 1019.932617]  [<c1031e9d>] ? irq_exit+0x63/0x68
[ 1020.132648]  [<c1003dda>] ? do_IRQ+0x44/0xa1
[ 1020.332680]  [<c1031e6b>] ? irq_exit+0x31/0x68
[ 1020.532713]  [<c101654e>] ? smp_apic_timer_interrupt+0x53/0x83
[ 1020.732745]  [<c1002d29>] ? common_interrupt+0x29/0x30
[ 1020.932777]  [<c1390000>] ? quirk_io_region+0x1c/0x91
[ 1021.032794] ---[ end trace 75e4f8534893c911 ]---
[ 1021.132812] 100: csum_start 306, offset 16, headroom 390, headlen 153, 
len 153
[ 1021.232828] nr_frags=0 gso_size=0
[ 1021.332844] 

Now what? 

Thanks a lot, Eric and Jarek! 

Plamen 

_
___
_____
 ------------------------------------------
This message was sent by the mail server
at fs.ru.acad.bg using the web interface:
    https://fs.ru.acad.bg/s/m/webmail
E-mail postmaster@fs.ru.acad.bg with anything,
regarding the server itself
Comment 42 Jarek Poplawski 2010-08-24 18:23:18 UTC
On Tue, Aug 24, 2010 at 08:25:16PM +0300, Plamen Petrov wrote:
> Eric Dumazet ????????????:
...
> 
> Now what?

Good question. IMHO it looks like skbs are overwritten, so better turn
off gro until some next patch.

> 
> Thanks a lot, Eric and Jarek!

Thanks a lot, Plamen and Eric, too!

Jarek P.
Comment 43 Eric Dumazet 2010-08-24 19:20:41 UTC
Le mardi 24 août 2010 à 20:25 +0300, Plamen Petrov a écrit :
> Above patch applied, and happy to report the machine now spits data
> in the logs instead of oopsing. Here is what we have now: 
> [  707.276981] ---[ end trace 75e4f8534893c910 ]---
> [  707.376998] 100: csum_start 306, offset 16, headroom 390, headlen 70, 
> len 70
> [  707.477015] nr_frags=0 gso_size=0
> [  707.577031]
> [ 1021.032794] ---[ end trace 75e4f8534893c911 ]---
> [ 1021.132812] 100: csum_start 306, offset 16, headroom 390, headlen 153, 
> len 153
> [ 1021.232828] nr_frags=0 gso_size=0
> [ 1021.332844] 
> 

Thanks !

csum_offset = 16.

so its offsetof(struct tcphdr, check)

maybe a bug in net/ipv4/netfilter/nf_nat_helper.c ?

We should trace all spots where we set csum_start/csum_offset

Or/And trace the skb content.

Please add a :

print_hex_dump(KERN_ERR, "skb data:", DUMP_PREFIX_OFFSET, 
               16, 1, skb->head, skb_end_pointer(skb)-skb->head,true);


call in skb_csum_start_bug(), right after the pr_err("\n") and before
the "return 1;"


int skb_csum_start_bug(const struct sk_buff *skb, int pos)
{

        if (skb->ip_summed == CHECKSUM_PARTIAL) {
                long csstart;

                csstart = skb->csum_start - skb_headroom(skb);
                if (WARN_ON(csstart > skb_headlen(skb))) {
                        int i;

                        pr_err("%d: csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
                                   pos, skb->csum_start, skb->csum_offset, skb_headroom(skb),
                                   skb_headlen(skb), skb->len);
                        pr_err("nr_frags=%u gso_size=%u ",
                                        skb_shinfo(skb)->nr_frags,
                                        skb_shinfo(skb)->gso_size);
                        for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
                                pr_err("frag_size=%u ", skb_shinfo(skb)->frags[i].size);
                        }
                        pr_err("\n");
                        print_hex_dump(KERN_ERR, "skb data:", DUMP_PREFIX_OFFSET,
                                16, 1, skb->head, skb_end_pointer(skb) - skb->head, true);
                        return 1;
                }
        }
        return 0;
}
Comment 44 Eric Dumazet 2010-08-24 21:52:33 UTC
Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> > Plamen Petrov wrote, On 20.08.2010 12:53:
> > > So, I guess its David and Herbert's turn?...
> > 
> > If you're bored in the meantime I'd suggest to do check the realtek
> > driver eg:
> > - for locking with the patch below,
> > - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> 
> After rethinking, it's almost impossible this patch could change
> anything here, so don't bother, but consider mainly the second
> proposal.
> 
> Jarek P.

Indeed ;)

Its true that not many nics use the skb_copy_and_csum_dev() helper,
maybe this one must be updated somehow ?
Comment 45 Eric Dumazet 2010-08-24 21:55:24 UTC
Le lundi 23 août 2010 à 12:47 +0000, Jarek Poplawski a écrit :
> On Mon, Aug 23, 2010 at 02:47:23PM +0300, Plamen Petrov wrote:
> > ???? 21.8.2010 ??. 11:07, Jarek Poplawski ????????????:
> >> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
> >>> Le samedi 21 ao??t 2010 ?? 09:47 +0200, Jarek Poplawski a écrit :
> >>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> >>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
> >>>>>> So, I guess its David and Herbert's turn?...
> >>>>>
> >>>>> If you're bored in the meantime I'd suggest to do check the realtek
> >>>>> driver eg:
> >>>>> - for locking with the patch below,
> >>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> ...
> > Yeah, 3 days and counting, right until I decided to try the freshly
> > announced 2.6.36-rc2.
> >
> > So I upgraded the kernel, but left the scripts that turn GRO off for
> > the tg3 card still run at system startup. This way the system ran for
> > 2 and a half hours, when I decided its time to try turning GRO on.
> >
> > I first tried to turn GRO on for the tg3 nic, and the system oopsed
> > immediately (if the panic screen is necessary - please, ask for it).
> >
> > After the system came back, I tried turning GRO on for the 2 RealTek
> > 8139 nics, too, but ethtool only accepted turning GRO off.
> >
> > And unfortunately, I can't test if other nics will fail the same way
> > as the motherboard integrated tg3 I have does, so for now, this is
> > only a tg3 + GRO on problem; I don't have any other hardware to test
> > with available.
> 
> A little misunderstanding: I was intersted with turning off some
> features on realteks to change the packet path from tg3 with gro
> to realtek without gro and without tx-checksumming etc.
> 
> But maybe you could try the patch below instead (so the patched
> kernel, tg3 with gro on, and realteks without any change).
> 
> Thanks,
> Jarek P.
> 
> --- (for debugging only)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3721fbb..51823cd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff
> *skb,
>                                             illegal_highdma(dev, skb))));
>  }
>  
> +static int skb_csum_start_bug(struct sk_buff *skb)
> +{
> +
> +     if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +             long csstart;
> +
> +             csstart = skb->csum_start - skb_headroom(skb);
> +             if (WARN_ON(csstart > skb_headlen(skb))) {
> +                     pr_warning("csum_start %d, headroom %d, headlen %d\n",
> +                                skb->csum_start, skb_headroom(skb),
> +                                skb_headlen(skb));

I was about to suggest a similar patch ;)

Also prints skb->csum_offset and skb->len if possible

pr_err("csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
        skb->csum_start,
	skb->csum_offset,
	skb_headroom(skb),
        skb_headlen(skb),
	skb->len);


> +                     return 1;
> +             }
> +     }
> +     return 0;
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>                       struct netdev_queue *txq)
>  {
> @@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>               skb_orphan_try(skb);
>  
>               if (netif_needs_gso(dev, skb)) {
> +                     skb_csum_start_bug(skb);
>                       if (unlikely(dev_gso_segment(skb)))
>                               goto out_kfree_skb;
>                       if (skb->next)
>                               goto gso;
>               } else {
> +                     skb_csum_start_bug(skb);
>                       if (skb_needs_linearize(skb, dev) &&
>                           __skb_linearize(skb))
>                               goto out_kfree_skb;
> @@ -1997,7 +2016,12 @@ gso:
>               if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>                       skb_dst_drop(nskb);
>  
> -             rc = ops->ndo_start_xmit(nskb, dev);
> +             if (skb_csum_start_bug(skb)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(nskb, dev);
> +
>               if (unlikely(rc != NETDEV_TX_OK)) {
>                       if (rc & ~NETDEV_TX_MASK)
>                               goto out_kfree_gso_skb;
Comment 46 Plamen Petrov 2010-08-25 07:06:21 UTC
На 24.8.2010 г. 22:19, Eric Dumazet написа:
> Le mardi 24 août 2010 à 20:25 +0300, Plamen Petrov a écrit :
>> Above patch applied, and happy to report the machine now spits data
>> in the logs instead of oopsing. Here is what we have now:
>> [  707.276981] ---[ end trace 75e4f8534893c910 ]---
>> [  707.376998] 100: csum_start 306, offset 16, headroom 390, headlen 70,
>> len 70
>> [  707.477015] nr_frags=0 gso_size=0
>> [  707.577031]
>> [ 1021.032794] ---[ end trace 75e4f8534893c911 ]---
>> [ 1021.132812] 100: csum_start 306, offset 16, headroom 390, headlen 153,
>> len 153
>> [ 1021.232828] nr_frags=0 gso_size=0
>> [ 1021.332844]
>>
>
> Thanks !
>
> csum_offset = 16.
>
> so its offsetof(struct tcphdr, check)
>
> maybe a bug in net/ipv4/netfilter/nf_nat_helper.c ?
>
> We should trace all spots where we set csum_start/csum_offset
>
> Or/And trace the skb content.
>
> Please add a :
>
> print_hex_dump(KERN_ERR, "skb data:", DUMP_PREFIX_OFFSET,
>                 16, 1, skb->head, skb_end_pointer(skb)-skb->head,true);
>

Done! See the results below.

>
> call in skb_csum_start_bug(), right after the pr_err("\n") and before
> the "return 1;"
>
>
> int skb_csum_start_bug(const struct sk_buff *skb, int pos)
> {
>
>          if (skb->ip_summed == CHECKSUM_PARTIAL) {
>                  long csstart;
>
>                  csstart = skb->csum_start - skb_headroom(skb);
>                  if (WARN_ON(csstart>  skb_headlen(skb))) {
>                          int i;
>
>                          pr_err("%d: csum_start %u, offset %u, headroom %d,
>                          headlen %d, len %d\n",
>                                     pos, skb->csum_start, skb->csum_offset,
>                                     skb_headroom(skb),
>                                     skb_headlen(skb), skb->len);
>                          pr_err("nr_frags=%u gso_size=%u ",
>                                          skb_shinfo(skb)->nr_frags,
>                                          skb_shinfo(skb)->gso_size);
>                          for (i = 0; i<  skb_shinfo(skb)->nr_frags; i++) {
>                                  pr_err("frag_size=%u ",
>                                  skb_shinfo(skb)->frags[i].size);
>                          }
>                          pr_err("\n");
>                          print_hex_dump(KERN_ERR, "skb data:",
>                          DUMP_PREFIX_OFFSET,
>                                  16, 1, skb->head, skb_end_pointer(skb) -
>                                  skb->head, true);
>                          return 1;
>                  }
>          }
>          return 0;
> }
>
>

I see you liked the previous one, here's some more. ;)

This one is based on Linus' latest tree,
hence the ID "2.6.36-rc2-FS-00210-geedff42".

> [   10.510191] XFS mounting filesystem md12
> [   10.693540] Ending clean XFS mount for filesystem: md12
> [   11.592737] IPv4 FIB: Using LC-trie version 0.409
> [   11.592827] eth2: link up, 100Mbps, full-duplex, lpa 0x45E1
> [   11.677311] eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
> [   11.687604] tg3 0000:04:00.0: irq 44 for MSI/MSI-X
> [   11.719166] ADDRCONF(NETDEV_UP): eth1: link is not ready
> [   11.845858] sixxs_t: Disabled Privacy Extensions
> [   14.815688] tg3 0000:04:00.0: eth1: Link is up at 1000 Mbps, full duplex
> [   14.815693] tg3 0000:04:00.0: eth1: Flow control is on for TX and on for
> RX
> [   14.815740] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> [   15.470040] tun0: Disabled Privacy Extensions
> [  310.470021] ------------[ cut here ]------------
> [  310.570041] WARNING: at net/core/dev.c:1945
> skb_csum_start_bug+0x46/0x133()
> [  310.670050] Hardware name: PowerEdge SC440
> [  310.770060] Pid: 2960, comm: FahCore_78.exe Not tainted
> 2.6.36-rc2-FS-00210-geedff42 #1
> [  310.870069] Call Trace:
> [  311.070087]  [<c102d87c>] ? warn_slowpath_common+0x67/0x8c
> [  311.270103]  [<c12adca9>] ? skb_csum_start_bug+0x46/0x133
> [  311.470126]  [<c12adca9>] ? skb_csum_start_bug+0x46/0x133
> [  311.670144]  [<c102d8bc>] ? warn_slowpath_null+0x1b/0x1f
> [  311.870167]  [<c12adca9>] ? skb_csum_start_bug+0x46/0x133
> [  312.070191]  [<c102456f>] ? __wake_up_sync_key+0x3c/0x52
> [  312.270207]  [<c12a7bbb>] ? skb_copy_and_csum_dev+0x2a/0xaf
> [  312.470224]  [<c122484b>] ? rtl8139_start_xmit+0x4a/0x13a
> [  312.670238]  [<c12ae2ee>] ? dev_hard_start_xmit+0x220/0x4cc
> [  312.870253]  [<c12bfc3d>] ? sch_direct_xmit+0xac/0x174
> [  313.070268]  [<c12c3fb9>] ? nf_iterate+0x69/0x7c
> [  313.270283]  [<c12e89c6>] ? ip_finish_output+0x0/0x2b6
> [  313.470297]  [<c12b013c>] ? dev_queue_xmit+0xc7/0x354
> [  313.670312]  [<c12e89c6>] ? ip_finish_output+0x0/0x2b6
> [  313.870326]  [<c12e8ae2>] ? ip_finish_output+0x11c/0x2b6
> [  314.070341]  [<c12e8f61>] ? ip_output+0xa4/0xc3
> [  314.270355]  [<c12e89c6>] ? ip_finish_output+0x0/0x2b6
> [  314.470370]  [<c12e5049>] ? ip_forward_finish+0x39/0x44
> [  314.670385]  [<c12e3a88>] ? ip_rcv_finish+0xe8/0x39f
> [  314.870399]  [<c12ad01d>] ? __netif_receive_skb+0x237/0x2b3
> [  315.070413]  [<c12ad62b>] ? netif_receive_skb+0x5f/0x64
> [  315.270427]  [<c12ad67e>] ? napi_gro_complete+0x4e/0x94
> [  315.470440]  [<c12ad9ba>] ? dev_gro_receive+0x158/0x1f5
> [  315.670454]  [<c12adba4>] ? napi_gro_receive+0x16/0x1f
> [  315.870468]  [<c1217f0b>] ? tg3_poll_work+0x5bc/0xbfb
> [  316.070483]  [<c1006e50>] ? nommu_sync_single_for_device+0x0/0x1
> [  316.270498]  [<c121ce78>] ? tg3_poll+0x43/0x194
> [  316.470512]  [<c12ad7d3>] ? net_rx_action+0xcc/0x15b
> [  316.670526]  [<c1031cbd>] ? __do_softirq+0x7f/0xfa
> [  316.870541]  [<c1053dd9>] ? handle_IRQ_event+0x48/0xa6
> [  317.070555]  [<c10568ab>] ? move_native_irq+0x9/0x3e
> [  317.270569]  [<c1031d5f>] ? do_softirq+0x27/0x2a
> [  317.470582]  [<c1031ead>] ? irq_exit+0x63/0x68
> [  317.670596]  [<c1003dda>] ? do_IRQ+0x44/0xa1
> [  317.870610]  [<c10035c3>] ? do_device_not_available+0x0/0x49
> [  318.070624]  [<c1002d29>] ? common_interrupt+0x29/0x30
> [  318.270639]  [<c1390000>] ? quirk_ati_exploding_mce+0x46/0x7a
> [  318.370647] ---[ end trace df8deff2ad2a9760 ]---
> [  318.470656] 100: csum_start 306, offset 16, headroom 390, headlen 151, len
> 151
> [  318.570664] nr_frags=0 gso_size=0
> [  318.670671]
> [  318.770680] skb data:00000000: 00 a4 27 cc 17 5e ef ec 00 1a a0 38 8a 1b
> 08 00  ..'..^.....8....
> [  318.870688] skb data:00000010: 45 00 00 b7 00 00 40 00 40 11 a4 62 c0 a8
> 0a 01  E.....@.@..b....
> [  318.970697] skb data:00000020: c0 a8 0a 82 00 35 f2 69 00 a3 96 88 d3 4e
> 81 80  .....5.i.....N..
> [  319.070706] skb data:00000030: 00 01 00 04 00 00 00 00 06 61 6b 61 6d 61
> 69 0d  .........akamai.
> [  319.170714] skb data:00000040: 73 6d 61 72 74 61 64 73 65 72 76 65 72 03
> 63 6f  smartadserver.co
> [  319.270723] skb data:00000050: 6d 00 00 01 00 01 c0 0c 00 05 00 01 00 00
> 81 0a  m...............
> [  319.370731] skb data:00000060: 00 28 06 61 6b 61 6d 61 69 0d 73 6d 61 72
> 74 61  .(.akamai.smarta
> [  319.470739] skb data:00000070: 64 73 65 72 76 65 72 03 63 6f 6d 09 65 64
> 67 65  dserver.com.edge
> [  319.570747] skb data:00000080: 73 75 69 74 65 03 6e 65 74 00 c0 36 00 00
> 00 00  suite.net..6....
> [  319.670756] skb data:00000090: 00 00 00 00 00 00 00 00 00 00 08 00 45 00
> 00 34  ............E..4
> [  319.770764] skb data:000000a0: 0d 39 40 00 40 06 2f 89 7f 00 00 01 7f 00
> 00 01  .9@.@./.........
> [  319.870772] skb data:000000b0: be 75 19 4e 14 43 0f 38 14 5d 49 65 00 00
> 00 00  .u.N.C.8.]Ie....
> [  319.970780] skb data:000000c0: 00 00 00 00 00 00 00 00 00 00 08 00 45 00
> 00 57  ............E..W
> [  320.070789] skb data:000000d0: 4b 40 40 00 40 06 f1 5e 7f 00 00 01 7f 00
> 00 01  K@@.@..^........
> [  320.170797] skb data:000000e0: 19 4e be 75 14 5d 49 65 14 43 0f 38 80 18
> 04 00  .N.u.]Ie.C.8....
> [  320.270805] skb data:000000f0: fe 4b 00 00 01 01 08 0a 00 00 02 03 00 00
> 02 03  .K..............
> [  320.370813] skb data:00000100: 00 a8 27 cc 00 00 00 00 00 00 00 00 00 00
> 00 00  ..'.............
> [  320.470821] skb data:00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  320.570829] skb data:00000120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  320.670837] skb data:00000130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  320.770845] skb data:00000140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  320.870853] skb data:00000150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  320.970862] skb data:00000160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  321.070870] skb data:00000170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  321.170878] skb data:00000180: 00 0a e6 ac 07 db 00 0e 2e 5c 27 b2 00 0e
> 2e 5c  .........\'....\
> [  321.270887] skb data:00000190: 27 ef 08 00 45 00 00 89 09 27 40 00 7f 06
> 46 74  '...E....'@...Ft
> [  321.370895] skb data:000001a0: c0 a8 01 02 5b 67 8e c2 04 8f 00 50 98 49
> d7 bc  ....[g.....P.I..
> [  321.470903] skb data:000001b0: ff 40 e9 4a 50 18 ff ff ac 4f 00 00 33 42
> 25 32  .@.JP....O..3B%2
> [  321.570911] skb data:000001c0: 34 73 68 25 33 44 33 25 33 42 25 32 34 73
> 77 25  4sh%3D3%3B%24sw%
> [  321.670919] skb data:000001d0: 33 44 33 3b 20 70 69 64 3d 35 30 32 31 37
> 34 33  3D3; pid=5021743
> [  321.770927] skb data:000001e0: 34 32 30 33 31 30 32 39 39 37 38 33 0d 0a
> 43 6f  420310299783..Co
> [  321.870936] skb data:000001f0: 6f 6b 69 65 32 3a 20 24 56 65 72 73 69 6f
> 6e 3d  okie2: $Version=
> [  321.970944] skb data:00000200: 31 0d 0a 43 6f 6e 6e 65 63 74 69 6f 6e 3a
> 20 4b  1..Connection: K
> [  322.070952] skb data:00000210: 65 65 70 2d 41 6c 69 76 65 0d 0a 0d 0a 00
> 00 00  eep-Alive.......
> [  322.170960] skb data:00000220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  322.270968] skb data:00000230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  322.370977] skb data:00000240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  322.470985] skb data:00000250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  322.570993] skb data:00000260: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................
> [  322.671004] skb data:00000270: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00  ................

The rest is in the attached file, in case you need to run it trough some 
debugging app...

Thanks,
Plamen
Comment 47 Eric Dumazet 2010-08-25 21:10:27 UTC
Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> > Plamen Petrov wrote, On 20.08.2010 12:53:
> > > So, I guess its David and Herbert's turn?...
> > 
> > If you're bored in the meantime I'd suggest to do check the realtek
> > driver eg:
> > - for locking with the patch below,
> > - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> 
> After rethinking, it's almost impossible this patch could change
> anything here, so don't bother, but consider mainly the second
> proposal.
> 
> Jarek P.

Indeed ;)

Its true that not many nics use the skb_copy_and_csum_dev() helper,
maybe this one must be updated somehow ?
Comment 48 Eric Dumazet 2010-08-25 21:16:59 UTC
Le lundi 23 août 2010 à 12:47 +0000, Jarek Poplawski a écrit :
> On Mon, Aug 23, 2010 at 02:47:23PM +0300, Plamen Petrov wrote:
> > ???? 21.8.2010 ??. 11:07, Jarek Poplawski ????????????:
> >> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
> >>> Le samedi 21 ao??t 2010 ?? 09:47 +0200, Jarek Poplawski a écrit :
> >>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> >>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
> >>>>>> So, I guess its David and Herbert's turn?...
> >>>>>
> >>>>> If you're bored in the meantime I'd suggest to do check the realtek
> >>>>> driver eg:
> >>>>> - for locking with the patch below,
> >>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> ...
> > Yeah, 3 days and counting, right until I decided to try the freshly
> > announced 2.6.36-rc2.
> >
> > So I upgraded the kernel, but left the scripts that turn GRO off for
> > the tg3 card still run at system startup. This way the system ran for
> > 2 and a half hours, when I decided its time to try turning GRO on.
> >
> > I first tried to turn GRO on for the tg3 nic, and the system oopsed
> > immediately (if the panic screen is necessary - please, ask for it).
> >
> > After the system came back, I tried turning GRO on for the 2 RealTek
> > 8139 nics, too, but ethtool only accepted turning GRO off.
> >
> > And unfortunately, I can't test if other nics will fail the same way
> > as the motherboard integrated tg3 I have does, so for now, this is
> > only a tg3 + GRO on problem; I don't have any other hardware to test
> > with available.
> 
> A little misunderstanding: I was intersted with turning off some
> features on realteks to change the packet path from tg3 with gro
> to realtek without gro and without tx-checksumming etc.
> 
> But maybe you could try the patch below instead (so the patched
> kernel, tg3 with gro on, and realteks without any change).
> 
> Thanks,
> Jarek P.
> 
> --- (for debugging only)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3721fbb..51823cd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff
> *skb,
>                                             illegal_highdma(dev, skb))));
>  }
>  
> +static int skb_csum_start_bug(struct sk_buff *skb)
> +{
> +
> +     if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +             long csstart;
> +
> +             csstart = skb->csum_start - skb_headroom(skb);
> +             if (WARN_ON(csstart > skb_headlen(skb))) {
> +                     pr_warning("csum_start %d, headroom %d, headlen %d\n",
> +                                skb->csum_start, skb_headroom(skb),
> +                                skb_headlen(skb));

I was about to suggest a similar patch ;)

Also prints skb->csum_offset and skb->len if possible

pr_err("csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
        skb->csum_start,
	skb->csum_offset,
	skb_headroom(skb),
        skb_headlen(skb),
	skb->len);


> +                     return 1;
> +             }
> +     }
> +     return 0;
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>                       struct netdev_queue *txq)
>  {
> @@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>               skb_orphan_try(skb);
>  
>               if (netif_needs_gso(dev, skb)) {
> +                     skb_csum_start_bug(skb);
>                       if (unlikely(dev_gso_segment(skb)))
>                               goto out_kfree_skb;
>                       if (skb->next)
>                               goto gso;
>               } else {
> +                     skb_csum_start_bug(skb);
>                       if (skb_needs_linearize(skb, dev) &&
>                           __skb_linearize(skb))
>                               goto out_kfree_skb;
> @@ -1997,7 +2016,12 @@ gso:
>               if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>                       skb_dst_drop(nskb);
>  
> -             rc = ops->ndo_start_xmit(nskb, dev);
> +             if (skb_csum_start_bug(skb)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(nskb, dev);
> +
>               if (unlikely(rc != NETDEV_TX_OK)) {
>                       if (rc & ~NETDEV_TX_MASK)
>                               goto out_kfree_gso_skb;
Comment 49 Eric Dumazet 2010-08-26 03:49:42 UTC
Le lundi 23 août 2010 à 12:47 +0000, Jarek Poplawski a écrit :
> On Mon, Aug 23, 2010 at 02:47:23PM +0300, Plamen Petrov wrote:
> > ???? 21.8.2010 ??. 11:07, Jarek Poplawski ????????????:
> >> On Sat, Aug 21, 2010 at 09:50:58AM +0200, Eric Dumazet wrote:
> >>> Le samedi 21 ao??t 2010 ?? 09:47 +0200, Jarek Poplawski a écrit :
> >>>> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> >>>>> Plamen Petrov wrote, On 20.08.2010 12:53:
> >>>>>> So, I guess its David and Herbert's turn?...
> >>>>>
> >>>>> If you're bored in the meantime I'd suggest to do check the realtek
> >>>>> driver eg:
> >>>>> - for locking with the patch below,
> >>>>> - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> ...
> > Yeah, 3 days and counting, right until I decided to try the freshly
> > announced 2.6.36-rc2.
> >
> > So I upgraded the kernel, but left the scripts that turn GRO off for
> > the tg3 card still run at system startup. This way the system ran for
> > 2 and a half hours, when I decided its time to try turning GRO on.
> >
> > I first tried to turn GRO on for the tg3 nic, and the system oopsed
> > immediately (if the panic screen is necessary - please, ask for it).
> >
> > After the system came back, I tried turning GRO on for the 2 RealTek
> > 8139 nics, too, but ethtool only accepted turning GRO off.
> >
> > And unfortunately, I can't test if other nics will fail the same way
> > as the motherboard integrated tg3 I have does, so for now, this is
> > only a tg3 + GRO on problem; I don't have any other hardware to test
> > with available.
> 
> A little misunderstanding: I was intersted with turning off some
> features on realteks to change the packet path from tg3 with gro
> to realtek without gro and without tx-checksumming etc.
> 
> But maybe you could try the patch below instead (so the patched
> kernel, tg3 with gro on, and realteks without any change).
> 
> Thanks,
> Jarek P.
> 
> --- (for debugging only)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3721fbb..51823cd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1935,6 +1935,23 @@ static inline int skb_needs_linearize(struct sk_buff
> *skb,
>                                             illegal_highdma(dev, skb))));
>  }
>  
> +static int skb_csum_start_bug(struct sk_buff *skb)
> +{
> +
> +     if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +             long csstart;
> +
> +             csstart = skb->csum_start - skb_headroom(skb);
> +             if (WARN_ON(csstart > skb_headlen(skb))) {
> +                     pr_warning("csum_start %d, headroom %d, headlen %d\n",
> +                                skb->csum_start, skb_headroom(skb),
> +                                skb_headlen(skb));

I was about to suggest a similar patch ;)

Also prints skb->csum_offset and skb->len if possible

pr_err("csum_start %u, offset %u, headroom %d, headlen %d, len %d\n",
        skb->csum_start,
	skb->csum_offset,
	skb_headroom(skb),
        skb_headlen(skb),
	skb->len);


> +                     return 1;
> +             }
> +     }
> +     return 0;
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>                       struct netdev_queue *txq)
>  {
> @@ -1955,11 +1972,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct
> net_device *dev,
>               skb_orphan_try(skb);
>  
>               if (netif_needs_gso(dev, skb)) {
> +                     skb_csum_start_bug(skb);
>                       if (unlikely(dev_gso_segment(skb)))
>                               goto out_kfree_skb;
>                       if (skb->next)
>                               goto gso;
>               } else {
> +                     skb_csum_start_bug(skb);
>                       if (skb_needs_linearize(skb, dev) &&
>                           __skb_linearize(skb))
>                               goto out_kfree_skb;
> @@ -1997,7 +2016,12 @@ gso:
>               if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>                       skb_dst_drop(nskb);
>  
> -             rc = ops->ndo_start_xmit(nskb, dev);
> +             if (skb_csum_start_bug(skb)) {
> +                     kfree_skb(skb);
> +                     rc = NETDEV_TX_OK;
> +             } else
> +                     rc = ops->ndo_start_xmit(nskb, dev);
> +
>               if (unlikely(rc != NETDEV_TX_OK)) {
>                       if (rc & ~NETDEV_TX_MASK)
>                               goto out_kfree_gso_skb;
Comment 50 Eric Dumazet 2010-08-26 03:56:19 UTC
Le samedi 21 août 2010 à 09:47 +0200, Jarek Poplawski a écrit :
> On Fri, Aug 20, 2010 at 09:38:35PM +0200, Jarek Poplawski wrote:
> > Plamen Petrov wrote, On 20.08.2010 12:53:
> > > So, I guess its David and Herbert's turn?...
> > 
> > If you're bored in the meantime I'd suggest to do check the realtek
> > driver eg:
> > - for locking with the patch below,
> > - to turn off with ethtool its tx-checksumming and/or scatter-gather,
> 
> After rethinking, it's almost impossible this patch could change
> anything here, so don't bother, but consider mainly the second
> proposal.
> 
> Jarek P.

Indeed ;)

Its true that not many nics use the skb_copy_and_csum_dev() helper,
maybe this one must be updated somehow ?
Comment 51 Plamen Petrov 2010-08-27 08:45:32 UTC
Current status refresh:

Just tried 2.6.35.4, but without luck.

See the oops at http://picpaste.com/8240d73e92bd2ba25a9b4019010fcabf.jpg

Eric Dumazet and I are continuing to try and find a solution for my
problem via private email.

Thanks,
Plamen
Comment 52 Rafael J. Wysocki 2010-08-29 22:09:47 UTC
Handled-By :  Eric Dumazet <eric.dumazet@gmail.com>
Comment 53 Florian Mickler 2010-09-02 05:18:18 UTC
Patch available: http://lkml.org/lkml/2010/9/1/107
Comment 54 Florian Mickler 2010-09-12 16:28:45 UTC
Fixed in mainline.

References: http://lkml.org/lkml/2010/9/12/16
Comment 55 Rafael J. Wysocki 2010-09-12 17:29:03 UTC
On Sunday, September 12, 2010, Plamen Petrov wrote:
> На 08.9.2010 г. 23:21, Rafael J. Wysocki написа:
> > On Wednesday, September 08, 2010, David Miller wrote:
> >> From: Jarek Poplawski<jarkao2@gmail.com>
> >> Date: Wed, 8 Sep 2010 06:20:04 +0000
> >>
> >>> We need both of them. I hope David could add this too:
> >>>
> >>> Tested-by: Plamen Petrov<pvp-lsts@fs.uni-ruse.bg>
> >>
> >> Done, and applied, thanks :-)
> >
> > Please kindly let me know when Linus gets them, so that I can close the
> bug.
> >
> > Rafael
> 
> Now that both commits that fix my problems are in Linus' tree, the
> bug can be closed, but these fixes should go in 2.6.35.y, too.
> So, CCing -stable.
> 
> Fix 1:
> > commit      3d3be4333fdf6faa080947b331a6a19bce1a4f57
> >
> > gro: fix different skb headrooms
> >
> > Packets entering GRO might have different headrooms, even for a given
> > flow (because of implementation details in drivers, like copybreak).
> > We cant force drivers to deliver packets with a fixed headroom.
> >
> > 1) fix skb_segment()
> >
> > skb_segment() makes the false assumption headrooms of fragments are same
> > than the head. When CHECKSUM_PARTIAL is used, this can give csum_start
> > errors, and crash later in skb_copy_and_csum_dev()
> >
> > 2) allocate a minimal skb for head of frag_list
> >
> > skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to
> > allocate a fresh skb. This adds NET_SKB_PAD to a padding already
> > provided by netdevice, depending on various things, like copybreak.
> >
> > Use alloc_skb() to allocate an exact padding, to reduce cache line
> > needs:
> > NET_SKB_PAD + NET_IP_ALIGN
> >
> > bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
> >
> > Many thanks to Plamen Petrov, testing many debugging patches !
> > With help of Jarek Poplawski.
> >
> > Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > CC: Jarek Poplawski <jarkao2@gmail.com>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Fix 2:
> > commit      64289c8e6851bca0e589e064c9a5c9fbd6ae5dd4
> >
> > gro: Re-fix different skb headrooms
> >
> > The patch: "gro: fix different skb headrooms" in its part:
> > "2) allocate a minimal skb for head of frag_list" is buggy. The copied
> > skb has p->data set at the ip header at the moment, and skb_gro_offset
> > is the length of ip + tcp headers. So, after the change the length of
> > mac header is skipped. Later skb_set_mac_header() sets it into the
> > NET_SKB_PAD area (if it's long enough) and ip header is misaligned at
> > NET_SKB_PAD + NET_IP_ALIGN offset. There is no reason to assume the
> > original skb was wrongly allocated, so let's copy it as it was.
> >
> > bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626
> > fixes commit: 3d3be4333fdf6faa080947b331a6a19bce1a4f57
> >
> > Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> > CC: Eric Dumazet <eric.dumazet@gmail.com>
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Tested-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg>
> > Signed-off-by: David S. Miller <davem@davemloft.net>

Note You need to log in before you can comment on or make changes to this bug.