Bug 11752
Summary: | Extremely low netperf UDP_RR throughput for nvidia MCP65 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Arno J. Klaassen (arno) |
Component: | Network | Assignee: | drivers_network (drivers_network) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Arno J. Klaassen
2008-10-13 13:41:19 UTC
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 13 Oct 2008 13:41:19 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11752 > > Summary: Extremely low netperf UDP_RR throughput for nvidia MCP65 > Product: Drivers > Version: 2.5 > KernelVersion: from F10-Beta-x86_64-Live-KDE.iso > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: arno@heho.snv.jussieu.fr > > > Latest working kernel version: - > Earliest failing kernel version: > Distribution: F10-Beta-x86_64 > Hardware Environment: HP Pavillon dv6820ef > Software Environment: > Problem Description: when running at 1Gbps netperf shows good performance for > TCP_STREAM and UDP_STREAM tests, but extremely bad performance for UDP_RR > test > (less then 1 ping-pong a second whereas at 100Mbps performance easily reaches > 10-20K a second) > > A friend figured out that it looks like small packets at 1Gbps are dropped > as being falsely considered crc-errored. > > > > Steps to reproduce: Which driver is this using? forcedeth? Reply-To: rick.jones2@hp.com Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 13 Oct 2008 13:41:19 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > > >>http://bugzilla.kernel.org/show_bug.cgi?id=11752 >> >> Summary: Extremely low netperf UDP_RR throughput for nvidia MCP65 >> Product: Drivers >> Version: 2.5 >> KernelVersion: from F10-Beta-x86_64-Live-KDE.iso >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Network >> AssignedTo: jgarzik@pobox.com >> ReportedBy: arno@heho.snv.jussieu.fr >> >> >>Latest working kernel version: - >>Earliest failing kernel version: >>Distribution: F10-Beta-x86_64 >>Hardware Environment: HP Pavillon dv6820ef >>Software Environment: >>Problem Description: when running at 1Gbps netperf shows good performance for >>TCP_STREAM and UDP_STREAM tests, but extremely bad performance for UDP_RR >>test >>(less then 1 ping-pong a second whereas at 100Mbps performance easily reaches >>10-20K a second) >> >>A friend figured out that it looks like small packets at 1Gbps are dropped >>as being falsely considered crc-errored. Given how netperf UDP_RR has _no_ recovery from lost datagrams, it makes sense that performance on that test would be very low - the first lost datagram the transactions come to a screeching halt until the end-of-test timer expires. Are netstat stats showing retransmissions during a TCP_STREAM test? How about a TCP_RR test? TCP_RR might be low too, it just wouldn't necessarily be as low since TCP will get things started again after a loss. UDP_STREAM would just go blasting along without a care in the world... rick jones Hello all, thanx for your help. Rick Jones <rick.jones2@hp.com> writes: > Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > > bugzilla web interface). > > On Mon, 13 Oct 2008 13:41:19 -0700 (PDT) > > bugme-daemon@bugzilla.kernel.org wrote: > > > >>http://bugzilla.kernel.org/show_bug.cgi?id=11752 > >> > >> Summary: Extremely low netperf UDP_RR throughput for nvidia > MCP65 > >> Product: Drivers > >> Version: 2.5 > >> KernelVersion: from F10-Beta-x86_64-Live-KDE.iso > >> Platform: All > >> OS/Version: Linux > >> Tree: Mainline > >> Status: NEW > >> Severity: normal > >> Priority: P1 > >> Component: Network > >> AssignedTo: jgarzik@pobox.com > >> ReportedBy: arno@heho.snv.jussieu.fr > >> > >> > >>Latest working kernel version: - > >>Earliest failing kernel version: > >>Distribution: F10-Beta-x86_64 > >>Hardware Environment: HP Pavillon dv6820ef > >>Software Environment: > >>Problem Description: when running at 1Gbps netperf shows good performance > for > >>TCP_STREAM and UDP_STREAM tests, but extremely bad performance for UDP_RR > test > >>(less then 1 ping-pong a second whereas at 100Mbps performance easily > reaches > >>10-20K a second) > >> > >>A friend figured out that it looks like small packets at 1Gbps are dropped > >>as being falsely considered crc-errored. > > Given how netperf UDP_RR has _no_ recovery from lost datagrams, it > makes sense that performance on that test would be very low - the > first lost datagram the transactions come to a screeching halt until > the end-of-test timer expires. > > Are netstat stats showing retransmissions during a TCP_STREAM test? I will check that later tonoght and/or this WE > How about a TCP_RR test? TCP_RR might be low too, it just wouldn't > necessarily be as low since TCP will get things started again after a > loss. UDP_STREAM would just go blasting along without a care in the > world... A summary of all tests is (REFERENCE being a freebsd6 box on same LAN against same server running netserver; it's pretty clear that the *STREAM tests perform OK and the *RR test poor to very poor) : TCP_STREAM Throughput 10^6bits/sec REFERENCE 349.57 fc10-x64 138.48 UDP_STREAM Throughput 10^6bits/sec REFERENCE 388.45 fc10-x64 365.10 TCP_RR Trans. Rate per sec REFERENCE 9801.58 fc10-x64 86.87 TCP_CRR Trans. Rate per sec REFERENCE 4520.98 fc10-x64 5.60 UDP_RR Trans. Rate per sec REFERENCE 9473.20 fc10-x64 0.80 Arno Hello,
> Are netstat stats showing retransmissions during a TCP_STREAM test?
some more info :
[root@localhost mcp65]# uname -a
Linux localhost.localdomain 2.6.27-0.352.rc7.git1.fc10.x86_64 #1 SMP Tue Sep 23 21:13:29 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@localhost ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 00:1E:68:XXX
inet addr:172.16.1.31 Bcast:172.16.1.255 Mask:255.255.255.0
inet6 addr: XXX/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1731 errors:149 dropped:0 overruns:0 frame:149
TX packets:1628 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2549871 (2.4 MiB) TX bytes:125378 (122.4 KiB)
Interrupt:20 Base address:0x6000
After some fiddling (essentially installing a netperf-rpm) :
[root@localhost mcp65]# netstat -Ieth0
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 6347 508 0 0 5838 0 0 0 BMRU
[root@localhost mcp65]# ethtool eth0
Settings for eth0:
Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: external
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Link detected: yes
[root@localhost mcp65]# netperf -v -t TCP_STREAM -H 172.16.1.7
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.1.7 (172.16.1.7) port 0 AF_INET
132.64
[root@localhost mcp65]# netstat -Ieth0
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVRFlg
eth0 1500 0 84682 2933 0 0 127298 0 0 0BMRU
I hope this is what you asked for ...
Arno
Reply-To: rick.jones2@hp.com Arno J. Klaassen wrote: > Hello, > > >>Are netstat stats showing retransmissions during a TCP_STREAM test? > > > some more info : > > [root@localhost mcp65]# uname -a > Linux localhost.localdomain 2.6.27-0.352.rc7.git1.fc10.x86_64 #1 SMP Tue > Sep 23 21:13:29 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > > > [root@localhost ~]# ifconfig > eth0 Link encap:Ethernet HWaddr 00:1E:68:XXX > inet addr:172.16.1.31 Bcast:172.16.1.255 Mask:255.255.255.0 > inet6 addr: XXX/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1731 errors:149 dropped:0 overruns:0 frame:149 > TX packets:1628 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:2549871 (2.4 MiB) TX bytes:125378 (122.4 KiB) > Interrupt:20 Base address:0x6000 > > After some fiddling (essentially installing a netperf-rpm) : > > [root@localhost mcp65]# netstat -Ieth0 > Kernel Interface table > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP > TX-OVR Flg > eth0 1500 0 6347 508 0 0 5838 0 0 > 0 BMRU > > > [root@localhost mcp65]# ethtool eth0 > Settings for eth0: > Supported ports: [ MII ] > Supported link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Supports auto-negotiation: Yes > Advertised link modes: 10baseT/Half 10baseT/Full > 100baseT/Half 100baseT/Full > 1000baseT/Full > Advertised auto-negotiation: Yes > Speed: 1000Mb/s > Duplex: Full > Port: MII > PHYAD: 1 > Transceiver: external > Auto-negotiation: on > Supports Wake-on: g > Wake-on: d > Link detected: yes > > [root@localhost mcp65]# netperf -v -t TCP_STREAM -H 172.16.1.7 > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.1.7 > (172.16.1.7) port 0 AF_INET > 132.64 Hmm, I'm surprised that the lack of a value following the -v was successful - I'll have to go back and look at the code :) Still, I guess it gave you the desired "-v 0" behaviour. > [root@localhost mcp65]# netstat -Ieth0 > Kernel Interface table > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP > TX-OVRFlg > eth0 1500 0 84682 2933 0 0 127298 0 0 > 0BMRU > > > I hope this is what you asked for ... Close enough. I suspect that if you were to snap netstat -s -t before and after the netperf you'd have seen retransmissions correlated to those "RX-ERR" stats. My history is such that I don't think of netstat for link-level stats and only think of it in the context of IP-layer and higher (eg tcp). Clearly something is fubar with the rx side (well duh :). The next set of stats I'd try to look at would be ethtool stats for the interface, eg ethtool -S eth0 and see if it shows someting more specific for the "RX-ERR" shown by netstat -I eth0. rick jones Rick Jones <rick.jones2@hp.com> writes: > Arno J. Klaassen wrote: > > Hello, > > > >>Are netstat stats showing retransmissions during a TCP_STREAM test? > > some more info : > > [root@localhost mcp65]# uname -a > > Linux localhost.localdomain 2.6.27-0.352.rc7.git1.fc10.x86_64 #1 > > SMP Tue Sep 23 21:13:29 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux > > [root@localhost ~]# ifconfig > > eth0 Link encap:Ethernet HWaddr 00:1E:68:XXX > > inet addr:172.16.1.31 Bcast:172.16.1.255 Mask:255.255.255.0 > > inet6 addr: XXX/64 Scope:Link UP > > BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:1731 errors:149 dropped:0 overruns:0 frame:149 > > TX packets:1628 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:2549871 (2.4 MiB) TX bytes:125378 (122.4 KiB) > > Interrupt:20 Base address:0x6000 > > After some fiddling (essentially installing a netperf-rpm) : > > [root@localhost mcp65]# netstat -Ieth0 > > Kernel Interface table > > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR > > TX-DRP TX-OVR Flg > > eth0 1500 0 6347 508 0 0 5838 0 > > 0 0 BMRU > > [root@localhost mcp65]# ethtool eth0 > > Settings for eth0: > > Supported ports: [ MII ] > > Supported link modes: 10baseT/Half 10baseT/Full > > 100baseT/Half 100baseT/Full > > 1000baseT/Full Supports > > auto-negotiation: Yes > > Advertised link modes: 10baseT/Half 10baseT/Full > > 100baseT/Half 100baseT/Full > > 1000baseT/Full Advertised > > auto-negotiation: Yes > > Speed: 1000Mb/s > > Duplex: Full > > Port: MII > > PHYAD: 1 > > Transceiver: external > > Auto-negotiation: on > > Supports Wake-on: g > > Wake-on: d > > Link detected: yes > > [root@localhost mcp65]# netperf -v -t TCP_STREAM -H 172.16.1.7 > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.1.7 > (172.16.1.7) port 0 AF_INET > > 132.64 > > Hmm, I'm surprised that the lack of a value following the -v was > successful - I'll have to go back and look at the code :) Still, I > guess it gave you the desired "-v 0" behaviour. > > > [root@localhost mcp65]# netstat -Ieth0 > > Kernel Interface table > > Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP > TX-OVRFlg > > eth0 1500 0 84682 2933 0 0 127298 0 0 > 0BMRU > > I hope this is what you asked for ... > > Close enough. I suspect that if you were to snap netstat -s -t before > and after the netperf you'd have seen retransmissions correlated to > those "RX-ERR" stats. My history is such that I don't think of > netstat for link-level stats and only think of it in the context of > IP-layer and higher (eg tcp). > > Clearly something is fubar with the rx side (well duh :). The next > set of stats I'd try to look at would be ethtool stats for the > interface, eg ethtool -S eth0 and see if it shows someting more > specific for the "RX-ERR" shown by netstat -I eth0. OK, here it is (rx_errors_total: 10049, rx_crc_errors: 10049) : (NB, let me know how to eventually test eventual patches/binary modules on a life-CD; I've just limited linux kernel skills) [root@localhost ~]# netstat -s -t Tcp: 16 active connections openings 0 passive connection openings 10 failed connection attempts 0 connection resets received 2 connections established 683 segments received 693 segments send out 4 segments retransmited 0 bad segments received. 14 resets sent UdpLite: TcpExt: 7 delayed acks sent Quick ack mode was activated 3 times 4 packets directly queued to recvmsg prequeue. 2 packets directly received from prequeue 230 packets header predicted 9 acknowledgments not containing data received 47 predicted acknowledgments 2 congestion windows recovered after partial ack 0 TCP data loss events 4 other TCP timeouts 3 DSACKs sent for old packets 4 DSACKs sent for out of order packets 1 connections reset due to unexpected data IpExt: InMcastPkts: 14 OutMcastPkts: 16 InBcastPkts: 22 [root@localhost ~]# ethtool -S eth0 NIC statistics: tx_bytes: 86812 tx_zero_rexmt: 1057 tx_one_rexmt: 0 tx_many_rexmt: 0 tx_late_collision: 0 tx_fifo_errors: 0 tx_carrier_errors: 0 tx_excess_deferral: 0 tx_retry_error: 0 rx_frame_error: 0 rx_extra_byte: 0 rx_late_collision: 0 rx_runt: 0 rx_frame_too_long: 0 rx_over_errors: 0 rx_crc_errors: 323 rx_frame_align_error: 0 rx_length_error: 0 rx_unicast: 1021 rx_multicast: 0 rx_broadcast: 29 rx_packets: 1050 rx_errors_total: 323 tx_errors_total: 0 tx_deferral: 0 tx_packets: 1057 rx_bytes: 1421807 tx_pause: 0 rx_pause: 0 rx_drop_frame: 0 [root@localhost ~]# netperf -v 1 -l 60 -H 172.16.1.7 -t TCP_STREAM TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.1.7 (172.16.1.7) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 3217968 16384 16384 60.00 149.06 [root@localhost ~]# netstat -s -t Tcp: 18 active connections openings 0 passive connection openings 10 failed connection attempts 0 connection resets received 2 connections established 523674 segments received 820982 segments send out 4 segments retransmited 0 bad segments received. 14 resets sent UdpLite: TcpExt: 1 TCP sockets finished time wait in fast timer 7 delayed acks sent Quick ack mode was activated 15 times 5 packets directly queued to recvmsg prequeue. 2 packets directly received from prequeue 666 packets header predicted 498190 acknowledgments not containing data received 23392 predicted acknowledgments 2 congestion windows recovered after partial ack 0 TCP data loss events 4 other TCP timeouts 15 DSACKs sent for old packets 16 DSACKs sent for out of order packets 1 connections reset due to unexpected data IpExt: InMcastPkts: 14 OutMcastPkts: 16 InBcastPkts: 33 [root@localhost ~]# ethtool -S eth0 NIC statistics: tx_bytes: 1175487542 tx_zero_rexmt: 821009 tx_one_rexmt: 0 tx_many_rexmt: 0 tx_late_collision: 0 tx_fifo_errors: 0 tx_carrier_errors: 0 tx_excess_deferral: 0 tx_retry_error: 0 rx_frame_error: 0 rx_extra_byte: 0 rx_late_collision: 0 rx_runt: 0 rx_frame_too_long: 0 rx_over_errors: 0 rx_crc_errors: 9301 rx_frame_align_error: 0 rx_length_error: 0 rx_unicast: 523675 rx_multicast: 0 rx_broadcast: 41 rx_packets: 523716 rx_errors_total: 9301 tx_errors_total: 0 tx_deferral: 0 tx_packets: 821009 rx_bytes: 39624285 tx_pause: 0 rx_pause: 0 rx_drop_frame: 0 [root@localhost ~]# netperf -v 1 -l 60 -H 172.16.1.7 -t TCP_RR TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.1.7 (172.16.1.7) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 87380 1 1 60.00 350.52 3217968 3217968 [root@localhost ~]# netstat -s -t Tcp: 20 active connections openings 0 passive connection openings 10 failed connection attempts 1 connection resets received 2 connections established 545934 segments received 843030 segments send out 226 segments retransmited 0 bad segments received. 15 resets sent UdpLite: TcpExt: 3 TCP sockets finished time wait in fast timer 9 delayed acks sent Quick ack mode was activated 22 times 20941 packets directly queued to recvmsg prequeue. 190 packets directly received from backlog 20725 packets directly received from prequeue 1046 packets header predicted 20913 packets header predicted and directly queued to user 498407 acknowledgments not containing data received 44212 predicted acknowledgments 3 congestion windows recovered after partial ack 0 TCP data loss events 7 timeouts after SACK recovery 219 other TCP timeouts 22 DSACKs sent for old packets 23 DSACKs sent for out of order packets 1 connections reset due to unexpected data 1 connections reset due to early user close IpExt: InMcastPkts: 14 OutMcastPkts: 16 InBcastPkts: 40 [root@localhost ~]# ethtool -S eth0 NIC statistics: tx_bytes: 1177080452 tx_zero_rexmt: 843341 tx_one_rexmt: 0 tx_many_rexmt: 0 tx_late_collision: 0 tx_fifo_errors: 0 tx_carrier_errors: 0 tx_excess_deferral: 0 tx_retry_error: 0 rx_frame_error: 0 rx_extra_byte: 0 rx_late_collision: 0 rx_runt: 0 rx_frame_too_long: 0 rx_over_errors: 0 rx_crc_errors: 9750 rx_frame_align_error: 0 rx_length_error: 0 rx_unicast: 545997 rx_multicast: 0 rx_broadcast: 50 rx_packets: 546047 rx_errors_total: 9750 tx_errors_total: 0 tx_deferral: 0 tx_packets: 843341 rx_bytes: 42743567 tx_pause: 0 rx_pause: 0 rx_drop_frame: 0 [root@localhost ~]# netperf -v 1 -l 60 -H 172.16.1.7 -t UDP_RR UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.1.7 (172.16.1.7) port 0 AF_INET Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 124928 124928 1 1 60.00 5.55 57344 524288 [root@localhost ~]# netstat -s -t Tcp: 21 active connections openings 0 passive connection openings 10 failed connection attempts 1 connection resets received 2 connections established 546873 segments received 843968 segments send out 226 segments retransmited 0 bad segments received. 15 resets sent UdpLite: TcpExt: 4 TCP sockets finished time wait in fast timer 10 delayed acks sent Quick ack mode was activated 27 times 20941 packets directly queued to recvmsg prequeue. 190 packets directly received from backlog 20725 packets directly received from prequeue 1323 packets header predicted 20913 packets header predicted and directly queued to user 498409 acknowledgments not containing data received 44213 predicted acknowledgments 3 congestion windows recovered after partial ack 0 TCP data loss events 7 timeouts after SACK recovery 219 other TCP timeouts 27 DSACKs sent for old packets 29 DSACKs sent for out of order packets 1 connections reset due to unexpected data 1 connections reset due to early user close IpExt: InMcastPkts: 14 OutMcastPkts: 16 InBcastPkts: 47 [root@localhost ~]# ethtool -S eth0 NIC statistics: tx_bytes: 1177171368 tx_zero_rexmt: 844569 tx_one_rexmt: 0 tx_many_rexmt: 0 tx_late_collision: 0 tx_fifo_errors: 0 tx_carrier_errors: 0 tx_excess_deferral: 0 tx_retry_error: 0 rx_frame_error: 0 rx_extra_byte: 0 rx_late_collision: 0 rx_runt: 0 rx_frame_too_long: 0 rx_over_errors: 0 rx_crc_errors: 10049 rx_frame_align_error: 0 rx_length_error: 0 rx_unicast: 547225 rx_multicast: 0 rx_broadcast: 56 rx_packets: 547281 rx_errors_total: 10049 tx_errors_total: 0 tx_deferral: 0 tx_packets: 844569 rx_bytes: 44106473 tx_pause: 0 rx_pause: 0 rx_drop_frame: 0 [root@localhost ~]# Arno Reply-To: rick.jones2@hp.com >>Clearly something is fubar with the rx side (well duh :). The next >>set of stats I'd try to look at would be ethtool stats for the >>interface, eg ethtool -S eth0 and see if it shows someting more >>specific for the "RX-ERR" shown by netstat -I eth0. > > > OK, here it is (rx_errors_total: 10049, rx_crc_errors: 10049) : Well, that seems to confirm that it is CRC errors. Did your friend say why he thought they were false CRC errors? If indeed it is only small packets/frames getting the CRC errors, in _theory_ a TCP_RR test with a larger request/response size should show "good" performance because there should be few if any standalone ACKs. You could start with something like: netperf -H <remote> -t TCP_RR -- -r 1448 and work your way down, checking for those CRC errors as you go. I don't thnik folks need all the output, just some idea of if you still see CRC errors with full-size TCP_RR and if you don't at what size you start to see them. I picked 1448 to have the request and response both result in a "full" TCP segment assuming an MTU of 1500 bytes and timestamps being enabled. (net.ipv4.tcp_timestamps). Has the usual litany of cable swapping and such been done already? A cable *known* to be good at 1G swapped-in and such? If this is via a switch, just for completeness trying other switch ports etc etc. While I'd not expect it to be at 1Gig and autoneg, CRC errors can sometimes be a sign of a duplex mismatch, but I have a difficult time seeing that happening - unless there happens to be other traffic on the link a plain netperf TCP_RR or UDP_RR test should "never" have both sides trying to talk at the same time and so shouldn't trip-over a duplex mismatch like a TCP_STREAM test would. > (NB, let me know how to eventually test eventual patches/binary > modules on a life-CD; I've just limited linux kernel skills) I'm going to have to defer to others on that score. Meanwhile some additional information gathering: For grins and bugzilla posterity, ethtool -i <interface> would be goodness. What was the last "known good" configuration? What is running "on the other side?" etc etc. Does say some other or earlier distro (Fedora, Ubuntu whatnot) Live CD not exhibit this problem? If not, what are the kernel and ethtool -i information from that? rick jones Hello, sorry for the late response Rick Jones <rick.jones2@hp.com> writes: > >>Clearly something is fubar with the rx side (well duh :). The next > >>set of stats I'd try to look at would be ethtool stats for the > >>interface, eg ethtool -S eth0 and see if it shows someting more > >>specific for the "RX-ERR" shown by netstat -I eth0. > > OK, here it is (rx_errors_total: 10049, rx_crc_errors: 10049) : > > Well, that seems to confirm that it is CRC errors. Did your friend > say why he thought they were false CRC errors? nope, but he gave me a patch for the freebsd driver which [would] "try to pass CRC errored frames to upper stack." And it improved the *RR tests by an order of magnitude (still leaving them another order of magnitude below reference values) > If indeed it is only small packets/frames getting the CRC errors, in > _theory_ a TCP_RR test with a larger request/response size should show > "good" performance because there should be few if any standalone > ACKs. You could start with something like: > > netperf -H <remote> -t TCP_RR -- -r 1448 > > and work your way down, checking for those CRC errors as you go. I > don't thnik folks need all the output, just some idea of if you still > see CRC errors with full-size TCP_RR and if you don't at what size you > start to see them. I picked 1448 to have the request and response > both result in a "full" TCP segment assuming an MTU of 1500 bytes and > timestamps being enabled. (net.ipv4.tcp_timestamps). packet/frame size does not seem to be of any influence; I tried a bunch of combinations and all give more or less the same performance > Has the usual litany of cable swapping and such been done already? A > cable *known* to be good at 1G swapped-in and such? If this is via a > switch, just for completeness trying other switch ports etc etc. yop; I tested with 3 differents switches and a couple of different cables, including the famous *known* good one as well as a brand new one; no difference > While I'd not expect it to be at 1Gig and autoneg, CRC errors can > sometimes be a sign of a duplex mismatch, but I have a difficult time > seeing that happening - unless there happens to be other traffic on > the link a plain netperf TCP_RR or UDP_RR test should "never" have > both sides trying to talk at the same time and so shouldn't trip-over > a duplex mismatch like a TCP_STREAM test would. > > > (NB, let me know how to eventually test eventual patches/binary > > modules on a life-CD; I've just limited linux kernel skills) > > I'm going to have to defer to others on that score. Meanwhile some > additional information gathering: > > For grins and bugzilla posterity, ethtool -i <interface> would be > goodness. [root@localhost mcp65]# ethtool -i eth0 driver: forcedeth version: 0.61 firmware-version: bus-info: 0000:00:06.0 > What was the last "known good" configuration? What is > running "on the other side?" etc etc. Does say some other or earlier > distro (Fedora, Ubuntu whatnot) Live CD not exhibit this problem? If > not, what are the kernel and ethtool -i information from that? I have this problem since I bought this notebook a month ago. I tried freebsd7 (nfe driver), opensolaris5.10 (nfo driver) and fc10 all with same result. It also runs vista, but I cannot find a netperf.exe for 2.4.4 ... if someone has a pointer (I found an earlier version but it makes netserver core dump when startiong the test) thanx for your help > rick jones Arno Hello, Rick Jones <rick.jones2@hp.com> writes: > >>Clearly something is fubar with the rx side (well duh :). The next > >>set of stats I'd try to look at would be ethtool stats for the > >>interface, eg ethtool -S eth0 and see if it shows someting more > >>specific for the "RX-ERR" shown by netstat -I eth0. > > OK, here it is (rx_errors_total: 10049, rx_crc_errors: 10049) : > > Well, that seems to confirm that it is CRC errors. [ .. stuff deleted .. ] well, just a quick 'knock knock' in case any of you has something new on this I can live with 100Mbps, but would be glad to help further to get it work at 1gbps (pavillon dv6700). Looks like a silicon bug to me, but alas; please beep me if I can test something. Best regards, Arno Closing as obsolete |