Bug 18822
Summary: | TCP Communications gets blocked, then resetted | ||
---|---|---|---|
Product: | Networking | Reporter: | Fred Baumgarten (dc6iq) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | RESOLVED DOCUMENTED | ||
Severity: | normal | CC: | alan, eric.dumazet |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.32-24-generic #43-Ubuntu | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
both machine dumps
tar file holding the last 300 lines of the broken tcp communication. sack=0 on host bacula both machines with sack = 0 netstat -s before and after problem on both hosts |
Could you try to get a tcpdump without filter drops ? You could record on a pcap file, then replay it and use tail command : tcpdump -i eth0 port 9103 -w file.pcap ... tcpdump -r file.pcap | tail -n 300 You could try to disable sack and see what happens echo 0 >/proc/sys/net/ipv4/tcp_sack Hi Eric ! Thanks for investigating that bug. I am sorry not to reply earlier - but i am a bit busy right now. i did the recording stuff (appended in the new attachment tgz file). Right now i started another job with tcp_sack = 0. I'll post the result later... Last few times the conection broke down at 20 GB and 2 GB. Created attachment 31232 [details]
tar file holding the last 300 lines of the broken tcp communication.
Created attachment 31262 [details]
sack=0 on host bacula
Created attachment 31272 [details]
both machines with sack = 0
I tried some more constellations # echo 0 >/proc/sys/net/ipv4/tcp_sack on machine bacula (32 bit) restarted all bacula servers -> https://bugzilla.kernel.org/attachment.cgi?id=31262 tcpsack set to 0 on both machines still fails. -> https://bugzilla.kernel.org/attachment.cgi?id=31272 Hope this helps... machine disks has processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Pentium(R) Dual-Core CPU E5200 @ 2.50GHz stepping : 6 cpu MHz : 2500.000 cache size : 2048 KB [...] machine bacula has processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 28 model name : Intel(R) Atom(TM) CPU 330 @ 1.60GHz stepping : 2 cpu MHz : 1600.047 cache size : 512 KB [...] Maybe I'll try to set up 64 bit env for bacula as well... OK, could you report "netstat -s" changes ? netstat -s >before <transfert> netstat -s >after diff after before Created attachment 31442 [details]
netstat -s before and after problem on both hosts
Hi Eric ! funny for me to see my netstat(8) got improved like that :-) the german manual page still hasn't that feature mentioned, looks like noone did continue my work there... thanks again for your work. |
Created attachment 30682 [details] both machine dumps having a freshly installed bacula server i could never get a backup from my working machine done to the bacula server. The Connection transmits approximately 3 GiB, then locks up and resets the connection. Transmission gets stuck at a certain point, then the bacula server does not reply to retransmitted packets on IPV4 stack. Retry Count on disks machine (my working machine) raises up to 13, then the Connection is gone. bacula server tries to send a Push packet (after KeepAlive timer runs out), and get the final RST packet from disks, because the connection is gone. In the Attachment you will find the tcpdumps from both machines, actually the sending machine dropped some packets in the dump. It might be a possible help: disks is running an 64 bit kernel whereas bacula is running 32 bit. I haven't looked into the option bits very well but it looks like there is a problem hidden: last ack being ok: 09:32:56.142876 IP bacula.elkenet.bacula-sd > disks.elkenet.50766: Flags [.], ack 2005754588, win 9582, options [nop,nop,TS val 21825875 ecr 4530083], length 0 next ack packet: 09:32:56.144763 IP bacula.elkenet.bacula-sd > disks.elkenet.50766: Flags [.], ack 2005773412, win 9308, options [nop,nop,TS val 21825876 ecr 4530083,nop,nop,sack 1 {2005774860:2005776308}], length 0 root@disks:~# uname -a Linux disks 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16 14:58:24 UTC 2010 x86_64 GNU/Linux root@bacula:~# uname -a Linux bacula 2.6.32-24-generic-pae #43-Ubuntu SMP Thu Sep 16 15:30:27 UTC 2010 i686 GNU/Linux Doing a 20GB backup on a debian server works fine server:~# uname -a Linux server 2.6.32-5-486 #1 Sat Sep 18 01:43:00 UTC 2010 i686 GNU/Linux Doing a 26 GB backup from a 32 bit Ubuntu works fine as well. Maybe its a 64 bit issue... root@elke:~# uname -a Linux elke 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16 14:17:33 UTC 2010 i686 GNU/Linux If any further input is required, just let me know...