Bug 218937
Summary: | TCP connection frozen on sender and receiver. No retries beyond 1. | ||
---|---|---|---|
Product: | Networking | Reporter: | joyson (joysonanuit) |
Component: | Other | Assignee: | Stephen Hemminger (stephen) |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
sender pcap screenshot
receiver pcap screenshot attachment-22439-0.html |
Created attachment 306414 [details]
receiver pcap screenshot
Is this reproducible in 6.9.3 or 6.6.32? It's highly unlikely anyone will help unless you run something more supported/modern. Thanks, Artem. This is for a commercial product of a company . So kernel is set to 5.4 only. cannot try in newer kernel. Hence the screenshot of pcap as well. Cannot share full pcap for confidentiality reasons. We mostly like to understand the possible reasons that could cause it and how to fix. If there are any optimisations/changes from 2.6 to 5.4 that are playing a role. Are both the sender and receiver Linux? Are there any middle boxes or firewalls in the way? It looks like there might be an MTU mismatch or non-functional TSO in the NIC. The packet that gets stuck is larger than 1500. Created attachment 306435 [details] attachment-22439-0.html sender receiver are both Linux. MTU on both sides is 7020. in the same setup if we run our app thats based on 2.6.38, we do not see this issue. only when we run the same up on 5.4.254, we see it. On Wed, Jun 5, 2024 at 9:36 PM <bugzilla-daemon@kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=218937 > > --- Comment #4 from Stephen Hemminger (stephen@networkplumber.org) --- > Are both the sender and receiver Linux? Are there any middle boxes or > firewalls > in the way? > > It looks like there might be an MTU mismatch or non-functional TSO in the > NIC. > The packet that gets stuck is larger than 1500. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You reported the bug. some update on it. we tried a few thing and last 2 showed us good results. the code was like. sender send all data and call close(fd) receiver: read all and when read 0, close(fd) if close() if commented out at sender, no socket freeze or data loss. if close() is replaced with shutdown, no socket freeze of data loss. looks like close() on sender was the problem. |
Created attachment 306413 [details] sender pcap screenshot Hi, i am facing an issue in TCP. At a random point in packet transfer, sender stops retrying and receiver stops acking. our previous kernel was 2.6 and current kernel is 5.4. the sequence of events are as below. sender: sends few packets of data. misses a few ACKs. retries again. does not get an ack. stops receiver: receives the packets. sends ack to only few packets. does not retry ack for the remaining packets. for this FIN, the sender sends RST. there is a timeout at receiver end which forces the socket to be closed. this erroneous socket reaches the end of timeout and sends a FIN with ACK of all the data that it has received(including the ones that it did not ack and the sender was waiting for)