Bug 219221

Summary: TCP connection/socket gets stuck and the handshaking is delayed
Product: Networking Reporter: Zoltan Balogh (zbal1977)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: NEW ---    
Severity: normal    
Priority: P3    
Hardware: All   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: not working TCP connection PCAP file
working TCP connection PCAP file

Description Zoltan Balogh 2024-09-02 14:25:13 UTC
Created attachment 306806 [details]
not working TCP connection PCAP file

Hi,

We have a client/server application which is developed a long time ago. It has been running in production for more than 10 years. The client is a Windows application written in C++, and the server-side component is written in Java8. 

This client/server software has been working fine for a long time on Linux servers. Currently, we use AlmaLinux 9. It was working on AlmaLinux 9 until updating the kernel. 

So, when we update the Linux kernel from "5.14.0-362.13.1.el9_3.x86_64" to "kernel-5.14.0-427.31.1.el9_4.x86_64" the application gets unstable: The client drops the connection based due to not receiving messages in the proper time. We notice delays, the client just waiting for the response from the server. The issue is always reproducible with the new kernel. And if we go back to the old kernel, the problem is gone. We kept running the test for hours in both cases.

I attach the PCAP file created on the working system running with the old kernel (5.14.0-362). Additionally, I attach another PCAP file created on the non-working system. The difference is the kernel (5.14.0-427). All other components are the same.

Please analyze it and let us know how to fix it. Whether it is a general issue in the Linux kernel, or only AlmaLinux distro suffers from that.

Thanks a lot!

Regards,
Zoltan Balogh
Comment 1 Zoltan Balogh 2024-09-02 14:26:08 UTC
Created attachment 306807 [details]
working TCP connection PCAP file
Comment 2 Stephen Hemminger 2024-09-09 17:43:38 UTC
Linux kernel networking does not use Bugzilla, instead problems and discussion takes place on the netdev@vger.kernel.org mailing list.
I do forward the reports like this to the list.

This report has gotten no feedback.

An obvious first step is to build your own kernel and use git bisect to identify the place where the regression happend.
 
Suggest contacting Eric Dumazet <edumazet@google.com> and Neal Cardwell <ncardwell@google.com>