Bug 72561

Summary: missing some icmp redirects
Product: Networking Reporter: Per Jessen (per)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: NEW ---    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.11.6-4-default Subsystem:
Regression: No Bisected commit-id:

Description Per Jessen 2014-03-20 14:45:21 UTC
Background:  I have recently upgraded our firewall to openSUSE 13.1. It used to
be 11.x I think. We have long had a transparent webcache setup, but after
upgrading I disabled this as there was something odd with it, and I didn't have
time to investigate initially. 

Setup:  the webcache works by redirecting port 80 and 443 traffic to a system
running squid.  

Problem:  complex websites, e.g. an on-line newspaper, typically pull in
content from several sources. When a page is loaded, I often see e.g. 8-10
different sources (IP-addresses) of content.  With the webcaching enabled,
loading e.g. http://www.tagesanzeiger.ch leads to a 2-3 minute wait, usually
for one of those IPs/resources.  I have traced the ICMP redirects and noticed
that usually only one of the IP-addresses is not redirected. This is the one
the browser waits for. After about 2 minutes, the connection-attempt times out
and is retried, and then properly redirected.  This is reproducable with a
variety of websites. 
Relatively "simple" web pages using content from only one webserver, e.g. a
wikipedia page or the google start page show no problems.  

Some sort of race condition in the ip stack?

Reproducible: Always

Steps to Reproduce:
in a transparent webcache setup:
1. load http://www.tagesanzeiger.ch/ 
2. observe how the browser will end up waiting for one source.
3. check the address of this resource
4. ip route get <ipaddr> will show it was not redirected
5. after 2 minutes, the page is shown.
6. ip route get <ipaddr> will show it was now redirected
Comment 1 Alan 2014-04-08 10:47:50 UTC
This isn't a support forum., just used for bug tracking

Your best bet after capturing traces is netdev@vger.kernel.org

ICMP error reporting is ratelimited however, so it seems a dumb way to implement a web cache
Comment 2 Per Jessen 2014-06-10 18:07:04 UTC
(In reply to Alan from comment #1)
> This isn't a support forum., just used for bug tracking
> 
> Your best bet after capturing traces is netdev@vger.kernel.org
>
> ICMP error reporting is ratelimited however, so it seems a dumb way to
> implement a web cache

It used to work fine until very recently. 

I have devised a way to reproduce the problem. I have a test setup of three machines: 

“client”, “firewall” and “server”. All on the same network.

Client:
Set up default route via “firewall”.

Server:
Assign 10.232.1.1-2-3-4-...-15/24 to an interface.
Run a tcp echo service (port 7).

Firewall:
Create routing table “test99”.  /etc/iproute2/rt_tables.
ip route add default via <server> dev eth0 table test99
ip rule add fwmark 5 table test99
iptables -A PREROUTING -t mangle -i eth0 -p tcp --dport 7 -j MARK --set-mark 5

(this setup is what will produce the ICMP redirects).

On “firewall”, run tcpdump to document (missing) redirects:
tcpdump -n -i eth0 proto \\icmp 

On “client”, create some test input:

cat <<XXX >test.input
klop
alpha
nothing
tagi
line1
line2
line3
line4
XXX

Create a script
cat <<XXX >doit
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
    telnet 10.232.1.$i 7 <test.input &
done

When you run “sh doit”, all of the telnet requests to 10.232.1.x should be redirected, but the tcpdump running on "firewall" will only show some of them.