Bug 10556
Summary: | IPVS sync_backup oops | ||
---|---|---|---|
Product: | Networking | Reporter: | L0op8ack (l0op8ack) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | bunk |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24.x | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
L0op8ack
2008-04-25 21:27:17 UTC
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000014 printing eip: c030659e *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: xt_tcpudp iptable_mangle xt_MARK xt_multiport ip_tables x_tables ip_vs_wrr ip_vs_wlc ip_vs_sh ip_vs_sed ip_vs_rr ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh pcnet32 crc32 e1000 e100 mii Pid: 3960, comm: ipvs_syncbackup Not tainted (2.6.24.4 #3) EIP: 0060:[<c030659e>] EFLAGS: 00010246 CPU: 0 EIP is at sync_thread+0x919/0xa3c EAX: 00000000 EBX: f6fe6800 ECX: e3ba2c00 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: f7310034 ESP: f699bf54 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process ipvs_syncbackup (pid: 3960, ti=f699a000 task=f6995030 task.ti=f699a000) Stack: 01000000 00000000 23035e0a 00000000 00001183 00000000 f699bfc0 f6d04860 00000002 00000034 f7310000 f78f1800 00000001 f731001c c0000000 00000000 00000010 f699bfc0 00000001 00000000 00000000 00000000 00000000 f6995030 Call Trace: [<c011c53a>] default_wake_function+0x0/0x8 [<c0305c85>] sync_thread+0x0/0xa3c [<c0104aa7>] kernel_thread_helper+0x7/0x10 ======================= Code: a1 20 64 3e c0 89 43 54 8b 54 24 34 0f b7 42 16 66 c1 c0 08 0f b7 c0 66 89 43 42 0f b6 42 01 e8 90 06 00 00 0f b7 53 42 0f b7 d2 <8b> 40 14 8b 04 90 89 43 38 89 d8 e8 8a 8e ff ff 8b 44 24 28 03 EIP: [<c030659e>] sync_thread+0x919/0xa3c SS:ESP 0068:f699bf54 Kernel panic - not syncing: Fatal exception in interrupt Reply-To: akpm@linux-foundation.org On Fri, 25 Apr 2008 21:27:18 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10556 > > Summary: IPVS sync_backup oops > Product: Networking > Version: 2.5 > KernelVersion: 2.6.24.x > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: l0op8ack@hotmail.com > > > Latest working kernel version: 2.6.23.17 > Earliest failing kernel version: 2.6.24 It's a regression. > Distribution: kernel.org > Hardware Environment: i386,P4 > Software Environment: ipvsadm-1.24 > Problem Description: > MATSER ipvs box will push connection table to BACKUP one using multicast, > while the BACKUP ipvs box received the multicast message, > the BACKUP ipvs box will report BUG() and crash > > Steps to reproduce: > +-------------+ > |192.168.1.123| > +------+------+ > | > --------------+-------+------------+-------- > | VIP:192.168.1.219 | > MASTER +-----+-----+ +-----+-----+ BACKUP > eth1 |192.168.1.1| eth1 |192.168.1.2| > eth0 |192.168.0.1| eth0 |192.168.0.2| > +-----+-----+ +-----+-----+ > |-> multicast sync ->| > --------------+-------+------------+-------- > | > real-server +------------+ www service open > |192.168.0.80| gateway 192.168.0.1 > +------+-----+ > > 1) > MASTER ipvs box: > ipvsadm --start-daemon master --mcast-interface eth0 --syncid 80 > > BACKUP ipvs box: > ipvsadm --start-daemon backup --mcast-interface eth0 --syncid 80 > > 2) > Both MASTER and BACKUP box: > 2.1) define service > ipvsadm -A -t 192.168.1.219:80 -s rr > 2.2) add real-server > ipvsadm -a -t 192.168.1.219:80 -r 192.168.0.80:80 -m > # set real-server gateway = 192.168.0.1 > > 3) > client(192.168.1.123) access http://192.168.1.219, > MATSER ipvs box will push connection table to BACKUP one using multicast, > while the BACKUP ipvs box received the multicast message, > the BACKUP ipvs box report BUG() and crash. > Also, BUG: unable to handle kernel NULL pointer dereference at virtual address 00000014 printing eip: c030659e *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: xt_tcpudp iptable_mangle xt_MARK xt_multiport ip_tables x_tables ip_vs_wrr ip_vs_wlc ip_vs_sh ip_vs_sed ip_vs_rr ip_vs_nq ip_vs_lc ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh pcnet32 crc32 e1000 e100 mii Pid: 3960, comm: ipvs_syncbackup Not tainted (2.6.24.4 #3) EIP: 0060:[<c030659e>] EFLAGS: 00010246 CPU: 0 EIP is at sync_thread+0x919/0xa3c EAX: 00000000 EBX: f6fe6800 ECX: e3ba2c00 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: f7310034 ESP: f699bf54 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process ipvs_syncbackup (pid: 3960, ti=f699a000 task=f6995030 task.ti=f699a000) Stack: 01000000 00000000 23035e0a 00000000 00001183 00000000 f699bfc0 f6d04860 00000002 00000034 f7310000 f78f1800 00000001 f731001c c0000000 00000000 00000010 f699bfc0 00000001 00000000 00000000 00000000 00000000 f6995030 Call Trace: [<c011c53a>] default_wake_function+0x0/0x8 [<c0305c85>] sync_thread+0x0/0xa3c [<c0104aa7>] kernel_thread_helper+0x7/0x10 ======================= Code: a1 20 64 3e c0 89 43 54 8b 54 24 34 0f b7 42 16 66 c1 c0 08 0f b7 c0 66 89 43 42 0f b6 42 01 e8 90 06 00 00 0f b7 53 42 0f b7 d2 <8b> 40 14 8b 04 90 89 43 38 89 d8 e8 8a 8e ff ff 8b 44 24 28 03 EIP: [<c030659e>] sync_thread+0x919/0xa3c SS:ESP 0068:f699bf54 Kernel panic - not syncing: Fatal exception in interrupt Which will be a bit hard to track down because it seems that pretty much the whole world got inlined into sync_thread(). Hi. On Sat, Apr 26, 2008 at 04:04:52AM -0700, Andrew Morton (akpm@linux-foundation.org) wrote: > [<c011c53a>] default_wake_function+0x0/0x8 > [<c0305c85>] sync_thread+0x0/0xa3c > [<c0104aa7>] kernel_thread_helper+0x7/0x10 > ======================= > Code: a1 20 64 3e c0 89 43 54 8b 54 24 34 0f b7 42 16 66 c1 c0 08 0f b7 c0 66 > 89 43 42 0f b6 42 01 e8 90 06 00 00 0f b7 53 42 0f b7 d2 <8b> 40 14 8b 04 90 > 89 > 43 38 89 d8 e8 8a 8e ff ff 8b 44 24 28 03 > EIP: [<c030659e>] sync_thread+0x919/0xa3c SS:ESP 0068:f699bf54 > Kernel panic - not syncing: Fatal exception in interrupt Any chance to run gdb ./vmlinux $ l *(sync_thread+0x919) for that kernel, and if it was not compiled with debug info try to do that and get bug with it? > Which will be a bit hard to track down because it seems that pretty much > the whole world got inlined into sync_thread(). I'm not ipvs guru, but that will give us at least some hint... Reply-To: ja@ssi.bg Hello, On Sat, 26 Apr 2008, Andrew Morton wrote: > On Fri, 25 Apr 2008 21:27:18 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10556 > > > > ReportedBy: l0op8ack@hotmail.com > > > > Latest working kernel version: 2.6.23.17 > > Earliest failing kernel version: 2.6.24 > > Also, > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 00000014 > printing eip: c030659e *pde = 00000000 > Oops: 0000 [#1] SMP > Modules linked in: xt_tcpudp iptable_mangle xt_MARK xt_multiport ip_tables > x_tables ip_vs_wrr ip_vs_wlc ip_vs_sh ip_vs_sed ip_vs_rr ip_vs_nq ip_vs_lc > ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh pcnet32 crc32 e1000 e100 mii > > Pid: 3960, comm: ipvs_syncbackup Not tainted (2.6.24.4 #3) > EIP: 0060:[<c030659e>] EFLAGS: 00010246 CPU: 0 > EIP is at sync_thread+0x919/0xa3c > EAX: 00000000 EBX: f6fe6800 ECX: e3ba2c00 EDX: 00000000 > ESI: 00000000 EDI: 00000000 EBP: f7310034 ESP: f699bf54 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process ipvs_syncbackup (pid: 3960, ti=f699a000 task=f6995030 > task.ti=f699a000) > Stack: 01000000 00000000 23035e0a 00000000 00001183 00000000 f699bfc0 > f6d04860 > 00000002 00000034 f7310000 f78f1800 00000001 f731001c c0000000 > 00000000 > 00000010 f699bfc0 00000001 00000000 00000000 00000000 00000000 > f6995030 > Call Trace: > [<c011c53a>] default_wake_function+0x0/0x8 > [<c0305c85>] sync_thread+0x0/0xa3c > [<c0104aa7>] kernel_thread_helper+0x7/0x10 > ======================= > Code: a1 20 64 3e c0 89 43 54 8b 54 24 34 0f b7 42 16 66 c1 c0 08 0f b7 c0 66 > 89 43 42 0f b6 42 01 e8 90 06 00 00 0f b7 53 42 0f b7 d2 <8b> 40 14 8b 04 90 > 89 > 43 38 89 d8 e8 8a 8e ff ff 8b 44 24 28 03 > EIP: [<c030659e>] sync_thread+0x919/0xa3c SS:ESP 0068:f699bf54 > Kernel panic - not syncing: Fatal exception in interrupt > > > Which will be a bit hard to track down because it seems that pretty much > the whole world got inlined into sync_thread(). I can not fully understand the above oops but hope following fix can help (not tested). It is for 2.6.25. I can provide patch for 2.6.24 if needed (there are rejects): Result from ip_vs_proto_get() should be checked because protocol value can be invalid or unsupported in backup. Also, add checks to validate message limits and connection state. Signed-off-by: Julian Anastasov <ja@ssi.bg> --- diff -urp v2.6.25/linux/include/net/ip_vs.h linux/include/net/ip_vs.h --- v2.6.25/linux/include/net/ip_vs.h 2008-04-17 09:58:08.000000000 +0300 +++ linux/include/net/ip_vs.h 2008-04-26 20:10:46.000000000 +0300 @@ -405,7 +405,8 @@ struct sk_buff; struct ip_vs_protocol { struct ip_vs_protocol *next; char *name; - __u16 protocol; + u16 protocol; + u16 num_states; int dont_defrag; atomic_t appcnt; /* counter of proto app incs */ int *timeout_table; /* protocol timeout table */ diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_ah.c linux/net/ipv4/ipvs/ip_vs_proto_ah.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_ah.c 2007-07-10 09:18:43.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto_ah.c 2008-04-26 20:14:17.000000000 +0300 @@ -160,6 +160,7 @@ static void ah_exit(struct ip_vs_protoco struct ip_vs_protocol ip_vs_protocol_ah = { .name = "AH", .protocol = IPPROTO_AH, + .num_states = 1, .dont_defrag = 1, .init = ah_init, .exit = ah_exit, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_esp.c linux/net/ipv4/ipvs/ip_vs_proto_esp.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_esp.c 2008-04-17 09:58:09.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto_esp.c 2008-04-26 20:14:39.000000000 +0300 @@ -159,6 +159,7 @@ static void esp_exit(struct ip_vs_protoc struct ip_vs_protocol ip_vs_protocol_esp = { .name = "ESP", .protocol = IPPROTO_ESP, + .num_states = 1, .dont_defrag = 1, .init = esp_init, .exit = esp_exit, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_tcp.c linux/net/ipv4/ipvs/ip_vs_proto_tcp.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_tcp.c 2008-01-25 10:45:06.000000000 +0200 +++ linux/net/ipv4/ipvs/ip_vs_proto_tcp.c 2008-04-26 20:15:45.000000000 +0300 @@ -594,6 +594,7 @@ static void ip_vs_tcp_exit(struct ip_vs_ struct ip_vs_protocol ip_vs_protocol_tcp = { .name = "TCP", .protocol = IPPROTO_TCP, + .num_states = IP_VS_TCP_S_LAST, .dont_defrag = 0, .appcnt = ATOMIC_INIT(0), .init = ip_vs_tcp_init, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_udp.c linux/net/ipv4/ipvs/ip_vs_proto_udp.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_udp.c 2008-01-25 10:45:06.000000000 +0200 +++ linux/net/ipv4/ipvs/ip_vs_proto_udp.c 2008-04-26 20:15:07.000000000 +0300 @@ -409,6 +409,7 @@ static void udp_exit(struct ip_vs_protoc struct ip_vs_protocol ip_vs_protocol_udp = { .name = "UDP", .protocol = IPPROTO_UDP, + .num_states = IP_VS_UDP_S_LAST, .dont_defrag = 0, .init = udp_init, .exit = udp_exit, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_sync.c linux/net/ipv4/ipvs/ip_vs_sync.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_sync.c 2008-04-17 09:58:09.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_sync.c 2008-04-26 20:25:51.000000000 +0300 @@ -288,11 +288,16 @@ static void ip_vs_process_message(const char *p; int i; + if (buflen < sizeof(struct ip_vs_sync_mesg)) { + IP_VS_ERR_RL("message header too short\n"); + return; + } + /* Convert size back to host byte order */ m->size = ntohs(m->size); if (buflen != m->size) { - IP_VS_ERR("bogus message\n"); + IP_VS_ERR_RL("bogus message size\n"); return; } @@ -307,9 +312,37 @@ static void ip_vs_process_message(const for (i=0; i<m->nr_conns; i++) { unsigned flags, state; - s = (struct ip_vs_sync_conn *)p; + if (p + SIMPLE_CONN_SIZE > buffer+buflen) { + IP_VS_ERR_RL("bogus message\n"); + return; + } + s = (struct ip_vs_sync_conn *) p; flags = ntohs(s->flags) | IP_VS_CONN_F_SYNC; + if (flags & IP_VS_CONN_F_SEQ_MASK) { + opt = (struct ip_vs_sync_conn_options *)&s[1]; + p += FULL_CONN_SIZE; + if (p > buffer+buflen) { + IP_VS_ERR_RL("bogus message\n"); + return; + } + } else { + opt = NULL; + p += SIMPLE_CONN_SIZE; + } + + pp = ip_vs_proto_get(s->protocol); + if (!pp) { + IP_VS_ERR_RL("Unsupported protocol %u in sync msg\n", + s->protocol); + continue; + } state = ntohs(s->state); + if (state >= pp->num_states) { + IP_VS_DBG(2, "Invalid %s state %u in sync msg\n", + pp->name, state); + continue; + } + if (!(flags & IP_VS_CONN_F_TEMPLATE)) cp = ip_vs_conn_in_get(s->protocol, s->caddr, s->cport, @@ -345,7 +378,6 @@ static void ip_vs_process_message(const IP_VS_ERR("ip_vs_conn_new failed\n"); return; } - cp->state = state; } else if (!cp->dest) { dest = ip_vs_try_bind_dest(cp); if (!dest) { @@ -371,23 +403,13 @@ static void ip_vs_process_message(const } } - if (flags & IP_VS_CONN_F_SEQ_MASK) { - opt = (struct ip_vs_sync_conn_options *)&s[1]; + if (opt) memcpy(&cp->in_seq, opt, sizeof(*opt)); - p += FULL_CONN_SIZE; - } else - p += SIMPLE_CONN_SIZE; - atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]); cp->state = state; - pp = ip_vs_proto_get(s->protocol); - cp->timeout = pp->timeout_table[cp->state]; + cp->timeout = pp->timeout_table ? + pp->timeout_table[state] : (3*60*HZ); ip_vs_conn_put(cp); - - if (p > buffer+buflen) { - IP_VS_ERR("bogus message\n"); - return; - } } } I apply theses patches to 2.6.24.4,ipvs connection sync works. When the BACKUP received connection update, it reportes IPVS: Unsupported protocol 0 in sync msg and, the client table expire timer in BACKUP is incorrect. MASTER # ipvsadm -Lnc IPVS connection entries pro expire state source virtual destination TCP 14:38 NONE 192.168.1.123:1400 192.168.1.219:80 192.168.0.80:80 TCP 14:38 NONE 192.168.1.123:1401 192.168.1.219:80 192.168.0.80:80 TCP 14:38 NONE 192.168.1.123:1402 192.168.1.219:80 192.168.0.80:80 BACKUP # ipvsadm -Lnc IPVS connection entries pro expire state source virtual destination TCP 715806:12 NONE 192.168.1.123:1400 192.168.1.219:80 192.168.0.80:80 TCP 715806:12 NONE 192.168.1.123:1401 192.168.1.219:80 192.168.0.80:80 TCP 715806:12 NONE 192.168.1.123:1402 192.168.1.219:80 192.168.0.80:80 Reply-To: ja@ssi.bg Hello, On Sat, 26 Apr 2008, Andrew Morton wrote: > > ReportedBy: l0op8ack@hotmail.com > Also, > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 00000014 > printing eip: c030659e *pde = 00000000 > Oops: 0000 [#1] SMP > Modules linked in: xt_tcpudp iptable_mangle xt_MARK xt_multiport ip_tables > x_tables ip_vs_wrr ip_vs_wlc ip_vs_sh ip_vs_sed ip_vs_rr ip_vs_nq ip_vs_lc > ip_vs_lblcr ip_vs_lblc ip_vs_ftp ip_vs_dh pcnet32 crc32 e1000 e100 mii > > Pid: 3960, comm: ipvs_syncbackup Not tainted (2.6.24.4 #3) > EIP: 0060:[<c030659e>] EFLAGS: 00010246 CPU: 0 > EIP is at sync_thread+0x919/0xa3c > EAX: 00000000 EBX: f6fe6800 ECX: e3ba2c00 EDX: 00000000 > ESI: 00000000 EDI: 00000000 EBP: f7310034 ESP: f699bf54 > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > Process ipvs_syncbackup (pid: 3960, ti=f699a000 task=f6995030 > task.ti=f699a000) > Stack: 01000000 00000000 23035e0a 00000000 00001183 00000000 f699bfc0 > f6d04860 > 00000002 00000034 f7310000 f78f1800 00000001 f731001c c0000000 > 00000000 > 00000010 f699bfc0 00000001 00000000 00000000 00000000 00000000 > f6995030 > Call Trace: > [<c011c53a>] default_wake_function+0x0/0x8 > [<c0305c85>] sync_thread+0x0/0xa3c > [<c0104aa7>] kernel_thread_helper+0x7/0x10 > ======================= > Code: a1 20 64 3e c0 89 43 54 8b 54 24 34 0f b7 42 16 66 c1 c0 08 0f b7 c0 66 > 89 43 42 0f b6 42 01 e8 90 06 00 00 0f b7 53 42 0f b7 d2 <8b> 40 14 8b 04 90 > 89 > 43 38 89 d8 e8 8a 8e ff ff 8b 44 24 28 03 > EIP: [<c030659e>] sync_thread+0x919/0xa3c SS:ESP 0068:f699bf54 > Kernel panic - not syncing: Fatal exception in interrupt Below is 2nd version that handles templates properly, which should be the actual problem. Large timeouts in backup look like a problem in incorrect ipvsadm binary, timeouts are not converted from jiffies. Patch follows: Result from ip_vs_proto_get() should be checked because protocol value can be invalid or unsupported in backup. But for valid message we should not fail for templates which use IPPROTO_IP. Also, add checks to validate message limits and connection state. Show NONE for templates. Signed-off-by: Julian Anastasov <ja@ssi.bg> --- diff -urp v2.6.25/linux/include/net/ip_vs.h linux/include/net/ip_vs.h --- v2.6.25/linux/include/net/ip_vs.h 2008-04-17 09:58:08.000000000 +0300 +++ linux/include/net/ip_vs.h 2008-04-26 20:10:46.000000000 +0300 @@ -405,7 +405,8 @@ struct sk_buff; struct ip_vs_protocol { struct ip_vs_protocol *next; char *name; - __u16 protocol; + u16 protocol; + u16 num_states; int dont_defrag; atomic_t appcnt; /* counter of proto app incs */ int *timeout_table; /* protocol timeout table */ diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto.c linux/net/ipv4/ipvs/ip_vs_proto.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto.c 2008-04-17 09:58:09.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto.c 2008-04-27 10:31:24.000000000 +0300 @@ -148,7 +148,7 @@ const char * ip_vs_state_name(__u16 prot struct ip_vs_protocol *pp = ip_vs_proto_get(proto); if (pp == NULL || pp->state_name == NULL) - return "ERR!"; + return (IPPROTO_IP == proto) ? "NONE" : "ERR!"; return pp->state_name(state); } diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_ah.c linux/net/ipv4/ipvs/ip_vs_proto_ah.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_ah.c 2007-07-10 09:18:43.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto_ah.c 2008-04-26 20:14:17.000000000 +0300 @@ -160,6 +160,7 @@ static void ah_exit(struct ip_vs_protoco struct ip_vs_protocol ip_vs_protocol_ah = { .name = "AH", .protocol = IPPROTO_AH, + .num_states = 1, .dont_defrag = 1, .init = ah_init, .exit = ah_exit, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_esp.c linux/net/ipv4/ipvs/ip_vs_proto_esp.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_esp.c 2008-04-17 09:58:09.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto_esp.c 2008-04-26 20:14:39.000000000 +0300 @@ -159,6 +159,7 @@ static void esp_exit(struct ip_vs_protoc struct ip_vs_protocol ip_vs_protocol_esp = { .name = "ESP", .protocol = IPPROTO_ESP, + .num_states = 1, .dont_defrag = 1, .init = esp_init, .exit = esp_exit, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_tcp.c linux/net/ipv4/ipvs/ip_vs_proto_tcp.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_tcp.c 2008-01-25 10:45:06.000000000 +0200 +++ linux/net/ipv4/ipvs/ip_vs_proto_tcp.c 2008-04-26 20:15:45.000000000 +0300 @@ -594,6 +594,7 @@ static void ip_vs_tcp_exit(struct ip_vs_ struct ip_vs_protocol ip_vs_protocol_tcp = { .name = "TCP", .protocol = IPPROTO_TCP, + .num_states = IP_VS_TCP_S_LAST, .dont_defrag = 0, .appcnt = ATOMIC_INIT(0), .init = ip_vs_tcp_init, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_udp.c linux/net/ipv4/ipvs/ip_vs_proto_udp.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_proto_udp.c 2008-01-25 10:45:06.000000000 +0200 +++ linux/net/ipv4/ipvs/ip_vs_proto_udp.c 2008-04-26 20:15:07.000000000 +0300 @@ -409,6 +409,7 @@ static void udp_exit(struct ip_vs_protoc struct ip_vs_protocol ip_vs_protocol_udp = { .name = "UDP", .protocol = IPPROTO_UDP, + .num_states = IP_VS_UDP_S_LAST, .dont_defrag = 0, .init = udp_init, .exit = udp_exit, diff -urp v2.6.25/linux/net/ipv4/ipvs/ip_vs_sync.c linux/net/ipv4/ipvs/ip_vs_sync.c --- v2.6.25/linux/net/ipv4/ipvs/ip_vs_sync.c 2008-04-17 09:58:09.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_sync.c 2008-04-27 11:05:48.000000000 +0300 @@ -288,11 +288,16 @@ static void ip_vs_process_message(const char *p; int i; + if (buflen < sizeof(struct ip_vs_sync_mesg)) { + IP_VS_ERR_RL("sync message header too short\n"); + return; + } + /* Convert size back to host byte order */ m->size = ntohs(m->size); if (buflen != m->size) { - IP_VS_ERR("bogus message\n"); + IP_VS_ERR_RL("bogus sync message size\n"); return; } @@ -307,9 +312,47 @@ static void ip_vs_process_message(const for (i=0; i<m->nr_conns; i++) { unsigned flags, state; - s = (struct ip_vs_sync_conn *)p; + if (p + SIMPLE_CONN_SIZE > buffer+buflen) { + IP_VS_ERR_RL("bogus conn in sync message\n"); + return; + } + s = (struct ip_vs_sync_conn *) p; flags = ntohs(s->flags) | IP_VS_CONN_F_SYNC; + if (flags & IP_VS_CONN_F_SEQ_MASK) { + opt = (struct ip_vs_sync_conn_options *)&s[1]; + p += FULL_CONN_SIZE; + if (p > buffer+buflen) { + IP_VS_ERR_RL("bogus conn options in sync message\n"); + return; + } + } else { + opt = NULL; + p += SIMPLE_CONN_SIZE; + } + state = ntohs(s->state); + if (!(flags & IP_VS_CONN_F_TEMPLATE)) { + pp = ip_vs_proto_get(s->protocol); + if (!pp) { + IP_VS_ERR_RL("Unsupported protocol %u in sync msg\n", + s->protocol); + continue; + } + if (state >= pp->num_states) { + IP_VS_DBG(2, "Invalid %s state %u in sync msg\n", + pp->name, state); + continue; + } + } else { + /* protocol in templates is not used for state/timeout */ + pp = NULL; + if (state > 0) { + IP_VS_DBG(2, "Invalid template state %u in sync msg\n", + state); + state = 0; + } + } + if (!(flags & IP_VS_CONN_F_TEMPLATE)) cp = ip_vs_conn_in_get(s->protocol, s->caddr, s->cport, @@ -345,7 +388,6 @@ static void ip_vs_process_message(const IP_VS_ERR("ip_vs_conn_new failed\n"); return; } - cp->state = state; } else if (!cp->dest) { dest = ip_vs_try_bind_dest(cp); if (!dest) { @@ -371,23 +413,21 @@ static void ip_vs_process_message(const } } - if (flags & IP_VS_CONN_F_SEQ_MASK) { - opt = (struct ip_vs_sync_conn_options *)&s[1]; + if (opt) memcpy(&cp->in_seq, opt, sizeof(*opt)); - p += FULL_CONN_SIZE; - } else - p += SIMPLE_CONN_SIZE; - atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]); cp->state = state; - pp = ip_vs_proto_get(s->protocol); - cp->timeout = pp->timeout_table[cp->state]; + /* + * We can not recover the right timeout for templates + * in all cases, we can not find the right fwmark + * virtual service. If needed, we can do it for + * non-fwmark persistent services. + */ + if (!(flags & IP_VS_CONN_F_TEMPLATE) && pp->timeout_table) + cp->timeout = pp->timeout_table[state]; + else + cp->timeout = (3*60*HZ); ip_vs_conn_put(cp); - - if (p > buffer+buflen) { - IP_VS_ERR("bogus message\n"); - return; - } } } 2nd version patches applied to 2.6.24.4
Connection table in MASTER sync to BACKUP ok.
The BACKUP connection table 'state' always be 'NONE'.
> Large timeouts in backup look like a problem in incorrect ipvsadm binary
It's not a problem in ipvsadm binary.
BACKUP box:
expired connections in BACKUP box will never be removed,
on the contrary, 'expire' changed to a large number.
connection timeouts received by BACKUP are not influenced by persist time of local defined service.
'struct ip_vs_sync_conn' defined in net/ipv4/ipvs/ip_vs_sync.c has no timeout options at all.
Local 'struct ip_vs_timeout_user' timeouts are the only determinant,
that's a bad idea.
MASTER / # ipvsadm -Lnc
IPVS connection entries
pro expire state source virtual destination
TCP 00:30 SYN_RECV 192.168.1.123:2140 192.168.1.219:80 192.168.0.80:80
TCP 14:32 ESTABLISHED 192.168.1.123:2141 192.168.1.219:80 192.168.0.80:80
TCP 01:30 TIME_WAIT 192.168.1.123:2137 192.168.1.219:80 192.168.0.80:80
TCP 14:32 ESTABLISHED 192.168.1.123:2142 192.168.1.219:80 192.168.0.80:80
TCP 01:30 TIME_WAIT 192.168.1.123:2136 192.168.1.219:80 192.168.0.80:80
IP 04:32 NONE 192.168.1.123:0 0.0.0.1:0 192.168.0.80:0
BACKUP / # ipvsadm -Lnc
IPVS connection entries
pro expire state source virtual destination
IP 00:01 NONE 192.168.1.123:0 0.0.0.1:0 192.168.0.80:0
TCP 00:01 NONE 192.168.1.123:2141 192.168.1.219:80 192.168.0.80:80
TCP 00:01 NONE 192.168.1.123:2137 192.168.1.219:80 192.168.0.80:80
TCP 00:01 NONE 192.168.1.123:2142 192.168.1.219:80 192.168.0.80:80
TCP 00:01 NONE 192.168.1.123:2136 192.168.1.219:80 192.168.0.80:80
BACKUP / # ipvsadm -Lnc
IPVS connection entries
pro expire state source virtual destination
IP 00:00 NONE 192.168.1.123:0 0.0.0.1:0 192.168.0.80:0
TCP 00:00 NONE 192.168.1.123:2141 192.168.1.219:80 192.168.0.80:80
TCP 00:00 NONE 192.168.1.123:2137 192.168.1.219:80 192.168.0.80:80
TCP 00:00 NONE 192.168.1.123:2142 192.168.1.219:80 192.168.0.80:80
TCP 00:00 NONE 192.168.1.123:2136 192.168.1.219:80 192.168.0.80:80
# these connections expired, and should be removed.
# but they are not removed. 'expire' changed to a large number(timer reset?)
BACKUP / # ipvsadm -Lnc
IPVS connection entries
pro expire state source virtual destination
IP 715827:52 NONE 192.168.1.123:0 0.0.0.1:0 192.168.0.80:0
TCP 715827:52 NONE 192.168.1.123:2141 192.168.1.219:80 192.168.0.80:80
TCP 715827:52 NONE 192.168.1.123:2137 192.168.1.219:80 192.168.0.80:80
TCP 715827:52 NONE 192.168.1.123:2142 192.168.1.219:80 192.168.0.80:80
TCP 715827:52 NONE 192.168.1.123:2136 192.168.1.219:80 192.168.0.80:80
Reply-To: ja@ssi.bg Hello, > > On Fri, 25 Apr 2008 21:27:18 -0700 (PDT) bugme-daemon@bugzilla.kernel.org > wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=10556 > > > > > > ReportedBy: l0op8ack@hotmail.com > I can not fully understand the above oops but hope following > fix can help (not tested). It is for 2.6.25. I can provide patch for > 2.6.24 if needed (there are rejects): As I said, there should be rejects if you apply patch against 2.6.24.4. I recommend to use patch against 2.6.25 because 2.6.25 has some fixes in ip_vs_sync. I suspect some hunks were not applied correctly, can you clarify what is the kernel version used now? In case you are still using 2.6.24.4 I'm appending new patch, this time against 2.6.24. Let me know if you see some rejects when applying patch (Hunk FAILED). Also, if TCP conns in backup are again NONE and conns are not removed I don't have any explanation, may be you can enable IPVS debugging? As for timeout, I have never used IPVS sync, it does not send timeout, I suspect it would be compatibility issue if timeout and nfmark are added to sync structure. 715827:52 should be jiffies 0xFFFFFFFF (time to expire), may be you can see them with cat /proc/net/ip_vs_conn Result from ip_vs_proto_get() should be checked because protocol value can be invalid or unsupported in backup. But for valid message we should not fail for templates which use IPPROTO_IP. Also, add checks to validate message limits and connection state. Show NONE for templates. Signed-off-by: Julian Anastasov <ja@ssi.bg> --- diff -urp v2.6.24/linux/include/net/ip_vs.h linux/include/net/ip_vs.h --- v2.6.24/linux/include/net/ip_vs.h 2008-04-27 17:49:40.000000000 +0300 +++ linux/include/net/ip_vs.h 2008-04-27 17:50:45.000000000 +0300 @@ -403,7 +403,8 @@ struct sk_buff; struct ip_vs_protocol { struct ip_vs_protocol *next; char *name; - __u16 protocol; + u16 protocol; + u16 num_states; int dont_defrag; atomic_t appcnt; /* counter of proto app incs */ int *timeout_table; /* protocol timeout table */ diff -urp v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto.c linux/net/ipv4/ipvs/ip_vs_proto.c --- v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto.c 2008-04-27 17:49:41.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto.c 2008-04-27 17:50:45.000000000 +0300 @@ -148,7 +148,7 @@ const char * ip_vs_state_name(__u16 prot struct ip_vs_protocol *pp = ip_vs_proto_get(proto); if (pp == NULL || pp->state_name == NULL) - return "ERR!"; + return (IPPROTO_IP == proto) ? "NONE" : "ERR!"; return pp->state_name(state); } diff -urp v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_ah.c linux/net/ipv4/ipvs/ip_vs_proto_ah.c --- v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_ah.c 2007-07-10 09:18:43.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto_ah.c 2008-04-27 17:50:45.000000000 +0300 @@ -160,6 +160,7 @@ static void ah_exit(struct ip_vs_protoco struct ip_vs_protocol ip_vs_protocol_ah = { .name = "AH", .protocol = IPPROTO_AH, + .num_states = 1, .dont_defrag = 1, .init = ah_init, .exit = ah_exit, diff -urp v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_esp.c linux/net/ipv4/ipvs/ip_vs_proto_esp.c --- v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_esp.c 2008-04-27 17:49:41.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_proto_esp.c 2008-04-27 17:50:45.000000000 +0300 @@ -159,6 +159,7 @@ static void esp_exit(struct ip_vs_protoc struct ip_vs_protocol ip_vs_protocol_esp = { .name = "ESP", .protocol = IPPROTO_ESP, + .num_states = 1, .dont_defrag = 1, .init = esp_init, .exit = esp_exit, diff -urp v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_tcp.c linux/net/ipv4/ipvs/ip_vs_proto_tcp.c --- v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_tcp.c 2008-01-25 10:45:06.000000000 +0200 +++ linux/net/ipv4/ipvs/ip_vs_proto_tcp.c 2008-04-27 17:50:45.000000000 +0300 @@ -594,6 +594,7 @@ static void ip_vs_tcp_exit(struct ip_vs_ struct ip_vs_protocol ip_vs_protocol_tcp = { .name = "TCP", .protocol = IPPROTO_TCP, + .num_states = IP_VS_TCP_S_LAST, .dont_defrag = 0, .appcnt = ATOMIC_INIT(0), .init = ip_vs_tcp_init, diff -urp v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_udp.c linux/net/ipv4/ipvs/ip_vs_proto_udp.c --- v2.6.24/linux/net/ipv4/ipvs/ip_vs_proto_udp.c 2008-01-25 10:45:06.000000000 +0200 +++ linux/net/ipv4/ipvs/ip_vs_proto_udp.c 2008-04-27 17:50:45.000000000 +0300 @@ -409,6 +409,7 @@ static void udp_exit(struct ip_vs_protoc struct ip_vs_protocol ip_vs_protocol_udp = { .name = "UDP", .protocol = IPPROTO_UDP, + .num_states = IP_VS_UDP_S_LAST, .dont_defrag = 0, .init = udp_init, .exit = udp_exit, diff -urp v2.6.24/linux/net/ipv4/ipvs/ip_vs_sync.c linux/net/ipv4/ipvs/ip_vs_sync.c --- v2.6.24/linux/net/ipv4/ipvs/ip_vs_sync.c 2008-04-27 17:49:41.000000000 +0300 +++ linux/net/ipv4/ipvs/ip_vs_sync.c 2008-04-27 18:01:45.000000000 +0300 @@ -288,11 +288,16 @@ static void ip_vs_process_message(const char *p; int i; + if (buflen < sizeof(struct ip_vs_sync_mesg)) { + IP_VS_ERR_RL("sync message header too short\n"); + return; + } + /* Convert size back to host byte order */ m->size = ntohs(m->size); if (buflen != m->size) { - IP_VS_ERR("bogus message\n"); + IP_VS_ERR_RL("bogus sync message size\n"); return; } @@ -305,10 +310,50 @@ static void ip_vs_process_message(const p = (char *)buffer + sizeof(struct ip_vs_sync_mesg); for (i=0; i<m->nr_conns; i++) { - unsigned flags; + unsigned flags, state; - s = (struct ip_vs_sync_conn *)p; + if (p + SIMPLE_CONN_SIZE > buffer+buflen) { + IP_VS_ERR_RL("bogus conn in sync message\n"); + return; + } + s = (struct ip_vs_sync_conn *) p; flags = ntohs(s->flags); + flags &= ~IP_VS_CONN_F_HASHED; + if (flags & IP_VS_CONN_F_SEQ_MASK) { + opt = (struct ip_vs_sync_conn_options *)&s[1]; + p += FULL_CONN_SIZE; + if (p > buffer+buflen) { + IP_VS_ERR_RL("bogus conn options in sync message\n"); + return; + } + } else { + opt = NULL; + p += SIMPLE_CONN_SIZE; + } + + state = ntohs(s->state); + if (!(flags & IP_VS_CONN_F_TEMPLATE)) { + pp = ip_vs_proto_get(s->protocol); + if (!pp) { + IP_VS_ERR_RL("Unsupported protocol %u in sync msg\n", + s->protocol); + continue; + } + if (state >= pp->num_states) { + IP_VS_DBG(2, "Invalid %s state %u in sync msg\n", + pp->name, state); + continue; + } + } else { + /* protocol in templates is not used for state/timeout */ + pp = NULL; + if (state > 0) { + IP_VS_DBG(2, "Invalid template state %u in sync msg\n", + state); + state = 0; + } + } + if (!(flags & IP_VS_CONN_F_TEMPLATE)) cp = ip_vs_conn_in_get(s->protocol, s->caddr, s->cport, @@ -337,35 +382,31 @@ static void ip_vs_process_message(const IP_VS_ERR("ip_vs_conn_new failed\n"); return; } - cp->state = ntohs(s->state); } else if (!cp->dest) { dest = ip_vs_try_bind_dest(cp); - if (!dest) { - /* it is an unbound entry created by - * synchronization */ - cp->flags = flags | IP_VS_CONN_F_HASHED; - } else + if (dest) atomic_dec(&dest->refcnt); } /* Note that we don't touch its state and flags if it is a normal entry. */ - if (flags & IP_VS_CONN_F_SEQ_MASK) { - opt = (struct ip_vs_sync_conn_options *)&s[1]; + if (opt) memcpy(&cp->in_seq, opt, sizeof(*opt)); - p += FULL_CONN_SIZE; - } else - p += SIMPLE_CONN_SIZE; atomic_set(&cp->in_pkts, sysctl_ip_vs_sync_threshold[0]); - cp->state = ntohs(s->state); - pp = ip_vs_proto_get(s->protocol); - cp->timeout = pp->timeout_table[cp->state]; + cp->state = state; + cp->old_state = cp->state; + /* + * We can not recover the right timeout for templates + * in all cases, we can not find the right fwmark + * virtual service. If needed, we can do it for + * non-fwmark persistent services. + */ + if (!(flags & IP_VS_CONN_F_TEMPLATE) && pp->timeout_table) + cp->timeout = pp->timeout_table[state]; + else + cp->timeout = (3*60*HZ); ip_vs_conn_put(cp); - if (p > buffer+buflen) { - IP_VS_ERR("bogus message\n"); - return; - } } } Yes, this patch works fine in 2.6.24.4, thanks. From: Julian Anastasov <ja@ssi.bg> Date: Sat, 26 Apr 2008 20:48:46 +0300 (EEST) > Result from ip_vs_proto_get() should be checked because > protocol value can be invalid or unsupported in backup. Also, add > checks to validate message limits and connection state. > > Signed-off-by: Julian Anastasov <ja@ssi.bg> This patch looks great, and fixes real bugs regardless of the effect upon the bug report in question. I would really like to see this fix tested, because if it fixes the specific bug, all the better, and we should indicate this in the changelog message. Reply-To: akpm@linux-foundation.org On Sun, 27 Apr 2008 16:40:16 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > From: Julian Anastasov <ja@ssi.bg> > Date: Sat, 26 Apr 2008 20:48:46 +0300 (EEST) > > > Result from ip_vs_proto_get() should be checked because > > protocol value can be invalid or unsupported in backup. Also, add > > checks to validate message limits and connection state. > > > > Signed-off-by: Julian Anastasov <ja@ssi.bg> > > This patch looks great, and fixes real bugs regardless of the > effect upon the bug report in question. > > I would really like to see this fix tested, because if it fixes > the specific bug, all the better, and we should indicate this in > the changelog message. The reporter (L0op8ack <l0op8ack@hotmail.com>) has used the bugzilla web interface to confirm that this patch fixes 2.6.24.4. From: Andrew Morton <akpm@linux-foundation.org> Date: Tue, 6 May 2008 13:10:54 -0700 > On Sun, 27 Apr 2008 16:40:16 -0700 (PDT) > David Miller <davem@davemloft.net> wrote: > > > From: Julian Anastasov <ja@ssi.bg> > > Date: Sat, 26 Apr 2008 20:48:46 +0300 (EEST) > > > > > Result from ip_vs_proto_get() should be checked because > > > protocol value can be invalid or unsupported in backup. Also, add > > > checks to validate message limits and connection state. > > > > > > Signed-off-by: Julian Anastasov <ja@ssi.bg> > > > > This patch looks great, and fixes real bugs regardless of the > > effect upon the bug report in question. > > > > I would really like to see this fix tested, because if it fixes > > the specific bug, all the better, and we should indicate this in > > the changelog message. > > The reporter (L0op8ack <l0op8ack@hotmail.com>) has used the bugzilla web > interface to confirm that this patch fixes 2.6.24.4. Great, thanks for the update. |