Bug 35672

Summary: System blocking at start VPS (lxc-start)
Product: Networking Reporter: Giuseppe Tofoni (gt0057)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: RESOLVED CODE_FIX    
Severity: blocking CC: eric.dumazet, florian
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39 Subsystem:
Regression: No Bisected commit-id:

Description Giuseppe Tofoni 2011-05-23 09:18:09 UTC
File /var/log/syslog (summary)

May 23 02:04:24 lxc kernel: [99498.329036] BUG: unable to handle kernel NULL pointer dereference at 00000004
May 23 02:04:24 lxc kernel: [99498.330017] IP: [<c143d6bf>] dst_mtu+0xb/0x1c
May 23 02:04:24 lxc kernel: [99498.330017] *pdpt = 000000001fb55001 *pde = 0000000000000000
May 23 02:04:24 lxc kernel: [99498.330017] Oops: 0000 [#1] SMP
May 23 02:04:24 lxc kernel: [99498.330017] last sysfs file: /sys/devices/virtual/vc/vcsa8/uevent
May 23 02:04:24 lxc kernel: [99498.330017] Modules linked in: lp ppdev parport_pc parport fuse firewire_ohci firewire_core crc_itu_t intel_agp intel_gtt
May 23 02:04:24 lxc kernel: [99498.330017]
May 23 02:04:24 lxc kernel: [99498.330017] Pid: 0, comm: swapper Not tainted 2.6.39-lxc #2 .   .  /IP35 Pro XE(Intel P35-ICH9R)
May 23 02:04:24 lxc kernel: [99498.330017] EIP: 0060:[<c143d6bf>] EFLAGS: 00010246 CPU: 0
May 23 02:04:24 lxc kernel: [99498.330017] EIP is at dst_mtu+0xb/0x1c
May 23 02:04:24 lxc kernel: [99498.330017] EAX: 00000000 EBX: e90b6b40 ECX: effc981c EDX: effc9000
May 23 02:04:24 lxc kernel: [99498.330017] ESI: c1a0d84e EDI: dda6331e EBP: f080bb44 ESP: f080bb44
May 23 02:04:24 lxc kernel: [99498.330017]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
May 23 02:04:24 lxc kernel: [99498.330017] Process swapper (pid: 0, ti=f080a000 task=c172b7e0 task.ti=c1724000)
May 23 02:04:24 lxc kernel: [99498.330017] Stack:
May 23 02:04:24 lxc kernel: [99498.330017]  f080bb8c c143e20d 00000004 f080bb88 c141aab2 c14b46db effc9000 00000014
May 23 02:04:24 lxc kernel: [99498.330017]  c14b8a44 effc9000 e90b6b40 00000014 effc981c e90b6b58 cd472800 e90b6b40
May 23 02:04:24 lxc kernel: [99498.330017]  c14b8a44 dda6331e f080bb98 c14b8aa0 e90b6b40 f080bba8 c14b881a e90b6b40
May 23 02:04:24 lxc kernel: [99498.330017] Call Trace:
May 23 02:04:24 lxc kernel: [99498.330017]  [<c143e20d>] ip_fragment+0xb5/0x66c
May 23 02:04:24 lxc kernel: [99498.330017]  [<c141aab2>] ? nf_hook_slow+0x43/0xd1
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b46db>] ? br_flood+0x83/0x83
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8a44>] ? br_parse_ip_options+0x1b0/0x1b0
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8a44>] ? br_parse_ip_options+0x1b0/0x1b0
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8aa0>] br_nf_dev_queue_xmit+0x5c/0x68
Comment 1 Stephen Hemminger 2011-05-23 16:47:10 UTC
What kernel version?

I think this probably is fixed by the several commits related to bridge netfilter and ip options code in 2.6.39.
Comment 2 Giuseppe Tofoni 2011-05-24 10:37:22 UTC
Kernel version is 2.6.39 the S.O. is slackware 13.37
Comment 3 Giuseppe Tofoni 2011-05-24 11:24:53 UTC
I have the same problem with kernel 2.6.38.6 (O.S. Slackware 13.1)
File /var/log/syslog (summary)
May 23 12:36:56 lxc kernel: [   33.448046] WARNING: at kernel/timer.c:983 del_timer_sync+0x25/0x37()
May 23 12:36:56 lxc kernel: [   33.448048] Hardware name: .
May 23 12:36:56 lxc kernel: [   33.448050] Modules linked in: floppy
May 24 11:49:02 lxc kernel: [   35.591786] Pid: 0, comm: kworker/0:0 Not tainted 2.6.38.6-lxc #1
May 24 11:49:02 lxc kernel: [   35.591788] Call Trace:
May 24 11:49:02 lxc kernel: [   35.591793]  [<c103aa34>] ? warn_slowpath_common+0x65/0x7a
May 24 11:49:02 lxc kernel: [   35.591797]  [<c1045790>] ? del_timer_sync+0x25/0x37
May 24 11:49:02 lxc kernel: [   35.591800]  [<c103aa58>] ? warn_slowpath_null+0xf/0x13
May 24 11:49:02 lxc kernel: [   35.591804]  [<c1045790>] ? del_timer_sync+0x25/0x37
May 24 11:49:02 blxc kernel: [   35.591808]  [<c145e68b>] ? linkwatch_schedule_work+0x67/0x82
May 24 11:49:02 lxc kernel: [   35.591812]  [<c145e74e>] ? linkwatch_fire_event+0xa8/0xad
May 24 11:49:02 lxc kernel: [   35.591816]  [<c146ccbb>] ? netif_carrier_on+0x23/0x34
May 24 11:49:02 lxc kernel: [   35.591820]  [<c13d309e>] ? __rtl8169_check_link_status+0x33/0x75
May 24 11:49:02 lxc kernel: [   35.591824]  [<c13d38ec>] ? rtl8169_interrupt+0x1e3/0x282
May 24 11:49:02 lxc kernel: [   35.591829]  [<c107aa6c>] ? handle_IRQ_event+0x49/0xf6
May 24 11:49:02 lxc kernel: [   35.591832]  [<c107c3e3>] ? handle_fasteoi_irq+0x7b/0xb6
May 24 11:49:02 lxc kernel: [   35.591835]  [<c107c368>] ? handle_fasteoi_irq+0x0/0xb6
May 24 11:49:02 lxc kernel: [   35.591837]  <IRQ>  [<c100a3a1>] ? do_IRQ+0x37/0x90
May 24 11:49:02 lxc kernel: [   35.591843]  [<c10098a9>] ? common_interrupt+0x29/0x30
May 24 11:49:02 lxc kernel: [   35.591847]  [<c100f234>] ? mwait_idle+0x7d/0xaa
May 24 11:49:02 lxc kernel: [   35.591850]  [<c1008672>] ? cpu_idle+0x49/0x63
May 24 11:49:02 lxc kernel: [   35.591854]  [<c158c9dd>] ? start_secondary+0x16c/0x171
May 24 11:49:02 lxc kernel: [   35.591857] ---[ end trace 224f2f2281917e3a ]---
May 24 11:49:02 lxc kernel: [   35.628141] ------------[ cut here ]------------
May 24 11:49:02 lxc kernel: [   35.628147] WARNING: at kernel/timer.c:983 del_timer_sync+0x25/0x37()
May 24 11:49:02 lxc kernel: [   35.628149] Hardware name: .
May 24 11:49:02 lxc kernel: [   35.628151] Modules linked in: floppy
May 24 11:49:02 lxc kernel: [   35.628155] Pid: 2160, comm: rc.S Tainted: G        W   2.6.38.6-lxc #1
May 24 11:49:02 lxc kernel: [   35.628157] Call Trace:
May 24 11:49:02 lxc kernel: [   35.628161]  [<c103aa34>] ? warn_slowpath_common+0x65/0x7a
May 24 11:49:02 lxc kernel: [   35.628165]  [<c1045790>] ? del_timer_sync+0x25/0x37
May 24 11:49:02 lxc kernel: [   35.628168]  [<c103aa58>] ? warn_slowpath_null+0xf/0x13
May 24 11:49:02 lxc kernel: [   35.628172]  [<c1045790>] ? del_timer_sync+0x25/0x37
May 24 11:49:02 lxc kernel: [   35.628175]  [<c145e68b>] ? linkwatch_schedule_work+0x67/0x82
May 24 11:49:02 lxc kernel: [   35.628179]  [<c145e74e>] ? linkwatch_fire_event+0xa8/0xad
May 24 11:49:02 lxc kernel: [   35.628182]  [<c146ccbb>] ? netif_carrier_on+0x23/0x34
May 24 11:49:02 lxc kernel: [   35.628186]  [<c13d309e>] ? __rtl8169_check_link_status+0x33/0x75
May 24 11:49:02 lxc kernel: [   35.628190]  [<c13d38ec>] ? rtl8169_interrupt+0x1e3/0x282
May 24 11:49:02 lxc kernel: [   35.628194]  [<c107aa6c>] ? handle_IRQ_event+0x49/0xf6
May 24 11:49:02 lxc kernel: [   35.628197]  [<c107c3e3>] ? handle_fasteoi_irq+0x7b/0xb6
May 24 11:49:02 lxc kernel: [   35.628200]  [<c107c368>] ? handle_fasteoi_irq+0x0/0xb6
May 24 11:49:02 lxc kernel: [   35.628202]  <IRQ>  [<c100a3a1>] ? do_IRQ+0x37/0x90
May 24 11:49:02 lxc kernel: [   35.628208]  [<c10098a9>] ? common_interrupt+0x29/0x30
May 24 11:49:02 lxc kernel: [   35.628212]  [<c159136c>] ? system_call+0x0/0x3e
May 24 11:49:02 lxc kernel: [   35.628214] ---[ end trace 224f2f2281917e3b ]---
Comment 4 Eric Dumazet 2011-05-24 16:54:00 UTC
Please try following patch :

http://patchwork.ozlabs.org/patch/97181/

Thanks for the report
Comment 5 Giuseppe Tofoni 2011-05-25 08:29:26 UTC
After patching the problem has not changed (new error code)

May 25 09:51:35 lxc kernel: [   36.269368] WARNING: at kernel/timer.c:1012 del_timer_sync+0x25/0x37()
May 25 09:51:35 lxc kernel: [   36.269370] Hardware name: .
May 25 09:51:35 lxc kernel: [   36.269372] Modules linked in: lp ppdev parport_pc parport fuse firewire_ohci firewire_core crc_itu_t intel_agp intel_gtt
May 25 09:51:35 lxc kernel: [   36.269383] Pid: 0, comm: kworker/0:0 Not tainted 2.6.39-lxc #2
May 25 09:51:35 lxc kernel: [   36.269385] Call Trace:
May 25 09:51:35 lxc kernel: [   36.269391]  [<c1030c91>] warn_slowpath_common+0x65/0x7a
May 25 09:51:35 lxc kernel: [   36.269394]  [<c103b839>] ? del_timer_sync+0x25/0x37
May 25 09:51:35 lxc kernel: [   36.269398]  [<c1030cb5>] warn_slowpath_null+0xf/0x13
May 25 09:51:35 lxc kernel: [   36.269401]  [<c103b839>] del_timer_sync+0x25/0x37
May 25 09:51:35 lxc kernel: [   36.269406]  [<c1430f93>] linkwatch_schedule_work+0x67/0x82
May 25 09:51:35 lxc kernel: [   36.269409]  [<c1431056>] linkwatch_fire_event+0xa8/0xad
May 25 09:51:35 lxc kernel: [   36.269412]  [<c1438728>] netif_carrier_on+0x23/0x34
May 25 09:51:35 lxc kernel: [   36.269417]  [<c134e7ca>] __rtl8169_check_link_status+0x33/0x75
May 25 09:51:35 lxc kernel: [   36.269420]  [<c134f05f>] rtl8169_interrupt+0x1e3/0x282
May 25 09:51:35 lxc kernel: [   36.269424]  [<c1072960>] ? handle_edge_irq+0xa4/0xa4
May 25 09:51:35 lxc kernel: [   36.269428]  [<c1070ea5>] handle_irq_event_percpu+0x4e/0x15e
May 25 09:51:35 lxc kernel: [   36.269431]  [<c1072960>] ? handle_edge_irq+0xa4/0xa4
May 25 09:51:35 lxc kernel: [   36.269434]  [<c1070fd9>] handle_irq_event+0x24/0x3b
May 25 09:51:35 lxc kernel: [   36.269437]  [<c1072960>] ? handle_edge_irq+0xa4/0xa4
May 25 09:51:35 lxc kernel: [   36.269440]  [<c10729c9>] handle_fasteoi_irq+0x69/0x82
May 25 09:51:35 lxc kernel: [   36.269442]  <IRQ>  [<c100313c>] ? do_IRQ+0x37/0x90
May 25 09:51:35 lxc kernel: [   36.269449]  [<c154eaa9>] ? common_interrupt+0x29/0x30
May 25 09:51:35 lxc kernel: [   36.269453]  [<c1007c63>] ? mwait_idle+0x7d/0xa0
May 25 09:51:35 lxc kernel: [   36.269456]  [<c1001b1b>] ? cpu_idle+0x44/0x61
May 25 09:51:35 lxc kernel: [   36.269460]  [<c1544d80>] ? start_secondary+0x162/0x167
May 25 09:51:35 lxc kernel: [   36.269463] ---[ end trace 9089c2d412bfbd9d ]---

Thanks for the previous patch.
Comment 6 Eric Dumazet 2011-05-25 08:37:40 UTC
Well, thats a totally different problem ;)

Its a bug in rtl8169 this time.
Comment 7 Giuseppe Tofoni 2011-05-26 19:10:20 UTC
One more question, is the patch at http://patchwork.ozlabs.org/patch/97181/  valid for the kernel 2.6.38.7 ?

The bug in rtl8169 also in kernel 2.6.38.7 ?

WARNING: at kernel/timer.c:983 del_timer_sync+0x25/0x37()
May 26 13:05:39 lxc kernel: [   33.258294] Hardware name: .
May 26 13:05:39 lxc kernel: [   33.258296] Modules linked in: floppy
May 26 13:05:39 lxc kernel: [   33.258301] Pid: 0, comm: swapper Not tainted 2.6.38.7-lxc #1
May 26 13:05:39 lxc kernel: [   33.258302] Call Trace:
May 26 13:05:39 lxc kernel: [   33.258308]  [<c103aac0>] ? warn_slowpath_common+0x65/0x7a
May 26 13:05:39 lxc kernel: [   33.258311]  [<c1045864>] ? del_timer_sync+0x25/0x37
May 26 13:05:39 lxc kernel: [   33.258315]  [<c103aae4>] ? warn_slowpath_null+0xf/0x13
May 26 13:05:39 lxc kernel: [   33.258318]  [<c1045864>] ? del_timer_sync+0x25/0x37
May 26 13:05:39 lxc kernel: [   33.258323]  [<c145e713>] ? linkwatch_schedule_work+0x67/0x82
May 26 13:05:39 lxc kernel: [   33.258327]  [<c145e7d6>] ? linkwatch_fire_event+0xa8/0xad
May 26 13:05:39 lxc kernel: [   33.258331]  [<c146cd43>] ? netif_carrier_on+0x23/0x34
May 26 13:05:39 lxc kernel: [   33.258336]  [<c13d3122>] ? __rtl8169_check_link_status+0x33/0x75
May 26 13:05:39 lxc kernel: [   33.258340]  [<c13d3970>] ? rtl8169_interrupt+0x1e3/0x282
May 26 13:05:39 lxc kernel: [   33.258344]  [<c107ab4c>] ? handle_IRQ_event+0x49/0xf6
May 26 13:05:39 lxc kernel: [   33.258348]  [<c107c4c3>] ? handle_fasteoi_irq+0x7b/0xb6
May 26 13:05:39 lxc kernel: [   33.258351]  [<c107c448>] ? handle_fasteoi_irq+0x0/0xb6
May 26 13:05:39 lxc kernel: [   33.258353]  <IRQ>  [<c100a3a1>] ? do_IRQ+0x37/0x90
May 26 13:05:39 lxc kernel: [   33.258359]  [<c10098a9>] ? common_interrupt+0x29/0x30
May 26 13:05:39 lxc kernel: [   33.258362]  [<c100f234>] ? mwait_idle+0x7d/0xaa
May 26 13:05:39 lxc kernel: [   33.258366]  [<c1008672>] ? cpu_idle+0x49/0x63
May 26 13:05:39 lxc kernel: [   33.258369]  [<c1572544>] ? rest_init+0x58/0x5a
May 26 13:05:39 lxc kernel: [   33.258373]  [<c182b8d3>] ? start_kernel+0x2f7/0x2fc
May 26 13:05:39 lxc kernel: [   33.258377]  [<c182b0e3>] ? i386_start_kernel+0xd2/0xd9
May 26 13:05:39 lxc kernel: [   33.258379] ---[ end trace 4863c1e34725b8db ]---


Thanks for your help.
Comment 8 Eric Dumazet 2011-05-26 19:25:54 UTC
Patch is for 2.6.39 only

For the rtl8169 bug, I have no idea, you should open another bugzilla entry, so that other people can take a look.
Comment 9 Florian Mickler 2011-05-30 07:59:37 UTC
A patch referencing this bug report has been merged in v3.0-rc1:

commit 33eb9873a283a2076f2b5628813d5365ca420ea9
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Tue May 24 13:32:18 2011 -0400

    bridge: initialize fake_rtable metrics
Comment 10 Giuseppe Tofoni 2011-06-03 15:52:19 UTC
I haven't got the problem with  kernel 2.6.39-git12 anymore, while I still have the problem with rtl8169. Now I am using netconsole for a better debug and I will open another bugzilla.
Thanks again for your help