Bug 11718 - Can't classificate problem. maybe hrtimer data structures got wrecked
Summary: Can't classificate problem. maybe hrtimer data structures got wrecked
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: IPV4 (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Arnaldo Carvalho de Melo
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-08 02:12 UTC by Badalian Slava
Modified: 2009-01-29 23:38 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.27
Tree: Mainline
Regression: ---


Attachments

Description Badalian Slava 2008-10-08 02:12:11 UTC
We have 10 equal server what do Trafiic Shape (tc (htb, u32, sfq) and
iptables) only. Few of them halt one times in week.
Timer settings in config:
HZ=300
NO_HZ=n
HIGH_RES_TIMERS = n

Server 1:

[321478.840858] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01fafc7, registers:
[321478.840858] Modules linked in: netconsole e1000e i2c_i801 e1000 i2c_core
[321478.840858]
[321478.840858] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[321478.840858] EIP: 0060:[<c01fafc7>] EFLAGS: 00000082 CPU: 3
[321478.840858] EIP is at rb_insert_color+0x17/0xc0
[321478.840858] EAX: f10294a4 EBX: f10294a4 ECX: 00000000 EDX: f10294a4
[321478.840858] ESI: f10294a4 EDI: f10294a4 EBP: c202d0d4 ESP: f7c5fcac
[321478.840858]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[321478.840858] Process swapper (pid: 0, ti=f7c5e000 task=f7c32940 task.ti=f7c5e000)
[321478.840858] Stack: f10294a4 00000000 c202d0cc c202d0d4 c013a8ff f10294a4 c202d0cc c20230cc
[321478.840858]        c04450a0 c013adea 00000000 f7c5fcfc 406ce400 00012461 00000001 00000286
[321478.840858]        f1029000 ffffffff 00000000 00000000 c02d15fe 00000000 f1029000 c02d6da6
[321478.840858] Call Trace:
[321478.840858]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[321478.840858]  [<c013adea>] hrtimer_start+0xaa/0x130
[321478.840858]  [<c02d15fe>] qdisc_watchdog_schedule+0x1e/0x30
[321478.840858]  [<c02d6da6>] htb_dequeue+0x6a6/0x810
[321478.840858]  [<c02d77f7>] sfq_drop+0x1a7/0x260
[321478.840858]  [<c02d77f7>] sfq_drop+0x1a7/0x260
[321478.840858]  [<c02d14f2>] tc_classify+0x42/0x90
[321478.840858]  [<c02d71c0>] htb_enqueue+0x0/0x1e0
[321478.840858]  [<c02d047c>] __qdisc_run+0x19c/0x1d0
[321478.840858]  [<c02d71c0>] htb_enqueue+0x0/0x1e0
[321478.840858]  [<c02c4cb7>] dev_queue_xmit+0x267/0x380
[321478.840858]  [<c02e6040>] ip_forward_finish+0x0/0x40
[321478.840858]  [<c02e8bef>] ip_finish_output+0x11f/0x280
[321478.840858]  [<c02e630f>] ip_forward+0x28f/0x2d0
[321478.840858]  [<c02e6065>] ip_forward_finish+0x25/0x40
[321478.840858]  [<c02e4ba2>] ip_rcv_finish+0x122/0x360
[321478.840858]  [<c016d7e9>] add_partial+0x19/0x60
[321478.840858]  [<c016e8d9>] __slab_free+0x169/0x290
[321478.840858]  [<c016e8d9>] __slab_free+0x169/0x290
[321478.840858]  [<c02e5020>] ip_rcv+0x0/0x290
[321478.840858]  [<c02c1b4b>] netif_receive_skb+0x26b/0x470
[321478.840858]  [<f886b74d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[321478.840858]  [<f886e9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[321478.840858]  [<f886af59>] e1000_clean+0x49/0x1f0 [e1000e]
[321478.840858]  [<c02c3f58>] net_rx_action+0xf8/0x1b0
[321478.840858]  [<c012a062>] __do_softirq+0x82/0x100
[321478.840858]  [<c012a117>] do_softirq+0x37/0x40
[321478.840858]  [<c0107120>] do_IRQ+0x40/0x80
[321478.840858]  [<c01055a3>] common_interrupt+0x23/0x28
[321478.840858]  [<c010a5e2>] mwait_idle+0x32/0x40
[321478.840858]  [<c010a5b0>] mwait_idle+0x0/0x40
[321478.840858]  [<c01036e8>] cpu_idle+0x48/0xc0
[321478.840858]  =======================
[321478.840858] Code: 24 83 c4 0c c3 89 56 04 eb e3 8d 76 00 8d bc 27 00 00 00 00 55 89 d5 57 89 c7 56 53 90 8d b4 26 00 00 00 00 8b 1f 83 e3 fc 74 32 <8b> 03 89 d9 a8 01 75 2a 89 c6 83 e6 fc 8b 56 08 39 d3 74 45 85
[ 2251.728719] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01fafd4, registers:
[ 2251.728719] Modules linked in: netconsole i2c_i801 i2c_core e1000e e1000
[ 2251.728719]
[ 2251.728719] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[ 2251.728719] EIP: 0060:[<c01fafd4>] EFLAGS: 00000082 CPU: 3
[ 2251.728719] EIP is at rb_insert_color+0x24/0xc0
[ 2251.728719] EAX: f6c134a4 EBX: f6c134a4 ECX: f6c134a4 EDX: f6c134a4
[ 2251.728719] ESI: f6c134a4 EDI: f6c134a4 EBP: c202d0d4 ESP: f7c5fcac
[ 2251.728719]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 2251.728719] Process swapper (pid: 0, ti=f7c5e000 task=f7c32940 task.ti=f7c5e000)
[ 2251.728719] Stack: f6c134a4 00000000 c202d0cc c202d0d4 c013a8ff f6c134a4 c202d0cc c20230cc
[ 2251.728719]        c04450a0 c013adea 00000000 f7c5fcfc 392e7c00 0000020c 00000001 00000286
[ 2251.728719]        f6c13000 ffffffff 00000000 00000000 c02d15fe 00000000 f6c13000 c02d6da6
[ 2251.728719] Call Trace:
[ 2251.728719]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[ 2251.728719]  [<c013adea>] hrtimer_start+0xaa/0x130
[ 2251.728719]  [<c02d15fe>] qdisc_watchdog_schedule+0x1e/0x30
[ 2251.728719]  [<c02d6da6>] htb_dequeue+0x6a6/0x810
[ 2251.728719]  [<c02d047c>] __qdisc_run+0x19c/0x1d0
[ 2251.728719]  [<c02d71c0>] htb_enqueue+0x0/0x1e0
[ 2251.728719]  [<c02c4cb7>] dev_queue_xmit+0x267/0x380
[ 2251.728719]  [<c02e6040>] ip_forward_finish+0x0/0x40
[ 2251.728719]  [<c02e8bef>] ip_finish_output+0x11f/0x280
[ 2251.728719]  [<c02e630f>] ip_forward+0x28f/0x2d0
[ 2251.728719]  [<c02e6065>] ip_forward_finish+0x25/0x40
[ 2251.728719]  [<c02e4ba2>] ip_rcv_finish+0x122/0x360
[ 2251.728719]  [<c016d7e9>] add_partial+0x19/0x60
[ 2251.728719]  [<c016e8d9>] __slab_free+0x169/0x290
[ 2251.728719]  [<c02e5020>] ip_rcv+0x0/0x290
[ 2251.728719]  [<c02c1b4b>] netif_receive_skb+0x26b/0x470
[ 2251.728719]  [<f886c74d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[ 2251.728719]  [<f886f9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[ 2251.728719]  [<f886bf59>] e1000_clean+0x49/0x1f0 [e1000e]
[ 2251.728719]  [<c02c3f58>] net_rx_action+0xf8/0x1b0
[ 2251.728719]  [<c012a062>] __do_softirq+0x82/0x100
[ 2251.728719]  [<c012a117>] do_softirq+0x37/0x40
[ 2251.728719]  [<c0107120>] do_IRQ+0x40/0x80
[ 2251.728719]  [<c01055a3>] common_interrupt+0x23/0x28
[ 2251.728719]  [<c010a5e2>] mwait_idle+0x32/0x40
[ 2251.728719]  [<c010a5b0>] mwait_idle+0x0/0x40
[ 2251.728719]  [<c01036e8>] cpu_idle+0x48/0xc0
[ 2251.728719]  =======================
[ 2251.728719] Code: 8d bc 27 00 00 00 00 55 89 d5 57 89 c7 56 53 90 8d b4 26 00 00 00 00 8b 1f 83 e3 fc 74 32 8b 03 89 d9 a8 01 75 2a 89 c6 83 e6 fc <8b> 56 08 39 d3 74 45 85 d2 74 25 8b 02 a8 01 75 1f 83 c8 01 89
[196496.545559] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01faf4c, registers:
[196496.545559] Modules linked in: netconsole i2c_i801 e1000e e1000 i2c_core
[196496.545559]
[196496.545559] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[196496.545559] EIP: 0060:[<c01faf4c>] EFLAGS: 00000096 CPU: 3
[196496.545559] EIP is at __rb_rotate_right+0xc/0x70
[196496.545559] EAX: f741f4a4 EBX: f741f4a4 ECX: f741f4a4 EDX: c202d0d4
[196496.545559] ESI: f741f4a4 EDI: f741f4a4 EBP: c202d0d4 ESP: f7c5fc9c
[196496.545559]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[196496.545559] Process swapper (pid: 0, ti=f7c5e000 task=f7c32940 task.ti=f7c5e000)
[196496.545559] Stack: f741f4a4 f741f4a4 f741f4a4 c01fb041 f741f4a4 00000000 c202d0cc c202d0d4
[196496.545559]        c013a8ff f741f4a4 c202d0cc c20230cc c04450a0 c013adea 00000000 f7c5fcfc
[196496.545559]        8af6d400 0000b2b5 00000001 00000286 f741f000 ffffffff 00000000 00000000
[196496.545559] Call Trace:
[196496.545559]  [<c01fb041>] rb_insert_color+0x91/0xc0
[196496.545559]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[196496.545559]  [<c013adea>] hrtimer_start+0xaa/0x130
[196496.545559]  [<c02d15fe>] qdisc_watchdog_schedule+0x1e/0x30
[196496.545559]  [<c02d6da6>] htb_dequeue+0x6a6/0x810
[196496.545559]  [<c02d047c>] __qdisc_run+0x19c/0x1d0
[196496.545559]  [<c02d71c0>] htb_enqueue+0x0/0x1e0
[196496.545559]  [<c02c4cb7>] dev_queue_xmit+0x267/0x380
[196496.545559]  [<c02e6040>] ip_forward_finish+0x0/0x40
[196496.545559]  [<c02e8bef>] ip_finish_output+0x11f/0x280
[196496.545559]  [<c02e630f>] ip_forward+0x28f/0x2d0
[196496.545559]  [<c02e6065>] ip_forward_finish+0x25/0x40
[196496.545559]  [<c02e4ba2>] ip_rcv_finish+0x122/0x360
[196496.545559]  [<c02be072>] __netdev_alloc_skb+0x22/0x50
[196496.545559]  [<c033091c>] notifier_call_chain+0x3c/0x80
[196496.545559]  [<c02e5020>] ip_rcv+0x0/0x290
[196496.545559]  [<c02c1b4b>] netif_receive_skb+0x26b/0x470
[196496.545559]  [<c02be072>] __netdev_alloc_skb+0x22/0x50
[196496.545559]  [<f886774d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[196496.545559]  [<f886a9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[196496.545559]  [<f8866f59>] e1000_clean+0x49/0x1f0 [e1000e]
[196496.545559]  [<c02c3f58>] net_rx_action+0xf8/0x1b0
[196496.545559]  [<c012a062>] __do_softirq+0x82/0x100
[196496.545559]  [<c012a117>] do_softirq+0x37/0x40
[196496.545559]  [<c0107120>] do_IRQ+0x40/0x80
[196496.545559]  [<c0114077>] smp_apic_timer_interrupt+0x57/0x90
[196496.545559]  [<c01055a3>] common_interrupt+0x23/0x28
[196496.545559]  [<c010a5e2>] mwait_idle+0x32/0x40
[196496.545559]  [<c010a5b0>] mwait_idle+0x0/0x40
[196496.545559]  [<c01036e8>] cpu_idle+0x48/0xc0
[196496.545559]  =======================
[196496.545559] Code: 24 08 83 e0 03 09 d0 89 03 8b 1c 24 83 c4 0c c3 89 56 08 eb e3 8d 76 00 8d bc 27 00 00 00 00 83 ec 0c 89 1c 24 89 c3 89 7c 24 08 <89> d7 89 74 24 04 8b 50 08 8b 30 8b 4a 04 83 e6 fc 85 c9 89 48
[23749.920305] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01faed0, registers:
[23749.920305] Modules linked in: netconsole e1000e i2c_i801 e1000 i2c_core
[23749.920305]
[23749.920305] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[23749.920305] EIP: 0060:[<c01faed0>] EFLAGS: 00000082 CPU: 3
[23749.920305] EIP is at __rb_rotate_left+0x0/0x70
[23749.920305] EAX: f72914a4 EBX: f72914a4 ECX: f72914a4 EDX: c202d0d4
[23749.920305] ESI: f72914a4 EDI: f72914a4 EBP: c202d0d4 ESP: f7c5fca8
[23749.920305]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[23749.920305] Process swapper (pid: 0, ti=f7c5e000 task=f7c32940 task.ti=f7c5e000)
[23749.920305] Stack: c01fb018 f72914a4 00000000 c202d0cc c202d0d4 c013a8ff f72914a4 c202d0cc
[23749.920305]        c20190cc c04450a0 c013adea 00000000 f7c5fcfc d5dfdc00 00001598 00000001
[23749.920305]        00000286 f7291000 ffffffff 00000000 00000000 c02d15fe 00000000 f7291000
[23749.920305] Call Trace:
[23749.920305]  [<c01fb018>] rb_insert_color+0x68/0xc0
[23749.920305]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[23749.920305]  [<c013adea>] hrtimer_start+0xaa/0x130
[23749.920305]  [<c02d15fe>] qdisc_watchdog_schedule+0x1e/0x30
[23749.920305]  [<c02d6da6>] htb_dequeue+0x6a6/0x810
[23749.920305]  [<c02d047c>] __qdisc_run+0x19c/0x1d0
[23749.920305]  [<c02d71c0>] htb_enqueue+0x0/0x1e0
[23749.920305]  [<c02c4cb7>] dev_queue_xmit+0x267/0x380
[23749.920305]  [<c02e6040>] ip_forward_finish+0x0/0x40
[23749.920305]  [<c02e8bef>] ip_finish_output+0x11f/0x280
[23749.920305]  [<c02e630f>] ip_forward+0x28f/0x2d0
[23749.920305]  [<c02e6065>] ip_forward_finish+0x25/0x40
[23749.920305]  [<c02e4ba2>] ip_rcv_finish+0x122/0x360
[23749.920305]  [<c02bd007>] __alloc_skb+0x57/0x120
[23749.920305]  [<c02e5020>] ip_rcv+0x0/0x290
[23749.920305]  [<c02c1b4b>] netif_receive_skb+0x26b/0x470
[23749.920305]  [<c02be072>] __netdev_alloc_skb+0x22/0x50
[23749.920305]  [<f886774d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[23749.920305]  [<f886a9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[23749.920305]  [<f8866f59>] e1000_clean+0x49/0x1f0 [e1000e]
[23749.920305]  [<c02c3f58>] net_rx_action+0xf8/0x1b0
[23749.920305]  [<c012a062>] __do_softirq+0x82/0x100
[23749.920305]  [<c012a117>] do_softirq+0x37/0x40
[23749.920305]  [<c0107120>] do_IRQ+0x40/0x80
[23749.920305]  [<c01055a3>] common_interrupt+0x23/0x28
[23749.920305]  [<c010a5e2>] mwait_idle+0x32/0x40
[23749.920305]  [<c010a5b0>] mwait_idle+0x0/0x40
[23749.920305]  [<c01036e8>] cpu_idle+0x48/0xc0
[23749.920305]  =======================
[23749.920305] Code: 35 3a c0 e8 c3 b0 f2 ff b8 01 00 00 00 8b 5c 24 0c 8b 74 24 10 8b 7c 24 14 83 c4 18 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <83> ec 0c 89 1c 24 89 c3 89 7c 24 08 89 d7 89 74 24 04 8b 50 04


Server 2:

[17053.718192] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01fb02d, registers:
[17053.718192] Modules linked in: netconsole e1000e i2c_i801 e1000 i2c_core
[17053.718192]
[17053.718192] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[17053.718192] EIP: 0060:[<c01fb02d>] EFLAGS: 00000046 CPU: 1
[17053.718192] EIP is at rb_insert_color+0x7d/0xc0
[17053.718192] EAX: f3412ca4 EBX: f3412ca4 ECX: f3412ca4 EDX: 00000000
[17053.718192] ESI: f3412ca4 EDI: f3412ca4 EBP: c20190d4 ESP: f7c4dcac
[17053.718192]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[17053.718192] Process swapper (pid: 0, ti=f7c4c000 task=f7c314a0 task.ti=f7c4c000)
[17053.718192] Stack: f3412ca4 00000000 c20190cc c20190d4 c013a8ff f3412ca4 c20190cc c200f0cc
[17053.718192]        c044b0a0 c013adea 00000000 f7c4dcfc c121d800 00000f81 00000001 00000286
[17053.718192]        f3412800 ffffffff 00000000 00000000 c02d527e 00000000 f3412800 c02daa26
[17053.718192] Call Trace:
[17053.718192]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[17053.718192]  [<c013adea>] hrtimer_start+0xaa/0x130
[17053.718192]  [<c02d527e>] qdisc_watchdog_schedule+0x1e/0x30
[17053.718192]  [<c02daa26>] htb_dequeue+0x6a6/0x810
[17053.718192]  [<c02d40fc>] __qdisc_run+0x19c/0x1d0
[17053.718192]  [<c02dae40>] htb_enqueue+0x0/0x1e0
[17053.718192]  [<c02c86b7>] dev_queue_xmit+0x267/0x380
[17053.718192]  [<c02e9cc0>] ip_forward_finish+0x0/0x40
[17053.718192]  [<c02ec86f>] ip_finish_output+0x11f/0x280
[17053.718192]  [<c02e9f8f>] ip_forward+0x28f/0x2d0
[17053.718192]  [<c02e9ce5>] ip_forward_finish+0x25/0x40
[17053.718192]  [<c02e8822>] ip_rcv_finish+0x122/0x360
[17053.718192]  [<c02c0777>] __alloc_skb+0x57/0x120
[17053.718192]  [<c02bfe68>] __kfree_skb+0x8/0x80
[17053.718192]  [<f883565b>] e1000_unmap_and_free_tx_resource+0x5b/0x80 [e1000]
[17053.718192]  [<c02e8ca0>] ip_rcv+0x0/0x290
[17053.718192]  [<c02c54fb>] netif_receive_skb+0x26b/0x470
[17053.718192]  [<f886b74d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[17053.718192]  [<f886e9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[17053.718192]  [<f886af59>] e1000_clean+0x49/0x1f0 [e1000e]
[17053.718192]  [<c02c790b>] net_rx_action+0xfb/0x200
[17053.718192]  [<c012a062>] __do_softirq+0x82/0x100
[17053.718192]  [<c012a117>] do_softirq+0x37/0x40
[17053.718192]  [<c0107120>] do_IRQ+0x40/0x80
[17053.718192]  [<c01055a3>] common_interrupt+0x23/0x28
[17053.718192]  [<c010a5e2>] mwait_idle+0x32/0x40
[17053.718192]  [<c010a5b0>] mwait_idle+0x0/0x40
[17053.718192]  [<c01036e8>] cpu_idle+0x48/0xc0
[17053.718192]  =======================
[17053.718192] Code: 5d c3 3b 7b 08 74 3d 83 09 01 89 ea 89 f0 83 26 fe e8 b8 fe ff ff eb a6 8d b6 00 00 00 00 8b 56 04 85 d2 74 06 8b 02 a8 01 74 b8 <3b> 7b 04 74 23 83 09 01 89 ea 89 f0 83 26 fe e8 ff fe ff ff e9
[75131.217107] BUG: NMI Watchdog detected LOCKUP on CPU0, ip c01fb037, registers:
[75131.217107] Modules linked in: netconsole i2c_i801 e1000e i2c_core e1000
[75131.217107]
[75131.217107] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[75131.217107] EIP: 0060:[<c01fb037>] EFLAGS: 00000086 CPU: 0
[75131.217107] EIP is at rb_insert_color+0x87/0xc0
[75131.217107] EAX: f56bb4a4 EBX: f56bb4a4 ECX: f56bb4a4 EDX: c200f0d4
[75131.217107] ESI: f56bb4a4 EDI: f56bb4a4 EBP: c200f0d4 ESP: c0411cdc
[75131.217107]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[75131.217107] Process swapper (pid: 0, ti=c0410000 task=c03df340 task.ti=c0410000)
[75131.217107] Stack: f56bb4a4 00000000 c200f0cc c200f0d4 c013a8ff f56bb4a4 c200f0cc c200f0cc
[75131.217107]        c044b0a0 c013adea c013d06d c0411d2c d940e400 00004454 00000000 00000286
[75131.217107]        f56bb000 ffffffff 00000000 00000000 c02d527e 00000000 f56bb000 c02daa26
[75131.217107] Call Trace:
[75131.217107]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[75131.217107]  [<c013adea>] hrtimer_start+0xaa/0x130
[75131.217107]  [<c013d06d>] getnstimeofday+0x3d/0xe0
[75131.217107]  [<c02d527e>] qdisc_watchdog_schedule+0x1e/0x30
[75131.217107]  [<c02daa26>] htb_dequeue+0x6a6/0x810
[75131.217107]  [<c02d40fc>] __qdisc_run+0x19c/0x1d0
[75131.217107]  [<c02dae40>] htb_enqueue+0x0/0x1e0
[75131.217107]  [<c02c86b7>] dev_queue_xmit+0x267/0x380
[75131.217107]  [<c02e9cc0>] ip_forward_finish+0x0/0x40
[75131.217107]  [<c02ec86f>] ip_finish_output+0x11f/0x280
[75131.217107]  [<c02e9f8f>] ip_forward+0x28f/0x2d0
[75131.217107]  [<c02e9ce5>] ip_forward_finish+0x25/0x40
[75131.217107]  [<c02e8822>] ip_rcv_finish+0x122/0x360
[75131.217107]  [<c02c17e2>] __netdev_alloc_skb+0x22/0x50
[75131.217107]  [<c02e8ca0>] ip_rcv+0x0/0x290
[75131.217107]  [<c02c54fb>] netif_receive_skb+0x26b/0x470
[75131.217107]  [<c02c17e2>] __netdev_alloc_skb+0x22/0x50
[75131.217107]  [<f886c74d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[75131.217107]  [<f886f9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[75131.217107]  [<f886bf59>] e1000_clean+0x49/0x1f0 [e1000e]
[75131.217107]  [<c02c790b>] net_rx_action+0xfb/0x200
[75131.217107]  [<c012a062>] __do_softirq+0x82/0x100
[75131.217107]  [<c012a117>] do_softirq+0x37/0x40
[75131.217107]  [<c0107120>] do_IRQ+0x40/0x80
[75131.217107]  [<c01055a3>] common_interrupt+0x23/0x28
[75131.217107]  [<c010a5e2>] mwait_idle+0x32/0x40
[75131.217107]  [<c010a5b0>] mwait_idle+0x0/0x40
[75131.217107]  [<c01036e8>] cpu_idle+0x48/0xc0
[75131.217107]  =======================
[75131.217107] Code: 89 ea 89 f0 83 26 fe e8 b8 fe ff ff eb a6 8d b6 00 00 00 00 8b 56 04 85 d2 74 06 8b 02 a8 01 74 b8 3b 7b 04 74 23 83 09 01 89 ea <89> f0 83 26 fe e8 ff fe ff ff e9 7a ff ff ff 89 ea 89 d8 e8 f1
[176617.218140] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01faee7, registers:
[176617.218140] Modules linked in: netconsole e1000e e1000 i2c_i801 i2c_core
[176617.218140]
[176617.218140] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[176617.218140] EIP: 0060:[<c01faee7>] EFLAGS: 00000096 CPU: 1
[176617.218140] EIP is at __rb_rotate_left+0x17/0x70
[176617.218140] EAX: f6c104a4 EBX: f6c104a4 ECX: f6c104a4 EDX: f6c104a4
[176617.218140] ESI: f6c104a4 EDI: c20190d4 EBP: c20190d4 ESP: f7c4dc9c
[176617.218140]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[176617.218140] Process swapper (pid: 0, ti=f7c4c000 task=f7c314a0 task.ti=f7c4c000)
[176617.218140] Stack: f6c104a4 f6c104a4 f6c104a4 c01fb018 f6c104a4 00000000 c20190cc c20190d4
[176617.218140]        c013a8ff f6c104a4 c20190cc c200f0cc c044b0a0 c013adea 00000000 f7c4dcfc
[176617.218140]        06da7c00 0000a0a1 00000001 00000286 f6c10000 ffffffff 00000000 00000000
[176617.218140] Call Trace:
[176617.218140]  [<c01fb018>] rb_insert_color+0x68/0xc0
[176617.218140]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[176617.218140]  [<c013adea>] hrtimer_start+0xaa/0x130
[176617.218140]  [<c02d527e>] qdisc_watchdog_schedule+0x1e/0x30
[176617.218140]  [<c02daa26>] htb_dequeue+0x6a6/0x810
[176617.218140]  [<c02d40fc>] __qdisc_run+0x19c/0x1d0
[176617.218140]  [<c02dae40>] htb_enqueue+0x0/0x1e0
[176617.218140]  [<c02c86b7>] dev_queue_xmit+0x267/0x380
[176617.218140]  [<c02e9cc0>] ip_forward_finish+0x0/0x40
[176617.218140]  [<c02ec86f>] ip_finish_output+0x11f/0x280
[176617.218140]  [<c02e9f8f>] ip_forward+0x28f/0x2d0
[176617.218140]  [<c02e9ce5>] ip_forward_finish+0x25/0x40
[176617.218140]  [<c02e8822>] ip_rcv_finish+0x122/0x360
[176617.218140]  [<c016007b>] split_vma+0xb/0x130
[176617.218140]  [<c016d7e9>] add_partial+0x19/0x60
[176617.218140]  [<c016e8d9>] __slab_free+0x169/0x290
[176617.218140]  [<c02e8ca0>] ip_rcv+0x0/0x290
[176617.218140]  [<c02c54fb>] netif_receive_skb+0x26b/0x470
[176617.218140]  [<c02c17e2>] __netdev_alloc_skb+0x22/0x50
[176617.218140]  [<f886774d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[176617.218140]  [<f886a9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[176617.218140]  [<f8866f59>] e1000_clean+0x49/0x1f0 [e1000e]
[176617.218140]  [<c02c790b>] net_rx_action+0xfb/0x200
[176617.218140]  [<c012a062>] __do_softirq+0x82/0x100
[176617.218140]  [<c012a117>] do_softirq+0x37/0x40
[176617.218140]  [<c0107120>] do_IRQ+0x40/0x80
[176617.218140]  [<c01055a3>] common_interrupt+0x23/0x28
[176617.218140]  [<c010a5e2>] mwait_idle+0x32/0x40
[176617.218140]  [<c010a5b0>] mwait_idle+0x0/0x40
[176617.218140]  [<c01036e8>] cpu_idle+0x48/0xc0
[176617.218140]  =======================
[176617.218140] Code: 24 14 83 c4 18 c3 90 90 90 90 90 90 90 90 90 90 90 90 90 90 83 ec 0c 89 1c 24 89 c3 89 7c 24 08 89 d7 89 74 24 04 8b 50 04 8b 30 <8b> 4a 08 83 e6 fc 85 c9 89 48 04 74 09 8b 01 83 e0 03 09 d8 89
[121302.565021] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01faf10, registers:
[121302.565021] Modules linked in: netconsole e1000e i2c_i801 e1000 i2c_core
[121302.565021]
[121302.565021] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[121302.565021] EIP: 0060:[<c01faf10>] EFLAGS: 00000046 CPU: 1
[121302.565021] EIP is at __rb_rotate_left+0x40/0x70
[121302.565021] EAX: f7d5c4a4 EBX: f7d5c4a4 ECX: 00000000 EDX: f7d5c4a4
[121302.565021] ESI: f7d5c4a4 EDI: c20190d4 EBP: c20190d4 ESP: f7c4dc9c
[121302.565021]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[121302.565021] Process swapper (pid: 0, ti=f7c4c000 task=f7c314a0 task.ti=f7c4c000)
[121302.565021] Stack: f7d5c4a4 f7d5c4a4 f7d5c4a4 c01fb018 f7d5c4a4 00000000 c20190cc c20190d4
[121302.565021]        c013a8ff f7d5c4a4 c20190cc c20230cc c044b0a0 c013adea 00000000 f7c4dcfc
[121302.565021]        1467b400 00006e52 00000001 00000286 f7d5c000 ffffffff 00000000 00000000
[121302.565021] Call Trace:
[121302.565021]  [<c01fb018>] rb_insert_color+0x68/0xc0
[121302.565021]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[121302.565021]  [<c013adea>] hrtimer_start+0xaa/0x130
[121302.565021]  [<c02d527e>] qdisc_watchdog_schedule+0x1e/0x30
[121302.565021]  [<c02daa26>] htb_dequeue+0x6a6/0x810
[121302.565021]  [<c02d40fc>] __qdisc_run+0x19c/0x1d0
[121302.565021]  [<c02dae40>] htb_enqueue+0x0/0x1e0
[121302.565021]  [<c02c86b7>] dev_queue_xmit+0x267/0x380
[121302.565021]  [<c02e9cc0>] ip_forward_finish+0x0/0x40
[121302.565021]  [<c02ec86f>] ip_finish_output+0x11f/0x280
[121302.565021]  [<c02e9f8f>] ip_forward+0x28f/0x2d0
[121302.565021]  [<c02e9ce5>] ip_forward_finish+0x25/0x40
[121302.565021]  [<c02e8822>] ip_rcv_finish+0x122/0x360
[121302.565021]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[121302.565021]  [<c02e8ca0>] ip_rcv+0x0/0x290
[121302.565021]  [<c02c54fb>] netif_receive_skb+0x26b/0x470
[121302.565021]  [<f886774d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[121302.565021]  [<f886a9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[121302.565021]  [<f8866f59>] e1000_clean+0x49/0x1f0 [e1000e]
[121302.565021]  [<c02c790b>] net_rx_action+0xfb/0x200
[121302.565021]  [<c012a062>] __do_softirq+0x82/0x100
[121302.565021]  [<c012a117>] do_softirq+0x37/0x40
[121302.565021]  [<c0107120>] do_IRQ+0x40/0x80
[121302.565021]  [<c01055a3>] common_interrupt+0x23/0x28
[121302.565021]  [<c010a5e2>] mwait_idle+0x32/0x40
[121302.565021]  [<c010a5b0>] mwait_idle+0x0/0x40
[121302.565021]  [<c01036e8>] cpu_idle+0x48/0xc0
[121302.565021]  =======================
[121302.565021] Code: 8b 30 8b 4a 08 83 e6 fc 85 c9 89 48 04 74 09 8b 01 83 e0 03 09 d8 89 01 8b 02 89 5a 08 83 e0 03 09 f0 85 f6 89 02 74 0a 3b 5e 08 <74> 1f 89 56 04 eb 02 89 17 8b 03 8b 74 24 04 8b 7c 24 08 83 e0
[96112.953448] BUG: NMI Watchdog detected LOCKUP on CPU1, ip c01faf93, registers:
[96112.953448] Modules linked in: netconsole e1000e i2c_i801 e1000 i2c_core
[96112.953448]
[96112.953448] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[96112.953448] EIP: 0060:[<c01faf93>] EFLAGS: 00000046 CPU: 1
[96112.953448] EIP is at __rb_rotate_right+0x53/0x70
[96112.953448] EAX: f3287ca4 EBX: f3287ca4 ECX: 00000000 EDX: f3287ca4
[96112.953448] ESI: f3287ca4 EDI: f3287ca4 EBP: c20190d4 ESP: f7c4dc9c
[96112.953448]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[96112.953448] Process swapper (pid: 0, ti=f7c4c000 task=f7c314a0 task.ti=f7c4c000)
[96112.953448] Stack: f3287ca4 f3287ca4 f3287ca4 c01fb041 f3287ca4 00000000 c20190cc c20190d4
[96112.953448]        c013a8ff f3287ca4 c20190cc c200f0cc c044b0a0 c013adea 00000000 f7c4dcfc
[96112.953448]        2ac32800 00005769 00000001 00000286 f3287800 ffffffff 00000000 00000000
[96112.953448] Call Trace:
[96112.953448]  [<c01fb041>] rb_insert_color+0x91/0xc0
[96112.953448]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[96112.953448]  [<c013adea>] hrtimer_start+0xaa/0x130
[96112.953448]  [<c02d527e>] qdisc_watchdog_schedule+0x1e/0x30
[96112.953448]  [<c02daa26>] htb_dequeue+0x6a6/0x810
[96112.953448]  [<c02d40fc>] __qdisc_run+0x19c/0x1d0
[96112.953448]  [<c02dae40>] htb_enqueue+0x0/0x1e0
[96112.953448]  [<c02c86b7>] dev_queue_xmit+0x267/0x380
[96112.953448]  [<c02e9cc0>] ip_forward_finish+0x0/0x40
[96112.953448]  [<c02ec86f>] ip_finish_output+0x11f/0x280
[96112.953448]  [<c02e9f8f>] ip_forward+0x28f/0x2d0
[96112.953448]  [<c02e9ce5>] ip_forward_finish+0x25/0x40
[96112.953448]  [<c02e8822>] ip_rcv_finish+0x122/0x360
[96112.953448]  [<c016d7e9>] add_partial+0x19/0x60
[96112.953448]  [<c016d7e9>] add_partial+0x19/0x60
[96112.953448]  [<c016e8d9>] __slab_free+0x169/0x290
[96112.953448]  [<c011c700>] find_busiest_group+0x180/0x740
[96112.953448]  [<c02e8ca0>] ip_rcv+0x0/0x290
[96112.953448]  [<c02c54fb>] netif_receive_skb+0x26b/0x470
[96112.953448]  [<f886774d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[96112.953448]  [<f886a9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[96112.953448]  [<f8866f59>] e1000_clean+0x49/0x1f0 [e1000e]
[96112.953448]  [<c02c790b>] net_rx_action+0xfb/0x200
[96112.953448]  [<c012a062>] __do_softirq+0x82/0x100
[96112.953448]  [<c012a117>] do_softirq+0x37/0x40
[96112.953448]  [<c0107120>] do_IRQ+0x40/0x80
[96112.953448]  [<c01055a3>] common_interrupt+0x23/0x28
[96112.953448]  [<c010a5e2>] mwait_idle+0x32/0x40
[96112.953448]  [<c010a5b0>] mwait_idle+0x0/0x40
[96112.953448]  [<c01036e8>] cpu_idle+0x48/0xc0
[96112.953448]  =======================
[96112.953448] Code: 03 09 d8 89 01 8b 02 89 5a 04 83 e0 03 09 f0 85 f6 89 02 74 0a 3b 5e 04 74 1f 89 56 08 eb 02 89 17 8b 03 8b 74 24 04 8b 7c 24 08 <83> e0 03 09 d0 89 03 8b 1c 24 83 c4 0c c3 89 56 04 eb e3 8d 76


Server 3:

[ 8518.194288] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01faf57, registers:
[ 8518.194288] Modules linked in: netconsole i2c_i801 e1000e i2c_core e1000
[ 8518.194288]
[ 8518.194288] Pid: 0, comm: swapper Not tainted (2.6.26.5-fw #1)
[ 8518.194288] EIP: 0060:[<c01faf57>] EFLAGS: 00000092 CPU: 3
[ 8518.194288] EIP is at __rb_rotate_right+0x17/0x70
[ 8518.194288] EAX: f52b14a4 EBX: f52b14a4 ECX: f52b14a4 EDX: f52b14a4
[ 8518.194288] ESI: f52b14a4 EDI: c202d0d4 EBP: c202d0d4 ESP: f7c5fc68
[ 8518.194288]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 8518.194288] Process swapper (pid: 0, ti=f7c5e000 task=f7c32940 task.ti=f7c5e000)
[ 8518.194288] Stack: f52b14a4 f52b14a4 f52b14a4 c01fb041 f52b14a4 00000000 c202d0cc c202d0d4
[ 8518.194288]        c013a8ff f52b14a4 c202d0cc c200f0cc c044b0a0 c013adea 00000000 f7c5fcc8
[ 8518.194288]        4bca9000 000007bf 00000001 00000286 f52b1000 ffffffff 00000000 00000000
[ 8518.194288] Call Trace:
[ 8518.194288]  [<c01fb041>] rb_insert_color+0x91/0xc0
[ 8518.194288]  [<c013a8ff>] enqueue_hrtimer+0x5f/0x80
[ 8518.194288]  [<c013adea>] hrtimer_start+0xaa/0x130
[ 8518.194288]  [<c02d585e>] qdisc_watchdog_schedule+0x1e/0x30
[ 8518.194288]  [<c02db006>] htb_dequeue+0x6a6/0x810
[ 8518.194288]  [<c02d46dc>] __qdisc_run+0x19c/0x1d0
[ 8518.194288]  [<c02db420>] htb_enqueue+0x0/0x1e0
[ 8518.194288]  [<c02c8c97>] dev_queue_xmit+0x267/0x380
[ 8518.194288]  [<c02d3c9b>] eth_header+0x2b/0xc0
[ 8518.194288]  [<c02ce0ab>] neigh_resolve_output+0xdb/0x280
[ 8518.194288]  [<c02ea2a0>] ip_forward_finish+0x0/0x40
[ 8518.194288]  [<c02ece4f>] ip_finish_output+0x11f/0x280
[ 8518.194288]  [<c02ea56f>] ip_forward+0x28f/0x2d0
[ 8518.194288]  [<c02ea2c5>] ip_forward_finish+0x25/0x40
[ 8518.194288]  [<c02e8e02>] ip_rcv_finish+0x122/0x360
[ 8518.194288]  [<c02c0d57>] __alloc_skb+0x57/0x120
[ 8518.194288]  [<c0109c6a>] nommu_map_single+0x2a/0x60
[ 8518.194288]  [<c02e9280>] ip_rcv+0x0/0x290
[ 8518.194288]  [<c02c5adb>] netif_receive_skb+0x26b/0x470
[ 8518.194288]  [<f886c74d>] e1000_receive_skb+0x4d/0x1b0 [e1000e]
[ 8518.194288]  [<f886f9bc>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[ 8518.194288]  [<f886bf59>] e1000_clean+0x49/0x1f0 [e1000e]
[ 8518.194288]  [<c02c7eeb>] net_rx_action+0xfb/0x200
[ 8518.194288]  [<c012a062>] __do_softirq+0x82/0x100
[ 8518.194288]  [<c012a117>] do_softirq+0x37/0x40
[ 8518.194288]  [<c0107120>] do_IRQ+0x40/0x80
[ 8518.194288]  [<c01055a3>] common_interrupt+0x23/0x28
[ 8518.194288]  [<c010a5e2>] mwait_idle+0x32/0x40
[ 8518.194288]  [<c010a5b0>] mwait_idle+0x0/0x40
[ 8518.194288]  [<c01036e8>] cpu_idle+0x48/0xc0
[ 8518.194288]  =======================
[ 8518.194288] Code: 24 83 c4 0c c3 89 56 08 eb e3 8d 76 00 8d bc 27 00 00 00 00 83 ec 0c 89 1c 24 89 c3 89 7c 24 08 89 d7 89 74 24 04 8b 50 08 8b 30 <8b> 4a 04 83 e6 fc 85 c9 89 48 08 74 09 8b 01 83 e0 03 09 d8 89
Comment 1 Badalian Slava 2008-10-09 22:05:33 UTC
2.6.26.6


Bug still here.

[ 5280.696710] BUG: NMI Watchdog detected LOCKUP<3>e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 5280.696710]   Tx Queue             <0>
[ 5280.696710]   TDH                  <18>
[ 5280.696710]   TDT                  <18>
[ 5280.696710]   next_to_use          <18>
[ 5280.696710]   next_to_clean        <6d>
[ 5280.696710] buffer_info[next_to_clean]
[ 5280.696710]   time_stamp           <4bf406>
[ 5280.696710]   next_to_watch        <6d>
[ 5280.696710]   jiffies              <4c02a9>
[ 5280.696710]   next_to_watch.status <1>
[ 5280.696710]  on CPU3, ip c01fafb0, registers:
[ 5280.696710] Modules linked in: netconsole i2c_i801 e1000e e1000 i2c_core
[ 5280.696710]
[ 5280.696710] Pid: 0, comm: swapper Not tainted (2.6.26.6-fw #1)
[ 5280.696710] EIP: 0060:[<c01fafb0>] EFLAGS: 00000096 CPU: 3
[ 5280.696710] EIP is at rb_insert_color+0x10/0xc0
[ 5280.696710] EAX: f55554a4 EBX: f55554a4 ECX: 00000000 EDX: f55554a4
[ 5280.696710] ESI: f55554a4 EDI: f55554a4 EBP: c202d0d4 ESP: f7c5fe04
[ 5280.696710]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 5280.696710] Process swapper (pid: 0, ti=f7c5e000 task=f7c32940 task.ti=f7c5e000)
[ 5280.696710] Stack: f55554a4 00000000 c202d0cc c202d0d4 c013aa4f f55554a4 c202d0cc c202d0cc
[ 5280.696710]        c044b0a0 c013af3a c013d1bd f7c5fe54 d948bc00 000004cc 00000000 00000286
[ 5280.696710]        f5555000 ffffffff 00000000 00000000 c02d521e 00000000 f5555000 c02da9d6
[ 5280.696710] Call Trace:
[ 5280.696710]  [<c013aa4f>] enqueue_hrtimer+0x5f/0x80
[ 5280.696710]  [<c013af3a>] hrtimer_start+0xaa/0x130
[ 5280.696710]  [<c013d1bd>] getnstimeofday+0x3d/0xe0
[ 5280.696710]  [<c02d521e>] qdisc_watchdog_schedule+0x1e/0x30
[ 5280.696710]  [<c02da9d6>] htb_dequeue+0x6a6/0x810
[ 5280.696710]  [<c02d409c>] __qdisc_run+0x19c/0x1d0
[ 5280.696710]  [<c013b19d>] hrtimer_run_pending+0x1d/0x90
[ 5280.696710]  [<c02c7a6e>] net_tx_action+0xbe/0xf0
[ 5280.696710]  [<c012a1c2>] __do_softirq+0x82/0x100
[ 5280.696710]  [<c012a277>] do_softirq+0x37/0x40
[ 5280.696710]  [<c0107120>] do_IRQ+0x40/0x80
[ 5280.696710]  [<c01055a3>] common_interrupt+0x23/0x28
[ 5280.696710]  [<c010a602>] mwait_idle+0x32/0x40
[ 5280.696710]  [<c010a5d0>] mwait_idle+0x0/0x40
[ 5280.696710]  [<c01036e8>] cpu_idle+0x48/0xc0
[ 5280.696710]  =======================
[ 5280.696710] Code: 03 09 d0 89 03 8b 1c 24 83 c4 0c c3 89 56 04 eb e3 8d 76 00 8d bc 27 00 00 00 00 55 89 d5 57 89 c7 56 53 90 8d b4 26 00 00 00 00 <8b> 1f 83 e3 fc 74 32 8b 03 89 d9 a8 01 75 2a 89 c6 83 e6 fc 8b
Comment 2 Badalian Slava 2008-10-10 01:47:10 UTC
[ 6951.841662] BUG: NMI Watchdog detected LOCKUP on CPU3, ip c01fde4c, registers:
[ 6951.841662] Modules linked in: sch_sfq sch_htb netconsole e1000 i2c_i801 e1000e i2c_core
[ 6951.841662]
[ 6951.841662] Pid: 0, comm: swapper Not tainted (2.6.27-fw #1)
[ 6951.841662] EIP: 0060:[<c01fde4c>] EFLAGS: 00000092 CPU: 3
[ 6951.841662] EIP is at __rb_rotate_right+0xc/0x70
[ 6951.841662] EAX: f70c3c68 EBX: f70c3c68 ECX: f70c3c68 EDX: c202c134
[ 6951.841662] ESI: f70c3c68 EDI: f70c3c68 EBP: c202c134 ESP: f785fc2c
[ 6951.841662]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 6951.841662] Process swapper (pid: 0, ti=f785e000 task=f7832940 task.ti=f785e000)
[ 6951.841662] Stack: f70c3c68 f70c3c68 f70c3c68 c01fdf41 f70c3c68 00000000 c202c12c c202c134
[ 6951.841662]        c013a91f f70c3c68 c202c12c c202212c c045b100 c013ae0a 00000000 c013d63d
[ 6951.841662]        9a011800 00000652 00000001 00000282 00000652 f70c3c68 00000000 00000000
[ 6951.841662] Call Trace:
[ 6951.841662]  [<c01fdf41>] rb_insert_color+0x91/0xc0
[ 6951.841662]  [<c013a91f>] enqueue_hrtimer+0x5f/0x80
[ 6951.841662]  [<c013ae0a>] hrtimer_start+0xaa/0x130
[ 6951.841662]  [<c013d63d>] getnstimeofday+0x3d/0xe0
[ 6951.841662]  [<c02de83d>] qdisc_watchdog_schedule+0x3d/0x50
[ 6951.841662]  [<f88ac343>] htb_dequeue+0x683/0x7b0 [sch_htb]
[ 6951.841662]  [<c02ce692>] dev_hard_start_xmit+0x1d2/0x2c0
[ 6951.841662]  [<c02dc87a>] __qdisc_run+0x13a/0x1d0
[ 6951.841662]  [<c02d0ed7>] dev_queue_xmit+0x227/0x4f0
[ 6951.841662]  [<c02f29ff>] ip_finish_output+0x11f/0x280
[ 6951.841662]  [<c02f00e0>] ip_forward+0x290/0x310
[ 6951.841662]  [<c02efe35>] ip_forward_finish+0x25/0x40
[ 6951.841662]  [<c02ee9a2>] ip_rcv_finish+0x122/0x360
[ 6951.841662]  [<c02c8cc6>] __alloc_skb+0x36/0x120
[ 6951.841662]  [<c02c9d02>] __netdev_alloc_skb+0x22/0x50
[ 6951.841662]  [<c02eee20>] ip_rcv+0x0/0x290
[ 6951.841662]  [<c02ce064>] netif_receive_skb+0x274/0x4d0
[ 6951.841662]  [<c0108b1a>] nommu_map_single+0x2a/0x60
[ 6951.841662]  [<f883be39>] e1000_receive_skb+0x49/0x80 [e1000e]
[ 6951.841662]  [<f883e84c>] e1000_clean_rx_irq+0x23c/0x300 [e1000e]
[ 6951.841662]  [<f883b3ad>] e1000_clean+0x1bd/0x570 [e1000e]
[ 6951.841662]  [<c02d03bc>] net_rx_action+0x13c/0x200
[ 6951.841662]  [<c0129b72>] __do_softirq+0x82/0x100
[ 6951.841662]  [<c0129c27>] do_softirq+0x37/0x40
[ 6951.841662]  [<c0106060>] do_IRQ+0x40/0x80
[ 6951.841662]  [<c01134c7>] smp_apic_timer_interrupt+0x57/0x90
[ 6951.841662]  [<c010457f>] common_interrupt+0x23/0x28
[ 6951.841662]  [<c0109aa2>] mwait_idle+0x32/0x40
[ 6951.841662]  [<c01026c8>] cpu_idle+0x48/0xe0
[ 6951.841662]  =======================
[ 6951.841662] Code: 24 08 83 e0 03 09 d0 89 03 8b 1c 24 83 c4 0c c3 89 56 08 eb e3 8d 76 00 8d bc 27 00 00 00 00 83 ec 0c 89 1c 24 89 c3 89 7c 24 08 <89> d7 89 74 24 04 8b 50 08 8b 30 8b 4a 04 83 e6 fc 85 c9 89 48
Comment 3 Badalian Slava 2008-10-10 01:49:02 UTC
2.6.27 get now!
Comment 4 Jarek Poplawski 2008-10-10 02:15:52 UTC
INFO:
This bug is tracked on netdev with Subject: deadlocks if use htb.
Comment 5 Badalian Slava 2008-12-18 03:42:52 UTC
Summary of tests.

Jarek answer:

> Here is my current opinion on this bug:
>
> 1) I'm almost sure it's not a htb, but hrtimers bug (some race),
>
> 2) the htb patches you've tested are not "the proper" way of fixing
>    it; I see substantial changes in hrtimers code in the "-tip" tree
>    (probably for 2.6.29), which, probably, you'll be advised by
>    hrtimers maintainers to try, and I guess, it's not easy on a
>    production system,
>
> So, it's up to you:
>
> 1) since these patches work for you, you can stop with testing and
>    wait with these patched kernels until 2.6.29 (I can propose this
>    #2 patch as a temporary fix then),
>
> 2) for curiosity you could try this patch #4 alone on one box first
>    (after reverting at least patch #2), but again: if it works, it
>    could be only treated as a temporary hack, and alternative of #2.
>
> Thanks,
> Jarek P.

Problem temporary fixed for me (system not crashed for 1 week) and i can wait for new kernels long time, but i can test hrtimer fixes if anyone intersted for this.
Comment 6 Jarek Poplawski 2008-12-18 04:46:21 UTC
On Thu, Dec 18, 2008 at 03:42:52AM -0800, bugme-daemon@bugzilla.kernel.org wrote:
...
> Problem temporary fixed for me (system not crashed for 1 week) and i can wait
> for new kernels long time, but i can test hrtimer fixes if anyone intersted
> for
> this.

Sure we are. Here is a link to the patches in the -tip tree:
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-sched-devel.git;a=history;f=kernel/hrtimer.c;h=b741f850426e5ba8841feca4c730f3da1c65f7b8;hb=HEAD

I mean top three Peter Zijlstra's "hrtimer: removing all ur callback
modes" patches. They should apply to the current -linus or -net tree,
but I didn't try to compile.

Jarek P.
Comment 7 Chris Caputo 2009-01-13 18:47:26 UTC
Per Jarek's suggestion, I ran 2.6.28 plus Peter Zijlstra's "hrtimer: removing all ur callback modes" patches dated 2008-11-25, 2008-12-04 and 2008-12-08.  Uptime was 2 days 22 hours before I hit what appears to be an unrelated bug related to the IPv6 FIB.  (Reported on dev lists with subject 'panic with 2.6.28 while doing "ip -6 route"'.)

Will continue testing with Zijlstra's patches...
Comment 8 Chris Caputo 2009-01-13 18:51:17 UTC
I should add that with 2.6.28, without the Zijlstra patches, the system would hang after about an hour.
Comment 9 Jarek Poplawski 2009-01-27 02:17:02 UTC
For the record: this bug is expected to be fixed now:
1) in 2.6.29 tree by above mentioned Peter Zijlstra's changes to hrtimers,
2) in 2.6.28.2 and 2.6.27.13 by a temporary patch to sch_htb:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.28.y.git;a=commit;h=e46032840eae03a502638049468edc1167345c9c
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=9befaf375925471a49159d775b38d42c04e218a1
so this bug report could be closed.
Jarek P.

Note You need to log in before you can comment on or make changes to this bug.