Bug 6682
Summary: | BUG: soft lockup detected on CPU#0! / ksoftirqd takse 100% CPU | ||
---|---|---|---|
Product: | Networking | Reporter: | Michal Margula (alchemyx) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, herbert, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.15.6 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Michal Margula
2006-06-12 16:07:20 UTC
2.6.16.16 with FIB_TRIE did not help. Changing interface from e1000 to e100 did not help. It looks like a flood though no other signs of it (increased pps rate, or bandwidth). Maybe some malformed packets causing so high usage? bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=6682 > > Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse > 100% CPU > Kernel Version: 2.6.15.6 > Status: NEW > Severity: normal > Owner: acme@conectiva.com.br > Submitter: alchemyx@uznam.net.pl > > > Most recent kernel where this bug did not occur: (unknown) > Distribution: Gentoo > Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one double > port e100. Based on Intel E7501 architecture (2U rack Intel chassis). > Software Environment: quagga 0.98.6 > Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation shows > no sing of network flood or something (and also 2 of 3 NICs are e1000 with > NAPI). Ocassionaly there are "BUG: soft lockup detected on CPU#0!". > > > Steps to reproduce: > > There is no simple way to reproduce. I think that everythint started when we > attached second provider with BGP support. We are using quagga which injects > about 186 000 routes into kernel. When running for a while (at least few hours, > sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in logs: > > BUG: soft lockup detected on CPU#0! > > Pid: 6506, comm: zebra > EIP: 0060:[<c027f6fd>] CPU: 0 > EIP is at _spin_lock+0x7/0xf > EFLAGS: 00000286 Not tainted (2.6.15.6) > EAX: f6203180 EBX: e6fbf000 ECX: 00000000 EDX: f6bec000 > ESI: f6203000 EDI: eddb4b80 EBP: fffffff4 DS: 007b ES: 007b > CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 000006d0 > [<c02396f9>] dev_queue_xmit+0xe0/0x203 > [<c0250de8>] ip_output+0x1e1/0x237 > [<c024f3f5>] ip_forward+0x181/0x1df > [<c024e21a>] ip_rcv+0x40c/0x485 > [<c0239bd0>] netif_receive_skb+0x12f/0x165 > [<f885aa4c>] e1000_clean_rx_irq+0x389/0x410 [e1000] > [<f885a1ca>] e1000_clean+0x94/0x12f [e1000] > [<c0239d5a>] net_rx_action+0x69/0xf0 > [<c011a305>] __do_softirq+0x55/0xbd > [<c011a39a>] do_softirq+0x2d/0x31 > [<c011a3f8>] local_bh_enable+0x5a/0x65 > [<c024a0a1>] rt_run_flush+0x5f/0x80 > [<c027623f>] fn_hash_insert+0x352/0x39f > [<c027364c>] inet_rtm_newroute+0x57/0x62 > [<c02413ed>] rtnetlink_rcv_msg+0x1a8/0x1cb > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c0247c1e>] netlink_rcv_skb+0x3a/0x8b > [<c0247cb1>] netlink_run_queue+0x42/0xc3 > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c0241227>] rtnetlink_rcv+0x22/0x40 > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c024764c>] netlink_data_ready+0x17/0x54 > [<c0246a99>] netlink_sendskb+0x1f/0x39 > [<c0247449>] netlink_sendmsg+0x27b/0x28c > [<c0231467>] sock_sendmsg+0xce/0xe9 > [<c0112b36>] __wake_up+0x27/0x3b > [<c01a6216>] copy_to_user+0x38/0x42 > [<c01a625a>] copy_from_user+0x3a/0x60 > [<c01a625a>] copy_from_user+0x3a/0x60 > [<c0126be2>] autoremove_wake_function+0x0/0x3a > [<c0236bcd>] verify_iovec+0x49/0x7f > [<c02327f2>] sys_sendmsg+0x152/0x1a8 > [<c0147a62>] do_sync_read+0xb8/0xeb > [<c01a6216>] copy_to_user+0x38/0x42 > [<c0126be2>] autoremove_wake_function+0x0/0x3a > [<c0122b7a>] getrusage+0x34/0x43 > [<c0168504>] inotify_dentry_parent_queue_event+0x29/0x7c > [<c01a625a>] copy_from_user+0x3a/0x60 > [<c0232b6b>] sys_socketcall+0x167/0x180 > [<c0102433>] sysenter_past_esp+0x54/0x75 > > BUG: soft lockup detected on CPU#0! > > Pid: 6506, comm: zebra > EIP: 0060:[<f8952052>] CPU: 0 > EIP is at u32_classify+0x52/0x170 [cls_u32] > EFLAGS: 00000206 Not tainted (2.6.15.6) > EAX: e2fbd020 EBX: f48649c0 ECX: 00000010 EDX: 29b09d5a > ESI: f48649ec EDI: 00000001 EBP: e2fbd020 DS: 007b ES: 007b > CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 000006d0 > [<f88462fa>] ipt_do_table+0x2de/0x2fd [ip_tables] > [<f883b523>] ip_nat_fn+0x177/0x185 [iptable_nat] > [<f88e159f>] ip_refrag+0x23/0x5f [ip_conntrack] > [<c0244d82>] tc_classify+0x2c/0x3f > [<f895514b>] htb_classify+0x14b/0x1dd [sch_htb] > [<f8955638>] htb_enqueue+0x1d/0x13a [sch_htb] > [<c02396fd>] dev_queue_xmit+0xe4/0x203 > [<c0250de8>] ip_output+0x1e1/0x237 > [<c024f3f5>] ip_forward+0x181/0x1df > [<c024e21a>] ip_rcv+0x40c/0x485 > [<c0239bd0>] netif_receive_skb+0x12f/0x165 > [<f885aa4c>] e1000_clean_rx_irq+0x389/0x410 [e1000] > [<f885a1ca>] e1000_clean+0x94/0x12f [e1000] > [<c0239d5a>] net_rx_action+0x69/0xf0 > [<c011a305>] __do_softirq+0x55/0xbd > [<c011a39a>] do_softirq+0x2d/0x31 > [<c011a3f8>] local_bh_enable+0x5a/0x65 > [<c024a0a1>] rt_run_flush+0x5f/0x80 > [<c027623f>] fn_hash_insert+0x352/0x39f > [<c027364c>] inet_rtm_newroute+0x57/0x62 > [<c02413ed>] rtnetlink_rcv_msg+0x1a8/0x1cb > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c0247c1e>] netlink_rcv_skb+0x3a/0x8b > [<c0247cb1>] netlink_run_queue+0x42/0xc3 > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c0241227>] rtnetlink_rcv+0x22/0x40 > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > [<c024764c>] netlink_data_ready+0x17/0x54 > [<c0246a99>] netlink_sendskb+0x1f/0x39 > [<c0247449>] netlink_sendmsg+0x27b/0x28c > [<c0231467>] sock_sendmsg+0xce/0xe9 > [<c0112b36>] __wake_up+0x27/0x3b > [<c01a625a>] copy_from_user+0x3a/0x60 > [<c01a625a>] copy_from_user+0x3a/0x60 > [<c0126be2>] autoremove_wake_function+0x0/0x3a > [<c0236bcd>] verify_iovec+0x49/0x7f > [<c02327f2>] sys_sendmsg+0x152/0x1a8 > [<c0147a62>] do_sync_read+0xb8/0xeb > [<c0122b7a>] getrusage+0x34/0x43 > [<c011d72f>] update_wall_time+0xa/0x32 > [<c011d9e5>] do_timer+0x33/0xa9 > [<c01a625a>] copy_from_user+0x3a/0x60 > [<c0232b6b>] sys_socketcall+0x167/0x180 > [<c0102433>] sysenter_past_esp+0x54/0x75 > > It happens on 2.6.15.6. Tonight I will try 2.6.16.16 with FIB_TRIE instead of > FIB_HASH. I am submitting that bug under network but I am not completly sure if > it belongs here. This also happens on 2.6.16.6. Reply-To: paulmck@us.ibm.com On Mon, Jun 19, 2006 at 03:20:10PM -0700, Andrew Morton wrote: > bugme-daemon@bugzilla.kernel.org wrote: > > > > http://bugzilla.kernel.org/show_bug.cgi?id=6682 > > > > Summary: BUG: soft lockup detected on CPU#0! / ksoftirqd takse > > 100% CPU > > Kernel Version: 2.6.15.6 > > Status: NEW > > Severity: normal > > Owner: acme@conectiva.com.br > > Submitter: alchemyx@uznam.net.pl > > > > > > Most recent kernel where this bug did not occur: (unknown) > > Distribution: Gentoo > > Hardware Environment: 2x Xeon 2.66, 1 GB RAM, NICS: 2 x e1000, and one double > > port e100. Based on Intel E7501 architecture (2U rack Intel chassis). > > Software Environment: quagga 0.98.6 > > Problem Description: ksoftirqd/0 takes 100% of CPU. further investigation shows > > no sing of network flood or something (and also 2 of 3 NICs are e1000 with > > NAPI). Ocassionaly there are "BUG: soft lockup detected on CPU#0!". > > > > Steps to reproduce: > > > > There is no simple way to reproduce. I think that everythint started when we > > attached second provider with BGP support. We are using quagga which injects > > about 186 000 routes into kernel. When running for a while (at least few hours, > > sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in logs: Is it possible that there is a routing loop, either in the overall configuration or in some intermediate point in the route injection? Both CPUs seem to be receiving ethernet packets at the time of the oops. Thanx, Paul > > BUG: soft lockup detected on CPU#0! > > > > Pid: 6506, comm: zebra > > EIP: 0060:[<c027f6fd>] CPU: 0 > > EIP is at _spin_lock+0x7/0xf > > EFLAGS: 00000286 Not tainted (2.6.15.6) > > EAX: f6203180 EBX: e6fbf000 ECX: 00000000 EDX: f6bec000 > > ESI: f6203000 EDI: eddb4b80 EBP: fffffff4 DS: 007b ES: 007b > > CR0: 8005003b CR2: aca6dff0 CR3: 361ad000 CR4: 000006d0 > > [<c02396f9>] dev_queue_xmit+0xe0/0x203 > > [<c0250de8>] ip_output+0x1e1/0x237 > > [<c024f3f5>] ip_forward+0x181/0x1df > > [<c024e21a>] ip_rcv+0x40c/0x485 > > [<c0239bd0>] netif_receive_skb+0x12f/0x165 > > [<f885aa4c>] e1000_clean_rx_irq+0x389/0x410 [e1000] > > [<f885a1ca>] e1000_clean+0x94/0x12f [e1000] > > [<c0239d5a>] net_rx_action+0x69/0xf0 > > [<c011a305>] __do_softirq+0x55/0xbd > > [<c011a39a>] do_softirq+0x2d/0x31 > > [<c011a3f8>] local_bh_enable+0x5a/0x65 > > [<c024a0a1>] rt_run_flush+0x5f/0x80 > > [<c027623f>] fn_hash_insert+0x352/0x39f > > [<c027364c>] inet_rtm_newroute+0x57/0x62 > > [<c02413ed>] rtnetlink_rcv_msg+0x1a8/0x1cb > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c0247c1e>] netlink_rcv_skb+0x3a/0x8b > > [<c0247cb1>] netlink_run_queue+0x42/0xc3 > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c0241227>] rtnetlink_rcv+0x22/0x40 > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c024764c>] netlink_data_ready+0x17/0x54 > > [<c0246a99>] netlink_sendskb+0x1f/0x39 > > [<c0247449>] netlink_sendmsg+0x27b/0x28c > > [<c0231467>] sock_sendmsg+0xce/0xe9 > > [<c0112b36>] __wake_up+0x27/0x3b > > [<c01a6216>] copy_to_user+0x38/0x42 > > [<c01a625a>] copy_from_user+0x3a/0x60 > > [<c01a625a>] copy_from_user+0x3a/0x60 > > [<c0126be2>] autoremove_wake_function+0x0/0x3a > > [<c0236bcd>] verify_iovec+0x49/0x7f > > [<c02327f2>] sys_sendmsg+0x152/0x1a8 > > [<c0147a62>] do_sync_read+0xb8/0xeb > > [<c01a6216>] copy_to_user+0x38/0x42 > > [<c0126be2>] autoremove_wake_function+0x0/0x3a > > [<c0122b7a>] getrusage+0x34/0x43 > > [<c0168504>] inotify_dentry_parent_queue_event+0x29/0x7c > > [<c01a625a>] copy_from_user+0x3a/0x60 > > [<c0232b6b>] sys_socketcall+0x167/0x180 > > [<c0102433>] sysenter_past_esp+0x54/0x75 > > > > BUG: soft lockup detected on CPU#0! > > > > Pid: 6506, comm: zebra > > EIP: 0060:[<f8952052>] CPU: 0 > > EIP is at u32_classify+0x52/0x170 [cls_u32] > > EFLAGS: 00000206 Not tainted (2.6.15.6) > > EAX: e2fbd020 EBX: f48649c0 ECX: 00000010 EDX: 29b09d5a > > ESI: f48649ec EDI: 00000001 EBP: e2fbd020 DS: 007b ES: 007b > > CR0: 8005003b CR2: 08154004 CR3: 361ad000 CR4: 000006d0 > > [<f88462fa>] ipt_do_table+0x2de/0x2fd [ip_tables] > > [<f883b523>] ip_nat_fn+0x177/0x185 [iptable_nat] > > [<f88e159f>] ip_refrag+0x23/0x5f [ip_conntrack] > > [<c0244d82>] tc_classify+0x2c/0x3f > > [<f895514b>] htb_classify+0x14b/0x1dd [sch_htb] > > [<f8955638>] htb_enqueue+0x1d/0x13a [sch_htb] > > [<c02396fd>] dev_queue_xmit+0xe4/0x203 > > [<c0250de8>] ip_output+0x1e1/0x237 > > [<c024f3f5>] ip_forward+0x181/0x1df > > [<c024e21a>] ip_rcv+0x40c/0x485 > > [<c0239bd0>] netif_receive_skb+0x12f/0x165 > > [<f885aa4c>] e1000_clean_rx_irq+0x389/0x410 [e1000] > > [<f885a1ca>] e1000_clean+0x94/0x12f [e1000] > > [<c0239d5a>] net_rx_action+0x69/0xf0 > > [<c011a305>] __do_softirq+0x55/0xbd > > [<c011a39a>] do_softirq+0x2d/0x31 > > [<c011a3f8>] local_bh_enable+0x5a/0x65 > > [<c024a0a1>] rt_run_flush+0x5f/0x80 > > [<c027623f>] fn_hash_insert+0x352/0x39f > > [<c027364c>] inet_rtm_newroute+0x57/0x62 > > [<c02413ed>] rtnetlink_rcv_msg+0x1a8/0x1cb > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c0247c1e>] netlink_rcv_skb+0x3a/0x8b > > [<c0247cb1>] netlink_run_queue+0x42/0xc3 > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c0241227>] rtnetlink_rcv+0x22/0x40 > > [<c0241245>] rtnetlink_rcv_msg+0x0/0x1cb > > [<c024764c>] netlink_data_ready+0x17/0x54 > > [<c0246a99>] netlink_sendskb+0x1f/0x39 > > [<c0247449>] netlink_sendmsg+0x27b/0x28c > > [<c0231467>] sock_sendmsg+0xce/0xe9 > > [<c0112b36>] __wake_up+0x27/0x3b > > [<c01a625a>] copy_from_user+0x3a/0x60 > > [<c01a625a>] copy_from_user+0x3a/0x60 > > [<c0126be2>] autoremove_wake_function+0x0/0x3a > > [<c0236bcd>] verify_iovec+0x49/0x7f > > [<c02327f2>] sys_sendmsg+0x152/0x1a8 > > [<c0147a62>] do_sync_read+0xb8/0xeb > > [<c0122b7a>] getrusage+0x34/0x43 > > [<c011d72f>] update_wall_time+0xa/0x32 > > [<c011d9e5>] do_timer+0x33/0xa9 > > [<c01a625a>] copy_from_user+0x3a/0x60 > > [<c0232b6b>] sys_socketcall+0x167/0x180 > > [<c0102433>] sysenter_past_esp+0x54/0x75 > > > > It happens on 2.6.15.6. Tonight I will try 2.6.16.16 with FIB_TRIE instead of > > FIB_HASH. I am submitting that bug under network but I am not completly sure if > > it belongs here. > > This also happens on 2.6.16.6. On Mon, Jun 19, 2006 at 10:20:10PM +0000, Andrew Morton wrote:
>
> > [<c02396f9>] dev_queue_xmit+0xe0/0x203
> > [<c0250de8>] ip_output+0x1e1/0x237
> > [<c024f3f5>] ip_forward+0x181/0x1df
> > [<c024e21a>] ip_rcv+0x40c/0x485
> > [<c0239bd0>] netif_receive_skb+0x12f/0x165
> > [<f885aa4c>] e1000_clean_rx_irq+0x389/0x410 [e1000]
> > [<f885a1ca>] e1000_clean+0x94/0x12f [e1000]
> > [<c0239d5a>] net_rx_action+0x69/0xf0
> > [<c011a305>] __do_softirq+0x55/0xbd
> > [<c011a39a>] do_softirq+0x2d/0x31
> > [<c011a3f8>] local_bh_enable+0x5a/0x65
> > [<c024a0a1>] rt_run_flush+0x5f/0x80
Could you tell us the frequency of route updates on this machine?
Route updates are pretty expensive especially when a large number
of flows hits your machine right afterwards.
You can monitor this by running ip mon. You might just be getting
bogus route updates causing unnecessary flushes to the routing cache.
Cheers,
First of all - at the moment I have 2.6.16.19 and it is running OK for about 6 days now, but still testing. I did ip mon for a period of 300 seconds and I got: sauron alchemyx # time ip mon > /tmp/ROUTE Killed real 5m1.234s user 0m0.020s sys 0m0.056s sauron alchemyx # wc -l /tmp/ROUTE 953 /tmp/ROUTE So it is about 3 changes per seconds. Is it much? Funny thing I noticed: 212.52.159.0/24 via 217.153.19.17 dev eth0 proto zebra equalize Deleted 212.52.158.0/24 via 217.153.19.17 dev eth0 proto zebra equalize 212.52.158.0/24 via 217.153.19.17 dev eth0 proto zebra equalize Deleted 212.52.157.0/24 via 217.153.19.17 dev eth0 proto zebra equalize 212.52.157.0/24 via 217.153.19.17 dev eth0 proto zebra equalize Deleted 212.52.156.0/24 via 217.153.19.17 dev eth0 proto zebra equalize 212.52.156.0/24 via 217.153.19.17 dev eth0 proto zebra equalize Deleted 212.52.155.0/24 via 217.153.19.17 dev eth0 proto zebra equalize 212.52.155.0/24 via 217.153.19.17 dev eth0 proto zebra equalize Deleted 212.52.154.0/24 via 217.153.19.17 dev eth0 proto zebra equalize 212.52.154.0/24 via 217.153.19.17 dev eth0 proto zebra equalize Deleted 212.52.153.0/24 via 217.153.19.17 dev eth0 proto zebra equalize 212.52.153.0/24 via 217.153.19.17 dev eth0 proto zebra equalize So quagga is doing some unecessary work putting also some load on routing table. Am I right? Reply-To: Robert.Olsson@data.slu.se Hello! Yes seems the system is very loaded for some reason > > sometimes a day) we get 100% usage on ksoftirqd/0 and following messages in logs: as all softirq's are run via ksoftirqd. That's still OK but why don't the watchdog get any CPU share at all? Mismatch in priorities? Herbert Xu writes: > On Mon, Jun 19, 2006 at 10:20:10PM +0000, Andrew Morton wrote: > > > > > [<c02396f9>] dev_queue_xmit+0xe0/0x203 > > > [<c0250de8>] ip_output+0x1e1/0x237 > > > [<c024f3f5>] ip_forward+0x181/0x1df > > > [<c024e21a>] ip_rcv+0x40c/0x485 > > > [<c0239bd0>] netif_receive_skb+0x12f/0x165 > > > [<f885aa4c>] e1000_clean_rx_irq+0x389/0x410 [e1000] > > > [<f885a1ca>] e1000_clean+0x94/0x12f [e1000] > > > [<c0239d5a>] net_rx_action+0x69/0xf0 > > > [<c011a305>] __do_softirq+0x55/0xbd > > > [<c011a39a>] do_softirq+0x2d/0x31 > > > [<c011a3f8>] local_bh_enable+0x5a/0x65 > > > [<c024a0a1>] rt_run_flush+0x5f/0x80 Normal for a router... > Could you tell us the frequency of route updates on this machine? > Route updates are pretty expensive especially when a large number > of flows hits your machine right afterwards. Yes flush is costly an unfortunly hard to avoid. We discussed this a bit before... > You can monitor this by running ip mon. You might just be getting > bogus route updates causing unnecessary flushes to the routing cache. Just sampled 10 min in one of routers with full 2 * (Full BGP). Well remember Zebra/Quagga has just one set in kernel. Anyway during the 10 minutes I looked I got 4 (insertion/deletions)/second in average. Cheers. --ro ANy updates on this problem? Thanks. At this point we are not using Linux for BGP and routing to upstream providers. Since 2.6.16.19 everything was quite OK. There was no "BUG: soft lockup detected on CPU#0!" but still we were getting 100% load on ksoftirqd. We have never discovered what caused that situation. We are currently using two linux boxes for QoS with HTB. ksoftirqd there depeneds mostly on number of queues and amount of traffic. And from time to time those boxes hang (when deleting root qdisc), usually when there is high traffic. Combination of watchdog and dynamic routing helps to minimize damages, but it still happens. You might want to check what is the other CPU doing by running top or taking sysrq-t trace to see what other processes are on the system and if there any lock or resource contention, and also check if memory goes too low. Running profiler is also something to try - to see which user and kernel functions are the biggest contributors. An update from Herbert: I think there are two problems here: 1) A busy router that gets frequent route updates spends too much time flushing route caches that it can't use. 2) We have some kind of scheduling problem so that when this happens it triggers the soft lockup watchdog. I'm not sure if anything's been done to solve 2), but 1) certainly hasn't changed. Having said that, 1) is not an easy problem to solve and AFAIK nobody is working on it at the moment. Oh and of course there is 3) A routing daemon that generates unnecessary route changes. You should look at eliminating those updates where possible. Michal, any updates on this? Was the problem resolved for you? Thanks. Natalie, I just checked the routing code and nothing has changed with respect to flushing so the problem still stands. Boxes are now quite stable - 138 days uptime (would be longer but we had rewire our racks to make some room) on both servers with HTB. Current used kernel 2.6.20.7. So i think it is OK now. We have sometimes following messages in dmesg, but maybe because of big load. It shows in every HTB box, but they are same hardware configurations. e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH <98> TDT <98> next_to_use <98> next_to_clean <ed> buffer_info[next_to_clean] time_stamp <b1a58d51> next_to_watch <ed> jiffies <b1a58e5a> next_to_watch.status <1> e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Tx Queue <0> TDH <50> TDT <50> next_to_use <50> next_to_clean <a5> buffer_info[next_to_clean] time_stamp <b2135604> next_to_watch <a5> jiffies <b2135730> next_to_watch.status <1> Michal, do you still run BGP on it? If not then that's what has cured the original problem rather than any code fixes that may have gone in. I run BGP on it, but it is for other purposes than before and it has only about 20 routes or so and those change rarely. So you are right it maybe because of not having huge routing tables. Michal, any updates on this problem from your side? If you can't reproduce the problem anymore, maybe we should close the bug now? unless Herbert needs to investigate and work more on this. No updates unfortunately. I think closing is OK for now. You know what, Eriz Dumazet did actually fix this bug back in January so I agree with the closing too. Thanks! This is great that it was fixed, closing it then. |