Bug 10318
Summary: | WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() | ||
---|---|---|---|
Product: | Networking | Reporter: | Pawel Staszewski (pstaszewski) |
Component: | IPV4 | Assignee: | Stephen Hemminger (stephen) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | bunk, jarkao2 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25-rc6-git7 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 |
Description
Pawel Staszewski
2008-03-25 02:50:54 UTC
Reply-To: akpm@linux-foundation.org On Tue, 25 Mar 2008 02:50:54 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10318 > > Summary: WARNING: at arch/x86/mm/highmem_32.c:43 > kmap_atomic_prot+0x87/0x184() > Product: Networking > Version: 2.5 > KernelVersion: 2.6.25-rc6-git7 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: IPV4 > AssignedTo: shemminger@linux-foundation.org > ReportedBy: pstaszewski@artcom.pl > > > Latest working kernel version: 2.6.24 This is a post-2.6.24 regression. > Software Environment: bgp/quagga The app does a lot of route management stuff. > Problem Description: > Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 > [<c021a0bf>] warn_on_slowpath+0x40/0x4f > [<c043c372>] fn_trie_lookup+0xe3/0x288 > [<c043d61a>] fib4_rule_action+0x3d/0x4d > [<c03f6417>] fib_rules_lookup+0x71/0xb6 > [<c043d652>] fib_lookup+0x28/0x36 > [<c023f126>] __rmqueue_smallest+0x83/0xe1 > [<c023f197>] __rmqueue+0x13/0x172 > [<c0211806>] kmap_atomic_prot+0x87/0x184 > [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 > [<c023ff92>] __alloc_pages+0x71/0x2cf > [<c0240229>] __get_free_pages+0x39/0x47 > [<c03f10d0>] neigh_create+0x2d8/0x40e > [<c045b85b>] _read_unlock_bh+0x5/0xd > [<c03f0539>] neigh_lookup+0x92/0x9b > [<c03f1241>] neigh_event_ns+0x3b/0x70 > [<c0432523>] arp_process+0x1e5/0x534 > [<c03edd5a>] dev_queue_xmit+0x279/0x29f > [<c0419415>] ip_finish_output+0x1c6/0x1fc > [<c03f81d5>] tc_classify+0x14/0x6b > [<c03eb578>] netif_receive_skb+0x29f/0x30e > [<c0357e63>] e1000_receive_skb+0x132/0x14c > [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c > [<c0356f82>] e1000_clean+0x29f/0x427 > [<c03ed3ee>] net_rx_action+0x5c/0x14a > [<c021e25e>] __do_softirq+0x5d/0xc1 > [<c021e2f4>] do_softirq+0x32/0x36 > [<c021e585>] irq_exit+0x35/0x67 > [<c0204f79>] do_IRQ+0x73/0x82 > [<c020343b>] common_interrupt+0x23/0x28 > [<c0201377>] mwait_idle_with_hints+0x36/0x39 > [<c020137a>] mwait_idle+0x0/0xa > [<c0201817>] cpu_idle+0xa8/0xc8 > ======================= > ---[ end trace 6a93a9703f6a626e ]--- > ------------[ cut here ]------------ This backtrace is a mess. > WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() > Modules linked in: > Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 > [<c021a0bf>] warn_on_slowpath+0x40/0x4f > [<c043c372>] fn_trie_lookup+0xe3/0x288 > [<c043d61a>] fib4_rule_action+0x3d/0x4d > [<c03f6417>] fib_rules_lookup+0x71/0xb6 > [<c043d652>] fib_lookup+0x28/0x36 > [<c023f126>] __rmqueue_smallest+0x83/0xe1 > [<c023f197>] __rmqueue+0x13/0x172 > [<c0211806>] kmap_atomic_prot+0x87/0x184 > [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 > [<c023ff92>] __alloc_pages+0x71/0x2cf > [<c0240229>] __get_free_pages+0x39/0x47 > [<c03f10d0>] neigh_create+0x2d8/0x40e > [<c045b85b>] _read_unlock_bh+0x5/0xd > [<c03f0539>] neigh_lookup+0x92/0x9b > [<c03f1241>] neigh_event_ns+0x3b/0x70 > [<c0432523>] arp_process+0x1e5/0x534 > [<c03edd5a>] dev_queue_xmit+0x279/0x29f > [<c0419415>] ip_finish_output+0x1c6/0x1fc > [<c03f81d5>] tc_classify+0x14/0x6b > [<c03eb578>] netif_receive_skb+0x29f/0x30e > [<c0357e63>] e1000_receive_skb+0x132/0x14c > [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c > [<c0356f82>] e1000_clean+0x29f/0x427 > [<c03ed3ee>] net_rx_action+0x5c/0x14a > [<c021e25e>] __do_softirq+0x5d/0xc1 > [<c021e2f4>] do_softirq+0x32/0x36 > [<c021e585>] irq_exit+0x35/0x67 > [<c0204f79>] do_IRQ+0x73/0x82 > [<c020343b>] common_interrupt+0x23/0x28 > [<c0201377>] mwait_idle_with_hints+0x36/0x39 > [<c020137a>] mwait_idle+0x0/0xa > [<c0201817>] cpu_idle+0xa8/0xc8 > ======================= They all are. afacit what's happened is that someone is running __alloc_pages(..., __GFP_ZERO) from softirq context. But the __GFP_ZERO implementation uses KM_USER0 which cannot be used from softirq context because non-interrupt code on this CPU might be using the same kmap slot. Can anyone thing of anything which recently changed in either networking core or e1000e which would have triggered this? I think the core MM code is being doubly dumb here. a) We should be able to use __GFP_ZERO from all copntexts. b) it's not a highmem page anyway, so we won't be using that kmap slot. Pawel, can you please confirm that this: --- a/arch/x86/mm/highmem_32.c~a +++ a/arch/x86/mm/highmem_32.c @@ -73,15 +73,15 @@ void *kmap_atomic_prot(struct page *page { enum fixed_addresses idx; unsigned long vaddr; - /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ - - debug_kmap_atomic_prot(type); + /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */ pagefault_disable(); if (!PageHighMem(page)) return page_address(page); + debug_kmap_atomic_prot(type); + idx = type + KM_TYPE_NR*smp_processor_id(); vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx); BUG_ON(!pte_none(*(kmap_pte-idx))); _ fixes it? Thanks. Reply-To: dada1@cosmosbay.com Andrew Morton a On Tue, 25 Mar 2008, Eric Dumazet wrote:
> If kzalloc() or __get_free_pages(__GFP_ZERO) is not allowed from softirq then
> many usages are illegal.
There is clearly no problem for GFP_KERNEL. The only issue could arise for
GFP_HIGHMEM|__GFP_ZERO. I thought we had dealt with these false positives
a while back?
* Andrew Morton <akpm@linux-foundation.org> wrote: > afacit what's happened is that someone is running __alloc_pages(..., > __GFP_ZERO) from softirq context. But the __GFP_ZERO implementation > uses KM_USER0 which cannot be used from softirq context because > non-interrupt code on this CPU might be using the same kmap slot. > > Can anyone thing of anything which recently changed in either > networking core or e1000e which would have triggered this? > > I think the core MM code is being doubly dumb here. > > a) We should be able to use __GFP_ZERO from all copntexts. > > b) it's not a highmem page anyway, so we won't be using that kmap > slot. i think this came up before (with kzalloc()) and the MM code should have been fixed to not even attempt a kmap_atomic(), instead of working it around in the callsite or in the kmap_atomic() code. Ingo and more: ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 [<c021a0bf>] warn_on_slowpath+0x40/0x4f [<c043c372>] fn_trie_lookup+0xe3/0x288 [<c043d61a>] fib4_rule_action+0x3d/0x4d [<c03f6417>] fib_rules_lookup+0x71/0xb6 [<c043d652>] fib_lookup+0x28/0x36 [<c023f126>] __rmqueue_smallest+0x83/0xe1 [<c023f197>] __rmqueue+0x13/0x172 [<c0211806>] kmap_atomic_prot+0x87/0x184 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 [<c023ff92>] __alloc_pages+0x71/0x2cf [<c0240229>] __get_free_pages+0x39/0x47 [<c03f10d0>] neigh_create+0x2d8/0x40e [<c045b85b>] _read_unlock_bh+0x5/0xd [<c03f0539>] neigh_lookup+0x92/0x9b [<c03f1241>] neigh_event_ns+0x3b/0x70 [<c0432523>] arp_process+0x1e5/0x534 [<c03edd5a>] dev_queue_xmit+0x279/0x29f [<c0419415>] ip_finish_output+0x1c6/0x1fc [<c03f81d5>] tc_classify+0x14/0x6b [<c03eb578>] netif_receive_skb+0x29f/0x30e [<c0357e63>] e1000_receive_skb+0x132/0x14c [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] e1000_clean+0x29f/0x427 [<c03ed3ee>] net_rx_action+0x5c/0x14a [<c021e25e>] __do_softirq+0x5d/0xc1 [<c021e2f4>] do_softirq+0x32/0x36 [<c021e585>] irq_exit+0x35/0x67 [<c0204f79>] do_IRQ+0x73/0x82 [<c020343b>] common_interrupt+0x23/0x28 [<c0201377>] mwait_idle_with_hints+0x36/0x39 [<c020137a>] mwait_idle+0x0/0xa [<c0201817>] cpu_idle+0xa8/0xc8 ======================= ---[ end trace 6a93a9703f6a626e ]--- ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 [<c021a0bf>] warn_on_slowpath+0x40/0x4f [<c043c372>] fn_trie_lookup+0xe3/0x288 [<c043d61a>] fib4_rule_action+0x3d/0x4d [<c03f6417>] fib_rules_lookup+0x71/0xb6 [<c043d652>] fib_lookup+0x28/0x36 [<c023f126>] __rmqueue_smallest+0x83/0xe1 [<c023f197>] __rmqueue+0x13/0x172 [<c0211806>] kmap_atomic_prot+0x87/0x184 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 [<c023ff92>] __alloc_pages+0x71/0x2cf [<c0240229>] __get_free_pages+0x39/0x47 [<c03f10d0>] neigh_create+0x2d8/0x40e [<c045b85b>] _read_unlock_bh+0x5/0xd [<c03f0539>] neigh_lookup+0x92/0x9b [<c03f1241>] neigh_event_ns+0x3b/0x70 [<c0432523>] arp_process+0x1e5/0x534 [<c03edd5a>] dev_queue_xmit+0x279/0x29f [<c0419415>] ip_finish_output+0x1c6/0x1fc [<c03f81d5>] tc_classify+0x14/0x6b [<c03eb578>] netif_receive_skb+0x29f/0x30e [<c0357e63>] e1000_receive_skb+0x132/0x14c [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] e1000_clean+0x29f/0x427 [<c03ed3ee>] net_rx_action+0x5c/0x14a [<c021e25e>] __do_softirq+0x5d/0xc1 [<c021e2f4>] do_softirq+0x32/0x36 [<c021e585>] irq_exit+0x35/0x67 [<c0204f79>] do_IRQ+0x73/0x82 [<c020343b>] common_interrupt+0x23/0x28 [<c0201377>] mwait_idle_with_hints+0x36/0x39 [<c020137a>] mwait_idle+0x0/0xa [<c0201817>] cpu_idle+0xa8/0xc8 ======================= ---[ end trace 6a93a9703f6a626e ]--- ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 [<c021a0bf>] warn_on_slowpath+0x40/0x4f [<c022ff00>] clockevents_program_event+0x2e/0xd2 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022df23>] update_wall_time+0x552/0x6c7 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022d8b8>] getnstimeofday+0x2b/0xb2 [<c022ff96>] clockevents_program_event+0xc4/0xd2 [<c023f126>] __rmqueue_smallest+0x83/0xe1 [<c023f197>] __rmqueue+0x13/0x172 [<c0211806>] kmap_atomic_prot+0x87/0x184 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 [<c023ff92>] __alloc_pages+0x71/0x2cf [<c043d61a>] fib4_rule_action+0x3d/0x4d [<c0240229>] __get_free_pages+0x39/0x47 [<c03f10d0>] neigh_create+0x2d8/0x40e [<c045b85b>] _read_unlock_bh+0x5/0xd [<c03f0539>] neigh_lookup+0x92/0x9b [<c04319f3>] arp_bind_neighbour+0x53/0x64 [<c0413343>] rt_intern_hash+0x257/0x32d [<c0414406>] ip_route_input+0x812/0xbc6 [<c0415c16>] ip_rcv_finish+0x2a/0x293 [<c03eb578>] netif_receive_skb+0x29f/0x30e [<c0357e63>] e1000_receive_skb+0x132/0x14c [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] e1000_clean+0x29f/0x427 [<c022c395>] ktime_get_ts+0x11/0x3a [<c03ed3ee>] net_rx_action+0x5c/0x14a [<c021e25e>] __do_softirq+0x5d/0xc1 [<c021e2f4>] do_softirq+0x32/0x36 [<c021e585>] irq_exit+0x35/0x67 [<c0204f79>] do_IRQ+0x73/0x82 [<c020343b>] common_interrupt+0x23/0x28 [<c0201377>] mwait_idle_with_hints+0x36/0x39 [<c020137a>] mwait_idle+0x0/0xa [<c0201817>] cpu_idle+0xa8/0xc8 ======================= ---[ end trace 6a93a9703f6a626e ]--- ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 [<c021a0bf>] warn_on_slowpath+0x40/0x4f [<c022ff00>] clockevents_program_event+0x2e/0xd2 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022df23>] update_wall_time+0x552/0x6c7 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022d8b8>] getnstimeofday+0x2b/0xb2 [<c022ff96>] clockevents_program_event+0xc4/0xd2 [<c023f126>] __rmqueue_smallest+0x83/0xe1 [<c023f197>] __rmqueue+0x13/0x172 [<c0211806>] kmap_atomic_prot+0x87/0x184 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 [<c023ff92>] __alloc_pages+0x71/0x2cf [<c043d61a>] fib4_rule_action+0x3d/0x4d [<c0240229>] __get_free_pages+0x39/0x47 [<c03f10d0>] neigh_create+0x2d8/0x40e [<c045b85b>] _read_unlock_bh+0x5/0xd [<c03f0539>] neigh_lookup+0x92/0x9b [<c04319f3>] arp_bind_neighbour+0x53/0x64 [<c0413343>] rt_intern_hash+0x257/0x32d [<c0414406>] ip_route_input+0x812/0xbc6 [<c0415c16>] ip_rcv_finish+0x2a/0x293 [<c03eb578>] netif_receive_skb+0x29f/0x30e [<c0357e63>] e1000_receive_skb+0x132/0x14c [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] e1000_clean+0x29f/0x427 [<c022c395>] ktime_get_ts+0x11/0x3a [<c03ed3ee>] net_rx_action+0x5c/0x14a [<c021e25e>] __do_softirq+0x5d/0xc1 [<c021e2f4>] do_softirq+0x32/0x36 [<c021e585>] irq_exit+0x35/0x67 [<c0204f79>] do_IRQ+0x73/0x82 [<c020343b>] common_interrupt+0x23/0x28 [<c0201377>] mwait_idle_with_hints+0x36/0x39 [<c020137a>] mwait_idle+0x0/0xa [<c0201817>] cpu_idle+0xa8/0xc8 ======================= ---[ end trace 6a93a9703f6a626e ]--- ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 [<c021a0bf>] warn_on_slowpath+0x40/0x4f [<c022ff00>] clockevents_program_event+0x2e/0xd2 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022df23>] update_wall_time+0x552/0x6c7 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022d8b8>] getnstimeofday+0x2b/0xb2 [<c022ff96>] clockevents_program_event+0xc4/0xd2 [<c023f126>] __rmqueue_smallest+0x83/0xe1 [<c023f197>] __rmqueue+0x13/0x172 [<c0211806>] kmap_atomic_prot+0x87/0x184 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 [<c023ff92>] __alloc_pages+0x71/0x2cf [<c043d61a>] fib4_rule_action+0x3d/0x4d [<c0240229>] __get_free_pages+0x39/0x47 [<c03f10d0>] neigh_create+0x2d8/0x40e [<c045b85b>] _read_unlock_bh+0x5/0xd [<c03f0539>] neigh_lookup+0x92/0x9b [<c04319f3>] arp_bind_neighbour+0x53/0x64 [<c0413343>] rt_intern_hash+0x257/0x32d [<c0414406>] ip_route_input+0x812/0xbc6 [<c0415c16>] ip_rcv_finish+0x2a/0x293 [<c03eb578>] netif_receive_skb+0x29f/0x30e [<c0357e63>] e1000_receive_skb+0x132/0x14c [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] e1000_clean+0x29f/0x427 [<c022c395>] ktime_get_ts+0x11/0x3a [<c03ed3ee>] net_rx_action+0x5c/0x14a [<c021e25e>] __do_softirq+0x5d/0xc1 [<c021e2f4>] do_softirq+0x32/0x36 [<c021e585>] irq_exit+0x35/0x67 [<c0204f79>] do_IRQ+0x73/0x82 [<c020343b>] common_interrupt+0x23/0x28 [<c0201377>] mwait_idle_with_hints+0x36/0x39 [<c020137a>] mwait_idle+0x0/0xa [<c0201817>] cpu_idle+0xa8/0xc8 ======================= ---[ end trace 6a93a9703f6a626e ]--- ------------[ cut here ]------------ WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184() Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1 [<c021a0bf>] warn_on_slowpath+0x40/0x4f [<c022ff00>] clockevents_program_event+0x2e/0xd2 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022df23>] update_wall_time+0x552/0x6c7 [<c022ee6d>] clocksource_get_next+0x39/0x3f [<c022df23>] update_wall_time+0x552/0x6c7 [<c022d8b8>] getnstimeofday+0x2b/0xb2 [<c022ff96>] clockevents_program_event+0xc4/0xd2 [<c023f126>] __rmqueue_smallest+0x83/0xe1 [<c023f197>] __rmqueue+0x13/0x172 [<c0211806>] kmap_atomic_prot+0x87/0x184 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358 [<c023ff92>] __alloc_pages+0x71/0x2cf [<c043d61a>] fib4_rule_action+0x3d/0x4d [<c0240229>] __get_free_pages+0x39/0x47 [<c03f10d0>] neigh_create+0x2d8/0x40e [<c045b85b>] _read_unlock_bh+0x5/0xd [<c03f0539>] neigh_lookup+0x92/0x9b [<c04319f3>] arp_bind_neighbour+0x53/0x64 [<c0413343>] rt_intern_hash+0x257/0x32d [<c0414406>] ip_route_input+0x812/0xbc6 [<c0415c16>] ip_rcv_finish+0x2a/0x293 [<c03eb578>] netif_receive_skb+0x29f/0x30e [<c0357e63>] e1000_receive_skb+0x132/0x14c [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] e1000_clean+0x29f/0x427 [<c022c395>] ktime_get_ts+0x11/0x3a [<c03ed3ee>] net_rx_action+0x5c/0x14a [<c021e25e>] __do_softirq+0x5d/0xc1 [<c021e2f4>] do_softirq+0x32/0x36 [<c021e585>] irq_exit+0x35/0x67 [<c0204f79>] do_IRQ+0x73/0x82 [<c020343b>] common_interrupt+0x23/0x28 [<c0201377>] mwait_idle_with_hints+0x36/0x39 [<c020137a>] mwait_idle+0x0/0xa [<c0201817>] cpu_idle+0xa8/0xc8 ======================= ---[ end trace 6a93a9703f6a626e ]--- eth1: TSO is Disabled eth0: TSO is Disabled eth0: Detected Tx Unit Hang: TDH <76> TDT <76> next_to_use <76> next_to_clean <cb> buffer_info[next_to_clean]: time_stamp <1a432ea> next_to_watch <cb> jiffies <1a43554> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <b0> TDT <b0> next_to_use <b0> next_to_clean <5> buffer_info[next_to_clean]: time_stamp <1a4e551> next_to_watch <5> jiffies <1a4e81e> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <93> TDT <93> next_to_use <93> next_to_clean <e8> buffer_info[next_to_clean]: time_stamp <1a5ea83> next_to_watch <e8> jiffies <1a5ece0> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <1d> TDT <1d> next_to_use <1d> next_to_clean <72> buffer_info[next_to_clean]: time_stamp <1a5f2c4> next_to_watch <72> jiffies <1a5f522> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <31> TDT <31> next_to_use <31> next_to_clean <86> buffer_info[next_to_clean]: time_stamp <1a60384> next_to_watch <86> jiffies <1a605ec> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <1d> TDT <1d> next_to_use <1d> next_to_clean <72> buffer_info[next_to_clean]: time_stamp <1a688a2> next_to_watch <72> jiffies <1a68b0e> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <fb> TDT <fb> next_to_use <fb> next_to_clean <50> buffer_info[next_to_clean]: time_stamp <1a72364> next_to_watch <50> jiffies <1a725dd> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <2a> TDT <2a> next_to_use <2a> next_to_clean <7f> buffer_info[next_to_clean]: time_stamp <1a880c7> next_to_watch <7f> jiffies <1a88363> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <8b> TDT <8b> next_to_use <8b> next_to_clean <e0> buffer_info[next_to_clean]: time_stamp <1a8dee1> next_to_watch <e0> jiffies <1a8e16c> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <47> TDT <47> next_to_use <47> next_to_clean <9c> buffer_info[next_to_clean]: time_stamp <1a92c54> next_to_watch <9c> jiffies <1a92f58> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <5b> TDT <5b> next_to_use <5b> next_to_clean <b0> buffer_info[next_to_clean]: time_stamp <1a93f7d> next_to_watch <b0> jiffies <1a9425a> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <e5> TDT <e5> next_to_use <e5> next_to_clean <3a> buffer_info[next_to_clean]: time_stamp <1a94950> next_to_watch <3a> jiffies <1a94c22> next_to_watch.status <1> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] Pid: 9, comm: events/0 Not tainted (2.6.25-rc6-git7 #1) EIP: 0060:[<c02ec183>] EFLAGS: 00000202 CPU: 0 EIP is at _raw_spin_lock+0x30/0xed EAX: f7c6a070 EBX: f7d43130 ECX: 00000000 EDX: f7d80000 ESI: 00003313 EDI: 00020000 EBP: 00003130 ESP: f7c77c90 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7dee031 CR3: 36084000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0412665>] ? rt_garbage_collect+0x131/0x27e [<c03ef171>] ? dst_alloc+0x16/0x5f [<c041429e>] ? ip_route_input+0x6aa/0xbc6 [<c0415c16>] ? ip_rcv_finish+0x2a/0x293 [<c03eb578>] ? netif_receive_skb+0x29f/0x30e [<c0357e63>] ? e1000_receive_skb+0x132/0x14c [<c0359ecf>] ? e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] ? e1000_clean+0x29f/0x427 [<c03f2a49>] ? neigh_timer_handler+0x248/0x27f [<c03ed3ee>] ? net_rx_action+0x5c/0x14a [<c021e25e>] ? __do_softirq+0x5d/0xc1 [<c021e2f4>] ? do_softirq+0x32/0x36 [<c021e585>] ? irq_exit+0x35/0x67 [<c020d084>] ? smp_apic_timer_interrupt+0x6e/0x7a [<c02034f8>] ? apic_timer_interrupt+0x28/0x30 [<c03ef022>] ? dst_gc_task+0x2f/0x168 [<c02e75d0>] ? rb_insert_color+0x4c/0xad [<c022bd00>] ? enqueue_hrtimer+0xe0/0xeb [<c045b917>] ? _spin_lock_irqsave+0x9/0xd [<c0221895>] ? lock_timer_base+0x19/0x35 [<c02219ac>] ? __mod_timer+0x9d/0xa7 [<c022745f>] ? queue_delayed_work_on+0x8b/0x97 [<c0244bbd>] ? vmstat_update+0x0/0x24 [<c03eeff3>] ? dst_gc_task+0x0/0x168 [<c0226dbd>] ? run_workqueue+0x6f/0xe7 [<c02274d4>] ? worker_thread+0x0/0x83 [<c022754d>] ? worker_thread+0x79/0x83 [<c0229754>] ? autoremove_wake_function+0x0/0x2d [<c02296a1>] ? kthread+0x38/0x5f [<c0229669>] ? kthread+0x0/0x5f [<c020363b>] ? kernel_thread_helper+0x7/0x10 ======================= eth0: Detected Tx Unit Hang: TDH <83> TDT <83> next_to_use <83> next_to_clean <d8> buffer_info[next_to_clean]: time_stamp <1a96565> next_to_watch <d8> jiffies <1a9682f> next_to_watch.status <1> eth1: Detected Tx Unit Hang: TDH <ba> TDT <ba> next_to_use <ba> next_to_clean <ae> buffer_info[next_to_clean]: time_stamp <1a96762> next_to_watch <ae> jiffies <1a968c5> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <21> TDT <21> next_to_use <21> next_to_clean <76> buffer_info[next_to_clean]: time_stamp <1a980d7> next_to_watch <76> jiffies <1a983bc> next_to_watch.status <1> eth1: Detected Tx Unit Hang: TDH <d0> TDT <d0> next_to_use <d0> next_to_clean <25> buffer_info[next_to_clean]: time_stamp <1a98a14> next_to_watch <25> jiffies <1a98d80> next_to_watch.status <1> eth0: Detected Tx Unit Hang: TDH <bf> TDT <bf> next_to_use <bf> next_to_clean <14> buffer_info[next_to_clean]: time_stamp <1a99c6d> next_to_watch <14> jiffies <1a99f2e> next_to_watch.status <1> eth1: Detected Tx Unit Hang: TDH <e0> TDT <e0> next_to_use <e0> next_to_clean <35> buffer_info[next_to_clean]: time_stamp <1a99c54> next_to_watch <35> jiffies <1a99f93> next_to_watch.status <1> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9] Pid: 9, comm: events/0 Not tainted (2.6.25-rc6-git7 #1) EIP: 0060:[<c02ec160>] EFLAGS: 00000246 CPU: 0 EIP is at _raw_spin_lock+0xd/0xed EAX: f7d4f8e0 EBX: f7d4f8e0 ECX: 00000000 EDX: f7d80000 ESI: 0000af8e EDI: 00020000 EBP: 0000f8e0 ESP: f7c77c90 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: b7dee031 CR3: 36084000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c0412665>] ? rt_garbage_collect+0x131/0x27e [<c03ef171>] ? dst_alloc+0x16/0x5f [<c041429e>] ? ip_route_input+0x6aa/0xbc6 [<c0415c16>] ? ip_rcv_finish+0x2a/0x293 [<c03eb578>] ? netif_receive_skb+0x29f/0x30e [<c0357e63>] ? e1000_receive_skb+0x132/0x14c [<c0359ecf>] ? e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] ? e1000_clean+0x29f/0x427 [<c03ed3ee>] ? net_rx_action+0x5c/0x14a [<c021e25e>] ? __do_softirq+0x5d/0xc1 [<c021e2f4>] ? do_softirq+0x32/0x36 [<c021e585>] ? irq_exit+0x35/0x67 [<c020d084>] ? smp_apic_timer_interrupt+0x6e/0x7a [<c02034f8>] ? apic_timer_interrupt+0x28/0x30 [<c03ef022>] ? dst_gc_task+0x2f/0x168 [<c02e75d0>] ? rb_insert_color+0x4c/0xad [<c022bd00>] ? enqueue_hrtimer+0xe0/0xeb [<c045b917>] ? _spin_lock_irqsave+0x9/0xd [<c0221895>] ? lock_timer_base+0x19/0x35 [<c02219ac>] ? __mod_timer+0x9d/0xa7 [<c022745f>] ? queue_delayed_work_on+0x8b/0x97 [<c0244bbd>] ? vmstat_update+0x0/0x24 [<c03eeff3>] ? dst_gc_task+0x0/0x168 [<c0226dbd>] ? run_workqueue+0x6f/0xe7 [<c02274d4>] ? worker_thread+0x0/0x83 [<c022754d>] ? worker_thread+0x79/0x83 [<c0229754>] ? autoremove_wake_function+0x0/0x2d [<c02296a1>] ? kthread+0x38/0x5f [<c0229669>] ? kthread+0x0/0x5f [<c020363b>] ? kernel_thread_helper+0x7/0x10 ======================= BUG: soft lockup - CPU#1 stuck for 61s! [snmpd:5408] Pid: 5408, comm: snmpd Not tainted (2.6.25-rc6-git7 #1) EIP: 0060:[<c0412665>] EFLAGS: 00200202 CPU: 1 EIP is at rt_garbage_collect+0x131/0x27e EAX: f767edf0 EBX: 00000000 ECX: 00000000 EDX: f7d86d01 ESI: 0000ef84 EDI: 00020000 EBP: 0000f840 ESP: f6d09a1c DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: b7fd8000 CR3: 370b5000 CR4: 00000690 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c03ef171>] ? dst_alloc+0x16/0x5f [<c041429e>] ? ip_route_input+0x6aa/0xbc6 [<c0415c16>] ? ip_rcv_finish+0x2a/0x293 [<c03eb578>] ? netif_receive_skb+0x29f/0x30e [<c0357e63>] ? e1000_receive_skb+0x132/0x14c [<c0359ecf>] ? e1000_clean_rx_irq+0x1fa/0x29c [<c0356f82>] ? e1000_clean+0x29f/0x427 [<c03f2a55>] ? neigh_timer_handler+0x254/0x27f [<c03ed3ee>] ? net_rx_action+0x5c/0x14a [<c021e25e>] ? __do_softirq+0x5d/0xc1 [<c021e2f4>] ? do_softirq+0x32/0x36 [<c021e545>] ? local_bh_enable_ip+0x35/0x40 [<c04126c7>] ? rt_garbage_collect+0x193/0x27e [<c03ef171>] ? dst_alloc+0x16/0x5f [<c04138b7>] ? __ip_route_output_key+0x49e/0x6db [<c02916aa>] ? inode2sd+0x123/0x138 [<c0413aff>] ? ip_route_output_flow+0xb/0x3e [<c0430c22>] ? udp_sendmsg+0x296/0x538 [<c0435af6>] ? inet_sendmsg+0x35/0x3f [<c03e24a4>] ? sock_sendmsg+0xb8/0xd1 [<c0229754>] ? autoremove_wake_function+0x0/0x2d [<c02987da>] ? reiserfs_dirty_inode+0x70/0x78 [<c02e97a1>] ? copy_from_user+0x2d/0x59 [<c02e97a1>] ? copy_from_user+0x2d/0x59 [<c03e264c>] ? sys_sendmsg+0x18f/0x1f2 [<c023cfe5>] ? generic_file_aio_write+0x5e/0xac [<c0257a84>] ? do_sync_write+0xbf/0x100 [<c0229754>] ? autoremove_wake_function+0x0/0x2d [<c03e3458>] ? sys_socketcall+0x168/0x181 [<c0202a2a>] ? sysenter_past_esp+0x5f/0x85 ======================= This server is running BGP/quagga with ~5000 /29 net prefixes ip a sh | grep inet | wc -l 5342 ip l sh |grep vlan | wc -l 121 average connected clients: ip neigh show | wc -l 2355 ethtool -S eth0 NIC statistics: rx_packets: 1819803357 tx_packets: 1714843474 rx_bytes: 1502903186279 tx_bytes: 1382777787478 rx_broadcast: 1730081 tx_broadcast: 1028199 rx_multicast: 4006 tx_multicast: 0 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 4006 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 72797 rx_missed_errors: 5745439 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 2698 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 1502903186279 rx_csum_offload_good: 1808325083 rx_csum_offload_errors: 3263 rx_header_split: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 0 rx_dma_failed: 0 tx_dma_failed: 0 ethtool -S eth1 NIC statistics: rx_packets: 1717858068 tx_packets: 1815955165 rx_bytes: 1387719125983 tx_bytes: 1502643097276 rx_broadcast: 4560 tx_broadcast: 114 rx_multicast: 362 tx_multicast: 0 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 362 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 111097 rx_missed_errors: 3987953 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 3078 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 1387719125983 rx_csum_offload_good: 1711660724 rx_csum_offload_errors: 436 rx_header_split: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 0 rx_dma_failed: 0 tx_dma_failed: 0 cat /proc/interrupts CPU0 CPU1 0: 175 76 IO-APIC-edge timer 1: 1 1 IO-APIC-edge i8042 6: 1 1 IO-APIC-edge floppy 8: 11 15 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 12: 2 2 IO-APIC-edge i8042 18: 19 15 IO-APIC-fasteoi 218: 771508779 182 PCI-MSI-edge eth1 219: 12251 748704785 PCI-MSI-edge eth0 220: 65164 62870 PCI-MSI-edge ahci NMI: 0 0 Non-maskable interrupts LOC: 9761747 8884458 Local timer interrupts RES: 3077 4847 Rescheduling interrupts CAL: 713 375 function call interrupts TLB: 2896 4796 TLB shootdowns TRM: 0 0 Thermal event interrupts SPU: 0 0 Spurious interrupts ERR: 0 MIS: 0 > 218: 771508779 182 PCI-MSI-edge eth1
> 219: 12251 748704785 PCI-MSI-edge eth0
From curiosity: did you try with affinity set for these irqs?
Regards,
Jarek P.
hmm. yes i use affinity CPU0 eth0 / CPU1 eth1 Because one CPU can't handle gigabit traffic... always 100% CPU (with traffic management use "iproute2/hfsc") those irq's that are on cpu0:219: 12251 /cpu1:218 182 are from mistake because i try to change cpu's irq bind for test. |