Bug 10318

Summary: WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Product: Networking Reporter: Pawel Staszewski (pstaszewski)
Component: IPV4Assignee: Stephen Hemminger (stephen)
Status: CLOSED CODE_FIX    
Severity: normal CC: bunk, jarkao2
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc6-git7 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    

Description Pawel Staszewski 2008-03-25 02:50:54 UTC
Latest working kernel version: 2.6.24

Distribution: gentoo
Hardware Environment:
lspci
00:00.0 Host bridge: Intel Corporation Server DRAM Controller
00:01.0 PCI bridge: Intel Corporation Server Host-Primary PCI Express Bridge
00:19.0 Ethernet controller: Intel Corporation 82566DM-2 Gigabit Network Connection (rev 02)
00:1a.0 USB Controller: Intel Corporation USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation USB2 EHCI Controller #2 (rev 02)
00:1c.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation PCI Express Port 5 (rev 02)
00:1d.0 USB Controller: Intel Corporation USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation SMBus Controller (rev 02)
01:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06)
03:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e [Pilot] ServerEngines (SEP1) (rev 02)
04:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05)

cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU            3075  @ 2.66GHz
stepping        : 11
cpu MHz         : 2660.095
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 5324.11
clflush size    : 64

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU            3075  @ 2.66GHz
stepping        : 11
cpu MHz         : 2660.095
cache size      : 4096 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr lahf_lm
bogomips        : 5320.37
clflush size    : 64
cat /proc/meminfo
MemTotal:      2068148 kB
MemFree:       1432804 kB
Buffers:        203020 kB
Cached:         253120 kB
SwapCached:          0 kB
Active:         204576 kB
Inactive:       263872 kB
HighTotal:     1177008 kB
HighFree:       908920 kB
LowTotal:       891140 kB
LowFree:        523884 kB
SwapTotal:      995896 kB
SwapFree:       995896 kB
Dirty:            1724 kB
Writeback:           0 kB
AnonPages:       12308 kB
Mapped:           5072 kB
Slab:           158532 kB
SReclaimable:    74692 kB
SUnreclaim:      83840 kB
PageTables:        260 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:   2029968 kB
Committed_AS:    20436 kB
VmallocTotal:   110584 kB
VmallocUsed:      1876 kB
VmallocChunk:   108612 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
HugePages_Surp:      0
Hugepagesize:     4096 kB

Software Environment: bgp/quagga
Problem Description:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c043c372>] fn_trie_lookup+0xe3/0x288
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c03f6417>] fib_rules_lookup+0x71/0xb6
 [<c043d652>] fib_lookup+0x28/0x36
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c03f1241>] neigh_event_ns+0x3b/0x70
 [<c0432523>] arp_process+0x1e5/0x534
 [<c03edd5a>] dev_queue_xmit+0x279/0x29f
 [<c0419415>] ip_finish_output+0x1c6/0x1fc
 [<c03f81d5>] tc_classify+0x14/0x6b
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c043c372>] fn_trie_lookup+0xe3/0x288
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c03f6417>] fib_rules_lookup+0x71/0xb6
 [<c043d652>] fib_lookup+0x28/0x36
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c03f1241>] neigh_event_ns+0x3b/0x70
 [<c0432523>] arp_process+0x1e5/0x534
 [<c03edd5a>] dev_queue_xmit+0x279/0x29f
 [<c0419415>] ip_finish_output+0x1c6/0x1fc
 [<c03f81d5>] tc_classify+0x14/0x6b
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
Comment 1 Anonymous Emailer 2008-03-25 10:58:30 UTC
Reply-To: akpm@linux-foundation.org

On Tue, 25 Mar 2008 02:50:54 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10318
> 
>            Summary: WARNING: at arch/x86/mm/highmem_32.c:43
>                     kmap_atomic_prot+0x87/0x184()
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.25-rc6-git7
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@linux-foundation.org
>         ReportedBy: pstaszewski@artcom.pl
> 
> 
> Latest working kernel version: 2.6.24

This is a post-2.6.24 regression.

> Software Environment: bgp/quagga

The app does a lot of route management stuff.

> Problem Description:
> Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
>  [<c021a0bf>] warn_on_slowpath+0x40/0x4f
>  [<c043c372>] fn_trie_lookup+0xe3/0x288
>  [<c043d61a>] fib4_rule_action+0x3d/0x4d
>  [<c03f6417>] fib_rules_lookup+0x71/0xb6
>  [<c043d652>] fib_lookup+0x28/0x36
>  [<c023f126>] __rmqueue_smallest+0x83/0xe1
>  [<c023f197>] __rmqueue+0x13/0x172
>  [<c0211806>] kmap_atomic_prot+0x87/0x184
>  [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
>  [<c023ff92>] __alloc_pages+0x71/0x2cf
>  [<c0240229>] __get_free_pages+0x39/0x47
>  [<c03f10d0>] neigh_create+0x2d8/0x40e
>  [<c045b85b>] _read_unlock_bh+0x5/0xd
>  [<c03f0539>] neigh_lookup+0x92/0x9b
>  [<c03f1241>] neigh_event_ns+0x3b/0x70
>  [<c0432523>] arp_process+0x1e5/0x534
>  [<c03edd5a>] dev_queue_xmit+0x279/0x29f
>  [<c0419415>] ip_finish_output+0x1c6/0x1fc
>  [<c03f81d5>] tc_classify+0x14/0x6b
>  [<c03eb578>] netif_receive_skb+0x29f/0x30e
>  [<c0357e63>] e1000_receive_skb+0x132/0x14c
>  [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
>  [<c0356f82>] e1000_clean+0x29f/0x427
>  [<c03ed3ee>] net_rx_action+0x5c/0x14a
>  [<c021e25e>] __do_softirq+0x5d/0xc1
>  [<c021e2f4>] do_softirq+0x32/0x36
>  [<c021e585>] irq_exit+0x35/0x67
>  [<c0204f79>] do_IRQ+0x73/0x82
>  [<c020343b>] common_interrupt+0x23/0x28
>  [<c0201377>] mwait_idle_with_hints+0x36/0x39
>  [<c020137a>] mwait_idle+0x0/0xa
>  [<c0201817>] cpu_idle+0xa8/0xc8
>  =======================
> ---[ end trace 6a93a9703f6a626e ]---
> ------------[ cut here ]------------

This backtrace is a mess.

> WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
> Modules linked in:
> Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
>  [<c021a0bf>] warn_on_slowpath+0x40/0x4f
>  [<c043c372>] fn_trie_lookup+0xe3/0x288
>  [<c043d61a>] fib4_rule_action+0x3d/0x4d
>  [<c03f6417>] fib_rules_lookup+0x71/0xb6
>  [<c043d652>] fib_lookup+0x28/0x36
>  [<c023f126>] __rmqueue_smallest+0x83/0xe1
>  [<c023f197>] __rmqueue+0x13/0x172
>  [<c0211806>] kmap_atomic_prot+0x87/0x184
>  [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
>  [<c023ff92>] __alloc_pages+0x71/0x2cf
>  [<c0240229>] __get_free_pages+0x39/0x47
>  [<c03f10d0>] neigh_create+0x2d8/0x40e
>  [<c045b85b>] _read_unlock_bh+0x5/0xd
>  [<c03f0539>] neigh_lookup+0x92/0x9b
>  [<c03f1241>] neigh_event_ns+0x3b/0x70
>  [<c0432523>] arp_process+0x1e5/0x534
>  [<c03edd5a>] dev_queue_xmit+0x279/0x29f
>  [<c0419415>] ip_finish_output+0x1c6/0x1fc
>  [<c03f81d5>] tc_classify+0x14/0x6b
>  [<c03eb578>] netif_receive_skb+0x29f/0x30e
>  [<c0357e63>] e1000_receive_skb+0x132/0x14c
>  [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
>  [<c0356f82>] e1000_clean+0x29f/0x427
>  [<c03ed3ee>] net_rx_action+0x5c/0x14a
>  [<c021e25e>] __do_softirq+0x5d/0xc1
>  [<c021e2f4>] do_softirq+0x32/0x36
>  [<c021e585>] irq_exit+0x35/0x67
>  [<c0204f79>] do_IRQ+0x73/0x82
>  [<c020343b>] common_interrupt+0x23/0x28
>  [<c0201377>] mwait_idle_with_hints+0x36/0x39
>  [<c020137a>] mwait_idle+0x0/0xa
>  [<c0201817>] cpu_idle+0xa8/0xc8
>  =======================

They all are.

afacit what's happened is that someone is running __alloc_pages(...,
__GFP_ZERO) from softirq context.  But the __GFP_ZERO implementation uses
KM_USER0 which cannot be used from softirq context because non-interrupt
code on this CPU might be using the same kmap slot.

Can anyone thing of anything which recently changed in either networking
core or e1000e which would have triggered this?

I think the core MM code is being doubly dumb here.

a) We should be able to use __GFP_ZERO from all copntexts.

b) it's not a highmem page anyway, so we won't be using that kmap slot.

Pawel, can you please confirm that this:

--- a/arch/x86/mm/highmem_32.c~a
+++ a/arch/x86/mm/highmem_32.c
@@ -73,15 +73,15 @@ void *kmap_atomic_prot(struct page *page
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
-	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
-
-	debug_kmap_atomic_prot(type);
 
+	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
 	pagefault_disable();
 
 	if (!PageHighMem(page))
 		return page_address(page);
 
+	debug_kmap_atomic_prot(type);
+
 	idx = type + KM_TYPE_NR*smp_processor_id();
 	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
 	BUG_ON(!pte_none(*(kmap_pte-idx)));
_

fixes it?

Thanks.
Comment 2 Anonymous Emailer 2008-03-25 11:45:47 UTC
Reply-To: dada1@cosmosbay.com

Andrew Morton a 
Comment 3 Christoph Lameter 2008-03-25 12:03:34 UTC
On Tue, 25 Mar 2008, Eric Dumazet wrote:

> If kzalloc() or __get_free_pages(__GFP_ZERO) is not allowed from softirq then
> many usages are illegal.

There is clearly no problem for GFP_KERNEL. The only issue could arise for 
GFP_HIGHMEM|__GFP_ZERO. I thought we had dealt with these false positives 
a while back?
Comment 4 Ingo Molnar 2008-03-25 13:24:40 UTC
* Andrew Morton <akpm@linux-foundation.org> wrote:

> afacit what's happened is that someone is running __alloc_pages(..., 
> __GFP_ZERO) from softirq context.  But the __GFP_ZERO implementation 
> uses KM_USER0 which cannot be used from softirq context because 
> non-interrupt code on this CPU might be using the same kmap slot.
> 
> Can anyone thing of anything which recently changed in either 
> networking core or e1000e which would have triggered this?
> 
> I think the core MM code is being doubly dumb here.
> 
> a) We should be able to use __GFP_ZERO from all copntexts.
> 
> b) it's not a highmem page anyway, so we won't be using that kmap 
> slot.

i think this came up before (with kzalloc()) and the MM code should have 
been fixed to not even attempt a kmap_atomic(), instead of working it 
around in the callsite or in the kmap_atomic() code.

	Ingo
Comment 5 Pawel Staszewski 2008-03-25 18:18:25 UTC
and more:
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c043c372>] fn_trie_lookup+0xe3/0x288
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c03f6417>] fib_rules_lookup+0x71/0xb6
 [<c043d652>] fib_lookup+0x28/0x36
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c03f1241>] neigh_event_ns+0x3b/0x70
 [<c0432523>] arp_process+0x1e5/0x534
 [<c03edd5a>] dev_queue_xmit+0x279/0x29f
 [<c0419415>] ip_finish_output+0x1c6/0x1fc
 [<c03f81d5>] tc_classify+0x14/0x6b
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c043c372>] fn_trie_lookup+0xe3/0x288
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c03f6417>] fib_rules_lookup+0x71/0xb6
 [<c043d652>] fib_lookup+0x28/0x36
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c03f1241>] neigh_event_ns+0x3b/0x70
 [<c0432523>] arp_process+0x1e5/0x534
 [<c03edd5a>] dev_queue_xmit+0x279/0x29f
 [<c0419415>] ip_finish_output+0x1c6/0x1fc
 [<c03f81d5>] tc_classify+0x14/0x6b
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
------------[ cut here ]------------
WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git7 #1
 [<c021a0bf>] warn_on_slowpath+0x40/0x4f
 [<c022ff00>] clockevents_program_event+0x2e/0xd2
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022ee6d>] clocksource_get_next+0x39/0x3f
 [<c022df23>] update_wall_time+0x552/0x6c7
 [<c022d8b8>] getnstimeofday+0x2b/0xb2
 [<c022ff96>] clockevents_program_event+0xc4/0xd2
 [<c023f126>] __rmqueue_smallest+0x83/0xe1
 [<c023f197>] __rmqueue+0x13/0x172
 [<c0211806>] kmap_atomic_prot+0x87/0x184
 [<c023fe7c>] get_page_from_freelist+0x2c5/0x358
 [<c023ff92>] __alloc_pages+0x71/0x2cf
 [<c043d61a>] fib4_rule_action+0x3d/0x4d
 [<c0240229>] __get_free_pages+0x39/0x47
 [<c03f10d0>] neigh_create+0x2d8/0x40e
 [<c045b85b>] _read_unlock_bh+0x5/0xd
 [<c03f0539>] neigh_lookup+0x92/0x9b
 [<c04319f3>] arp_bind_neighbour+0x53/0x64
 [<c0413343>] rt_intern_hash+0x257/0x32d
 [<c0414406>] ip_route_input+0x812/0xbc6
 [<c0415c16>] ip_rcv_finish+0x2a/0x293
 [<c03eb578>] netif_receive_skb+0x29f/0x30e
 [<c0357e63>] e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] e1000_clean+0x29f/0x427
 [<c022c395>] ktime_get_ts+0x11/0x3a
 [<c03ed3ee>] net_rx_action+0x5c/0x14a
 [<c021e25e>] __do_softirq+0x5d/0xc1
 [<c021e2f4>] do_softirq+0x32/0x36
 [<c021e585>] irq_exit+0x35/0x67
 [<c0204f79>] do_IRQ+0x73/0x82
 [<c020343b>] common_interrupt+0x23/0x28
 [<c0201377>] mwait_idle_with_hints+0x36/0x39
 [<c020137a>] mwait_idle+0x0/0xa
 [<c0201817>] cpu_idle+0xa8/0xc8
 =======================
---[ end trace 6a93a9703f6a626e ]---
eth1: TSO is Disabled
eth0: TSO is Disabled
eth0: Detected Tx Unit Hang:
  TDH                  <76>
  TDT                  <76>
  next_to_use          <76>
  next_to_clean        <cb>
buffer_info[next_to_clean]:
  time_stamp           <1a432ea>
  next_to_watch        <cb>
  jiffies              <1a43554>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <b0>
  TDT                  <b0>
  next_to_use          <b0>
  next_to_clean        <5>
buffer_info[next_to_clean]:
  time_stamp           <1a4e551>
  next_to_watch        <5>
  jiffies              <1a4e81e>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <93>
  TDT                  <93>
  next_to_use          <93>
  next_to_clean        <e8>
buffer_info[next_to_clean]:
  time_stamp           <1a5ea83>
  next_to_watch        <e8>
  jiffies              <1a5ece0>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <1d>
  TDT                  <1d>
  next_to_use          <1d>
  next_to_clean        <72>
buffer_info[next_to_clean]:
  time_stamp           <1a5f2c4>
  next_to_watch        <72>
  jiffies              <1a5f522>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <31>
  TDT                  <31>
  next_to_use          <31>
  next_to_clean        <86>
buffer_info[next_to_clean]:
  time_stamp           <1a60384>
  next_to_watch        <86>
  jiffies              <1a605ec>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <1d>
  TDT                  <1d>
  next_to_use          <1d>
  next_to_clean        <72>
buffer_info[next_to_clean]:
  time_stamp           <1a688a2>
  next_to_watch        <72>
  jiffies              <1a68b0e>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <fb>
  TDT                  <fb>
  next_to_use          <fb>
  next_to_clean        <50>
buffer_info[next_to_clean]:
  time_stamp           <1a72364>
  next_to_watch        <50>
  jiffies              <1a725dd>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <2a>
  TDT                  <2a>
  next_to_use          <2a>
  next_to_clean        <7f>
buffer_info[next_to_clean]:
  time_stamp           <1a880c7>
  next_to_watch        <7f>
  jiffies              <1a88363>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <8b>
  TDT                  <8b>
  next_to_use          <8b>
  next_to_clean        <e0>
buffer_info[next_to_clean]:
  time_stamp           <1a8dee1>
  next_to_watch        <e0>
  jiffies              <1a8e16c>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <47>
  TDT                  <47>
  next_to_use          <47>
  next_to_clean        <9c>
buffer_info[next_to_clean]:
  time_stamp           <1a92c54>
  next_to_watch        <9c>
  jiffies              <1a92f58>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <5b>
  TDT                  <5b>
  next_to_use          <5b>
  next_to_clean        <b0>
buffer_info[next_to_clean]:
  time_stamp           <1a93f7d>
  next_to_watch        <b0>
  jiffies              <1a9425a>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <e5>
  TDT                  <e5>
  next_to_use          <e5>
  next_to_clean        <3a>
buffer_info[next_to_clean]:
  time_stamp           <1a94950>
  next_to_watch        <3a>
  jiffies              <1a94c22>
  next_to_watch.status <1>
BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]

Pid: 9, comm: events/0 Not tainted (2.6.25-rc6-git7 #1)
EIP: 0060:[<c02ec183>] EFLAGS: 00000202 CPU: 0
EIP is at _raw_spin_lock+0x30/0xed
EAX: f7c6a070 EBX: f7d43130 ECX: 00000000 EDX: f7d80000
ESI: 00003313 EDI: 00020000 EBP: 00003130 ESP: f7c77c90
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7dee031 CR3: 36084000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0412665>] ? rt_garbage_collect+0x131/0x27e
 [<c03ef171>] ? dst_alloc+0x16/0x5f
 [<c041429e>] ? ip_route_input+0x6aa/0xbc6
 [<c0415c16>] ? ip_rcv_finish+0x2a/0x293
 [<c03eb578>] ? netif_receive_skb+0x29f/0x30e
 [<c0357e63>] ? e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] ? e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] ? e1000_clean+0x29f/0x427
 [<c03f2a49>] ? neigh_timer_handler+0x248/0x27f
 [<c03ed3ee>] ? net_rx_action+0x5c/0x14a
 [<c021e25e>] ? __do_softirq+0x5d/0xc1
 [<c021e2f4>] ? do_softirq+0x32/0x36
 [<c021e585>] ? irq_exit+0x35/0x67
 [<c020d084>] ? smp_apic_timer_interrupt+0x6e/0x7a
 [<c02034f8>] ? apic_timer_interrupt+0x28/0x30
 [<c03ef022>] ? dst_gc_task+0x2f/0x168
 [<c02e75d0>] ? rb_insert_color+0x4c/0xad
 [<c022bd00>] ? enqueue_hrtimer+0xe0/0xeb
 [<c045b917>] ? _spin_lock_irqsave+0x9/0xd
 [<c0221895>] ? lock_timer_base+0x19/0x35
 [<c02219ac>] ? __mod_timer+0x9d/0xa7
 [<c022745f>] ? queue_delayed_work_on+0x8b/0x97
 [<c0244bbd>] ? vmstat_update+0x0/0x24
 [<c03eeff3>] ? dst_gc_task+0x0/0x168
 [<c0226dbd>] ? run_workqueue+0x6f/0xe7
 [<c02274d4>] ? worker_thread+0x0/0x83
 [<c022754d>] ? worker_thread+0x79/0x83
 [<c0229754>] ? autoremove_wake_function+0x0/0x2d
 [<c02296a1>] ? kthread+0x38/0x5f
 [<c0229669>] ? kthread+0x0/0x5f
 [<c020363b>] ? kernel_thread_helper+0x7/0x10
 =======================
eth0: Detected Tx Unit Hang:
  TDH                  <83>
  TDT                  <83>
  next_to_use          <83>
  next_to_clean        <d8>
buffer_info[next_to_clean]:
  time_stamp           <1a96565>
  next_to_watch        <d8>
  jiffies              <1a9682f>
  next_to_watch.status <1>
eth1: Detected Tx Unit Hang:
  TDH                  <ba>
  TDT                  <ba>
  next_to_use          <ba>
  next_to_clean        <ae>
buffer_info[next_to_clean]:
  time_stamp           <1a96762>
  next_to_watch        <ae>
  jiffies              <1a968c5>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <21>
  TDT                  <21>
  next_to_use          <21>
  next_to_clean        <76>
buffer_info[next_to_clean]:
  time_stamp           <1a980d7>
  next_to_watch        <76>
  jiffies              <1a983bc>
  next_to_watch.status <1>
eth1: Detected Tx Unit Hang:
  TDH                  <d0>
  TDT                  <d0>
  next_to_use          <d0>
  next_to_clean        <25>
buffer_info[next_to_clean]:
  time_stamp           <1a98a14>
  next_to_watch        <25>
  jiffies              <1a98d80>
  next_to_watch.status <1>
eth0: Detected Tx Unit Hang:
  TDH                  <bf>
  TDT                  <bf>
  next_to_use          <bf>
  next_to_clean        <14>
buffer_info[next_to_clean]:
  time_stamp           <1a99c6d>
  next_to_watch        <14>
  jiffies              <1a99f2e>
  next_to_watch.status <1>
eth1: Detected Tx Unit Hang:
  TDH                  <e0>
  TDT                  <e0>
  next_to_use          <e0>
  next_to_clean        <35>
buffer_info[next_to_clean]:
  time_stamp           <1a99c54>
  next_to_watch        <35>
  jiffies              <1a99f93>
  next_to_watch.status <1>
BUG: soft lockup - CPU#0 stuck for 61s! [events/0:9]

Pid: 9, comm: events/0 Not tainted (2.6.25-rc6-git7 #1)
EIP: 0060:[<c02ec160>] EFLAGS: 00000246 CPU: 0
EIP is at _raw_spin_lock+0xd/0xed
EAX: f7d4f8e0 EBX: f7d4f8e0 ECX: 00000000 EDX: f7d80000
ESI: 0000af8e EDI: 00020000 EBP: 0000f8e0 ESP: f7c77c90
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7dee031 CR3: 36084000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c0412665>] ? rt_garbage_collect+0x131/0x27e
 [<c03ef171>] ? dst_alloc+0x16/0x5f
 [<c041429e>] ? ip_route_input+0x6aa/0xbc6
 [<c0415c16>] ? ip_rcv_finish+0x2a/0x293
 [<c03eb578>] ? netif_receive_skb+0x29f/0x30e
 [<c0357e63>] ? e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] ? e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] ? e1000_clean+0x29f/0x427
 [<c03ed3ee>] ? net_rx_action+0x5c/0x14a
 [<c021e25e>] ? __do_softirq+0x5d/0xc1
 [<c021e2f4>] ? do_softirq+0x32/0x36
 [<c021e585>] ? irq_exit+0x35/0x67
 [<c020d084>] ? smp_apic_timer_interrupt+0x6e/0x7a
 [<c02034f8>] ? apic_timer_interrupt+0x28/0x30
 [<c03ef022>] ? dst_gc_task+0x2f/0x168
 [<c02e75d0>] ? rb_insert_color+0x4c/0xad
 [<c022bd00>] ? enqueue_hrtimer+0xe0/0xeb
 [<c045b917>] ? _spin_lock_irqsave+0x9/0xd
 [<c0221895>] ? lock_timer_base+0x19/0x35
 [<c02219ac>] ? __mod_timer+0x9d/0xa7
 [<c022745f>] ? queue_delayed_work_on+0x8b/0x97
 [<c0244bbd>] ? vmstat_update+0x0/0x24
 [<c03eeff3>] ? dst_gc_task+0x0/0x168
 [<c0226dbd>] ? run_workqueue+0x6f/0xe7
 [<c02274d4>] ? worker_thread+0x0/0x83
 [<c022754d>] ? worker_thread+0x79/0x83
 [<c0229754>] ? autoremove_wake_function+0x0/0x2d
 [<c02296a1>] ? kthread+0x38/0x5f
 [<c0229669>] ? kthread+0x0/0x5f
 [<c020363b>] ? kernel_thread_helper+0x7/0x10
 =======================
BUG: soft lockup - CPU#1 stuck for 61s! [snmpd:5408]

Pid: 5408, comm: snmpd Not tainted (2.6.25-rc6-git7 #1)
EIP: 0060:[<c0412665>] EFLAGS: 00200202 CPU: 1
EIP is at rt_garbage_collect+0x131/0x27e
EAX: f767edf0 EBX: 00000000 ECX: 00000000 EDX: f7d86d01
ESI: 0000ef84 EDI: 00020000 EBP: 0000f840 ESP: f6d09a1c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: b7fd8000 CR3: 370b5000 CR4: 00000690
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 [<c03ef171>] ? dst_alloc+0x16/0x5f
 [<c041429e>] ? ip_route_input+0x6aa/0xbc6
 [<c0415c16>] ? ip_rcv_finish+0x2a/0x293
 [<c03eb578>] ? netif_receive_skb+0x29f/0x30e
 [<c0357e63>] ? e1000_receive_skb+0x132/0x14c
 [<c0359ecf>] ? e1000_clean_rx_irq+0x1fa/0x29c
 [<c0356f82>] ? e1000_clean+0x29f/0x427
 [<c03f2a55>] ? neigh_timer_handler+0x254/0x27f
 [<c03ed3ee>] ? net_rx_action+0x5c/0x14a
 [<c021e25e>] ? __do_softirq+0x5d/0xc1
 [<c021e2f4>] ? do_softirq+0x32/0x36
 [<c021e545>] ? local_bh_enable_ip+0x35/0x40
 [<c04126c7>] ? rt_garbage_collect+0x193/0x27e
 [<c03ef171>] ? dst_alloc+0x16/0x5f
 [<c04138b7>] ? __ip_route_output_key+0x49e/0x6db
 [<c02916aa>] ? inode2sd+0x123/0x138
 [<c0413aff>] ? ip_route_output_flow+0xb/0x3e
 [<c0430c22>] ? udp_sendmsg+0x296/0x538
 [<c0435af6>] ? inet_sendmsg+0x35/0x3f
 [<c03e24a4>] ? sock_sendmsg+0xb8/0xd1
 [<c0229754>] ? autoremove_wake_function+0x0/0x2d
 [<c02987da>] ? reiserfs_dirty_inode+0x70/0x78
 [<c02e97a1>] ? copy_from_user+0x2d/0x59
 [<c02e97a1>] ? copy_from_user+0x2d/0x59
 [<c03e264c>] ? sys_sendmsg+0x18f/0x1f2
 [<c023cfe5>] ? generic_file_aio_write+0x5e/0xac
 [<c0257a84>] ? do_sync_write+0xbf/0x100
 [<c0229754>] ? autoremove_wake_function+0x0/0x2d
 [<c03e3458>] ? sys_socketcall+0x168/0x181
 [<c0202a2a>] ? sysenter_past_esp+0x5f/0x85
 =======================
Comment 6 Pawel Staszewski 2008-03-26 08:34:38 UTC
This server is running BGP/quagga with ~5000 /29 net prefixes

ip a sh | grep inet | wc -l
5342
ip l sh |grep vlan | wc -l
121
average connected clients:
ip neigh show | wc -l
2355
 ethtool -S eth0
NIC statistics:
     rx_packets: 1819803357
     tx_packets: 1714843474
     rx_bytes: 1502903186279
     tx_bytes: 1382777787478
     rx_broadcast: 1730081
     tx_broadcast: 1028199
     rx_multicast: 4006
     tx_multicast: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 4006
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 72797
     rx_missed_errors: 5745439
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 2698
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 1502903186279
     rx_csum_offload_good: 1808325083
     rx_csum_offload_errors: 3263
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     rx_dma_failed: 0
     tx_dma_failed: 0

 ethtool -S eth1
NIC statistics:
     rx_packets: 1717858068
     tx_packets: 1815955165
     rx_bytes: 1387719125983
     tx_bytes: 1502643097276
     rx_broadcast: 4560
     tx_broadcast: 114
     rx_multicast: 362
     tx_multicast: 0
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 362
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 111097
     rx_missed_errors: 3987953
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 3078
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 1387719125983
     rx_csum_offload_good: 1711660724
     rx_csum_offload_errors: 436
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0
     rx_dma_failed: 0
     tx_dma_failed: 0

cat /proc/interrupts
           CPU0       CPU1
  0:        175         76   IO-APIC-edge      timer
  1:          1          1   IO-APIC-edge      i8042
  6:          1          1   IO-APIC-edge      floppy
  8:         11         15   IO-APIC-edge      rtc
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          2          2   IO-APIC-edge      i8042
 18:         19         15   IO-APIC-fasteoi
218:  771508779        182   PCI-MSI-edge      eth1
219:      12251  748704785   PCI-MSI-edge      eth0
220:      65164      62870   PCI-MSI-edge      ahci
NMI:          0          0   Non-maskable interrupts
LOC:    9761747    8884458   Local timer interrupts
RES:       3077       4847   Rescheduling interrupts
CAL:        713        375   function call interrupts
TLB:       2896       4796   TLB shootdowns
TRM:          0          0   Thermal event interrupts
SPU:          0          0   Spurious interrupts
ERR:          0
MIS:          0
Comment 7 Jarek Poplawski 2008-03-28 13:50:29 UTC
> 218:  771508779        182   PCI-MSI-edge      eth1
> 219:      12251  748704785   PCI-MSI-edge      eth0

From curiosity: did you try with affinity set for these irqs?

Regards,
Jarek P.
Comment 8 Pawel Staszewski 2008-03-28 14:20:45 UTC
hmm.
yes i use affinity CPU0 eth0 / CPU1 eth1
Because one CPU can't handle gigabit traffic... always 100% CPU (with traffic management use "iproute2/hfsc")
those irq's that are on cpu0:219:      12251  /cpu1:218     182 are from mistake because i try to change cpu's irq bind for test.