Most recent kernel where this bug did not occur: don't know Distribution: Gentoo Hardware Environment: Intel(R) Pentium(R) 4 CPU 3.00GHz cpu family:15 model: 4 stepping: 3 x86_64 2x intel e1000 1x acenic Tigon2 Software Environment: Quagga, Zebra BGP Problem Description: After 15 min of system running : cat /proc/meminfo MemTotal: 502684 kB MemFree: 259560 kB Buffers: 3048 kB Cached: 5684 kB SwapCached: 17788 kB Active: 139980 kB Inactive: 20452 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 502684 kB LowFree: 259560 kB SwapTotal: 996020 kB SwapFree: 974940 kB Dirty: 4 kB Writeback: 0 kB Mapped: 152736 kB Slab: 72716 kB CommitLimit: 1247360 kB Committed_AS: 163188 kB PageTables: 1004 kB VmallocTotal: 34359738367 kB VmallocUsed: 2356 kB VmallocChunk: 34359735819 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB Call Trace: <IRQ> <ffffffff8024abf7>{__alloc_pages+623} <ffffffff802145d4>{smp_local_timer_interrupt+35} <ffffffff80214639>{smp_apic_timer_interrupt+65} <ffffffff802612f2>{cache_alloc_refill+695} <ffffffff802616d7>{__kmalloc+114} <ffffffff8038d148>{__alloc_skb+85} <ffffffff8034123e>{e1000_clean_rx_irq+740} <ffffffff80340e94>{e1000_clean+239} <ffffffff80392be2>{net_rx_action+171} <ffffffff8022d553>{__do_softirq+103} <ffffffff8020aaa6>{call_softirq+30} <ffffffff8020c678>{do_softirq+61} <ffffffff8020c631>{do_IRQ+62} <ffffffff80207e94>{mwait_idle+0} <ffffffff80209e04>{ret_from_intr+0} <EOI> <ffffffff80207ede>{mwait_idle+74} <ffffffff80207e75>{cpu_idle+98} <ffffffff805da79b>{start_secondary+1127} Mem-info: DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 cpu 1 hot: high 0, batch 1 used:0 cpu 1 cold: high 0, batch 1 used:0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:72 cpu 0 cold: high 62, batch 15 used:13 cpu 1 hot: high 186, batch 31 used:30 cpu 1 cold: high 62, batch 15 used:50 Normal per-cpu: empty HighMem per-cpu: empty Free pages: 4920kB (0kB HighMem) Active:48439 inactive:4651 dirty:0 writeback:0 unstable:0 free:1230 slab:68845 mapped:39985 pagetables:255 DMA free:1992kB min:172kB low:212kB high:256kB active:0kB inactive:0kB present:10604kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 481 481 481 DMA32 free:2928kB min:8016kB low:10020kB high:12024kB active:193756kB inactive:18604kB present:492776kB pages_scanned:126 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1992kB DMA32: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2928kB Normal: empty HighMem: empty Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 996020kB Total swap = 996020kB Free swap: 996020kB 128997 pages of RAM 3326 reserved pages 13499 pages shared 0 pages swap cached swapper: page allocation failure. order:0, mode:0x20 Call Trace: <IRQ> <ffffffff8024abf7>{__alloc_pages+623} <ffffffff802612f2>{cache_alloc_refill+695} <ffffffff802616d7>{__kmalloc+114} <ffffffff8038d148>{__alloc_skb+85} <ffffffff8034123e>{e1000_clean_rx_irq+740} <ffffffff80340e94>{e1000_clean+239} <ffffffff80392be2>{net_rx_action+171} <ffffffff8022d553>{__do_softirq+103} <ffffffff8020aaa6>{call_softirq+30} <ffffffff8020c678>{do_softirq+61} <ffffffff8020c631>{do_IRQ+62} <ffffffff80207e94>{mwait_idle+0} <ffffffff80209e04>{ret_from_intr+0} <EOI> <ffffffff80207ede>{mwait_idle+74} <ffffffff80207e75>{cpu_idle+98} <ffffffff805da79b>{start_secondary+1127} Mem-info: DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 cpu 1 hot: high 0, batch 1 used:0 cpu 1 cold: high 0, batch 1 used:0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:72 cpu 0 cold: high 62, batch 15 used:13 cpu 1 hot: high 186, batch 31 used:30 cpu 1 cold: high 62, batch 15 used:50 Normal per-cpu: empty HighMem per-cpu: empty Free pages: 4920kB (0kB HighMem) Active:48439 inactive:4651 dirty:0 writeback:0 unstable:0 free:1230 slab:68845 mapped:39985 pagetables:255 DMA free:1992kB min:172kB low:212kB high:256kB active:0kB inactive:0kB present:10604kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 481 481 481 DMA32 free:2928kB min:8016kB low:10020kB high:12024kB active:193756kB inactive:18604kB present:492776kB pages_scanned:126 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1992kB DMA32: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2928kB Normal: empty HighMem: empty Swap cache: add 0, delete 0, find 0/0, race 0+0 Free swap = 996020kB Total swap = 996020kB Free swap: 996020kB 128997 pages of RAM 3326 reserved pages 13499 pages shared 0 pages swap cached printk: 21353 messages suppressed. swapper: page allocation failure. order:0, mode:0x20 Call Trace: <IRQ> <ffffffff8024abf7>{__alloc_pages+623} <ffffffff802612f2>{cache_alloc_refill+695} <ffffffff802616d7>{__kmalloc+114} <ffffffff8038d148>{__alloc_skb+85} <ffffffff80341c62>{e1000_alloc_rx_buffers+119} <ffffffff803414ef>{e1000_clean_rx_irq+1429} <ffffffff80340e94>{e1000_clean+239} <ffffffff80392be2>{net_rx_action+171} <ffffffff803166dd>{add_timer_randomness+284} <ffffffff8022d553>{__do_softirq+103} <ffffffff8020aaa6>{call_softirq+30} <ffffffff8020c678>{do_softirq+61} <ffffffff8020c631>{do_IRQ+62} <ffffffff80207e94>{mwait_idle+0} <ffffffff80209e04>{ret_from_intr+0} <EOI> <ffffffff80207ede>{mwait_idle+74} <ffffffff80207e75>{cpu_idle+98} <ffffffff805da79b>{start_secondary+1127} Mem-info: DMA per-cpu: cpu 0 hot: high 0, batch 1 used:0 cpu 0 cold: high 0, batch 1 used:0 cpu 1 hot: high 0, batch 1 used:0 cpu 1 cold: high 0, batch 1 used:0 DMA32 per-cpu: cpu 0 hot: high 186, batch 31 used:129 cpu 0 cold: high 62, batch 15 used:57 cpu 1 hot: high 186, batch 31 used:30 cpu 1 cold: high 62, batch 15 used:60 Normal per-cpu: empty HighMem per-cpu: empty Free pages: 4976kB (0kB HighMem) Active:40234 inactive:1675 dirty:8 writeback:0 unstable:0 free:1244 slab:79910 mapped:39985 pagetables:255 DMA free:1992kB min:172kB low:212kB high:256kB active:0kB inactive:0kB present:10604kB pages_scanned:0 all_unreclaimable? yes lowmem_reserve[]: 0 481 481 481 DMA32 free:2984kB min:8016kB low:10020kB high:12024kB active:160936kB inactive:6700kB present:492776kB pages_scanned:565 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1992kB DMA32: 0*4kB 1*8kB 2*16kB 0*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2984kB Normal: empty HighMem: empty Swap cache: add 38, delete 28, find 0/0, race 0+0 Free swap = 995868kB Total swap = 996020kB Free swap: 995868kB 128997 pages of RAM 3326 reserved pages 5576 pages shared 10 pages swap cached
This bug needs to be refreshed. Pawel can you please retest with newest kernel?
Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits BGP paol # w 00:10:39 up 1 day, 23:02, 1 user, load average: 0.01, 0.02, 0.00 USER TTY LOGIN@ IDLE JCPU PCPU WHAT paol pts/0 00:10 0.00s 0.00s 0.02s sshd: paol [priv] BGP paol # uname -a Linux BGP 2.6.25-rc6-git6 #1 SMP Sun Mar 23 01:06:42 CET 2008 x86_64 Intel(R) With the newest kernel i have no problem. only this: Fix inflate_threshold_root. Now=15 size=11 bits
Thanks for the update. Hmm, this comes from resize() in TCP code. CC-int to Stephen, maybe this bug needs to be re-assigned now.
Are you using Jumbo frames? I suspect the problem is that e1000 driver has to round up allocations, and when using Jumbo frames this can mean a 9K frame requires 16K contiguous allocation. On a busy system that will fail. Newer kernels generate less fragmented space, but it is really a hardware problem.
The message 'inflate_threshold_root' comes from the fib_trie routing code. How many routes do you have and what is the distribution? It would be useful to see the output of /proc/net/fib_triestat and /proc/net/fib_trie
Created attachment 15540 [details] cat /proc/net/fib_trie
this is bgp router with installed quagga as bgp daemon. cat /proc/net/fib_triestat Basic info: size of leaf: 40 bytes, size of tnode: 40 bytes. Local: Aver depth: 5.12 Max depth: 7 Leaves: 31 Internal nodes: 28 1: 26 2: 2 Pointers: 60 Null ptrs: 2 Total size: 3 kB Main: Aver depth: 2.26 Max depth: 6 Leaves: 235924 Internal nodes: 57854 1: 31632 2: 11422 3: 8475 4: 3755 5: 1676 6: 893 18: 1 Pointers: 609760 Null ptrs: 315983 Total size: 16240 kB
and MTU is 1500 no jumbo frames.
Is this still a problem ?
Kernel 2.6.29.3 Server #1 Sometimes i have the same info like before: e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None e1000: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits On three different machines. cat /proc/net/fib_triestat Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes. Main: Aver depth: 2.27 Max depth: 7 Leaves: 268917 Prefixes: 282352 Internal nodes: 65708 1: 36354 2: 12657 3: 9215 4: 4399 5: 2011 6: 1071 18: 1 Pointers: 662480 Null ptrs: 327856 Total size: 17871 kB Counters: --------- gets = 2195484652 backtracks = 636049593 semantic match passed = 2195493159 semantic match miss = 212348885 null node hit= 1353421394 skipped node resize = 0 Local: Aver depth: 3.58 Max depth: 4 Leaves: 12 Prefixes: 13 Internal nodes: 9 1: 7 2: 2 Pointers: 22 Null ptrs: 2 Total size: 1 kB Counters: --------- gets = 2195935298 backtracks = 1041035352 semantic match passed = 418139 semantic match miss = 0 null node hit= 290544 skipped node resize = 0 Server #2 Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits Fix inflate_threshold_root. Now=15 size=11 bits cat /proc/net/fib_triestat Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes. Local: Aver depth: 3.75 Max depth: 5 Leaves: 12 Prefixes: 13 Internal nodes: 10 1: 9 2: 1 Pointers: 22 Null ptrs: 1 Total size: 2 kB Counters: --------- gets = 4070990500 backtracks = 993990300 semantic match passed = 311572 semantic match miss = 0 null node hit= 1556177 skipped node resize = 0 Main: Aver depth: 2.26 Max depth: 7 Leaves: 268918 Prefixes: 282353 Internal nodes: 65142 1: 34900 2: 12879 3: 9559 4: 4591 5: 2106 6: 1106 18: 1 Pointers: 671564 Null ptrs: 337505 Total size: 17887 kB Counters: --------- gets = 4070765628 backtracks = 41807032 semantic match passed = 4068421081 semantic match miss = 114232 null node hit= 2025898189 skipped node resize = 0 On this servers is working bgpd (quagga 0.99.12)
These problems were tracked on the netdev@ list under the subject: "rib_trie / Fix inflate_threshold_root" (25 Jun 2009). Fixes in the mainline (2.6.31 tree) start with: "ipv4: Fix fib_trie rebalancing" http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=e0f7cb8c8cc6cccce28d2ce39ad8c60d23c3799f plus next parts.