Distribution: RHEL4 with 2.6.12.1 Hardware Environment: IBM Power5 (IBM,9111-520) Software Environment: Problem Description: System complains "page allocation failure. order:0, mode:0x20" when running test case. Steps to reproduce: 1. Build kernel 2.6.12.1 2. Run test cases (networktest + iozone + memtest) 3. After running for about 3 hours, system reports"swapper: page allocation failure. order:0, mode:0x20". Additional information: 1. System is available and the test cases are running. 2. Get following call trace in /var/log/messages: ========================================================= Jul 22 19:51:06 csdlsf2 kernel: swapper: page allocation failure. order:0, mode:0x20 Jul 22 19:51:06 csdlsf2 kernel: Call Trace: Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef480] [c0000000000952e8] .__alloc_pages+0x550/0x5a4 (unreliable) Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef560] [c0000000000b4650] .alloc_pages_current+0x74/0xe0 Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef5f0] [c000000000099398] .cache_grow+0x130/0x4d4 Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef6b0] [c000000000099a2c] .cache_alloc_refill+0x2f0/0x324 Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef760] [c000000000099264] .kmem_cache_alloc+0xbc/0xc0 Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef7f0] [c00000000027dea8] .dst_alloc+0x50/0x108 Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffef880] [c00000000029d4e0] .ip_route_input+0x560/0xe34 Jul 22 19:51:06 csdlsf2 kernel: [c00000000ffefa60] [c0000000002a1564] .ip_rcv+0x59c/0x6cc Jul 22 19:51:09 csdlsf2 kernel: [c00000000ffefb00] [c000000000278b74] .netif_receive_skb+0x3e8/0x4a0 Jul 22 19:51:19 csdlsf2 kernel: [c00000000ffefbb0] [d0000000002a374c] .e1000_clean_rx_irq+0xf4/0x678 [e1000] Jul 22 19:51:28 csdlsf2 kernel: [c00000000ffefcd0] [d0000000002a30a8] .e1000_clean+0x590/0xb40 [e1000] Jul 22 19:51:36 csdlsf2 kernel: [c00000000ffefe10] [c000000000278fa4] .net_rx_action+0xfc/0x2bc Jul 22 19:51:43 csdlsf2 kernel: [c00000000ffefed0] [c0000000000644c0] .__do_softirq+0xcc/0x1a0 Jul 22 19:51:47 csdlsf2 kernel: [c00000000ffeff90] [c0000000000143f0] .call_do_softirq+0x14/0x24 Jul 22 19:51:49 csdlsf2 kernel: [c00000000778b920] [c00000000000fd70] .do_softirq+0x90/0xa4 Jul 22 19:51:50 csdlsf2 kernel: [c00000000778b9b0] [c0000000000646a0] .irq_exit+0x6c/0x84 Jul 22 19:51:54 csdlsf2 kernel: [c00000000778ba30] [c00000000000f99c] .do_IRQ+0x88/0xe4 Jul 22 19:51:56 csdlsf2 kernel: [c00000000778bab0] [c00000000000ae34] hardware_interrupt_entry+0x8/0x54 Jul 22 19:51:57 csdlsf2 kernel: --- Exception: 501 at .dedicated_idle+0x114/0x270 Jul 22 19:51:58 csdlsf2 kernel: LR = .dedicated_idle+0x1cc/0x270 Jul 22 19:51:59 csdlsf2 kernel: [c00000000778be90] [c000000000010348] .cpu_idle+0x38/0x4c Jul 22 19:52:01 csdlsf2 kernel: [c00000000778bf00] [c00000000003b254] .start_secondary+0x108/0x144 Jul 22 19:52:02 csdlsf2 kernel: [c00000000778bf90] [c00000000000bd84] .enable_64b_mode+0x0/0x28 ========================================================= 3. memory usage is high. kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 07:20:08 PM 460556 1410788 75.39 1636 281868 1975948 72028 3.52 25064 07:30:10 PM 8404 1862940 99.55 1732 737132 1975944 72032 3.52 23784 07:40:09 PM 7776 1863568 99.58 1392 737408 1975944 72032 3.52 23848 07:50:02 PM 7768 1863576 99.58 2496 728824 1975944 72032 3.52 24104 08:00:06 PM 5720 1865624 99.69 1784 718660 1975944 72032 3.52 25176 08:10:13 PM 7984 1863360 99.57 832 736900 1975980 71996 3.52 24400 08:20:12 PM 8372 1862972 99.55 1416 721036 1975984 71992 3.52 23684 08:31:05 PM 7148 1864196 99.62 1408 735204 1975984 71992 3.52 25004 08:40:06 PM 8200 1863144 99.56 756 711608 1975984 71992 3.52 33256 08:50:02 PM 7944 1863400 99.58 2780 709236 1992332 55644 2.72 17092 09:00:02 PM 512524 1358820 72.61 37672 183940 2008812 39164 1.91 784 09:10:01 PM 7652 1863692 99.59 32356 688548 2008812 39164 1.91 2012
Apart from the warning, are they any other errors? Under extreme memory pressure we may see order 0 atomic allocations fail but we should be able to recover from that. What is /proc/sys/vm/min_free_kbytes set to? By increasing it you should have less of a chance at hitting this warning.
Even this is really a warning not a bug, Id like to reassign it to the memory management guys so they know we are hitting it.
Anton, fair enough, I'll take this one ... looks like a real-enough bug.
Alloc appears to ge GFP_ATOMIC - suspect we're just emptying the emergency pools. If it's sporadic, might help to make them bigger. If not, will probably eat them all over time anyway ... Under the "nopage:" section of mm/page_alloc.c:__alloc_pages(), please add a call to show_mem() just after dump_stack (inside the if statement), and something like this too, then reproduce: diff -purN -X /home/mbligh/.diff.exclude 2.6.11/arch/i386/mm/pgtable.c 2.6.11-show_mem_slab/arch/i386/mm/pgtable.c --- 2.6.11/arch/i386/mm/pgtable.c 2005-03-02 02:59:09.000000000 -0800 +++ 2.6.11-show_mem_slab/arch/i386/mm/pgtable.c 2005-05-31 15:58:36.000000000 -0700 @@ -30,6 +30,7 @@ void show_mem(void) struct page *page; pg_data_t *pgdat; unsigned long i; + struct page_state ps; printk("Mem-info:\n"); show_free_areas(); @@ -53,6 +54,13 @@ void show_mem(void) printk("%d reserved pages\n",reserved); printk("%d pages shared\n",shared); printk("%d pages swap cached\n",cached); + + get_page_state(&ps); + printk("%d pages dirty\n", ps.nr_dirty); + printk("%d pages writeback\n", ps.nr_writeback); + printk("%d pages mapped\n", ps.nr_mapped); + printk("%d pages slab\n", ps.nr_slab); + printk("%d pages pagetables\n", ps.nr_page_table_pages); }
In reply to comment #1: 1. There aren't any other errors except this problem, and the warning only appeared one time during this loop. but I have ever met the similar in previous testing. ========= warning info it met last time ============================= swapper: page allocation failure. order:0, mode:0x20 Call Trace: [c00000000fff7710] [c0000000000952e8] .__alloc_pages+0x550/0x5a4 (unreliable) [c00000000fff77f0] [c0000000000b4650] .alloc_pages_current+0x74/0xe0 [c00000000fff7880] [c000000000099398] .cache_grow+0x130/0x4d4 [c00000000fff7940] [c000000000099a2c] .cache_alloc_refill+0x2f0/0x324 [c00000000fff79f0] [c000000000099264] .kmem_cache_alloc+0xbc/0xc0 [c00000000fff7a80] [c00000000006c094] .__sigqueue_alloc+0x68/0xf4 [c00000000fff7b00] [c00000000006cf18] .send_signal+0x118/0x1cc [c00000000fff7ba0] [c00000000006e104] .__group_send_sig_info+0xa4/0x130 [c00000000fff7c40] [c00000000006e240] .group_send_sig_info+0xb0/0xf0 [c00000000fff7ce0] [c00000000006e2c8] .send_group_sig_info+0x48/0x7c [c00000000fff7d80] [c000000000062e5c] .it_real_fn+0x2c/0x9c [c00000000fff7e00] [c00000000006a8f8] .run_timer_softirq+0x13c/0x260 [c00000000fff7ed0] [c0000000000644c0] .__do_softirq+0xcc/0x1a0 [c00000000fff7f90] [c0000000000143f0] .call_do_softirq+0x14/0x24 [c00000000ffa7880] [c00000000000fd70] .do_softirq+0x90/0xa4 [c00000000ffa7910] [c0000000000646a0] .irq_exit+0x6c/0x84 [c00000000ffa7990] [c0000000000118e8] .timer_interrupt+0x390/0x470 [c00000000ffa7ab0] [c00000000000a2b4] decrementer_common+0xb4/0x100 --- Exception: 901 at .dedicated_idle+0x10c/0x270 LR = .dedicated_idle+0x1cc/0x270 [c00000000ffa7e90] [c000000000010348] .cpu_idle+0x38/0x4c [c00000000ffa7f00] [c00000000003b254] .start_secondary+0x108/0x144 [c00000000ffa7f90] [c00000000000bd84] .enable_64b_mode+0x0/0x28 ==================================================================== 2. The default value of min_free_kbytes is 5538, but in another machine the value is different with this. Does the default value have any relation to the total memory? [root@csdlsf2 home]# cat /proc/sys/vm/min_free_kbytes 5538 [root@csdlsf2 home]# cat /proc/meminfo | grep MemTotal MemTotal: 1871344 kB
We meet this issue on ppc64, so I update function show_mem() as the following and will try to reproduce this bug. Please correct me if I make any mistake. thanks. [root@hv41-lp2 mm]# diff -purN /home/init.c.old /usr/src/fwq-linux-2.6.12.1/arch/ppc64/mm/init.c --- /home/init.c.old 2005-07-26 16:31:31.000000000 +0800 +++ /usr/src/fwq-linux-2.6.12.1/arch/ppc64/mm/init.c 2005-07-26 16:36:18.000000000 +0800 @@ -95,6 +95,7 @@ void show_mem(void) struct page *page; pg_data_t *pgdat; unsigned long i; + struct page_state ps; printk("Mem-info:\n"); show_free_areas(); @@ -115,6 +116,14 @@ void show_mem(void) printk("%ld reserved pages\n", reserved); printk("%ld pages shared\n", shared); printk("%ld pages swap cached\n", cached); + + + get_page_state(&ps); + printk("%ld pages dirty\n", ps.nr_dirty); + printk("%ld pages writeback\n", ps.nr_writeback); + printk("%ld pages mapped\n", ps.nr_mapped); + printk("%ld pages slab\n", ps.nr_slab); + printk("%ld pages pagetables\n", ps.nr_page_table_pages); } #ifdef CONFIG_PPC_ISERIES call show_mem() when failing to allocate pages: [root@hv41-lp2 mm]# diff -purN /home/page_alloc.c.old /usr/src/fwq-linux-2.6.12.1/mm/page_alloc.c --- /home/page_alloc.c.old 2005-07-26 16:55:12.000000000 +0800 +++ /usr/src/fwq-linux-2.6.12.1/mm/page_alloc.c 2005-07-26 16:26:17.000000000 +0800 @@ -905,6 +905,7 @@ nopage: " order:%d, mode:0x%x\n", p->comm, order, gfp_mask); dump_stack(); + show_mem(); } return NULL; got_pg:
Yup, min_free_kbytes is calculated from the total memory (for that zone), but then capped at a certain amount (ppc64 only has one zone, for non-NUMA boxes). Your patch for ppc64 looks correct to me, from a brief inspection.
Bug is reproduced, here is the related dmesg: swapper: page allocation failure. order:0, mode:0x20 Call Trace: [c00000000ffef710] [c000000000095344] .__alloc_pages+0x550/0x5ac (unreliable) [c00000000ffef7f0] [c0000000000b46b4] .alloc_pages_current+0x74/0xe0 [c00000000ffef880] [c0000000000993fc] .cache_grow+0x130/0x4d4 [c00000000ffef940] [c000000000099a90] .cache_alloc_refill+0x2f0/0x324 [c00000000ffef9f0] [c0000000000992c8] .kmem_cache_alloc+0xbc/0xc0 [c00000000ffefa80] [c00000000006c0f0] .__sigqueue_alloc+0x68/0xf4 [c00000000ffefb00] [c00000000006cf74] .send_signal+0x118/0x1cc [c00000000ffefba0] [c00000000006e160] .__group_send_sig_info+0xa4/0x130 [c00000000ffefc40] [c00000000006e29c] .group_send_sig_info+0xb0/0xf0 [c00000000ffefce0] [c00000000006e324] .send_group_sig_info+0x48/0x7c [c00000000ffefd80] [c000000000062eb8] .it_real_fn+0x2c/0x9c [c00000000ffefe00] [c00000000006a954] .run_timer_softirq+0x13c/0x260 [c00000000ffefed0] [c00000000006451c] .__do_softirq+0xcc/0x1a0 [c00000000ffeff90] [c0000000000143f0] .call_do_softirq+0x14/0x24 [c00000000778b880] [c00000000000fd70] .do_softirq+0x90/0xa4 [c00000000778b910] [c0000000000646fc] .irq_exit+0x6c/0x84 [c00000000778b990] [c0000000000118e8] .timer_interrupt+0x390/0x470 [c00000000778bab0] [c00000000000a2b4] decrementer_common+0xb4/0x100 --- Exception: 901 at .dedicated_idle+0xf4/0x270 LR = .dedicated_idle+0x1cc/0x270 [c00000000778be90] [c000000000010348] .cpu_idle+0x38/0x4c [c00000000778bf00] [c00000000003b254] .start_secondary+0x108/0x144 [c00000000778bf90] [c00000000000bd84] .enable_64b_mode+0x0/0x28 Mem-info: Node 0 DMA per-cpu: cpu 0 hot: low 62, high 186, batch 31 cpu 0 cold: low 0, high 62, batch 31 cpu 1 hot: low 62, high 186, batch 31 cpu 1 cold: low 0, high 62, batch 31 cpu 2 hot: low 62, high 186, batch 31 cpu 2 cold: low 0, high 62, batch 31 cpu 3 hot: low 62, high 186, batch 31 cpu 3 cold: low 0, high 62, batch 31 Node 0 Normal per-cpu: empty Node 0 HighMem per-cpu: empty Free pages: 2012kB (0kB HighMem) Active:272784 inactive:181262 dirty:80745 writeback:8661 unstable:0 free:503 slab:8029 mapped:271938 pagetables:1654 Node 0 DMA free:2012kB min:5536kB low:6920kB high:8304kB active:1091136kB inactive:725048kB present:1916928kB pages_scanned:7812 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Node 0 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 2012kB Node 0 Normal: empty Node 0 HighMem: empty Swap cache: add 98226, delete 92372, find 42448/51689, race 0+888 Free swap = 1984296kB Total swap = 2047976kB Free swap: 1984296kB 479232 pages of RAM 11394 reserved pages 267122 pages shared 5861 pages swap cached 80745 pages dirty 8661 pages writeback 271938 pages mapped 8029 pages slab 1654 pages pagetables
Thanks, is most useful. There's a couple more bits of info about the watermarks missing that should be in the bootlog ... if you have access to that (from /var/log/messages or dmesg or something) for the last boot, could you attatch that too? Sorry ;-)
Created attachment 5467 [details] /var/log/messages Sorry, I'm not sure what's the info exactly, I'm attaching the whole message file. It's only 92k after compression. You can find the exception infomation from line 10845 to line 10906. 10845 Jul 30 11:01:49 csdlsf2 kernel: swapper: page allocation failure. order:0, mode:0x20 ..... 10906 Jul 30 11:02:30 csdlsf2 kernel: 1654 pages pagetables
Created attachment 5468 [details] /var/log/boot.log.1 The last boot information(Jul 29).
We met the same problem while testing kernel 2.6.12.3
What is the status of this problem? Is it still seen with new kernels? Thanks.
The same problem found in 2.6.16.53, 2.6.16.57. It's look, that bugs 6254, 6648 are duplicates to this one.
This is still pretty old, we need data on the current kernel (2.6.24+). There were multiple patches since then on memory anti-fragmentations, reclamation, etc. that would help with page allocations including atomic ones. There is practically no reports on such failures lately, so I would recommend testing with current kernel where the problem might have been fixed.
It's seems to be fixed in 2.6.24-rc4.