Bug 13481 - swapper: page allocation failure. order:0, mode:0x4020
Summary: swapper: page allocation failure. order:0, mode:0x4020
Status: CLOSED OBSOLETE
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-08 01:16 UTC by starlight
Modified: 2012-06-08 11:43 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.29.4 with two hugepage patches
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg output (979.69 KB, text/plain)
2009-06-08 01:16 UTC, starlight
Details
config (33.10 KB, text/plain)
2009-06-08 01:17 UTC, starlight
Details
hugetlb patch 1 (705 bytes, patch)
2009-06-08 01:17 UTC, starlight
Details | Diff
hugetlb patch 2 (4.28 KB, patch)
2009-06-08 01:18 UTC, starlight
Details | Diff
dmesg output (995.10 KB, text/plain)
2009-06-08 15:41 UTC, starlight
Details
config (33.13 KB, text/plain)
2009-06-08 15:42 UTC, starlight
Details

Description starlight 2009-06-08 01:16:43 UTC
Created attachment 21797 [details]
dmesg output

Built a minimalist kernel leaving anonymous
page swapping disabled.  Under a network stress test
the above message and much more appeared in the
kernel message log.

Turning anonymous page swapping back on fixed it.

It's not important to me to have this work correctly,
so if it's not important to anyone else it could
be ignored and the issue closed.  Just reporting
it since I tripped over it.
Comment 1 starlight 2009-06-08 01:17:11 UTC
Created attachment 21798 [details]
config
Comment 2 starlight 2009-06-08 01:17:57 UTC
Created attachment 21799 [details]
hugetlb patch 1
Comment 3 starlight 2009-06-08 01:18:17 UTC
Created attachment 21800 [details]
hugetlb patch 2
Comment 4 starlight 2009-06-08 01:22:58 UTC
Forgot to add that external Intel network drivers
were in use when this happened.  'igb' was in use
at the time.

ixgbe-1.3.56.17
igb-1.3.19.3


modprobe igb IntMode=1,1,1,1
modprobe ixgbe InterruptType=2,2 MQ=1,1 RSS=2,2 InterruptThrottleRate=1,1
Comment 5 starlight 2009-06-08 15:40:53 UTC
Spoke too soon when saying that turning swap support on fixed 
it.  Bug came back even with this and several other features 
re-enabled.  Left NUMA off however as the target server is a 
Xeon.  Problem results from high-stress network load from a 
multicast application and a huge 'rcp' running simultaneously
at full 1G link speed.
Comment 6 starlight 2009-06-08 15:41:46 UTC
Created attachment 21809 [details]
dmesg output
Comment 7 starlight 2009-06-08 15:42:28 UTC
Created attachment 21810 [details]
config
Comment 8 starlight 2009-06-08 15:47:27 UTC
Looking at the trace, I suppose this could be a bug in the 'igb' 
driver.  Giving up on 'igb' and 'ixgbe' as both perform poorly 
compared with 'e1000e' and so we're staying with that.
Comment 9 Nicolas Bigaouette 2010-01-30 04:39:11 UTC
I'm having a similar issue on the headnode of a ~80 nodes diskless cluster with gentoo-sources 2.6.27-gentoo-r8. Nodes are connected with infiniband, but this connection is not used yet. Another gigabit connection is used for management, mouting /home as nfs and to transfer an kernel/initrd at compute nodes boot (since compute nodes are diskless)

What kind of error is this? Is the connection to be considered dead or should it come back without need for reboot?

dmesg part:
[511466.766844] swapper: page allocation failure. order:0, mode:0x4020
[511466.766855] Pid: 0, comm: swapper Not tainted 2.6.27-gentoo-r8 #1
[511466.766857]
[511466.766858] Call Trace:
[511466.766860]  <IRQ>  [<ffffffff81079efe>] __alloc_pages_internal+0x412/0x434
[511466.766870]  [<ffffffff8109b148>] new_slab+0x55/0x1b0
[511466.766873]  [<ffffffff8109b55c>] __slab_alloc+0x252/0x3d2
[511466.766878]  [<ffffffff813c7e7c>] __netdev_alloc_skb+0x29/0x45
[511466.766881]  [<ffffffff813c7e7c>] __netdev_alloc_skb+0x29/0x45
[511466.766885]  [<ffffffff8109c474>] __kmalloc_node_track_caller+0x75/0xaa
[511466.766889]  [<ffffffff813c74ca>] __alloc_skb+0x6b/0x12f
[511466.766892]  [<ffffffff813c7e7c>] __netdev_alloc_skb+0x29/0x45
[511466.766897]  [<ffffffff812c44a8>] igb_alloc_rx_buffers_adv+0xdc/0x1b7
[511466.766901]  [<ffffffff812c4916>] igb_clean_rx_irq_adv+0x393/0x3d5
[511466.766904]  [<ffffffff812c4acf>] igb_clean_rx_ring_msix+0x51/0x14a
[511466.766908]  [<ffffffff81048bd3>] hrtimer_reprogram+0x74/0x8f
[511466.766913]  [<ffffffff813ca9eb>] net_rx_action+0xb7/0x1a7
[511466.766917]  [<ffffffff8103994a>] __do_softirq+0x63/0xcc
[511466.766921]  [<ffffffff8100d32c>] call_softirq+0x1c/0x28
[511466.766925]  [<ffffffff8100e3c7>] do_softirq+0x2c/0x68
[511466.766927]  [<ffffffff81039709>] irq_exit+0x3f/0x91
[511466.766930]  [<ffffffff8100e5fc>] do_IRQ+0xb5/0xd2
[511466.766934]  [<ffffffff8100c5f1>] ret_from_intr+0x0/0xa
[511466.766936]  <EOI>  [<ffffffff812796c1>] acpi_idle_enter_bm+0x251/0x294
[511466.766944]  [<ffffffff812796b7>] acpi_idle_enter_bm+0x247/0x294
[511466.766949]  [<ffffffff8137441d>] cpuidle_idle_call+0x8d/0xca
[511466.766952]  [<ffffffff8100b193>] cpu_idle+0x88/0xdc
[511466.766955] Mem-Info:
[511466.766957] Node 0 DMA per-cpu:
[511466.766960] CPU    0: hi:    0, btch:   1 usd:   0
[511466.766962] CPU    1: hi:    0, btch:   1 usd:   0
[511466.766964] CPU    2: hi:    0, btch:   1 usd:   0
[511466.766966] CPU    3: hi:    0, btch:   1 usd:   0
[511466.766968] CPU    4: hi:    0, btch:   1 usd:   0
[511466.766970] CPU    5: hi:    0, btch:   1 usd:   0
[511466.766972] CPU    6: hi:    0, btch:   1 usd:   0
[511466.766974] CPU    7: hi:    0, btch:   1 usd:   0
[511466.766976] CPU    8: hi:    0, btch:   1 usd:   0
[511466.766978] CPU    9: hi:    0, btch:   1 usd:   0
[511466.766980] CPU   10: hi:    0, btch:   1 usd:   0
[511466.766982] CPU   11: hi:    0, btch:   1 usd:   0
[511466.766984] CPU   12: hi:    0, btch:   1 usd:   0
[511466.766986] CPU   13: hi:    0, btch:   1 usd:   0
[511466.766988] CPU   14: hi:    0, btch:   1 usd:   0
[511466.766990] CPU   15: hi:    0, btch:   1 usd:   0
[511466.766992] Node 0 DMA32 per-cpu:
[511466.766995] CPU    0: hi:  186, btch:  31 usd: 134
[511466.766997] CPU    1: hi:  186, btch:  31 usd: 126
[511466.766999] CPU    2: hi:  186, btch:  31 usd: 135
[511466.767001] CPU    3: hi:  186, btch:  31 usd: 130
[511466.767003] CPU    4: hi:  186, btch:  31 usd: 137
[511466.767005] CPU    5: hi:  186, btch:  31 usd: 137
[511466.767007] CPU    6: hi:  186, btch:  31 usd: 164
[511466.767009] CPU    7: hi:  186, btch:  31 usd: 139
[511466.767011] CPU    8: hi:  186, btch:  31 usd: 150
[511466.767013] CPU    9: hi:  186, btch:  31 usd: 136
[511466.767015] CPU   10: hi:  186, btch:  31 usd:  91
[511466.767017] CPU   11: hi:  186, btch:  31 usd: 140
[511466.767019] CPU   12: hi:  186, btch:  31 usd:  63
[511466.767021] CPU   13: hi:  186, btch:  31 usd:  57
[511466.767023] CPU   14: hi:  186, btch:  31 usd: 119
[511466.767025] CPU   15: hi:  186, btch:  31 usd:  88
[511466.767027] Node 0 Normal per-cpu:
[511466.767030] CPU    0: hi:  186, btch:  31 usd: 101
[511466.767032] CPU    1: hi:  186, btch:  31 usd: 135
[511466.767034] CPU    2: hi:  186, btch:  31 usd: 183
[511466.767036] CPU    3: hi:  186, btch:  31 usd: 177
[511466.767038] CPU    4: hi:  186, btch:  31 usd: 167
[511466.767040] CPU    5: hi:  186, btch:  31 usd: 148
[511466.767042] CPU    6: hi:  186, btch:  31 usd: 180
[511466.767044] CPU    7: hi:  186, btch:  31 usd: 130
[511466.767046] CPU    8: hi:  186, btch:  31 usd: 123
[511466.767048] CPU    9: hi:  186, btch:  31 usd: 165
[511466.767050] CPU   10: hi:  186, btch:  31 usd: 179
[511466.767052] CPU   11: hi:  186, btch:  31 usd: 179
[511466.767054] CPU   12: hi:  186, btch:  31 usd:  74
[511466.767056] CPU   13: hi:  186, btch:  31 usd: 167
[511466.767058] CPU   14: hi:  186, btch:  31 usd: 178
[511466.767060] CPU   15: hi:  186, btch:  31 usd: 169
[511466.767062] Node 1 Normal per-cpu:
[511466.767064] CPU    0: hi:  186, btch:  31 usd: 125
[511466.767067] CPU    1: hi:  186, btch:  31 usd: 181
[511466.767069] CPU    2: hi:  186, btch:  31 usd: 154
[511466.767071] CPU    3: hi:  186, btch:  31 usd: 157
[511466.767073] CPU    4: hi:  186, btch:  31 usd: 158
[511466.767075] CPU    5: hi:  186, btch:  31 usd: 153
[511466.767077] CPU    6: hi:  186, btch:  31 usd: 139
[511466.767079] CPU    7: hi:  186, btch:  31 usd: 153
[511466.767081] CPU    8: hi:  186, btch:  31 usd:  89
[511466.767083] CPU    9: hi:  186, btch:  31 usd: 168
[511466.767085] CPU   10: hi:  186, btch:  31 usd: 157
[511466.767087] CPU   11: hi:  186, btch:  31 usd: 156
[511466.767089] CPU   12: hi:  186, btch:  31 usd:  50
[511466.767091] CPU   13: hi:  186, btch:  31 usd: 158
[511466.767093] CPU   14: hi:  186, btch:  31 usd: 158
[511466.767095] CPU   15: hi:  186, btch:  31 usd: 121
[511466.767099] Active:4223822 inactive:1831524 dirty:2490 writeback:1276 unstable:0
[511466.767100]  free:12767 slab:35363 mapped:9918 pagetables:15796 bounce:0
[511466.767102] Node 0 DMA free:7772kB min:4kB low:4kB high:4kB active:0kB inactive:0kB present:6708kB pages_scanned:0 all_unreclaimable? yes
[511466.767107] lowmem_reserve[]: 0 2991 12081 12081
[511466.767111] Node 0 DMA32 free:37172kB min:2460kB low:3072kB high:3688kB active:1793752kB inactive:1005228kB present:3063392kB pages_scanned:0 all_unreclaimable? no
[511466.767117] lowmem_reserve[]: 0 0 9090 9090
[511466.767121] Node 0 Normal free:2648kB min:7476kB low:9344kB high:11212kB active:6071344kB inactive:3083336kB present:9308160kB pages_scanned:128 all_unreclaimable? no
[511466.767126] lowmem_reserve[]: 0 0 0 0
[511466.767130] Node 1 Normal free:3476kB min:9968kB low:12460kB high:14952kB active:9030192kB inactive:3237532kB present:12410880kB pages_scanned:0 all_unreclaimable? no
[511466.767135] lowmem_reserve[]: 0 0 0 0
[511466.767139] Node 0 DMA: 3*4kB 2*8kB 2*16kB 3*32kB 1*64kB 1*128kB 3*256kB 1*512kB 2*1024kB 0*2048kB 1*4096kB = 7772kB
[511466.767150] Node 0 DMA32: 8333*4kB 212*8kB 33*16kB 1*32kB 7*64kB 3*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 36676kB
[511466.767161] Node 0 Normal: 513*4kB 5*8kB 30*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 3372kB
[511466.767172] Node 1 Normal: 468*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 3752kB
[511466.767183] 57479 total pagecache pages
[511466.767184] 31209 pages in swap cache
[511466.767186] Swap cache stats: add 215086, delete 183877, find 511/1459
[511466.767189] Free swap  = 24343440kB
[511466.767190] Total swap = 25173812kB
[511466.767703] 6291440 pages RAM
[511466.767703] 107182 pages reserved
[511466.767703] 117792 pages shared
[511466.767703] 6100859 pages non-shared





$ lsmod
Module                  Size  Used by
nfsd                  244648  21
exportfs                8256  1 nfsd
rdma_ucm               13952  0
rdma_cm                26932  1 rdma_ucm
iw_cm                  11144  1 rdma_cm
ib_addr                 8648  1 rdma_cm
ib_ipoib               58364  0
ib_cm                  31256  2 rdma_cm,ib_ipoib
ib_sa                  33976  3 rdma_cm,ib_ipoib,ib_cm
ipv6                  246136  79 ib_ipoib
ib_uverbs              37104  1 rdma_ucm
ib_umad                15320  4
mlx4_ib                53408  0
ib_mad                 33256  4 ib_cm,ib_sa,ib_umad,mlx4_ib
ib_core                44548  10 rdma_ucm,rdma_cm,iw_cm,ib_ipoib,ib_cm,ib_sa,ib_uverbs,ib_umad,mlx4_ib,ib_mad
mlx4_core              76772  1 mlx4_ib

Note You need to log in before you can comment on or make changes to this bug.