Bug 73831

Summary: xfs failed to allocate memory due to memory fragmentation
Product: File System Reporter: yangzhe1990
Component: XFSAssignee: XFS Guru (xfs-masters)
Status: RESOLVED OBSOLETE    
Severity: normal CC: sandeen, sandeen, skmuttlu, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13.5 Subsystem:
Regression: No Bisected commit-id:

Description yangzhe1990 2014-04-11 10:39:20 UTC
Hi,

We always experience web server hanging problem. xfs keeps printing 

XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)

We finally find out that it's the memory fragmentation problem. We are experiencing this problem even if we have 50% free memory (+ buffer/cache)
. echo 1 > /proc/sys/vm/drop_caches solve the problem instantly.

I was wondering if you could enhance the message, for example, print the amount of pages being requested, and print messages like  

Node 0 Normal: 12544*4kB 68*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 1*4096kB = 54816kB

to help debugging, and try to allocate smaller number of pages.

Thanks.
Comment 1 Eric Sandeen 2014-04-11 15:39:18 UTC
I agree that it'd be good to print the size being requested.

These large allocations are often the result of extent maps for very badly fragmented files; you might consider looking to see if that is the problem in your case.

"allocating a smaller number of pages" isn't necessarily possible, though.

-Eric
Comment 2 yangzhe1990 2014-04-21 04:40:18 UTC
Hi Eric,

I was able to find that on my system it's in most case an allocation of 128KB contiguous memory space.

The only things I could do without change in any code are changing the vm.swappiness and vm.vfs_cache_pressure and hope I could have more free slabs of larger space. But I still see those memory allocation related warnings from XFS.

Would you like to consider use vmalloc() as a fallback after a few failed tries to kmalloc()?
Comment 3 yangzhe1990 2014-04-21 13:44:39 UTC
I just saw a related discussion and I realized that I was using 64K dir block and it seems that commit b3f03bac8132207a20286d5602eda64500c19724 fixed the problem. I'll try if it solves my problem.
Comment 4 Eric Sandeen 2016-09-22 16:24:39 UTC
I'm batch-closing all xfs bugs which are more than 1 year old, sorry about that.

If you still have this issue on a current kernel, please retest and re-open with this information.

Thanks,
-Eric
Comment 5 Sunil 2016-10-11 20:09:44 UTC
how to reopen this bug? I could repro this issue very successfully ( application: fio ) on SLES 12.
Comment 6 Sunil 2016-10-11 20:25:35 UTC
Hi,
 
Let me start with little bit background, however just a brief. Intermittently, however more frequently some of the random write test cases with block size split were failing due to fio hung issue. I have been observing this since last few days, but couldn’t repro it frequently.  FIO hung issue was bit annoying to me where test would fail due to fio hung state.
why fio was hung (in deadlock state)?  FIO would look for contiguous memory allocation, however I don’t have logs anywhere to say that what size of pages have been requested.  This could be the potential reason being extreme memory fragmentation and/or XFS memory pressure issue led FIO into deadlock state. Some specific test cases (FIO workloads)  induces extremely fragmented filesystem by involving tailored FIO workloads, and read/write on the same would induce high memory pressure as well. One of these reason might have cause memory extreme fragmentation and failed SLAB memory allocation.  

My tests would run the test with defined runtime (runtime=1200) and kill the if jobs exceeding 1200s runtime.  I have been observing frequently  since couple of months on xfs filesystem, where FIO was not honoring  defined runtime b’css of hung/deadlock state, and/or sometime  killing (pkill -9) dishonored FIO procs in this state  resulting defunct  process (/corrupted state). 
 
Here is the substantial proof for the same, where memory allocation was hung...
<snip>
[Mon Oct 10 21:08:53 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                          
[Mon Oct 10 21:08:54 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                          
[Mon Oct 10 21:08:55 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                         
[Mon Oct 10 21:08:55 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                         
[Mon Oct 10 21:08:55 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                          
[Mon Oct 10 21:08:56 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                          
[Mon Oct 10 21:08:56 2016] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)                                                                                                         
 
ATF sends ‘pkill -9’ to dishonored FIO procs, in this case deadlock procs which led proc into defunct state…
<snip>
r3a4-28:~ # ps -aef | grep 9163                                                                                                                                                                        
root      9163     1  0 15:57 ?        00:00:05 /usr/local/staf/bin/STAFProc                                                                                                                           
root     18111  9163 47 20:48 ?        00:21:17 [fio] <defunct>                                                                                                                                         
root     18125  9163 47 20:48 ?        00:21:12 [fio] <defunct>                                                                                                                                         
root     18171  9163 47 20:48 ?        00:21:15 [fio] <defunct>                                                                                                                                        
root     18186  9163 47 20:48 ?        00:21:11 [fio] <defunct>                                                                                                                                        
                                    
FIO process traces.
<snip>
2016-10-10T20:14:39.134180-07:00 r3a4-28 kernel: [15561.432108] fio: page allocation failure: order:1, mode:0x200020                                                                                    
2016-10-10T20:14:39.134202-07:00 r3a4-28 kernel: [15561.432114] CPU: 2 PID: 16735 Comm: fio Tainted: G              X 3.12.28-4-default #1                                                             
2016-10-10T20:14:39.134203-07:00 r3a4-28 kernel: [15561.432115] Hardware name: Supermicro X9SRW-F/X9SRW-F, BIOS 3.2 01/15/2015                                                                         
2016-10-10T20:14:39.134203-07:00 r3a4-28 kernel: [15561.432117]  ffff88047fc438c0 ffffffff8150b1db 0000000000200020 ffffffff8113af90                                                                   
2016-10-10T20:14:39.134204-07:00 r3a4-28 kernel: [15561.432121]  ffffffff8107793e 0000000000000000 ffff88047ff99b00 000000027ff98000                                                                    
2016-10-10T20:14:39.134205-07:00 r3a4-28 kernel: [15561.432123]  ffff88047ffbdf68 0000000000000046 0000000000000000 0000000000000000                                                                   
2016-10-10T20:14:39.164051-07:00 r3a4-28 kernel: [15561.432126] Call Trace:                                                                                                                            
2016-10-10T20:14:39.164066-07:00 r3a4-28 kernel: [15561.432139]  [<ffffffff8100467d>] dump_trace+0x7d/0x2d0                                                                                            
2016-10-10T20:14:39.164068-07:00 r3a4-28 kernel: [15561.432143]  [<ffffffff81004964>] show_stack_log_lvl+0x94/0x170                                                                                    
2016-10-10T20:14:39.164069-07:00 r3a4-28 kernel: [15561.432147]  [<ffffffff81005d91>] show_stack+0x21/0x50                                                                                              
2016-10-10T20:14:39.164070-07:00 r3a4-28 kernel: [15561.432152]  [<ffffffff8150b1db>] dump_stack+0x41/0x51                                                                                             
2016-10-10T20:14:39.164071-07:00 r3a4-28 kernel: [15561.432158]  [<ffffffff8113af90>] warn_alloc_failed+0xf0/0x160                                                                                     
2016-10-10T20:14:39.164072-07:00 r3a4-28 kernel: [15561.432163]  [<ffffffff8150763a>] __alloc_pages_slowpath+0x72f/0x796                                                                               
2016-10-10T20:14:39.164197-07:00 r3a4-28 kernel: [15561.432166]  [<ffffffff8113ee7a>] __alloc_pages_nodemask+0x1ea/0x210                                                                                
2016-10-10T20:14:39.164202-07:00 r3a4-28 kernel: [15561.432172]  [<ffffffff81183f48>] kmem_getpages+0x58/0x1a0                                                                                          
2016-10-10T20:14:39.164203-07:00 r3a4-28 kernel: [15561.432176]  [<ffffffff811859df>] fallback_alloc+0x20f/0x250                                                                                       
2016-10-10T20:14:39.164204-07:00 r3a4-28 kernel: [15561.432180]  [<ffffffff8118641c>] kmem_cache_alloc+0x11c/0x440                                                                                     
2016-10-10T20:14:39.164205-07:00 r3a4-28 kernel: [15561.432185]  [<ffffffff814120f4>] sk_prot_alloc+0x34/0x180                                                                                         
2016-10-10T20:14:39.164206-07:00 r3a4-28 kernel: [15561.432189]  [<ffffffff814126db>] sk_clone_lock+0x1b/0x2e0                                                                                          
2016-10-10T20:14:39.164207-07:00 r3a4-28 kernel: [15561.432194]  [<ffffffff81465561>] inet_csk_clone_lock+0x11/0x140                                                                                   
2016-10-10T20:14:39.164344-07:00 r3a4-28 kernel: [15561.432199]  [<ffffffff8147d45e>] tcp_create_openreq_child+0x1e/0x430                                                                              
2016-10-10T20:14:39.164348-07:00 r3a4-28 kernel: [15561.432202]  [<ffffffff81479708>] tcp_v4_syn_recv_sock+0x38/0x250                                                                                  
2016-10-10T20:14:39.164350-07:00 r3a4-28 kernel: [15561.432206]  [<ffffffff8147e0c9>] tcp_check_req+0x3b9/0x530                                                                                        
2016-10-10T20:14:39.164351-07:00 r3a4-28 kernel: [15561.432212]  [<ffffffff8147b21c>] tcp_v4_do_rcv+0x1dc/0x300                                                                                        
2016-10-10T20:14:39.164352-07:00 r3a4-28 kernel: [15561.432217]  [<ffffffff8147d3ef>] tcp_v4_rcv+0x75f/0x780                                                                                            
2016-10-10T20:14:39.164353-07:00 r3a4-28 kernel: [15561.432221]  [<ffffffff8145a05e>] ip_local_deliver_finish+0x9e/0x200                                                                               
2016-10-10T20:14:39.164356-07:00 r3a4-28 kernel: [15561.432226]  [<ffffffff81425d07>] __netif_receive_skb_core+0x627/0x800                                                                             
2016-10-10T20:14:39.164357-07:00 r3a4-28 kernel: [15561.432230]  [<ffffffff81425f5f>] netif_receive_skb+0x1f/0x90                                                                                      
2016-10-10T20:14:39.164358-07:00 r3a4-28 kernel: [15561.432234]  [<ffffffff814268e8>] napi_gro_receive+0x68/0x90                                                                                        
2016-10-10T20:14:39.164359-07:00 r3a4-28 kernel: [15561.432244]  [<ffffffffa049e4be>] igb_clean_rx_irq+0x3be/0x8a0 [igb]                                                                                
2016-10-10T20:14:39.164361-07:00 r3a4-28 kernel: [15561.432261]  [<ffffffffa049ec82>] igb_poll+0x2e2/0x760 [igb]                                                                                       
2016-10-10T20:14:39.164361-07:00 r3a4-28 kernel: [15561.432270]  [<ffffffff814262a0>] net_rx_action+0x140/0x240                                                                                        
2016-10-10T20:14:39.164362-07:00 r3a4-28 kernel: [15561.432275]  [<ffffffff8105a215>] __do_softirq+0xe5/0x230                                                                                          
2016-10-10T20:14:39.164495-07:00 r3a4-28 kernel: [15561.432280]  [<ffffffff8151ac9c>] call_softirq+0x1c/0x30                                                                                            
2016-10-10T20:14:39.164500-07:00 r3a4-28 kernel: [15561.432283]  [<ffffffff810045c5>] do_softirq+0x55/0x90                                                                                             
2016-10-10T20:14:39.164501-07:00 r3a4-28 kernel: [15561.432286]  [<ffffffff8105a4b5>] irq_exit+0x95/0xa0                                                                                               
2016-10-10T20:14:39.164502-07:00 r3a4-28 kernel: [15561.432290]  [<ffffffff8151b47e>] do_IRQ+0x4e/0xb0                                                                                                 
2016-10-10T20:14:39.164503-07:00 r3a4-28 kernel: [15561.432295]  [<ffffffff8151162d>] common_interrupt+0x6d/0x6d                                                                                       
2016-10-10T20:14:39.164504-07:00 r3a4-28 kernel: [15561.432308]  [<0000000000431b38>] 0x431b37                                                                                                          
2016-10-10T20:14:39.164505-07:00 r3a4-28 kernel: [15561.432309] SLAB: Unable to allocate memory on node 0 (gfp=0x20)                                                                                   
2016-10-10T20:14:39.164509-07:00 r3a4-28 kernel: [15561.432310]   cache: TCP, object size: 1728, order: 1                                                                                              
2016-10-10T20:14:39.164510-07:00 r3a4-28 kernel: [15561.432313]   node 0: slabs: 5/5, objs: 20/20, free: 0                                                                                             
2016-10-10T20:14:39.164511-07:00 r3a4-28 kernel: [15561.432624] fio: page allocation failure: order:1, mode:0x200020                                                                                    
2016-10-10T20:14:39.164512-07:00 r3a4-28 kernel: [15561.432626] CPU: 2 PID: 16735 Comm: fio Tainted: G              X 3.12.28-4-default #1                                                              
2016-10-10T20:14:39.164513-07:00 r3a4-28 kernel: [15561.432627] Hardware name: Supermicro X9SRW-F/X9SRW-F, BIOS 3.2 01/15/2015                                                                         
2016-10-10T20:14:39.164514-07:00 r3a4-28 kernel: [15561.432628]  ffff88047fc438c0 ffffffff8150b1db 0000000000200020 ffffffff8113af90  
2016-10-10T20:14:39.164505-07:00 r3a4-28 kernel: [15561.432309] SLAB: Unable to allocate memory on node 0 (gfp=0x20)                                                                                    
2016-10-10T20:14:39.164509-07:00 r3a4-28 kernel: [15561.432310]   cache: TCP, object size: 1728, order: 1                                                                                              
2016-10-10T20:14:39.164510-07:00 r3a4-28 kernel: [15561.432313]   node 0: slabs: 5/5, objs: 20/20, free: 0                                                                                             
2016-10-10T20:14:39.164511-07:00 r3a4-28 kernel: [15561.432624] fio: page allocation failure: order:1, mode:0x200020                                                                                   
2016-10-10T20:14:39.164512-07:00 r3a4-28 kernel: [15561.432626] CPU: 2 PID: 16735 Comm: fio Tainted: G              X 3.12.28-4-default #1                                                             
2016-10-10T20:14:39.164513-07:00 r3a4-28 kernel: [15561.432627] Hardware name: Supermicro X9SRW-F/X9SRW-F, BIOS 3.2 01/15/2015                                                                         
2016-10-10T20:14:39.164514-07:00 r3a4-28 kernel: [15561.432628]  ffff88047fc438c0 ffffffff8150b1db 0000000000200020 ffffffff8113af90                                                                   
2016-10-10T20:14:39.164669-07:00 r3a4-28 kernel: [15561.432631]  0000000000000000 0000000000000000 ffff88047ff99b00 0000000200000000                                                                    
2016-10-10T20:14:39.164673-07:00 r3a4-28 kernel: [15561.432633]  0000000000000002 0000000000000000 00ff880400000000 0000000000000000     
 
I was thinking that following procedure  would resolve the issue, however it did not help to prevent the issue:
    I observed this with more than 60% free memory as well(+ buffer/cache)
  1.  Clearing aches: Drop the caches "echo 3 > /proc/sys/vm/drop_caches"  or “echo 1 > /proc/sys/vm/drop_caches”, executing this for every 2 minutes 
    2. ”echo 1 > /proc/sys/vm/compact_memory", (with CONFIG_COMPACTION=y)
    3. Reduce huge page size to 128K

Aforementioned procedure did not help to mitigate/prevent the issue 
 Here is <snip>
Before reboot (even after compact)
r3a4-28:~ #  cat /proc/buddyinfo                                                                                                                                                                                             │··
Node 0, zone      DMA      1      0      0      1      0      1      1      0      1      1      3                                                                                                     │··
Node 0, zone    DMA32  26713  23984  16125   7850   2725    916    526    244     93     78      4                                                                                                     │··
Node 0, zone   Normal 326417 256905 167300  76595  21201   4638   1814    778    207    371      3         
 
after reboot:
r3a4-28:~ # cat /proc/buddyinfo                                                                                                                                                                        │··
Node 0, zone      DMA      1      0      0      1      0      1      1      0      1      1      3                                                                                                     │··
Node 0, zone    DMA32    197    117     39      9      1      2      2      4      2      2    426                                                                                                     │··
Node 0, zone   Normal     30    487    133     51      8      3     12     11      6      4   3347
Comment 7 Eric Sandeen 2016-10-11 20:50:45 UTC
If you have a SLES12 problem, please report that to SuSE.

This is the upstream bugtracker, but you're running a distro kernel based on 3.12...
Comment 8 Sunil 2016-10-11 20:58:28 UTC
oh! ok.
BTW, this is not SLES 12 issue. This is XFS issue. The issue persist on RHEL 7.2 ( kernel 3.10.0-327.10.1.el7.x86_64), OL, etc.
Comment 9 Eric Sandeen 2016-10-11 20:59:24 UTC
If you have a RHEL bug, please report it to Red Hat.
If you have an Oracle bug, please report it to Oracle.

If you can reproduce it on an upstream kernel, please report it here.
Comment 10 Sunil 2016-10-11 21:03:03 UTC
Sure. let me set it ip and rerun the tests, and will keep you posted.