Bug 121181 - Why oom-killer invoked here? (we have here Free swap = 35117452kB)
Summary: Why oom-killer invoked here? (we have here Free swap = 35117452kB)
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-29 20:40 UTC by Mikhail
Modified: 2016-09-09 11:12 UTC (History)
1 user (show)

See Also:
Kernel Version: 4.6.3-300.fc24.x86_64+debug
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (243.17 KB, text/x-log)
2016-06-29 20:40 UTC, Mikhail
Details
dmesg after fresh boot (71.15 KB, text/plain)
2016-09-09 11:11 UTC, Michal Vaner
Details
oom output (9.48 KB, text/plain)
2016-09-09 11:12 UTC, Michal Vaner
Details

Description Mikhail 2016-06-29 20:40:49 UTC
Created attachment 221491 [details]
dmesg

[88734.915557] Purging GPU memory, 17874944 bytes freed, 17383424 bytes still pinned.
[88734.920157] totem invoked oom-killer: gfp_mask=0x0(), order=0, oom_score_adj=0
[88734.920284] totem cpuset=/ mems_allowed=0
[88734.920290] CPU: 7 PID: 20272 Comm: totem Not tainted 4.6.3-300.fc24.x86_64+debug #1
[88734.920291] Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
[88734.920293]  0000000000000286 00000000e3ddd6c5 ffff880004343d10 ffffffff81458b55
[88734.920297]  ffff880004343e58 ffff8803b8634000 ffff880004343d88 ffffffff81294469
[88734.920301]  0000000000000206 ffffffff81a23ff0 ffffffff8111149d ffff880004343d58
[88734.920305] Call Trace:
[88734.920311]  [<ffffffff81458b55>] dump_stack+0x86/0xc1
[88734.920314]  [<ffffffff81294469>] dump_header+0x60/0x22c
[88734.920318]  [<ffffffff8111149d>] ? trace_hardirqs_on+0xd/0x10
[88734.920322]  [<ffffffff811fa220>] oom_kill_process+0x200/0x470
[88734.920325]  [<ffffffff811faaa8>] out_of_memory+0x5c8/0x5e0
[88734.920326]  [<ffffffff811fa806>] ? out_of_memory+0x326/0x5e0
[88734.920329]  [<ffffffff811fab32>] pagefault_out_of_memory+0x72/0xc0
[88734.920331]  [<ffffffff81073ab4>] mm_fault_error+0x94/0x190
[88734.920333]  [<ffffffff8107407e>] __do_page_fault+0x4ce/0x520
[88734.920335]  [<ffffffff81074100>] do_page_fault+0x30/0x80
[88734.920339]  [<ffffffff818ceff8>] page_fault+0x28/0x30
[88734.920340] Mem-Info:
[88734.920345] active_anon:6985821 inactive_anon:516706 isolated_anon:0
                active_file:24794 inactive_file:22988 isolated_file:0
                unevictable:287 dirty:1005 writeback:0 unstable:0
                slab_reclaimable:68377 slab_unreclaimable:74637
                mapped:185087 shmem:399052 pagetables:91173 bounce:0
                free:65367 free_pcp:1869 free_cma:0
[88734.920348] Node 0 DMA free:14744kB min:32kB low:44kB high:56kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[88734.920354] lowmem_reserve[]: 0 2347 31008 31008 31008
[88734.920359] Node 0 DMA32 free:128256kB min:5112kB low:7512kB high:9912kB active_anon:1706824kB inactive_anon:426972kB active_file:20804kB inactive_file:19460kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2556500kB managed:2460452kB mlocked:0kB dirty:24kB writeback:0kB mapped:71212kB shmem:163908kB slab_reclaimable:36360kB slab_unreclaimable:32412kB kernel_stack:3872kB pagetables:33728kB unstable:0kB bounce:0kB free_pcp:4164kB local_pcp:724kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[88734.920365] lowmem_reserve[]: 0 0 28661 28661 28661
[88734.920369] Node 0 Normal free:118468kB min:62436kB low:91784kB high:121132kB active_anon:26236460kB inactive_anon:1639852kB active_file:78372kB inactive_file:72492kB unevictable:1148kB isolated(anon):0kB isolated(file):0kB present:29874176kB managed:29349192kB mlocked:1148kB dirty:3996kB writeback:0kB mapped:669136kB shmem:1432300kB slab_reclaimable:237148kB slab_unreclaimable:266136kB kernel_stack:35616kB pagetables:330964kB unstable:0kB bounce:0kB free_pcp:3312kB local_pcp:664kB free_cma:0kB writeback_tmp:0kB pages_scanned:192 all_unreclaimable? no
[88734.920374] lowmem_reserve[]: 0 0 0 0 0
[88734.920378] Node 0 DMA: 2*4kB (U) 2*8kB (U) 2*16kB (U) 1*32kB (U) 3*64kB (U) 1*128kB (U) 0*256kB 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB (M) = 14744kB
[88734.920392] Node 0 DMA32: 3806*4kB (UME) 3035*8kB (UME) 594*16kB (UME) 283*32kB (UME) 273*64kB (UME) 107*128kB (UME) 28*256kB (UME) 18*512kB (UME) 12*1024kB (ME) 5*2048kB (M) 0*4096kB = 128144kB
[88734.920407] Node 0 Normal: 22730*4kB (UME) 3358*8kB (UME) 23*16kB (UME) 5*32kB (M) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 118312kB
[88734.920420] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[88734.920422] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[88734.920423] 1603525 total pagecache pages
[88734.920425] 1156095 pages in swap cache
[88734.920427] Swap cache stats: add 51129785, delete 49973690, find 21403325/31931856
[88734.920428] Free swap  = 35117452kB
[88734.920429] Total swap = 62494716kB
[88734.920431] 8111666 pages RAM
[88734.920432] 0 pages HighMem/MovableOnly
[88734.920433] 155281 pages reserved
[88734.920434] 0 pages cma reserved
[88734.920436] 0 pages hwpoisoned
Comment 1 Michal Vaner 2016-09-09 11:10:46 UTC
I think the same problem happens to me (though very rarely, I don't know how to reproduce it). So I'd like to add some more info.

Usually, this happens after some thorough exercise of bunch of disks (eg. btrfs balance). I have the disks connected to two SATA controllers, one onboard, one in PCIe slot. When it starts happening, almost larger IO to the disks cause an OOM killer. The killing continues until almost the whole userspace is murdered.

I did notice there are no DMA pages free (if I understand the output correctly). But I find it quite hard to believe that killing firefox or gcc would free any.

So I believe there are two problems. One is something consumend and haven't returned the DMA pages, which are very limited. The other problem is the OOM killer killing some random process that doesn't have any of these pages, which doesn't help, so the killer is invoked again the next time a DMA page is needed.

I'm going to add a dmesg output of the OOM killer output, start of dmesg after fresh reboot (so it shows what DMA mappings there may be and what HW there is). It happened with quite recent kernel (4.7.0), but it wasn't the first time, so the problem is older. It haven't happened for some weeks now again, but as I said, it happens quite rarely.

Is there any more info I could provide?
Comment 2 Michal Vaner 2016-09-09 11:11:44 UTC
Created attachment 232831 [details]
dmesg after fresh boot
Comment 3 Michal Vaner 2016-09-09 11:12:32 UTC
Created attachment 232841 [details]
oom output

Note You need to log in before you can comment on or make changes to this bug.