Bug 9214

Summary: panic in show_mem on when out_of_memory
Product: Memory Management Reporter: Bernd Pfrommer (berndp)
Component: OtherAssignee: Andrew Morton (akpm)
Status: CLOSED CODE_FIX    
Severity: normal CC: randy.dunlap
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.21.5 Subsystem:
Regression: --- Bisected commit-id:

Description Bernd Pfrommer 2007-10-23 06:10:08 UTC
Most recent kernel where this bug did not occur: unknown
Distribution: Fedora Core 6 x86_64
Hardware Environment: 2 x Quad Xeon Supermicro with 4GB of memory
Software Environment: 64bit
Problem Description:
When running an application that reads a large file (8GB) into memory (4GB), a kernel panic occurs. No output occured on the serial port, so the following info was copied from the screen manually.

The process that caused the panic was "init".

RIP: show_mem + 0x8d/0x140


Stack:
out_of_memory + 0x75
__alloc_pages
__do_page_cache_readahead
mntput_no_expire
link_path_walk
filemap_nopage
__handle_mm_fault
do_page_fault
error_exit

This is happening on an 8-way SMP (2 quad xeon processors), supermicro 6015t-tv
Some kernel config info:

X86_64_ACPI_NUMA is switched on
no forced preemption
NUMA support is on
page migration is on
CC_STACKPROTECTOR is on
processor family is MCORE2


Steps to reproduce:

run an application that reads a large file into memory
Comment 1 Randy Dunlap 2007-10-23 13:45:21 UTC
How much swap space to you have?
Comment 2 Bernd Pfrommer 2007-10-23 16:06:19 UTC
I had configured 16GB of swap, but when I checked to make sure, swapon -s showed no swap at all. Turns out that somehow the partition labelling by the FC6 installer must not have worked properly, because after fixing /etc/fstab, and running mkswap, the swap now shows up in /proc/swaps.

In summary, NO swap was configured. I suspect that after the process reached the 4GB physical memory limit, the kernel must have tried to kill the process, at which point show_mem() must have been called.

I since found that there was a bug in show_mem() in the 2.6.21.5 kernel, which may have been patched in 2.6.21.6. See http://lkml.org/lkml/2007/6/27/195.

I'm compiling a 2.6.23 kernel now, and will do some testing with it later in the night.
 
Comment 3 Bernd Pfrommer 2007-10-24 04:20:43 UTC
Ran the application again with latest kernel 2.6.23. The oom-killer kicked in and killed the offending process, no kernel panic or the like. Consider this problem resolved, presumably with 2.6.21.6, but definitely with 2.6.23.