Bug 7760
Summary: | page allocation failure on ixp4xx (nslu2) with 128MB RAM | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Stephan (sinclair73) |
Component: | ARM | Assignee: | Russell King (rmk) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, cw, rlockhar, rod, tch |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.18-3 | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Stephan
2007-01-02 06:46:37 UTC
The VM decided you were out of 16K pages in the DMA zone and refused to allocate you the page. This is one of the hazards of using the DMA bounce code - if you have large allocations, the VM can choose to refuse to allocate you memory, especially when you ask for it in atomic contexts. You're probably seeing this because 128MB could be sufficiently large that you're starting to use the bounce buffers; since I don't know NSLU2 hardware (or even IXP hardware) I couldn't say for certain though. Please see http://bugzilla.kernel.org/show_bug.cgi?id=7760 in regards to >64MB and "page allocation failure" and DMA_BOUNCE failure. Since only USB on PCI bus, any PCI DMA problems will cause corruption of USB devices (and thus rootfs if it is located on USB bus). I have an IXP420 with 256MiB SDRAM and USB EHCI on the PCI bus. Using kernel 2.6.18 and 2.6.20, I have serial console and ssh login. The root fs is mounted to a USB device /dev/sda1. The USB driver gets a page allocation error (USB EHCI on PCI bus), after the error message below occurs, if the USB driver (i.e., usb_storage) requested the memory. oom-killer: gfp_mask=0x200d2, order=0 memtest: page allocation failure. order:0, mode:0x200d2 ehci_hcd 0000:00:01.2: alloc_safe_buffer: could not alloc dma memory (size=8192) ehci_hcd 0000:00:01.2: map_single: unable to map unsafe buffer cff04000! When doing a "memtest 256m 1", I get a dump of the messages above to the console (serial) and similar messages (different buffer address). I have also been able to get similar messages by running any program at attempts to use nearly all the memory available (i.e., rsync with a sufficiently large database). Note that this is not a solder problem or hardware problem, as I have been able to successfully test >95% of the memory. I have pasted the complete process here, including all debugging I could imagine: http://pastebin.ca/440753 -> debian 2.6.18 w/apex 1.4.18 (from debian armle) http://pastebin.ca/442296 -> slugosle 2.6.20 w/apex 1.4.18 (generic kernel from sources) Please let me know what I can do to help debug this problem. I am willing to try with 2.6.21 kernel. Note that I have been able to successfully get it into this state by using the following process: 1) do "free", hopefully not many buffers used. If so, may have to do this more than once. 2) convert that number to MB by /1024. Round the number down, let it be X. Then: memtester X (if slugosle generic kernel) memtest X 1 (if debian) 3) Note that if you do the same test with X-1 instead of X, the problem doesn't occur, ever (thus verifying the hardware stability). 4) Note also that if the test passes, perhaps the cache was flushed, so go back to 1) and confirm free space, and then re-run 2) as appropriate. Here is /proc/iomem (for addresses of devices): NSLU2:/proc# cat iomem 00000000-0fffffff : System RAM 0001f000-0020636f : Kernel text 00208000-002842a7 : Kernel data 48000000-4bffffff : PCI Memory Space 48000000-48000fff : 0000:00:01.0 48000000-48000fff : ohci_hcd 48001000-48001fff : 0000:00:01.1 48001000-48001fff : ohci_hcd 48002000-480020ff : 0000:00:01.2 48002000-480020ff : ehci_hcd 50000000-50ffffff : IXP4XX-Flash.0 50000000-50ffffff : IXP4XXFlash 60000000-60003fff : ixp4xx_qmgr.0 60000000-60003fff : ixp_qmgr c8000000-c8000fff : serial8250.0 c8000000-c800001f : serial c8001000-c8001fff : serial8250.0 c8001000-c800101f : serial c8007000-c8007fff : ixp4xx_npe.1 c8007000-c8007fff : NPE-B c8008000-c8008fff : ixp4xx_npe.2 c8008000-c8008fff : NPE-C c8009000-c80091ff : ixp4xx_mac.0 c8009000-c80091ff : ixp4xx_mac Here is the complete dump of the error message (from serial console): NSLU2:/proc# oom-killer: gfp_mask=0x201d2, order=0 Mem-info: DMA per-cpu: cpu 0 hot: high 18, batch 3 used:0 cpu 0 cold: high 6, batch 1 used:5 DMA32 per-cpu: cpu 0 hot: high 90, batch 15 used:0 cpu 0 cold: high 30, batch 7 used:6 Normal per-cpu: empty HighMem per-cpu: empty Free pages: 1008kB (0kB HighMem) Active:33222 inactive:28814 dirty:0 writeback:0 unstable:0 free:252 slab:857 mapped:7 pagetables:216 DMA free:1008kB min:512kB low:640kB high:768kB active:29984kB inactive:31572kB present:65536kB pages_scanned:137512 all_unreclaimable? yes lowmem_reserve[]: 0 192 192 192 DMA32 free:0kB min:1536kB low:1920kB high:2304kB active:102904kB inactive:83684kB present:196608kB pages_scanned:271001 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Normal: empty HighMem: empty Swap cache: add 187052, delete 125513, find 300/490, race 0+0 Free swap = 5336kB Total swap = 257024kB Free swap: 5336kB 65536 pages of RAM 406 free pages 1236 reserved pages 857 slab pages 60 pages shared 61539 pages swap cached other related links: http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2003-June/015758.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2004-March/020737.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2005-January/026346.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2006-June/034900.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2007-January/037844.html http://thread.gmane.org/gmane.comp.misc.nslu2.devel/1486/focus=1488 Please see http://bugzilla.kernel.org/show_bug.cgi?id=7760 in regards to >64MB and "page allocation failure" and DMA_BOUNCE failure. Since only USB on PCI bus, any PCI DMA problems will cause corruption of USB devices (and thus rootfs if it is located on USB bus). I have an IXP420 with 256MiB SDRAM and USB EHCI on the PCI bus. Using kernel 2.6.18 and 2.6.20, I have serial console and ssh login. The root fs is mounted to a USB device /dev/sda1. The USB driver gets a page allocation error (USB EHCI on PCI bus), after the error message below occurs, if the USB driver (i.e., usb_storage) requested the memory. oom-killer: gfp_mask=0x200d2, order=0 memtest: page allocation failure. order:0, mode:0x200d2 ehci_hcd 0000:00:01.2: alloc_safe_buffer: could not alloc dma memory (size=8192) ehci_hcd 0000:00:01.2: map_single: unable to map unsafe buffer cff04000! When doing a "memtest 256m 1", I get a dump of the messages above to the console (serial) and similar messages (different buffer address). I have also been able to get similar messages by running any program at attempts to use nearly all the memory available (i.e., rsync with a sufficiently large database). Note that this is not a solder problem or hardware problem, as I have been able to successfully test >95% of the memory. I have pasted the complete process here, including all debugging I could imagine: http://pastebin.ca/440753 -> debian 2.6.18 w/apex 1.4.18 (from debian armle) http://pastebin.ca/442296 -> slugosle 2.6.20 w/apex 1.4.18 (generic kernel from sources) Please let me know what I can do to help debug this problem. I am willing to try with 2.6.21 kernel. Note that I have been able to successfully get it into this state by using the following process: 1) do "free", hopefully not many buffers used. If so, may have to do this more than once. 2) convert that number to MB by /1024. Round the number down, let it be X. Then: memtester X (if slugosle generic kernel) memtest X 1 (if debian) 3) Note that if you do the same test with X-1 instead of X, the problem doesn't occur, ever (thus verifying the hardware stability). 4) Note also that if the test passes, perhaps the cache was flushed, so go back to 1) and confirm free space, and then re-run 2) as appropriate. Here is /proc/iomem (for addresses of devices): NSLU2:/proc# cat iomem 00000000-0fffffff : System RAM 0001f000-0020636f : Kernel text 00208000-002842a7 : Kernel data 48000000-4bffffff : PCI Memory Space 48000000-48000fff : 0000:00:01.0 48000000-48000fff : ohci_hcd 48001000-48001fff : 0000:00:01.1 48001000-48001fff : ohci_hcd 48002000-480020ff : 0000:00:01.2 48002000-480020ff : ehci_hcd 50000000-50ffffff : IXP4XX-Flash.0 50000000-50ffffff : IXP4XXFlash 60000000-60003fff : ixp4xx_qmgr.0 60000000-60003fff : ixp_qmgr c8000000-c8000fff : serial8250.0 c8000000-c800001f : serial c8001000-c8001fff : serial8250.0 c8001000-c800101f : serial c8007000-c8007fff : ixp4xx_npe.1 c8007000-c8007fff : NPE-B c8008000-c8008fff : ixp4xx_npe.2 c8008000-c8008fff : NPE-C c8009000-c80091ff : ixp4xx_mac.0 c8009000-c80091ff : ixp4xx_mac Here is the complete dump of the error message (from serial console): NSLU2:/proc# oom-killer: gfp_mask=0x201d2, order=0 Mem-info: DMA per-cpu: cpu 0 hot: high 18, batch 3 used:0 cpu 0 cold: high 6, batch 1 used:5 DMA32 per-cpu: cpu 0 hot: high 90, batch 15 used:0 cpu 0 cold: high 30, batch 7 used:6 Normal per-cpu: empty HighMem per-cpu: empty Free pages: 1008kB (0kB HighMem) Active:33222 inactive:28814 dirty:0 writeback:0 unstable:0 free:252 slab:857 mapped:7 pagetables:216 DMA free:1008kB min:512kB low:640kB high:768kB active:29984kB inactive:31572kB present:65536kB pages_scanned:137512 all_unreclaimable? yes lowmem_reserve[]: 0 192 192 192 DMA32 free:0kB min:1536kB low:1920kB high:2304kB active:102904kB inactive:83684kB present:196608kB pages_scanned:271001 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB DMA32: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB Normal: empty HighMem: empty Swap cache: add 187052, delete 125513, find 300/490, race 0+0 Free swap = 5336kB Total swap = 257024kB Free swap: 5336kB 65536 pages of RAM 406 free pages 1236 reserved pages 857 slab pages 60 pages shared 61539 pages swap cached other related links: http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2003-June/015758.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2004-March/020737.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2005-January/026346.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2006-June/034900.html http://lists.arm.linux.org.uk/pipermail/linux-arm-kernel/2007-January/037844.html http://thread.gmane.org/gmane.comp.misc.nslu2.devel/1486/focus=1488 Given Rob's attempts at trying to get some reaction on the ARM mailing lists were unfruitful, I'm not sure where we go from here. The DMA bounce code is a _hack_ around the problem - the API was never designed in the first place to have bounces happening at this point. It was designed to map the passed buffer to an address visible on the bus, be that via an IOMMU or via a straight translation. The handling of "can this buffer be visible to the device for DMA" is supposed to be handled by whoever allocates the buffer, and that means having the right DMA mask set for the device. Not sure how to deal with that when it's a property of the upstream busses rather than the device itself; Linux's DMA masks are based around capabilities of the device. And, as I've said in comment #2, I'm not certain of the issues with PCI on IXP4xx, so having a bug assigned to me in this bugzilla is utterly useless (and I don't think there's anyone else in the ARM kernel community on this bugzilla I could assign it to.) What we need are people who remain responsible for platforms, rather than dumping code into the patch system and then running away from responsibility for that code. Unfortunately, people with that attribute are few and far between in the ARM kernel community. Sorry. The original poster (Stephan) suggested ".. something seams wrong with memory-management or DMA". It seems, at least to me, that memory is being requested without being available, as oom_killer is being called. Could the DMA error (also PCI error) be an artifact of the VM subsystem alloc failure? Hideo Aoki has a patch for OVERCOMMIT_GUESS: "Patch: mm: An enhancement of OVERCOMMIT_GUESS" http://lwn.net/Articles/178850/ http://marc.info/?l=linux-kernel&m=112993489022427&w=2 The differences here I see are in regards to linux-2.6.21-rc6-arm/mm/page_alloc.c: zone->pages_high = zone->pages_min + (tmp >> 1); but the patches above seem to have a different parameter for this. I don't claim to understand the nuances of these implementations but I suggest it as it might be the culprit. Can someone comment as to whether this seems plausible? I am by far an expert in the linux vm subsystem. s/I am by far an expert in the linux vm subsystem/I am by far *NOT* an expert in the linux vm subsystem/ Here is my theory - please tell me if I'm way off-base or not: In alloc_safe_buffer in dmabounce.c, it looks at the requested size. If it's <= small size (2048 in ixp4xx case) it uses the pre-allocated small pool. If <= large size (4096 for ixp4xx) it uses the preallocated large pool. If bigger than 4096 (which is the case in the failing situations reported), it just does a dma_alloc_coherent (which does not use preallocated memory). So in the case where we are using memtester to trigger this bug, memtester would have already requested all available memory from the kernel, and then something wants to read or write to USB. That read or write triggers an alloc of >4K, and the request cannot be satisifed, which calls oom and kills the USB and therefore the rootfs which is mounted from the USB disk. Is that plausible? In my opinion, it is something like this : while trying allocating memory, it appears too low, so system decides to allocate disk swap, but the problem is that disk swap invokes USB transfert, and USB request memory for DMA... then it enters into a kind of loop. here is the problem? so, possibles solutions are : - increasing minimum merory? - turning USB into a non-dma transferts if memory is too low? can someone more expert that me give some opinion? I think that a workaround might have been discovered, by Stephen Miller. It is in the testing stages right now, but so far I haven't been able to create the aforementioned problem. For the record, I'm using an NSLU2 with 256MB in two banks of 2 @ x16 512Mb memory. And it is running Debian BE: Linux NSLU2 2.6.18-5-ixp4xx #1 Sun Dec 23 05:17:39 UTC 2007 armv5tel I am using Apex v1.4.7 as 2nd stage boot loader, and specify this: setenv startup sdram-init; memscan -u 0+256m; copy -s fis://kernel 0x00008000; copy -s fis://ramdisk 0x01000000; wait 20 Type ^C key to cancel autoboot.; boot I then put this in the beginning of /etc/rcS.d/S01glibc.sh (just below first line): echo 100 >/proc/sys/vm/swappiness echo 8192 >/proc/sys/vm/min_free_kbytes This is not the greatest idea but it is just a test. There seems to be some influence of "/etc/sysctl.conf" in these values but I think it is too late in the bootup process. I.e., if booting uses too much memory w/o the values being changed, then the kernel will hang as the values haven't been changed when it's running out of memory. file: /etc/sysctl.conf vm.overcommit_memory=2 vm.overcommit_ratio=80 vm.min_free_kbytes=10240 Note that this seems to be called from within S30procps.sh and this occurs AFTER S30checkfs.sh so it's too late. I have posted an updated log here: http://pastebin.ca/876984 Note that before I moved the commands, the FatSlug would hang when doing the 180-day file system check. Now it does not, even when the available memory has been exceeded (forced to use swap). PLEASE NOTE that I have effectively commented out the effects of the first two of three parameters above in /etc/sysctl.conf and the values for my NSLU2 are currently set to: /proc/sys/vm/overcommit_memory -> 0 /proc/sys/vm/overcommit_ratio -> 50 /proc/sys/vm/min_free_kbytes -> 10240 Comments? I hereby revoke my optimistic statement regarding a possible work-around. Indeed, the lockup still occurs when transferring large amounts of data whereby the DMA (PCI->USB->IDE) process seems to hang. Going back to 64MiB seems to alleviate the condition. I agree i also made similar modifications, here is the script #! /bin/sh echo '#! /bin/sh' > /etc/init.d/lowmem echo '# jean jacques.goessens - crée automotiquement' >> /etc/init.d/lowmem echo 'echo 4096 > /proc/sys/vm/min_free_kbytes' >> /etc/init.d/lowmem chmod +x /etc/init.d/lowmem ln -s /etc/init.d/lowmem /etc/rc1.d/S10lowmem ln -s /etc/init.d/lowmem /etc/rc2.d/S10lowmem ln -s /etc/init.d/lowmem /etc/rc3.d/S10lowmem ln -s /etc/init.d/lowmem /etc/rc4.d/S10lowmem ln -s /etc/init.d/lowmem /etc/rc5.d/S10lowmem this script creates this other script #! /bin/sh # jean jacques.goessens - crée automotiquement echo 4096 > /proc/sys/vm/min_free_kbytes and link it in RS's Now, my 128M slug works fine, but i never tried an initial fsck. I confirm than my slug was hanging every time it performs fsck at startup, now i have made the modification, i did not checked yet. anyway, maybe linking to S01lowmem can make it load before S30checkfs, but i did not checked. the best should be to modify dma_bounce and to recompile the kernel? but this is too early for me. what is the purpose of "swappiness"?? JJ I use slugOS-4.8 beta (kernel 2.6.21.7), and changes made in /proc/sys/vm/min_free_kbytes doesn't help: in case of formatting a 100Gbyte HDD with Ext3 my slug with 256MByte of RAM hangs up. I did the following slug specific workaround, which solves the problem: I've increased the DMA pool block size, and there is no need to call dma_alloc_coherent() because there are always preallocated DMA buffers exist. I've changed the line dmabounce_register_dev(dev, 2048, 4096); to dmabounce_register_dev(dev, 16384, 131072); in module /linux-2.6.21/arch/arm/mach-ixp4xx/common-pci.c in function static int ixp4xx_pci_platform_notify(struct device *dev) {} My slug has been running with this patch for a month without any problem. Levente I created SlugOS-BE 4.10-alpha for NSLU2 with 256MB slug, including Apex 1.5.14 (for support of 256MB memory detection). The kernel version here is 2.6.24.7 - I couldn't change the field above. I added the patches mentioned in Comment #12 in my custom build, which was built per these instructions: http://www.nslu2-linux.org/wiki/HowTo/CrossCompileWithCentOS Then I "ipkg install memtester", then "memtester 128m". That performed just fine. However, doing "memtester 256m", and I got the dreaded OOM bug (console frozen). Output is below. I humbly suggest that the "memtester" be used (if tested in SlugOS environment) or "memtest" (if tested with Debian environment) to verify a kernel fix. Note that the aforementioned settings were not set (those in /proc/sys/vm/) but were left at defaults, which were: /proc/sys/vm/overcommit_memory = 0 /proc/sys/vm/overcommit_ratio = 50 /proc/sys/vm/min_free_kbytes = 2039 /proc/sys/vm/swappiness = 60 The first test below was with defaults above. root@NSLU2:~# memtester 64m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 64MB (67108864 bytes) got 64MB (67108864 bytes), trying mlock ...locked. Loop 1: Stuck Address : testing 6 root@NSLU2:~# free total used free shared buffers Mem: 257416 95328 162088 0 4868 Swap: 145144 0 145144 Total: 402560 95328 307232 root@NSLU2:~# memtester 180m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 180MB (188743680 bytes) got 180MB (188743680 bytes), trying mlock ...locked. Loop 1: Stuck Address : testing 1 root@NSLU2:~# free total used free shared buffers Mem: 257416 68956 188460 0 4664 Swap: 145144 0 145144 Total: 402560 68956 333604 root@NSLU2:~# memtester 250m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 250MB (262144000 bytes) got 250MB (262144000 bytes), trying mlock ... memtester invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0 Function entered at [<c0023f08>] from [<c0060c34>] Function entered at [<c0060bdc>] from [<c0061098>] r7:00000fb3 r6:cfc5f3e0 r5:c025af44 r4:cfc5f3e0 Function entered at [<c0060f0c>] from [<c0063258>] Function entered at [<c0062fe8>] from [<c006b5c8>] Function entered at [<c006b470>] from [<c006bb38>] Function entered at [<c006ba18>] from [<c006d168>] Function entered at [<c006d0dc>] from [<c006db50>] r8:cfddcce0 r7:40129000 r6:cff65f6c r5:cfec9e48 r4:ffff05ff Function entered at [<c006da44>] from [<c006dcb8>] Function entered at [<c006dc04>] from [<c006dda4>] r6:40129000 r5:fffffff4 r4:0fa01000 Function entered at [<c006dd04>] from [<c001fde0>] r6:0fa00000 r5:0fa00000 r4:0fa00008 Mem-info: DMA per-cpu: CPU 0: Hot: hi: 18, btch: 3 usd: 2 Cold: hi: 6, btch: 1 usd: 0 Normal per-cpu: CPU 0: Hot: hi: 90, btch: 15 usd: 76 Cold: hi: 30, btch: 7 usd: 28 Active:31216 inactive:30842 dirty:0 writeback:0 unstable:0 free:686 slab:485 mapped:0 pagetables:175 bounce:0 DMA free:1268kB min:508kB low:632kB high:760kB active:29756kB inactive:29864kB present:65024kB pages_scanned:99747 all_unreclaimable? yes lowmem_reserve[]: 0 190 190 Normal free:1476kB min:1524kB low:1904kB high:2284kB active:95108kB inactive:93504kB present:195072kB pages_scanned:310638 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1268kB Normal: 3*4kB 1*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1476kB Swap cache: add 36286, delete 182, find 0/0, race 0+0 Free swap = 0kB Total swap = 145144kB Free swap: 0kB 65536 pages of RAM 873 free pages 1694 reserved pages 485 slab pages 96 pages shared 36104 pages swap cached Out of memory: kill process 3980 (memtester) score 4019 or a child Killed process 3980 (memtester) root@NSLU2:~# memtester 256m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 256MB (268435456 bytes) got 256MB (268435456 bytes), trying mlock ...memtester invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0 Function entered at [<c0023f08>] from [<c0060c34>] Function entered at [<c0060bdc>] from [<c0061098>] r7:00001013 r6:cfc1e060 r5:c025af44 r4:cfc1e060 Function entered at [<c0060f0c>] from [<c0063258>] Function entered at [<c0062fe8>] from [<c006b5c8>] Function entered at [<c006b470>] from [<c006bb38>] Function entered at [<c006ba18>] from [<c006d168>] Function entered at [<c006d0dc>] from [<c006db50>] r8:cfe60a40 r7:40129000 r6:cfdbdf6c r5:cfed6da0 r4:fffeffff Function entered at [<c006da44>] from [<c006dcb8>] Function entered at [<c006dc04>] from [<c006dda4>] r6:40129000 r5:fffffff4 r4:10001000 Function entered at [<c006dd04>] from [<c001fde0>] r6:10000000 r5:10000000 r4:10000008 Mem-info: DMA per-cpu: CPU 0: Hot: hi: 18, btch: 3 usd: 2 Cold: hi: 6, btch: 1 usd: 0 Normal per-cpu: CPU 0: Hot: hi: 90, btch: 15 usd: 23 Cold: hi: 30, btch: 7 usd: 9 Active:31348 inactive:30830 dirty:0 writeback:0 unstable:0 free:693 slab:469 mapped:0 pagetables:175 bounce:0 DMA free:1268kB min:508kB low:632kB high:760kB active:29888kB inactive:29732kB present:65024kB pages_scanned:99010 all_unreclaimable? yes lowmem_reserve[]: 0 190 190 Normal free:1504kB min:1524kB low:1904kB high:2284kB active:95504kB inactive:93588kB present:195072kB pages_scanned:320579 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1268kB Normal: 8*4kB 2*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1504kB Swap cache: add 72435, delete 36771, find 15/23, race 0+0 Free swap = 0kB Total swap = 145144kB Free swap: 0kB 65536 pages of RAM 808 free pages 1694 reserved pages 469 slab pages 11 pages shared 35664 pages swap cached Out of memory: kill process 3981 (memtester) score 4115 or a child Killed process 3981 (memtester) Killed root@NSLU2:~# memtester 250m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 250MB (262144000 bytes) got 250MB (262144000 bytes), trying mlock ...ntpd invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 Function entered at [<c0023f08>] from [<c0060c34>] Function entered at [<c0060bdc>] from [<c0061098>] r7:00000fb3 r6:cff2d480 r5:c025af44 r4:cff2d480 Function entered at [<c0060f0c>] from [<c0063258>] Function entered at [<c0062fe8>] from [<c0065234>] Function entered at [<c0065168>] from [<c00657d0>] Function entered at [<c0065764>] from [<c0060074>] r7:cff59540 r6:00000000 r5:00000fff r4:00000000 Function entered at [<c005ff0c>] from [<c006a78c>] Function entered at [<c006a720>] from [<c006b708>] Function entered at [<c006b470>] from [<c0025cd0>] Function entered at [<c0025bdc>] from [<c0025eb4>] Function entered at [<c0025e94>] from [<c001f1ac>] r5:00000000 r4:ffffffff Function entered at [<c001f194>] from [<c001fd60>] Exception stack(0xcfd9bfb0 to 0xcfd9bff8) bfa0: ffffffff 00000004 00000010 00000000 bfc0: ffffffff 00000000 be99acf8 be99ad78 00000000 00000000 401cb000 00000000 bfe0: 000520fc be99acf4 00010ce4 400ddfe0 60000010 ffffffff Mem-info: DMA per-cpu: CPU 0: Hot: hi: 18, btch: 3 usd: 4 Cold: hi: 6, btch: 1 usd: 0 Normal per-cpu: CPU 0: Hot: hi: 90, btch: 15 usd: 33 Cold: hi: 30, btch: 7 usd: 6 Active:30780 inactive:31366 dirty:0 writeback:0 unstable:0 free:695 slab:465 mapped:0 pagetables:175 bounce:0 DMA free:1268kB min:508kB low:632kB high:760kB active:29264kB inactive:30344kB present:65024kB pages_scanned:99352 all_unreclaimable? yes lowmem_reserve[]: 0 190 190 Normal free:1512kB min:1524kB low:1904kB high:2284kB active:93856kB inactive:95120kB present:195072kB pages_scanned:303405 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1268kB Normal: 8*4kB 3*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1512kB Swap cache: add 108389, delete 72731, find 92/141, race 0+0 Free swap = 0kB Total swap = 145144kB Free swap: 0kB 65536 pages of RAM 819 free pages 1694 reserved pages 465 slab pages 12 pages shared 35658 pages swap cached Out of memory: kill process 3982 (memtester) score 4019 or a child Killed process 3982 (memtester) Killed root@NSLU2:~# ehci_hcd 0000:00:01.2: alloc_safe_buffer: could not alloc dma memory (size=32768) ehci_hcd 0000:00:01.2: map_single: unable to map unsafe buffer cf448000! Next, I changed the defaults for /proc/sys/mem as follows: root@NSLU2:~# echo 100 >/proc/sys/vm/swappiness root@NSLU2:~# echo 10240 >/proc/sys/vm/min_free_kbytes root@NSLU2:~# free total used free shared buffers Mem: 257416 94112 163304 0 4648 Swap: 145144 0 145144 Total: 402560 94112 308448 root@NSLU2:~# memtester 250m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 250MB (262144000 bytes) got 250MB (262144000 bytes), trying mlock ...ntpd invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0 Function entered at [<c0023f08>] from [<c0060c34>] Function entered at [<c0060bdc>] from [<c0061098>] r7:00000fb3 r6:cfc5f960 r5:c025af44 r4:cfc5f960 Function entered at [<c0060f0c>] from [<c0063258>] Function entered at [<c0062fe8>] from [<c0065234>] Function entered at [<c0065168>] from [<c00657d0>] Function entered at [<c0065764>] from [<c0060074>] r7:cfd22f60 r6:00000000 r5:00000fff r4:00000000 Function entered at [<c005ff0c>] from [<c006a78c>] Function entered at [<c006a720>] from [<c006b708>] Function entered at [<c006b470>] from [<c0025cd0>] Function entered at [<c0025bdc>] from [<c0025eb4>] Function entered at [<c0025e94>] from [<c001f1ac>] r5:40112598 r4:ffffffff Function entered at [<c001f194>] from [<c001fd60>] Exception stack(0xcfe11fb0 to 0xcfe11ff8) 1fa0: 40125720 00000000 00000000 40125720 1fc0: 00000000 40112598 80000000 00000001 00000000 00000000 40125000 bef6eb14 1fe0: 0001d188 bef6eaf8 0000adc4 4007023c 60000010 ffffffff Mem-info: DMA per-cpu: CPU 0: Hot: hi: 18, btch: 3 usd: 3 Cold: hi: 6, btch: 1 usd: 0 Normal per-cpu: CPU 0: Hot: hi: 90, btch: 15 usd: 84 Cold: hi: 30, btch: 7 usd: 20 Active:30086 inactive:29913 dirty:0 writeback:0 unstable:0 free:2744 slab:476 mapped:0 pagetables:167 bounce:0 DMA free:3320kB min:2560kB low:3200kB high:3840kB active:28912kB inactive:28656kB present:65024kB pages_scanned:86516 all_unreclaimable? yes lowmem_reserve[]: 0 190 190 Normal free:7656kB min:7680kB low:9600kB high:11520kB active:91432kB inactive:90996kB present:195072kB pages_scanned:294421 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3320kB Normal: 8*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7656kB Swap cache: add 36366, delete 650, find 10/22, race 0+0 Free swap = 0kB Total swap = 145144kB Free swap: 0kB 65536 pages of RAM 2928 free pages 1694 reserved pages 476 slab pages 26 pages shared 35716 pages swap cached Out of memory: kill process 3844 (memtester) score 4019 or a child Killed process 3844 (memtester) Killed root@NSLU2:~# memtester 256m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 256MB (268435456 bytes) got 256MB (268435456 bytes), trying mlock ...memtester invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0 Function entered at [<c0023f08>] from [<c0060c34>] Function entered at [<c0060bdc>] from [<c0061098>] r7:00001013 r6:cfe96680 r5:c025af44 r4:cfe96680 Function entered at [<c0060f0c>] from [<c0063258>] Function entered at [<c0062fe8>] from [<c006b5c8>] Function entered at [<c006b470>] from [<c006bb38>] Function entered at [<c006ba18>] from [<c006d168>] Function entered at [<c006d0dc>] from [<c006db50>] r8:cfdf8a20 r7:40129000 r6:cff91f6c r5:cfe212cc r4:fffeffff Function entered at [<c006da44>] from [<c006dcb8>] Function entered at [<c006dc04>] from [<c006dda4>] r6:40129000 r5:fffffff4 r4:10001000 Function entered at [<c006dd04>] from [<c001fde0>] r6:10000000 r5:10000000 r4:10000008 Mem-info: DMA per-cpu: CPU 0: Hot: hi: 18, btch: 3 usd: 2 Cold: hi: 6, btch: 1 usd: 5 Normal per-cpu: CPU 0: Hot: hi: 90, btch: 15 usd: 21 Cold: hi: 30, btch: 7 usd: 12 Active:30143 inactive:29971 dirty:0 writeback:0 unstable:0 free:2745 slab:456 mapped:0 pagetables:168 bounce:0 DMA free:3316kB min:2560kB low:3200kB high:3840kB active:28988kB inactive:28568kB present:65024kB pages_scanned:94915 all_unreclaimable? yes lowmem_reserve[]: 0 190 190 Normal free:7664kB min:7680kB low:9600kB high:11520kB active:91584kB inactive:91316kB present:195072kB pages_scanned:283698 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3316kB Normal: 2*4kB 3*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7664kB Swap cache: add 87014, delete 51312, find 73/121, race 0+0 Free swap = 0kB Total swap = 145144kB Free swap: 0kB 65536 pages of RAM 2862 free pages 1694 reserved pages 456 slab pages 44 pages shared 35702 pages swap cached Out of memory: kill process 3845 (memtester) score 4115 or a child Killed process 3845 (memtester) Killed root@NSLU2:~# memtester 256m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 256MB (268435456 bytes) got 256MB (268435456 bytes), trying mlock ...init invoked oom-killer: gfp_mask=0x1200d2, order=0, oomkilladj=0 Function entered at [<c0023f08>] from [<c0060c34>] Function entered at [<c0060bdc>] from [<c0061098>] r7:00001013 r6:cfc5e8e0 r5:c025af44 r4:cfc5e8e0 Function entered at [<c0060f0c>] from [<c0063258>] Function entered at [<c0062fe8>] from [<c0073fb4>] Function entered at [<c0073f74>] from [<c006b0d8>] r7:cfd339b0 r6:0000098d r5:00000008 r4:00000006 Function entered at [<c006b098>] from [<c006b744>] r8:be94a000 r7:cfd339b0 r6:00002fa0 r5:00000000 r4:00131a00 Function entered at [<c006b470>] from [<c0025cd0>] Function entered at [<c0025bdc>] from [<c001f1ec>] Function entered at [<c001f1b0>] from [<c001fa00>] Exception stack(0xcfc21de8 to 0xcfc21e30) 1de0: be94abe0 cfc21e64 ffffffe4 00000000 cfc21e54 00000004 1e00: be94abe0 00000000 00000000 00000000 be94ad10 cfc21fa4 0000001c cfc21e30 1e20: 00000000 c0105a34 00000013 ffffffff r8:00000000 r7:00000000 r6:be94abe0 r5:cfc21e1c r4:ffffffff Function entered at [<c008b284>] from [<c001fde0>] Mem-info: DMA per-cpu: CPU 0: Hot: hi: 18, btch: 3 usd: 2 Cold: hi: 6, btch: 1 usd: 5 Normal per-cpu: CPU 0: Hot: hi: 90, btch: 15 usd: 38 Cold: hi: 30, btch: 7 usd: 28 Active:30013 inactive:30097 dirty:0 writeback:0 unstable:0 free:2746 slab:453 mapped:0 pagetables:168 bounce:0 DMA free:3320kB min:2560kB low:3200kB high:3840kB active:28540kB inactive:29012kB present:65024kB pages_scanned:98257 all_unreclaimable? yes lowmem_reserve[]: 0 190 190 Normal free:7664kB min:7680kB low:9600kB high:11520kB active:91512kB inactive:91376kB present:195072kB pages_scanned:288632 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 DMA: 2*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3320kB Normal: 0*4kB 2*8kB 2*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7664kB Swap cache: add 132337, delete 96640, find 127/217, race 0+0 Free swap = 0kB Total swap = 145144kB Free swap: 0kB 65536 pages of RAM 2896 free pages 1694 reserved pages 453 slab pages 44 pages shared 35697 pages swap cached Out of memory: kill process 3846 (memtester) score 4115 or a child Killed process 3846 (memtester) Killed ehci_hcd 0000:00:01.2: alloc_safe_buffer: could not alloc dma memory (size=28672) ehci_hcd 0000:00:01.2: map_single: unable to map unsafe buffer cb010000! (In reply to comment #13) > Then I "ipkg install memtester", then "memtester 128m". That performed just > fine. However, doing "memtester 256m", and I got the dreaded OOM bug > (console > frozen). Output is below. > > [...] > got 250MB (262144000 bytes), trying mlock ... You cannot expect to be able to mlock 250 MB of memory on a machine with 256 MB of RAM, no matter how much swap you have. This has nothing to do with the bug described in this bugzilla. Per suggestion #14 that adding swap is not related to the bug, I turned swap off and re-tested clean (i.e., re-flashed completely and re-started test). This is for the NSLU2 with 256MB SDRAM, IXP420 @ 266MHz, SlugOSBE-4.10-alpha, Apex 1.5.14, kernel as mentioned above. root@NSLU2:~# echo 100 >/proc/sys/vm/swappiness root@NSLU2:~# echo 10240 >/proc/sys/vm/min_free_kbytes root@NSLU2:~# free total used free shared buffers Mem: 257416 94104 163312 0 4624 Swap: 0 0 0 Total: 257416 94104 163312 root@NSLU2:~# memtester 250m memtester version 4.0.6 (32-bit) Copyright (C) 2006 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffff000 want 250MB (262144000 bytes) got 228MB (239128576 bytes), trying mlock ...locked. Loop 1: Stuck Address : ok Random Value : ok Compare XOR : ok Compare SUB : ok Compare MUL : ok Compare DIV : ok Compare OR : ok Compare AND : ok Sequential Increment: ok Solid Bits : ok Block Sequential : ok Checkerboard : ok Bit Spread : testing 17 root@NSLU2:~# root@NSLU2:~# root@NSLU2:~# man memtester No manual entry for memtester root@NSLU2:~# ipkg list | grep memtest ehci_hcd 0000:00:01.2: alloc_safe_buffer: could not alloc dma memory (size=28672) ehci_hcd 0000:00:01.2: map_single: unable to map unsafe buffer cddd0000! See above - the NSLU2 hung at that point. I was trying to find the man page for memtester (so I could make it run more quickly). My presumtion that the remarks per #14 suggest to not have swap turned on, it appears that the OOM / DMA bug still exists with no swap enabled. However, note that the initial amount of requested memory is still significantly less than 250MB (which is what I would expect). |