Bug 5229

Summary: error with 2.6.13-mm3
Product: Memory Management Reporter: Danny ter Haar (osdl)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED CODE_FIX    
Severity: high CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.13-mm3 Subsystem:
Regression: --- Bisected commit-id:

Description Danny ter Haar 2005-09-12 06:06:22 UTC
Most recent kernel where this bug did not occur:2.6.13.1
Distribution: debian-amd64
Hardware Environment:tyan athlon SMP machine with 8x scsi & 2 gig-E
Software Environment: usenet gateway
Problem Description:
Sep 12 14:45:05 newsgate kernel: swapper: page allocation failure. order:1,
mode:0x80000020
Sep 12 14:45:05 newsgate kernel:
Sep 12 14:45:05 newsgate kernel: Call Trace: <IRQ>
<ffffffff80155777>{__alloc_pages+1031} <ffffffff802fe7a8>{qdisc_restart+40}
Sep 12 14:45:05 newsgate kernel:       
<ffffffff80157d90>{cache_alloc_refill+656} <ffffffff80158054>{__kmalloc+100}
Sep 12 14:45:05 newsgate kernel:        <ffffffff802eb304>{__alloc_skb+116}
<ffffffff80318469>{tcp_collapse+313}
Sep 12 14:45:05 newsgate kernel:        <ffffffff8030f0a8>{ip_queue_xmit+1128}
<ffffffff80318910>{tcp_prune_queue+592}
Sep 12 14:45:05 newsgate kernel:        <ffffffff8031adcc>{tcp_data_queue+540}
<ffffffff8031d289>{tcp_rcv_established+1929}
Sep 12 14:45:05 newsgate kernel:        <ffffffff80323f40>{tcp_v4_do_rcv+48}
<ffffffff8032543a>{tcp_v4_rcv+2282}
Sep 12 14:45:05 newsgate kernel:        <ffffffff80309fc8>{ip_local_deliver+392}
<ffffffff8030a4de>{ip_rcv+1118}
Sep 12 14:45:05 newsgate kernel:        <ffffffff802f2672>{process_backlog+146}
<ffffffff802f10db>{net_rx_action+139}
Sep 12 14:45:05 newsgate kernel:        <ffffffff80133fef>{__do_softirq+79}
<ffffffff8010e853>{call_softirq+31}
Sep 12 14:45:05 newsgate kernel:        <ffffffff8010fe2c>{do_softirq+44}
<ffffffff8010fe74>{do_IRQ+52}
Sep 12 14:45:05 newsgate kernel:        <ffffffff8010deec>{ret_from_intr+0} 
<EOI> <ffffffff803130c0>{tcp_poll+0}
Sep 12 14:45:05 newsgate kernel:        <ffffffff8010c340>{default_idle+0}
<ffffffff8010c362>{default_idle+34}
Sep 12 14:45:05 newsgate kernel:        <ffffffff8010c3a1>{cpu_idle+49}
<ffffffff8046279f>{start_kernel+399}
Sep 12 14:45:05 newsgate kernel:        <ffffffff804621f4>{_sinittext+500}
Sep 12 14:45:05 newsgate kernel: Mem-info:
Sep 12 14:45:05 newsgate kernel: DMA per-cpu:
Sep 12 14:45:05 newsgate kernel: cpu 0 hot: low 0, high 12, batch 2 used:10
Sep 12 14:45:05 newsgate kernel: cpu 0 cold: low 0, high 4, batch 1 used:0
Sep 12 14:45:05 newsgate kernel: DMA32 per-cpu:
Sep 12 14:45:05 newsgate kernel: cpu 0 hot: low 0, high 384, batch 64 used:3
Sep 12 14:45:05 newsgate kernel: cpu 0 cold: low 0, high 128, batch 32 used:31
Sep 12 14:45:05 newsgate kernel: Normal per-cpu: empty
Sep 12 14:45:05 newsgate kernel: HighMem per-cpu: empty
Sep 12 14:45:05 newsgate kernel: Free pages:       30200kB (0kB HighMem)
Sep 12 14:45:05 newsgate kernel: Active:174282 inactive:711121 dirty:102251
writeback:0 unstable:0 free:7550 slab:119354 mapped:56923 pagetables:928
Sep 12 14:45:05 newsgate kernel: DMA free:12464kB min:24kB low:28kB high:36kB
active:0kB inactive:0kB present:12124kB pages_scanned:968 all_unreclaimable? yes
Sep 12 14:45:05 newsgate kernel: lowmem_reserve[]: 0 3961 3961 3961
Sep 12 14:45:05 newsgate kernel: DMA32 free:17736kB min:8036kB low:10044kB
high:12052kB active:697128kB inactive:2844484kB present:4056100kB
pages_scanned:0 all_unreclaimable? no
Sep 12 14:45:05 newsgate kernel: lowmem_reserve[]: 0 0 0 0
Sep 12 14:45:05 newsgate kernel: Normal free:0kB min:0kB low:0kB high:0kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Sep 12 14:45:05 newsgate kernel: lowmem_reserve[]: 0 0 0 0
Sep 12 14:45:05 newsgate kernel: DMA: 4*4kB 6*8kB 5*16kB 5*32kB 4*64kB 1*128kB
2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12464kB
Sep 12 14:45:05 newsgate kernel: DMA32: 4140*4kB 147*8kB 0*16kB 0*32kB 0*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17736kB
Sep 12 14:45:05 newsgate kernel: Normal: empty
Sep 12 14:45:05 newsgate kernel: HighMem: empty
Sep 12 14:45:05 newsgate kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0
Sep 12 14:45:05 newsgate kernel: Free swap  = 0kB
Sep 12 14:45:05 newsgate kernel: Total swap = 0kB
Sep 12 14:45:05 newsgate kernel: Free swap:            0kB
Sep 12 14:45:05 newsgate kernel: 1032176 pages of RAM
Sep 12 14:45:05 newsgate kernel: 16670 reserved pages
Sep 12 14:45:05 newsgate kernel: 645043 pages shared
Sep 12 14:45:05 newsgate kernel: 0 pages swap cached

Steps to reproduce:
Comment 1 Danny ter Haar 2005-09-12 06:08:08 UTC
config file & full kern.log available at:
http://newsgate.newsserver.nl/kernel/

Danny
Comment 2 Andrew Morton 2005-09-12 12:41:18 UTC
Can you please determine whether reverting

mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch
mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk-fix.patch

fixes this?
Comment 3 Anonymous Emailer 2005-09-12 13:49:37 UTC
Reply-To: dth@dth.net

Quoting bugme-daemon@kernel-bugs.osdl.org (bugme-daemon@kernel-bugs.osdl.org):
> Can you please determine whether reverting
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk.patch
> mm-try-to-allocate-higher-order-pages-in-rmqueue_bulk-fix.patch
> fixes this?

compiling...

Comment 4 Danny ter Haar 2005-09-12 14:22:56 UTC
I no longer see those errors if i revert those 2 patches.
Testing if everything is/was working i did find out that ping times to other
server connected by gig-E (cupper) is not what i suspect of it:

newsgate:~# ping spool1.int.newsserver.nl
PING spool1.int.newsserver.nl (192.168.30.29) 56(84) bytes of data.
64 bytes from spool1.int.newsserver.nl (192.168.30.29): icmp_seq=1 ttl=64
time=4294967296000 ms
64 bytes from spool1.int.newsserver.nl (192.168.30.29): icmp_seq=2 ttl=64
time=4294967296000 ms
64 bytes from spool1.int.newsserver.nl (192.168.30.29): icmp_seq=3 ttl=64
time=4294967296000 ms
64 bytes from spool1.int.newsserver.nl (192.168.30.29): icmp_seq=4 ttl=64
time=4294967296000 ms

--- spool1.int.newsserver.nl ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev =
4294967296000.146/4294967296000.217/4294967296000.362/46340.950 ms

I honestly don't know which kernel didn't do this as i've never tested/seen it
before.

This machine IS SMP ,
 Processor #0 15:5 APIC version 16
 ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
 Processor #1 15:5 APIC version 16
 WARNING: NR_CPUS limit of 1 reached. Processor ignored.

But when i use 2 processor's the performance drops _under_ UP usage !?
It think the cpu's are fighting for the IO IRQ's (2 x scsi controller & 
2 x gig-E ethernet controllers)

Any ideas/suggestions about this ?
Comment 5 Andrew Morton 2005-09-12 14:41:16 UTC
OK, thanks, I guess I'll drop those mm patches.

wrt the negative ping times: don't know. Does
2.6.13 do that as well?
Comment 6 Adrian Bunk 2006-02-13 13:19:40 UTC
Andrew, can we close this bug?