Bug 11315 - Memory errors and device failures in IPW2200
Summary: Memory errors and device failures in IPW2200
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Zhu Yi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-12 22:02 UTC by Andrej Podzimek
Modified: 2012-05-22 13:23 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.25.15
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Andrej Podzimek 2008-08-12 22:02:59 UTC
Latest working kernel version:

Dunno.

Earliest failing kernel version:

2.6.25.15

Distribution:

Archlinux

Hardware Environment:

Asus M2400N, Pentium M 1.6 GHz, 768 MB RAM, Intel IPW2915

Software Environment:

X.org, KDE... Nothing special.

Problem Description:

ipw2200/0: page allocation failure. order:3, mode:0x4020
Pid: 7450, comm: ipw2200/0 Not tainted 2.6.25.15-AP #1
 [<c015f289>] __alloc_pages+0x2b9/0x380
 [<c017b61c>] __slab_alloc+0x2fc/0x610
 [<c017b72a>] __slab_alloc+0x40a/0x610
 [<f038d8fb>] ipw_rx_queue_replenish+0x5b/0x100 [ipw2200]
 [<c017c708>] __kmalloc_track_caller+0xb8/0x100
 [<f038d8fb>] ipw_rx_queue_replenish+0x5b/0x100 [ipw2200]
 [<c0313e85>] __alloc_skb+0x55/0x120
 [<f038d8fb>] ipw_rx_queue_replenish+0x5b/0x100 [ipw2200]
 [<f038f4d0>] ipw_bg_rx_queue_replenish+0x0/0x40 [ipw2200]
 [<f038f4f6>] ipw_bg_rx_queue_replenish+0x26/0x40 [ipw2200]
 [<c012ebbc>] run_workqueue+0x6c/0x150
 [<c012f08f>] worker_thread+0x7f/0xe0
 [<c01324e0>] autoremove_wake_function+0x0/0x50
 [<c012f010>] worker_thread+0x0/0xe0
 [<c0132097>] kthread+0x37/0x70
 [<c0132060>] kthread+0x0/0x70
 [<c0104d13>] kernel_thread_helper+0x7/0x14
 =======================
Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:   3
Active:132552 inactive:16871 dirty:9 writeback:1 unstable:0
 free:1692 slab:7344 mapped:13890 pagetables:956 bounce:0
DMA free:2992kB min:72kB low:88kB high:108kB active:1988kB inactive:4176kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 737 737
Normal free:3776kB min:3436kB low:4292kB high:5152kB active:528220kB inactive:63308kB present:755144kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 36*4kB 12*8kB 12*16kB 10*32kB 13*64kB 11*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2992kB
Normal: 778*4kB 31*8kB 4*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3776kB
54636 total pagecache pages
Swap cache: add 171402, delete 147872, find 34791/42583
Free swap  = 1630068kB
Total swap = 2000084kB
Free swap:       1630068kB
194368 pages of RAM
0 pages of HIGHMEM
2728 reserved pages
111884 pages shared
23530 pages swap cached
9 pages dirty
1 pages writeback
13890 pages mapped
7344 pages slab
956 pages pagetables

Steps to reproduce:

Attach the card to an AP and run *huge* data transfers in *both* directions. Hundreds of these long messages appear in dmesg. Some of them end like this:

ipw2200: Firmware error detected.  Restarting.
ipw2200: Unable to load firmware: -12
ipw2200: Unable to load firmware: -12
ipw2200: Failed to up device

A total network failure follows. Re-modprobing the module can fix it. However, the firmware is a binary blackbox, AFAIK... So the memory allocation failure should be more interesting than the strange firmware failure.

Some error messages (very rarely) reported allocation failures in other processes too, but I'd guess there's less than 1% of them. One of those processes was hald-addon-input.

(BTW, I had a *similar* problem on a dual CPU server (IBM xSeries 330) with an Atheros WiFi card, with lots of strange page failures in vital kernel processes. (Dmesg messages suggested a reboot, which I did.) But the server uses a tainted kernel (reiquiring ath_hal and ath_pci to serve as an AP), so I didn't post the messages here.)
Comment 1 Andrej Podzimek 2008-08-12 23:59:23 UTC
Yet another backtrace.

ipw2200/0: page allocation failure. order:3, mode:0x4020
Pid: 7889, comm: ipw2200/0 Not tainted 2.6.25.15-AP #1
 [<c015f289>] __alloc_pages+0x2b9/0x380
 [<f0390272>] ipw_net_hard_start_xmit+0x402/0xd90 [ipw2200]
 [<c017b61c>] __slab_alloc+0x2fc/0x610
 [<c017b3bd>] __slab_alloc+0x9d/0x610
 [<f029552d>] ieee80211_alloc_txb+0x9d/0xf0 [ieee80211]
 [<c017c708>] __kmalloc_track_caller+0xb8/0x100
 [<f029552d>] ieee80211_alloc_txb+0x9d/0xf0 [ieee80211]
 [<c0313e85>] __alloc_skb+0x55/0x120
 [<f029552d>] ieee80211_alloc_txb+0x9d/0xf0 [ieee80211]
 [<f0295ce8>] ieee80211_xmit+0x4f8/0xdf0 [ieee80211]
 [<c0319a5a>] dev_hard_start_xmit+0x1fa/0x260
 [<c0327c69>] __qdisc_run+0x1d9/0x230
 [<c01278f5>] run_timer_softirq+0x1d5/0x210
 [<c0317eca>] net_tx_action+0xaa/0x100
 [<c012367a>] __do_softirq+0x4a/0x90
 [<c01236ed>] do_softirq+0x2d/0x40
 [<c0123875>] irq_exit+0x55/0x70
 [<c0111a6d>] smp_apic_timer_interrupt+0x3d/0x70
 [<c0104bc0>] apic_timer_interrupt+0x28/0x30
 [<c031007b>] sk_clone+0x8b/0x190
 [<f038d955>] ipw_rx_queue_replenish+0xb5/0x100 [ipw2200]
 [<f038f4d0>] ipw_bg_rx_queue_replenish+0x0/0x40 [ipw2200]
 [<f038f4f6>] ipw_bg_rx_queue_replenish+0x26/0x40 [ipw2200]
 [<c012ebbc>] run_workqueue+0x6c/0x150
 [<c012f08f>] worker_thread+0x7f/0xe0
 [<c01324e0>] autoremove_wake_function+0x0/0x50
 [<c012f010>] worker_thread+0x0/0xe0
 [<c0132097>] kthread+0x37/0x70
 [<c0132060>] kthread+0x0/0x70
 [<c0104d13>] kernel_thread_helper+0x7/0x14
 =======================
Mem-info:
DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 118
Active:130509 inactive:20398 dirty:9 writeback:1 unstable:0
 free:1462 slab:5918 mapped:14238 pagetables:972 bounce:0
DMA free:3004kB min:72kB low:88kB high:108kB active:5364kB inactive:1100kB present:16256kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 737 737
Normal free:2844kB min:3436kB low:4292kB high:5152kB active:516672kB inactive:80492kB present:755144kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 311*4kB 16*8kB 10*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 3004kB
Normal: 543*4kB 4*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2844kB
68881 total pagecache pages
Swap cache: add 203363, delete 178332, find 45971/56414
Free swap  = 1570508kB
Total swap = 2000084kB
Free swap:       1570508kB
194368 pages of RAM
0 pages of HIGHMEM
2728 reserved pages
125065 pages shared
25031 pages swap cached
9 pages dirty
1 pages writeback
14238 pages mapped
5918 pages slab
972 pages pagetables

The smp_apic stuff sounds weird on a uniprocessor...
Comment 2 Zhu Yi 2008-08-15 00:28:11 UTC
Can you confirm without ipw2200 the memory shortage doesn't happen?
Did you enable fragmentation (tell from iwconfig output) btw?
Comment 3 Andrej Podzimek 2008-08-15 00:57:30 UTC
The error only appears under *heavy* network traffic. That's a situation I cannot reproduce without ipw2200.

Fragmentation is off.
Comment 4 Zhu Yi 2008-08-25 01:45:50 UTC
Maybe you can try it on wired network? i.e connect to your AP with an ethernet cable?

Note You need to log in before you can comment on or make changes to this bug.