Bug 14440

Summary: Kernel Panic with lots of page allocation errors, related to the iwlagn driver
Product: Drivers Reporter: Detlev Casanova (detlev.casanova)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: RESOLVED DUPLICATE    
Severity: high CC: linville, reinette.chatre
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.31.2 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Log from metalog, up to the kernel panic.

Description Detlev Casanova 2009-10-19 21:43:34 UTC
Created attachment 23469 [details]
Log from metalog, up to the kernel panic.

Hi, I'm reporting a bug caused, apparently by multiple page allocation errors.
When using a KDE desktop, and scrolling in multiple PDFs, the computer starts slowing down. The hard drive seems to be used a lot even though I don't have any swap partition on it.

After some time, it nearly freezes. Sometimes it's recoverable (the application crashes or I'm able to Ctrl+Alt+Esc kill it) but sometimes, this leads to a kernel panic (I have been able to reproduce that only twice though).

I think, but I'm not sure at all, that the problem occurs after a suspend-to-ram.

The problem seems to be there since 2.6.30.

If you need more details, ask me.

lspci :
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
03:01.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
03:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
03:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
03:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
03:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)
09:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5784M Gigabit Ethernet PCIe (rev 10)
0c:00.0 Network controller: Intel Corporation Device 4232
Comment 1 John W. Linville 2009-10-19 21:47:17 UTC
Have you tried the latest linux-2.6 kernel?
Comment 2 Reinette Chatre 2009-10-20 04:57:36 UTC
(In reply to comment #0)
> Created an attachment (id=23469) [details]
> Log from metalog, up to the kernel panic.
> 

I do not see a kernel panic. There are several allocation failures that are pretty noisy when they occur - are these what you refer to? There are a few things going on here. The driver is behaving in a way that easily triggers some memory allocation problems. This problem is still being debugged upstream. For now, can you try http://lkml.org/lkml/2009/10/16/89 and commits 373c0a7e and 8aa7e847 reverted ?

At the same time there has been some work in the driver to make this problems not so easy to trigger. When you use latest from linux-2.6 you will get the two patches below (so if you are not interested in entire linux-2.6 then you can just backport these two patches). We are also in process of moving to paged skb. This work is not yet in linux-2.6. This work will reduce the amount of memory required per allocation from order-2 to order-1. To get this new work you can use compat-wireless. There is one patch needed that is not yet available to compat-wireless either. It is "iwlwifi: fix use after free bug for paged rx" from iwlwifi-2.6 repository.

Here are the patches you will get when using linux-2.6:

commit de0bd50845eb5935ce3d503c5d2f565d6cb9ece1
Author: Reinette Chatre <reinette.chatre@intel.com>
Date:   Fri Sep 11 10:38:12 2009 -0700

    iwlwifi: fix potential rx buffer loss


commit f82a924cc88a5541df1d4b9d38a0968cd077a051
Author: Reinette Chatre <reinette.chatre@intel.com>
Date:   Thu Sep 17 10:43:56 2009 -0700

    iwlwifi: reduce noise when skb allocation fails
Comment 3 Detlev Casanova 2009-10-20 09:07:09 UTC
The kernel panic is not in the log but by kernel panic I mean a system freeze and the shift lock led blinking (which seems to be a kernel panic).
I'm not totaly sure that this is the reason of the kernel panic but I don't see what else would be.

I'll try these patches tonight. Thanks for your fast answers.
Comment 4 Detlev Casanova 2009-10-20 13:55:36 UTC
I tried kernel 2.6.32-rc5. In fact, iwlwifi is quiet now.
But the problem is still there :
With 7 pdf opened, I got a freeze (but no kernel panic) with lots of activity on the hard drive.
I get lots of lines like those :

Oct 20 15:41:07 [kernel] [  225.415985] [drm:i915_gem_execbuffer] *ERROR* Failed to pin buffer 317 of 390, total 222339072 bytes: -28
Oct 20 15:41:07 [kernel] [  225.415989] [drm:i915_gem_execbuffer] *ERROR* 4868 objects [320 pinned], 730185728 object bytes [231391232 pinned], 231391232/260308992 gtt bytes

And I got this :

Oct 20 15:41:22 [kernel] [  240.326769] X invoked oom-killer: gfp_mask=0x0, order=0, oom_adj=0
Oct 20 15:41:22 [kernel] [  240.326774] X cpuset=/ mems_allowed=0
Oct 20 15:41:22 [kernel] [  240.326778] Pid: 5112, comm: X Not tainted 2.6.32-rc5 #1
Oct 20 15:41:22 [kernel] [  240.326780] Call Trace:
Oct 20 15:41:22 [kernel] [  240.326789]  [<ffffffff81079130>] ? cpuset_print_task_mems_allowed+0x8d/0x98
Oct 20 15:41:22 [kernel] [  240.326794]  [<ffffffff81099cac>] oom_kill_process+0x82/0x241
Oct 20 15:41:22 [kernel] [  240.326798]  [<ffffffff8105ce78>] ? ktime_get_ts+0xb1/0xbe
Oct 20 15:41:22 [kernel] [  240.326801]  [<ffffffff8109a320>] __out_of_memory+0x134/0x14b
Oct 20 15:41:22 [kernel] [  240.326804]  [<ffffffff8109a424>] pagefault_out_of_memory+0x55/0x7d
Oct 20 15:41:22 [kernel] [  240.326807]  [<ffffffff810266a6>] mm_fault_error+0x3b/0xf6
Oct 20 15:41:22 [kernel] [  240.326811]  [<ffffffff810af0a1>] ? handle_mm_fault+0x2e4/0x5b2
Oct 20 15:41:22 [kernel] [  240.326813]  [<ffffffff81026923>] do_page_fault+0x1c2/0x251
Oct 20 15:41:22 [kernel] [  240.326818]  [<ffffffff814fdd3f>] page_fault+0x1f/0x30
Oct 20 15:41:22 [kernel] [  240.326820] Mem-Info:
Oct 20 15:41:22 [kernel] [  240.326821] DMA per-cpu:
Oct 20 15:41:22 [kernel] [  240.326823] CPU    0: hi:    0, btch:   1 usd:   0
Oct 20 15:41:22 [kernel] [  240.326825] CPU    1: hi:    0, btch:   1 usd:   0
Oct 20 15:41:22 [kernel] [  240.326826] DMA32 per-cpu:
Oct 20 15:41:22 [kernel] [  240.326828] CPU    0: hi:  186, btch:  31 usd: 114
Oct 20 15:41:22 [kernel] [  240.326830] CPU    1: hi:  186, btch:  31 usd:  91
Oct 20 15:41:22 [kernel] [  240.326836] active_anon:403267 inactive_anon:20055 isolated_anon:0
Oct 20 15:41:22 [kernel] [  240.326837]  active_file:27912 inactive_file:26435 isolated_file:0
Oct 20 15:41:22 [kernel] [  240.326838]  unevictable:0 dirty:11 writeback:0 unstable:0 buffer:4965
Oct 20 15:41:22 [kernel] [  240.326838]  free:3422 slab_reclaimable:4574 slab_unreclaimable:7458
Oct 20 15:41:22 [kernel] [  240.326839]  mapped:32879 shmem:224227 pagetables:9702 bounce:0
Oct 20 15:41:22 [kernel] [  240.326846] DMA free:8012kB min:40kB low:48kB high:60kB active_anon:5172kB inactive_anon:1536kB active_file:80kB inactive_file:152kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15364kB mlocked:0kB dirty:0kB writeback:0kB mapped:52kB shmem:2752kB slab_reclaimable:8kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:32kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable? no
Oct 20 15:41:22 [kernel] [  240.326850] lowmem_reserve[]: 0 1994 1994 1994
Oct 20 15:41:22 [kernel] [  240.326858] DMA32 free:5676kB min:5692kB low:7112kB high:8536kB active_anon:1607768kB inactive_anon:78812kB active_file:111568kB inactive_file:105588kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2042580kB mlocked:0kB dirty:44kB writeback:0kB mapped:131464kB shmem:894156kB slab_reclaimable:18288kB slab_unreclaimable:29808kB kernel_stack:2240kB pagetables:38776kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:348 all_unreclaimable? no
Oct 20 15:41:22 [kernel] [  240.326863] lowmem_reserve[]: 0 0 0 0
Oct 20 15:41:22 [kernel] [  240.326866] DMA: 23*4kB 12*8kB 13*16kB 6*32kB 2*64kB 1*128kB 2*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB
Oct 20 15:41:22 [kernel] [  240.326873] DMA32: 91*4kB 96*8kB 26*16kB 7*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 5676kB
Oct 20 15:41:22 [kernel] [  240.326880] 278613 total pagecache pages
Oct 20 15:41:22 [kernel] [  240.326882] 0 pages in swap cache
Oct 20 15:41:22 [kernel] [  240.326883] Swap cache stats: add 0, delete 0, find 0/0
Oct 20 15:41:22 [kernel] [  240.326885] Free swap  = 0kB
Oct 20 15:41:22 [kernel] [  240.326886] Total swap = 0kB
Oct 20 15:41:22 [kernel] [  240.335655] 521820 pages RAM
Oct 20 15:41:22 [kernel] [  240.335657] 10340 pages reserved
Oct 20 15:41:22 [kernel] [  240.335658] 357434 pages shared
Oct 20 15:41:22 [kernel] [  240.335659] 406456 pages non-shared
Oct 20 15:41:22 [kernel] [  240.335663] Out of memory: kill process 6437 (kdeinit4) score 2628013 or a child
Oct 20 15:41:22 [kernel] [  240.335666] Killed process 6438 (klauncher)
Oct 20 15:41:57 [kernel] [  273.566020] okular invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Oct 20 15:42:10 [kernel] [  273.566028] okular cpuset=/ mems_allowed=0
Oct 20 15:42:43 [kernel] [  273.566033] Pid: 6640, comm: okular Not tainted 2.6.32-rc5 #1
[REBOOT]

I'm not sure it's still related to iwlagn.

Are 2 GB of memory not enough ? (I don't have a swap partition).
Comment 5 Reinette Chatre 2009-10-20 16:53:21 UTC
(In reply to comment #4)
> I tried kernel 2.6.32-rc5. In fact, iwlwifi is quiet now.

Great - thanks for testing.

Your logs below does show that you are having issues with memory allocation. It may be related to what is currently being discussed upstream (you can also follow in bug http://bugzilla.kernel.org/show_bug.cgi?id=14141 .

Since what you are seeing now is not related to wireless, perhaps John can reassign it to the default or perhaps somebody specific who will be able to figure out what is going on here?

I do not know if adding swap will help ... it is something to try.


> But the problem is still there :
> With 7 pdf opened, I got a freeze (but no kernel panic) with lots of activity
> on the hard drive.
> I get lots of lines like those :
> 
> Oct 20 15:41:07 [kernel] [  225.415985] [drm:i915_gem_execbuffer] *ERROR*
> Failed to pin buffer 317 of 390, total 222339072 bytes: -28
> Oct 20 15:41:07 [kernel] [  225.415989] [drm:i915_gem_execbuffer] *ERROR*
> 4868
> objects [320 pinned], 730185728 object bytes [231391232 pinned],
> 231391232/260308992 gtt bytes
> 
> And I got this :
> 
> Oct 20 15:41:22 [kernel] [  240.326769] X invoked oom-killer: gfp_mask=0x0,
> order=0, oom_adj=0
> Oct 20 15:41:22 [kernel] [  240.326774] X cpuset=/ mems_allowed=0
> Oct 20 15:41:22 [kernel] [  240.326778] Pid: 5112, comm: X Not tainted
> 2.6.32-rc5 #1
> Oct 20 15:41:22 [kernel] [  240.326780] Call Trace:
> Oct 20 15:41:22 [kernel] [  240.326789]  [<ffffffff81079130>] ?
> cpuset_print_task_mems_allowed+0x8d/0x98
> Oct 20 15:41:22 [kernel] [  240.326794]  [<ffffffff81099cac>]
> oom_kill_process+0x82/0x241
> Oct 20 15:41:22 [kernel] [  240.326798]  [<ffffffff8105ce78>] ?
> ktime_get_ts+0xb1/0xbe
> Oct 20 15:41:22 [kernel] [  240.326801]  [<ffffffff8109a320>]
> __out_of_memory+0x134/0x14b
> Oct 20 15:41:22 [kernel] [  240.326804]  [<ffffffff8109a424>]
> pagefault_out_of_memory+0x55/0x7d
> Oct 20 15:41:22 [kernel] [  240.326807]  [<ffffffff810266a6>]
> mm_fault_error+0x3b/0xf6
> Oct 20 15:41:22 [kernel] [  240.326811]  [<ffffffff810af0a1>] ?
> handle_mm_fault+0x2e4/0x5b2
> Oct 20 15:41:22 [kernel] [  240.326813]  [<ffffffff81026923>]
> do_page_fault+0x1c2/0x251
> Oct 20 15:41:22 [kernel] [  240.326818]  [<ffffffff814fdd3f>]
> page_fault+0x1f/0x30
> Oct 20 15:41:22 [kernel] [  240.326820] Mem-Info:
> Oct 20 15:41:22 [kernel] [  240.326821] DMA per-cpu:
> Oct 20 15:41:22 [kernel] [  240.326823] CPU    0: hi:    0, btch:   1 usd:  
> 0
> Oct 20 15:41:22 [kernel] [  240.326825] CPU    1: hi:    0, btch:   1 usd:  
> 0
> Oct 20 15:41:22 [kernel] [  240.326826] DMA32 per-cpu:
> Oct 20 15:41:22 [kernel] [  240.326828] CPU    0: hi:  186, btch:  31 usd:
> 114
> Oct 20 15:41:22 [kernel] [  240.326830] CPU    1: hi:  186, btch:  31 usd: 
> 91
> Oct 20 15:41:22 [kernel] [  240.326836] active_anon:403267
> inactive_anon:20055
> isolated_anon:0
> Oct 20 15:41:22 [kernel] [  240.326837]  active_file:27912
> inactive_file:26435
> isolated_file:0
> Oct 20 15:41:22 [kernel] [  240.326838]  unevictable:0 dirty:11 writeback:0
> unstable:0 buffer:4965
> Oct 20 15:41:22 [kernel] [  240.326838]  free:3422 slab_reclaimable:4574
> slab_unreclaimable:7458
> Oct 20 15:41:22 [kernel] [  240.326839]  mapped:32879 shmem:224227
> pagetables:9702 bounce:0
> Oct 20 15:41:22 [kernel] [  240.326846] DMA free:8012kB min:40kB low:48kB
> high:60kB active_anon:5172kB inactive_anon:1536kB active_file:80kB
> inactive_file:152kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:15364kB mlocked:0kB dirty:0kB writeback:0kB mapped:52kB shmem:2752kB
> slab_reclaimable:8kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:32kB
> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:32 all_unreclaimable?
> no
> Oct 20 15:41:22 [kernel] [  240.326850] lowmem_reserve[]: 0 1994 1994 1994
> Oct 20 15:41:22 [kernel] [  240.326858] DMA32 free:5676kB min:5692kB
> low:7112kB
> high:8536kB active_anon:1607768kB inactive_anon:78812kB active_file:111568kB
> inactive_file:105588kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
> present:2042580kB mlocked:0kB dirty:44kB writeback:0kB mapped:131464kB
> shmem:894156kB slab_reclaimable:18288kB slab_unreclaimable:29808kB
> kernel_stack:2240kB pagetables:38776kB unstable:0kB bounce:0kB
> writeback_tmp:0kB pages_scanned:348 all_unreclaimable? no
> Oct 20 15:41:22 [kernel] [  240.326863] lowmem_reserve[]: 0 0 0 0
> Oct 20 15:41:22 [kernel] [  240.326866] DMA: 23*4kB 12*8kB 13*16kB 6*32kB
> 2*64kB 1*128kB 2*256kB 3*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB
> Oct 20 15:41:22 [kernel] [  240.326873] DMA32: 91*4kB 96*8kB 26*16kB 7*32kB
> 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 5676kB
> Oct 20 15:41:22 [kernel] [  240.326880] 278613 total pagecache pages
> Oct 20 15:41:22 [kernel] [  240.326882] 0 pages in swap cache
> Oct 20 15:41:22 [kernel] [  240.326883] Swap cache stats: add 0, delete 0,
> find
> 0/0
> Oct 20 15:41:22 [kernel] [  240.326885] Free swap  = 0kB
> Oct 20 15:41:22 [kernel] [  240.326886] Total swap = 0kB
> Oct 20 15:41:22 [kernel] [  240.335655] 521820 pages RAM
> Oct 20 15:41:22 [kernel] [  240.335657] 10340 pages reserved
> Oct 20 15:41:22 [kernel] [  240.335658] 357434 pages shared
> Oct 20 15:41:22 [kernel] [  240.335659] 406456 pages non-shared
> Oct 20 15:41:22 [kernel] [  240.335663] Out of memory: kill process 6437
> (kdeinit4) score 2628013 or a child
> Oct 20 15:41:22 [kernel] [  240.335666] Killed process 6438 (klauncher)
> Oct 20 15:41:57 [kernel] [  273.566020] okular invoked oom-killer:
> gfp_mask=0x201da, order=0, oom_adj=0
> Oct 20 15:42:10 [kernel] [  273.566028] okular cpuset=/ mems_allowed=0
> Oct 20 15:42:43 [kernel] [  273.566033] Pid: 6640, comm: okular Not tainted
> 2.6.32-rc5 #1
> [REBOOT]
> 
> I'm not sure it's still related to iwlagn.
> 
> Are 2 GB of memory not enough ? (I don't have a swap partition).
Comment 6 Detlev Casanova 2009-10-20 19:58:42 UTC
Well, I had the problem with swap activated, I deactivated it because I thought  that it was the problem, obviously not.
Comment 7 John W. Linville 2009-10-26 19:33:26 UTC

*** This bug has been marked as a duplicate of bug 14141 ***