Bug 15680

Summary: kswapd NULL pointer dereference
Product: Memory Management Reporter: Steinar H. Gunderson (steinar+kernel)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: RESOLVED UNREPRODUCIBLE    
Severity: normal CC: alan, shurik.morozov
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc2 Subsystem:
Regression: No Bisected commit-id:

Description Steinar H. Gunderson 2010-04-02 13:01:00 UTC
Hi,

My server, running 2.6.34-rc2 for the occasion, suddenly got a _ton_ of load compared to what it usually does (including a few VLC processes that gobble up several hundred GB of address space due to some bug -- that might be related), and suddenly gave me:

[584163.116507] BUG: unable to handle kernel NULL pointer dereference at (null)
[584163.117456] IP: [<ffffffff810af4ea>] page_referenced+0xef/0x1d5
[584163.117456] PGD 0 
[584163.117456] Oops: 0000 [#1] SMP 
[584163.117456] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
[584163.117456] CPU 0 
[584163.117456] Modules linked in: ipt_REJECT iptable_filter ip_tables af_packet tun ext2 ext4 jbd2 crc16 coretemp w83627ehf hwmon_vid psmouse ide_generic ide_gd_mod ide_cd_mod cdrom forcedeth i2c_i801 pcspkr i2c_core rtc_cmos rtc_core rtc_lib evdev ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod usbhid raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 md_mod ide_pci_generic ide_core e1000e uhci_hcd ehci_hcd sd_mod unix [last unloaded: scsi_wait_scan]
[584163.117456] 
[584163.117456] Pid: 320, comm: kswapd0 Not tainted 2.6.34-rc2 #2 C2SBC-Q/C2SBC-Q
[584163.117456] RIP: 0010:[<ffffffff810af4ea>]  [<ffffffff810af4ea>] page_referenced+0xef/0x1d5
[584163.117456] RSP: 0018:ffff88023fe6dc20  EFLAGS: 00010206
[584163.117456] RAX: ffff880169111fc8 RBX: ffffffffffffffe0 RCX: ffff8801f3fa6080
[584163.117456] RDX: ffff880169111fc1 RSI: 0000000000000000 RDI: ffff880169111fc0
[584163.117456] RBP: ffff88023fe6dca0 R08: 0000000000000020 R09: ffff880215c390c0
[584163.117456] R10: ffffffff814afee8 R11: ffffffff814afde8 R12: ffffea0004979968
[584163.117456] R13: 0000000000000000 R14: ffff880169111fc0 R15: ffff88023fe6dd40
[584163.117456] FS:  0000000000000000(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
[584163.117456] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[584163.117456] CR2: 0000000000000000 CR3: 00000000014ee000 CR4: 00000000000006f0
[584163.117456] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[584163.117456] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[584163.117456] Process kswapd0 (pid: 320, threadinfo ffff88023fe6c000, task ffff88023fe34380)
[584163.117456] Stack:
[584163.117456]  0000000000000001 ffff880169111fc8 ffff88023402c240 0000000000002000
[584163.117456] <0> 0000000000000000 0000000000000000 ffff88023a940538 00000000000011e2
[584163.117456] <0> 00000000000011e2 00000014fffffffe ffff88023fe6dca0 ffffea0004979968
[584163.117456] Call Trace:
[584163.117456]  [<ffffffff8109a511>] shrink_active_list+0x1be/0x289
[584163.117456]  [<ffffffff81306087>] ? schedule+0x7a0/0x867
[584163.117456]  [<ffffffff8109bbfc>] kswapd+0x41d/0x865
[584163.117456]  [<ffffffff8109972e>] ? isolate_pages_global+0x0/0x1f2
[584163.117456]  [<ffffffff8104e6ae>] ? autoremove_wake_function+0x0/0x38
[584163.117456]  [<ffffffff8109b7df>] ? kswapd+0x0/0x865
[584163.117456]  [<ffffffff8104e252>] kthread+0x7d/0x85
[584163.117456]  [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10
[584163.117456]  [<ffffffff8104e1d5>] ? kthread+0x0/0x85
[584163.117456]  [<ffffffff81002cd0>] ? kernel_thread_helper+0x0/0x10
[584163.117456] Code: 3b 56 10 73 1e 48 83 fa f2 74 18 4d 89 f8 48 8d 4d cc 4c 89 e7 e8 44 f2 ff ff 41 01 c5 83 7d cc 00 74 19 48 8b 43 20 48 8d 58 e0 <48> 8b 43 20 0f 18 08 48 8d 43 20 48 39 45 88 75 a7 41 fe 06 e9 
[584163.117456] RIP  [<ffffffff810af4ea>] page_referenced+0xef/0x1d5
[584163.117456]  RSP <ffff88023fe6dc20>
[584163.117456] CR2: 0000000000000000
[584163.418575] ---[ end trace 689f7702fb2ed439 ]---

I haven't seen it before, and it's only happened once so far.
Comment 1 Alex Montana 2011-03-03 11:16:28 UTC
Hi,

I've got almost the same issue with 2.6.35.7 kernel.
The machine simply died with no warnings, no load and after the reboot I found this:

//=====================================================
Mar  3 03:49:30  kernel: BUG: unable to handle kernel NULL pointer dereference at (nil)
Mar  3 03:49:30  kernel: IP: [<ffffffff811939ac>] 
Mar  3 03:49:30  kernel: PGD 205131067 PUD 141261067 PMD 0 
Mar  3 03:49:30  kernel: Oops: 0000 [#1] SMP 
Mar  3 03:49:30  kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Mar  3 03:49:30  kernel: CPU 0 
Mar  3 03:49:30  kernel: Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_recent xt_owner xt_conntrack iptable_mangle ipt_REJECT ipt_LOG xt_limit xt_multiport xt_state iptable_filter ip_tables ipv6 dm_mirror dm_multipath dm_region_hash dm_log shpchp ohci_hcd
Mar  3 03:49:30  kernel: 
Mar  3 03:49:30  kernel: Pid: 500, comm: kswapd0 Not tainted 2.6.35.7-grsec #1 X8DTL/X8DTL
Mar  3 03:49:30  kernel: RIP: 0010:[<ffffffff811939ac>]  [<ffffffff811939ac>] 
Mar  3 03:49:30  kernel: RSP: 0018:ffff88023e863d30  EFLAGS: 00010297
Mar  3 03:49:30  kernel: RAX: ffff880074775670 RBX: ffffffffffffff18 RCX: ffff880074775660
Mar  3 03:49:30  kernel: RDX: ffff880074775670 RSI: ffff880074775658 RDI: ffff88007477579c
Mar  3 03:49:30  kernel: RBP: ffff88023e863d70 R08: 0000000000000000 R09: 0000000000000000
Mar  3 03:49:30  kernel: R10: 0000000000000000 R11: 00000000ffffff02 R12: ffff880074775670
Mar  3 03:49:30  kernel: R13: ffff8800747756f0 R14: ffff880074775660 R15: 0000000000000012
Mar  3 03:49:30  kernel: FS:  0000000000000000(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000
Mar  3 03:49:30  kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar  3 03:49:30  kernel: CR2: 0000000000000000 CR3: 0000000001841000 CR4: 00000000000006f0
Mar  3 03:49:30  kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  3 03:49:30  kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar  3 03:49:30  kernel: Process kswapd0 (pid: 500, threadinfo ffff88023e862000, task ffff88023f459900)
Mar  3 03:49:30  kernel: Stack:
Mar  3 03:49:30  kernel:  ffff8801d805e298 ffff8801d805e298 ffff88023e863d40 0000000000004bc8
Mar  3 03:49:30  kernel: <0> ffffffff818dffe0 0000000000000081 00000000000000d0 0000000000000000
Mar  3 03:49:30  kernel: <0> ffff88023e863dc0 ffffffff8109bcf5 ffff88023e863dc0 000000000018ecb6
Mar  3 03:49:30  kernel: Call Trace:
Mar  3 03:49:30  kernel:  [<ffffffff8109bcf5>] 
Mar  3 03:49:30  kernel:  [<ffffffff8109c780>] 
Mar  3 03:49:30  kernel:  [<ffffffff81055dc0>] ? 
Mar  3 03:49:30  kernel:  [<ffffffff8109c3b0>] ? 
Mar  3 03:49:30  kernel:  [<ffffffff810559ee>] 
Mar  3 03:49:30  kernel:  [<ffffffff81003b94>] 
Mar  3 03:49:30  kernel:  [<ffffffff81492911>] ? 
Mar  3 03:49:30  kernel:  [<ffffffff81055960>] ? 
Mar  3 03:49:30  kernel:  [<ffffffff81003b90>] ? 
Mar  3 03:49:30  kernel: Code: 89 10 4d 89 64 24 08 4c 89 a3 e8 00 00 00 f0 80 a3 90 00 00 00 fb 41 fe 85 ac 00 00 00 48 8b 9b e8 00 00 00 48 81 eb e8 00 00 00 <48> 8b 83 e8 00 00 00 4c 8d a3 e8 00 00 00 0f 18 08 49 81 fc d0 
Mar  3 03:49:30  kernel: RIP  [<ffffffff811939ac>] 
Mar  3 03:49:30  kernel:  RSP <ffff88023e863d30>
Mar  3 03:49:30  kernel: CR2: 0000000000000000
Mar  3 03:49:30  kernel: ---[ end trace 09c41ba9fa71fb72 ]---
Mar  3 03:49:31  kernel: swap_free: Bad swap offset entry 00800000
Mar  3 03:49:31  kernel: BUG: Bad page map in process httpd  pte:100000000 pmd:74775067
Mar  3 03:49:31 cx96 kernel: addr:0000000002af3000 vm_flags:00100073 anon_vma:ffff88006e09cc00 mapping:(nil) index:2af3
Mar  3 03:49:31 kernel: Pid: 3970, comm: httpd Tainted: G      D     2.6.35.7-grsec #1
Mar  3 03:49:31 kernel: Call Trace:
Mar  3 03:49:31 kernel:  [<ffffffff810a492b>] 
Mar  3 03:49:31 kernel:  [<ffffffff810a5bb5>] 
Mar  3 03:49:31 kernel:  [<ffffffff810aacc2>] 
Mar  3 03:49:31 kernel:  [<ffffffff810396f6>] 
Mar  3 03:49:31 kernel:  [<ffffffff810cb516>] 
Mar  3 03:49:31 kernel:  [<ffffffff8110de18>] 
Mar  3 03:49:31 kernel:  [<ffffffff81095336>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff8102a7db>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810989e4>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810afc22>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810989e4>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810a6afe>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810a5442>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810a6e48>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff811cb915>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff81203d61>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff8110b380>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810a7184>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810ca08f>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810bdfe9>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810ca009>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810ca425>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff810ca644>] 
Mar  3 03:49:31 kernel:  [<ffffffff810cc58e>] 
Mar  3 03:49:31 kernel:  [<ffffffff81207f3a>] ? 
Mar  3 03:49:31 kernel:  [<ffffffff8100b4c9>] 
Mar  3 03:49:31 kernel:  [<ffffffff810031ea>] 
//=====================================================

It appears to happen intermittently and I can't find a reason (this error has happened two times for the last 10 days).
Comment 2 Alan 2012-06-18 16:30:09 UTC
Not much we can do with this data versus the old kernel alas, so closing