Bug 51291

Summary: [-next-20121129] GP fault in mm/slub:get_freepointer()
Product: Memory Management Reporter: Peter Hurley (peter)
Component: Slab AllocatorAssignee: Pekka Enberg (penberg)
Status: NEW ---    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: -next-20121129 Subsystem:
Regression: No Bisected commit-id:
Attachments: kernel log of gp fault
listing file of mm/slub.c
hand-annotated mm/slub.lst
kernel log of 2nd event triggering BUG in mm/slub.c

Description Peter Hurley 2012-12-04 15:49:04 UTC
Created attachment 88411 [details]
kernel log of gp fault

Page allocator failure produced general protection fault in SLUB version of __kmalloc_node_track_caller().

Attached is the log and mm/slub.lst for the kernel I was running at the time.

peter@thor:~/src/kernels/next$ grep SLUB .config
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set

Not sure I got my analysis correct but it looks like r12 has some status bits set.

static inline void *get_freepointer(struct kmem_cache *s, void *object)
{
	return *(void **)(object + s->offset);
    6e6c:	49 8b 1c 04          	mov    (%r12,%rax,1),%rbx           <======= faulting instruction
		 * 3. If they were not changed replace tid and freelist
		 *
		 * Since this is without lock semantics the protection is only against
		 * code executing on this cpu *not* from access by other cpus.
		 */
		if (unlikely(!this_cpu_cmpxchg_double(
    6e70:	4c 89 e0             	mov    %r12,%rax
    6e73:	48 8d 37             	lea    (%rdi),%rsi
    6e76:	e8 00 00 00 00       	callq  6e7b <__kmalloc_node_track_caller+0x16b>
			6e77: R_X86_64_PC32	this_cpu_cmpxchg16b_emu-0x4
    6e7b:	84 c0                	test   %al,%al
    6e7d:	0f 84 d2 fe ff ff    	je     6d55 <__kmalloc_node_track_caller+0x45>
	return *(void **)(object + s->offset);
}
Comment 1 Peter Hurley 2012-12-04 15:50:50 UTC
Created attachment 88421 [details]
listing file of mm/slub.c
Comment 2 Andrew Morton 2012-12-04 19:50:27 UTC
hm, I can't spot it.  Reassigned to Pekka...

Dec  4 09:38:25 thor kernel: [322576.747744] general protection fault: 0000 [#1] PREEMPT SMP 
Dec  4 09:38:25 thor kernel: [322576.747769] Modules linked in: netconsole configfs bnep rfcomm ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc binfmt_misc snd_ctxfi snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event i5400_edac snd_seq snd_timer snd_seq_device coretemp edac_core kvm_intel psmouse btusb snd joydev hid_logitech_dj mac_hid i5k_amb firewire_serial(C) bluetooth kvm gpio_ich soundcore hid_generic serio_raw snd_page_alloc lpc_ich shpchp ppdev dcdbas microcode parport_pc ext2 lp parport dm_mirror dm_region_hash dm_log usbhid hid usb_storage tg3 firewire_ohci firewire_core crc_itu_t nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi
Dec  4 09:38:25 thor kernel: [322576.748955] CPU 7 
Dec  4 09:38:25 thor kernel: [322576.748960] Pid: 1426, comm: Xorg Tainted: G        WC   3.7.0-next-20121129-ge83124f-xeon #ge83124f Dell Inc. Precision WorkStation T5400  /0RW203
Dec  4 09:38:25 thor kernel: [322576.749899] RIP: 0010:[<ffffffff811885ec>]  [<ffffffff811885ec>] __kmalloc_node_track_caller+0x15c/0x1f0
Dec  4 09:38:25 thor kernel: [322576.750383] RSP: 0018:ffff8802a0f7fa08  EFLAGS: 00010246
Dec  4 09:38:25 thor kernel: [322576.750864] RAX: 0000000000000000 RBX: 00000000000004d0 RCX: 000000000b30ddef
Dec  4 09:38:25 thor kernel: [322576.751354] RDX: 000000000b30dde7 RSI: 0000000000000000 RDI: 0000000000016dc0
Dec  4 09:38:25 thor kernel: [322576.751683] RBP: ffff8802a0f7fa68 R08: ffff8802a0f7fae7 R09: 0000000000000001
Dec  4 09:38:25 thor kernel: [322576.751683] R10: 0000000000000000 R11: 0000000000003246 R12: 040000e86c5a001c
Dec  4 09:38:25 thor kernel: [322576.751683] R13: ffff8802a5002a00 R14: 00000000ffffffff R15: 0000000000000200
Dec  4 09:38:25 thor kernel: [322576.751683] FS:  00007f6fd67138a0(0000) GS:ffff8802afdc0000(0000) knlGS:0000000000000000
Dec  4 09:38:25 thor kernel: [322576.751683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  4 09:38:25 thor kernel: [322576.751683] CR2: 00007f7541ba2000 CR3: 00000002a08ff000 CR4: 00000000000027f0
Dec  4 09:38:25 thor kernel: [322576.751683] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  4 09:38:25 thor kernel: [322576.751683] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  4 09:38:25 thor kernel: [322576.751683] Process Xorg (pid: 1426, threadinfo ffff8802a0f7e000, task ffff88029ebf2e20)
Dec  4 09:38:25 thor kernel: [322576.751683] Stack:
Dec  4 09:38:25 thor kernel: [322576.751683]  ffff8802a0f7fa28 ffffffff816f7fcc 0000000000000082 ffff88028382c530
Dec  4 09:38:25 thor kernel: [322576.751683]  ffffffff815d64eb 000106d0810912dc ffff8802a0f7fa88 00000000000004d0
Dec  4 09:38:25 thor kernel: [322576.751683]  00000000ffffffff 0000000000000200 ffff8802a0f7fae7 0000000000000200
Dec  4 09:38:25 thor kernel: [322576.751683] Call Trace:
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff816f7fcc>] ? _raw_spin_unlock_irqrestore+0x2c/0x50
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815d64eb>] ? __alloc_skb+0x8b/0x2a0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815d598b>] __kmalloc_reserve+0x3b/0xa0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815d64eb>] __alloc_skb+0x8b/0x2a0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815ce90a>] sock_alloc_send_pskb+0x11a/0x2a0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815ceaa5>] sock_alloc_send_skb+0x15/0x20
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff816767f0>] unix_stream_sendmsg+0x2f0/0x430
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815c9f72>] do_sock_write.isra.16+0xc2/0xe0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815c9f90>] ? do_sock_write.isra.16+0xe0/0xe0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff815c9ff3>] sock_aio_write+0x63/0x90
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff811a6520>] ? __pollwait+0xf0/0xf0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff81194333>] do_sync_readv_writev+0xa3/0xe0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff811945fd>] do_readv_writev+0xcd/0x1d0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff8119473c>] vfs_writev+0x3c/0x50
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff811948b2>] sys_writev+0x52/0xc0
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff816f92c5>] ? do_device_not_available+0x15/0x20
Dec  4 09:38:25 thor kernel: [322576.751683]  [<ffffffff81700019>] system_call_fastpath+0x16/0x1b
Dec  4 09:38:25 thor kernel: [322576.751683] Code: ff ff 01 48 8b 80 38 e0 ff ff a8 08 0f 84 6a ff ff ff e8 48 e5 56 00 e9 60 ff ff ff 0f 1f 00 49 63 45 20 48 8d 4a 08 49 8b 7d 00 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 d2 fe 
Dec  4 09:38:25 thor kernel: [322576.751683] RIP  [<ffffffff811885ec>] __kmalloc_node_track_caller+0x15c/0x1f0
Dec  4 09:38:25 thor kernel: [322576.751683]  RSP <ffff8802a0f7fa08>
Comment 3 Peter Hurley 2012-12-05 16:40:01 UTC
Well, this fault comes from a free list corruption.

I've attached a hand-annotated extract of the relevant code in mm/slub.lst

My premise now is that some OOM error path does a write-after-free. Back to this-could-be-anywhere-in-kernel bug... although modules caught on the other cpus go to the head of the suspect list.
Comment 4 Peter Hurley 2012-12-05 16:41:09 UTC
Created attachment 88511 [details]
hand-annotated mm/slub.lst
Comment 5 Peter Hurley 2012-12-10 15:04:53 UTC
Please reassign to Drivers/DRI(non-Intel)

A similar event triggered a BUG in mm/slub.c with the same stack trace into nouveau. Attached is the kernel log from this new event for comparison.

For cross-reference, this has been reported here
https://bugs.freedesktop.org/show_bug.cgi?id=58087
Comment 6 Peter Hurley 2012-12-10 15:05:40 UTC
Created attachment 88831 [details]
kernel log of 2nd event triggering BUG in mm/slub.c