Bug 51291
Summary: | [-next-20121129] GP fault in mm/slub:get_freepointer() | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Peter Hurley (peter) |
Component: | Slab Allocator | Assignee: | Pekka Enberg (penberg) |
Status: | NEW --- | ||
Severity: | normal | ||
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | -next-20121129 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel log of gp fault
listing file of mm/slub.c hand-annotated mm/slub.lst kernel log of 2nd event triggering BUG in mm/slub.c |
Created attachment 88421 [details]
listing file of mm/slub.c
hm, I can't spot it. Reassigned to Pekka... Dec 4 09:38:25 thor kernel: [322576.747744] general protection fault: 0000 [#1] PREEMPT SMP Dec 4 09:38:25 thor kernel: [322576.747769] Modules linked in: netconsole configfs bnep rfcomm ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc binfmt_misc snd_ctxfi snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event i5400_edac snd_seq snd_timer snd_seq_device coretemp edac_core kvm_intel psmouse btusb snd joydev hid_logitech_dj mac_hid i5k_amb firewire_serial(C) bluetooth kvm gpio_ich soundcore hid_generic serio_raw snd_page_alloc lpc_ich shpchp ppdev dcdbas microcode parport_pc ext2 lp parport dm_mirror dm_region_hash dm_log usbhid hid usb_storage tg3 firewire_ohci firewire_core crc_itu_t nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video wmi Dec 4 09:38:25 thor kernel: [322576.748955] CPU 7 Dec 4 09:38:25 thor kernel: [322576.748960] Pid: 1426, comm: Xorg Tainted: G WC 3.7.0-next-20121129-ge83124f-xeon #ge83124f Dell Inc. Precision WorkStation T5400 /0RW203 Dec 4 09:38:25 thor kernel: [322576.749899] RIP: 0010:[<ffffffff811885ec>] [<ffffffff811885ec>] __kmalloc_node_track_caller+0x15c/0x1f0 Dec 4 09:38:25 thor kernel: [322576.750383] RSP: 0018:ffff8802a0f7fa08 EFLAGS: 00010246 Dec 4 09:38:25 thor kernel: [322576.750864] RAX: 0000000000000000 RBX: 00000000000004d0 RCX: 000000000b30ddef Dec 4 09:38:25 thor kernel: [322576.751354] RDX: 000000000b30dde7 RSI: 0000000000000000 RDI: 0000000000016dc0 Dec 4 09:38:25 thor kernel: [322576.751683] RBP: ffff8802a0f7fa68 R08: ffff8802a0f7fae7 R09: 0000000000000001 Dec 4 09:38:25 thor kernel: [322576.751683] R10: 0000000000000000 R11: 0000000000003246 R12: 040000e86c5a001c Dec 4 09:38:25 thor kernel: [322576.751683] R13: ffff8802a5002a00 R14: 00000000ffffffff R15: 0000000000000200 Dec 4 09:38:25 thor kernel: [322576.751683] FS: 00007f6fd67138a0(0000) GS:ffff8802afdc0000(0000) knlGS:0000000000000000 Dec 4 09:38:25 thor kernel: [322576.751683] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 4 09:38:25 thor kernel: [322576.751683] CR2: 00007f7541ba2000 CR3: 00000002a08ff000 CR4: 00000000000027f0 Dec 4 09:38:25 thor kernel: [322576.751683] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 4 09:38:25 thor kernel: [322576.751683] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 4 09:38:25 thor kernel: [322576.751683] Process Xorg (pid: 1426, threadinfo ffff8802a0f7e000, task ffff88029ebf2e20) Dec 4 09:38:25 thor kernel: [322576.751683] Stack: Dec 4 09:38:25 thor kernel: [322576.751683] ffff8802a0f7fa28 ffffffff816f7fcc 0000000000000082 ffff88028382c530 Dec 4 09:38:25 thor kernel: [322576.751683] ffffffff815d64eb 000106d0810912dc ffff8802a0f7fa88 00000000000004d0 Dec 4 09:38:25 thor kernel: [322576.751683] 00000000ffffffff 0000000000000200 ffff8802a0f7fae7 0000000000000200 Dec 4 09:38:25 thor kernel: [322576.751683] Call Trace: Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff816f7fcc>] ? _raw_spin_unlock_irqrestore+0x2c/0x50 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815d64eb>] ? __alloc_skb+0x8b/0x2a0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815d598b>] __kmalloc_reserve+0x3b/0xa0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815d64eb>] __alloc_skb+0x8b/0x2a0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815ce90a>] sock_alloc_send_pskb+0x11a/0x2a0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815ceaa5>] sock_alloc_send_skb+0x15/0x20 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff816767f0>] unix_stream_sendmsg+0x2f0/0x430 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815c9f72>] do_sock_write.isra.16+0xc2/0xe0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815c9f90>] ? do_sock_write.isra.16+0xe0/0xe0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff815c9ff3>] sock_aio_write+0x63/0x90 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff811a6520>] ? __pollwait+0xf0/0xf0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff81194333>] do_sync_readv_writev+0xa3/0xe0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff811945fd>] do_readv_writev+0xcd/0x1d0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff8119473c>] vfs_writev+0x3c/0x50 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff811948b2>] sys_writev+0x52/0xc0 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff816f92c5>] ? do_device_not_available+0x15/0x20 Dec 4 09:38:25 thor kernel: [322576.751683] [<ffffffff81700019>] system_call_fastpath+0x16/0x1b Dec 4 09:38:25 thor kernel: [322576.751683] Code: ff ff 01 48 8b 80 38 e0 ff ff a8 08 0f 84 6a ff ff ff e8 48 e5 56 00 e9 60 ff ff ff 0f 1f 00 49 63 45 20 48 8d 4a 08 49 8b 7d 00 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 d2 fe Dec 4 09:38:25 thor kernel: [322576.751683] RIP [<ffffffff811885ec>] __kmalloc_node_track_caller+0x15c/0x1f0 Dec 4 09:38:25 thor kernel: [322576.751683] RSP <ffff8802a0f7fa08> Well, this fault comes from a free list corruption. I've attached a hand-annotated extract of the relevant code in mm/slub.lst My premise now is that some OOM error path does a write-after-free. Back to this-could-be-anywhere-in-kernel bug... although modules caught on the other cpus go to the head of the suspect list. Created attachment 88511 [details]
hand-annotated mm/slub.lst
Please reassign to Drivers/DRI(non-Intel) A similar event triggered a BUG in mm/slub.c with the same stack trace into nouveau. Attached is the kernel log from this new event for comparison. For cross-reference, this has been reported here https://bugs.freedesktop.org/show_bug.cgi?id=58087 Created attachment 88831 [details]
kernel log of 2nd event triggering BUG in mm/slub.c
|
Created attachment 88411 [details] kernel log of gp fault Page allocator failure produced general protection fault in SLUB version of __kmalloc_node_track_caller(). Attached is the log and mm/slub.lst for the kernel I was running at the time. peter@thor:~/src/kernels/next$ grep SLUB .config CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y # CONFIG_SLUB_DEBUG_ON is not set # CONFIG_SLUB_STATS is not set Not sure I got my analysis correct but it looks like r12 has some status bits set. static inline void *get_freepointer(struct kmem_cache *s, void *object) { return *(void **)(object + s->offset); 6e6c: 49 8b 1c 04 mov (%r12,%rax,1),%rbx <======= faulting instruction * 3. If they were not changed replace tid and freelist * * Since this is without lock semantics the protection is only against * code executing on this cpu *not* from access by other cpus. */ if (unlikely(!this_cpu_cmpxchg_double( 6e70: 4c 89 e0 mov %r12,%rax 6e73: 48 8d 37 lea (%rdi),%rsi 6e76: e8 00 00 00 00 callq 6e7b <__kmalloc_node_track_caller+0x16b> 6e77: R_X86_64_PC32 this_cpu_cmpxchg16b_emu-0x4 6e7b: 84 c0 test %al,%al 6e7d: 0f 84 d2 fe ff ff je 6d55 <__kmalloc_node_track_caller+0x45> return *(void **)(object + s->offset); }