When trying to use an old trick for finding lost data by grep'ing /proc/kcore, I managed to oops my server's kernel. I tried again on my desktop with cat /proc/kcore >/dev/null. cat was killed, and a similar oops appeared in my dmesg which I managed to capture: Jul 26 23:04:13 mike kernel: BUG: unable to handle kernel paging request at e07cf000 Jul 26 23:04:13 mike kernel: IP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 Jul 26 23:04:13 mike kernel: *pde = 1b5f4067 *pte = 00000000 Jul 26 23:04:13 mike kernel: Oops: 0000 [#2] PREEMPT SMP Jul 26 23:04:13 mike kernel: last sysfs file: /sys/power/state Jul 26 23:04:13 mike kernel: Modules linked in: ipv6 sg sd_mod fuse usb_storage usbhid hid snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ppdev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer ohci_hcd parport_pc lp parport snd soundcore snd_page_alloc nvidia(P) agpgart k8temp ehci_hcd forcedeth i2c_nforce2 i2c_core usbcore evdev thermal processor fan button battery ac rtc_cmos rtc_core rtc_lib ext3 jbd mbcache ide_gd_mod ide_cd_mod cdrom sata_nv libata amd74xx ide_pci_generic ide_core scsi_mod Jul 26 23:04:13 mike kernel: Jul 26 23:04:13 mike kernel: Pid: 4835, comm: cat Tainted: P D (2.6.30-ARCH #1) W3107 Jul 26 23:04:13 mike kernel: EIP: 0060:[<c0224dd1>] EFLAGS: 00210286 CPU: 0 Jul 26 23:04:13 mike kernel: EIP is at read_kcore+0x2c1/0x4b0 Jul 26 23:04:13 mike kernel: EAX: ddb71ac0 EBX: 00001000 ECX: 00000400 EDX: e07d0000 Jul 26 23:04:13 mike kernel: ESI: e07cf000 EDI: da20e000 EBP: d73fbf30 ESP: d73fbefc Jul 26 23:04:13 mike kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Jul 26 23:04:13 mike kernel: Process cat (pid: 4835, ti=d73fa000 task=d4b5cc00 task.ti=d73fa000) Jul 26 23:04:13 mike kernel: Stack: Jul 26 23:04:13 mike kernel: da20e000 e07cf000 d73fbf90 00000000 09459000 00000000 00008000 00001000 Jul 26 23:04:13 mike kernel: 00000000 6798ed89 ddaf0000 ce920c80 c0224b10 fffffffb c0219e89 d73fbf90 Jul 26 23:04:13 mike kernel: 09459000 00008000 d73fbf90 6798ed89 ce920c80 00008000 09459000 d73fbf80 Jul 26 23:04:13 mike kernel: Call Trace: Jul 26 23:04:13 mike kernel: [<c0224b10>] ? read_kcore+0x0/0x4b0 Jul 26 23:04:13 mike kernel: [<c0219e89>] ? proc_reg_read+0x79/0xc0 Jul 26 23:04:13 mike kernel: [<c01d1b43>] ? vfs_read+0xc3/0x1a0 Jul 26 23:04:13 mike kernel: [<c0219e10>] ? proc_reg_read+0x0/0xc0 Jul 26 23:04:13 mike kernel: [<c01d1d28>] ? sys_read+0x58/0xb0 Jul 26 23:04:13 mike kernel: [<c0103c73>] ? sysenter_do_call+0x12/0x28 Jul 26 23:04:13 mike kernel: Code: 89 fb 0f 43 f2 89 ca 29 f2 29 f3 39 f9 0f 46 da 29 5c 24 14 f6 40 0c 01 8d 14 33 75 19 89 d9 89 f7 c1 e9 02 2b 7c 24 04 03 3c 24 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 4c 24 14 8b 00 85 c9 74 0a Jul 26 23:04:13 mike kernel: EIP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 SS:ESP 0068:d73fbefc Jul 26 23:04:13 mike kernel: CR2: 00000000e07cf000 Jul 26 23:04:13 mike kernel: ---[ end trace 3bb140bf57c1987e ]--- Jul 26 23:04:13 mike kernel: note: cat[4835] exited with preempt_count 1 I understand it's quite a ridiculous thing to do, but userspace shouldn't be able to cause kernel errors, no matter what kind of insane things I try.
(switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 27 Jul 2009 03:19:11 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=13850 > > Summary: reading /proc/kcore causes oops > Product: Other > Version: 2.5 > Kernel Version: 2.6.30 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > AssignedTo: other_other@kernel-bugs.osdl.org > ReportedBy: scgtrp@gmail.com > Regression: No > > > When trying to use an old trick for finding lost data by grep'ing > /proc/kcore, > I managed to oops my server's kernel. I tried again on my desktop with cat > /proc/kcore >/dev/null. cat was killed, and a similar oops appeared in my > dmesg > which I managed to capture: > > Jul 26 23:04:13 mike kernel: BUG: unable to handle kernel paging request at > e07cf000 > Jul 26 23:04:13 mike kernel: IP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 > Jul 26 23:04:13 mike kernel: *pde = 1b5f4067 *pte = 00000000 > Jul 26 23:04:13 mike kernel: Oops: 0000 [#2] PREEMPT SMP > Jul 26 23:04:13 mike kernel: last sysfs file: /sys/power/state > Jul 26 23:04:13 mike kernel: Modules linked in: ipv6 sg sd_mod fuse > usb_storage > usbhid hid snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > snd_seq_device > snd_pcm_oss snd_mixer_oss ppdev snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm > snd_timer ohci_hcd parport_pc lp parport snd soundcore snd_page_alloc > nvidia(P) > agpgart k8temp ehci_hcd forcedeth i2c_nforce2 i2c_core usbcore evdev thermal > processor fan button battery ac rtc_cmos rtc_core rtc_lib ext3 jbd mbcache > ide_gd_mod ide_cd_mod cdrom sata_nv libata amd74xx ide_pci_generic ide_core > scsi_mod > Jul 26 23:04:13 mike kernel: > Jul 26 23:04:13 mike kernel: Pid: 4835, comm: cat Tainted: P D > (2.6.30-ARCH #1) W3107 > Jul 26 23:04:13 mike kernel: EIP: 0060:[<c0224dd1>] EFLAGS: 00210286 CPU: 0 > Jul 26 23:04:13 mike kernel: EIP is at read_kcore+0x2c1/0x4b0 > Jul 26 23:04:13 mike kernel: EAX: ddb71ac0 EBX: 00001000 ECX: 00000400 EDX: > e07d0000 > Jul 26 23:04:13 mike kernel: ESI: e07cf000 EDI: da20e000 EBP: d73fbf30 ESP: > d73fbefc > Jul 26 23:04:13 mike kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > Jul 26 23:04:13 mike kernel: Process cat (pid: 4835, ti=d73fa000 > task=d4b5cc00 > task.ti=d73fa000) > Jul 26 23:04:13 mike kernel: Stack: > Jul 26 23:04:13 mike kernel: da20e000 e07cf000 d73fbf90 00000000 09459000 > 00000000 00008000 00001000 > Jul 26 23:04:13 mike kernel: 00000000 6798ed89 ddaf0000 ce920c80 c0224b10 > fffffffb c0219e89 d73fbf90 > Jul 26 23:04:13 mike kernel: 09459000 00008000 d73fbf90 6798ed89 ce920c80 > 00008000 09459000 d73fbf80 > Jul 26 23:04:13 mike kernel: Call Trace: > Jul 26 23:04:13 mike kernel: [<c0224b10>] ? read_kcore+0x0/0x4b0 > Jul 26 23:04:13 mike kernel: [<c0219e89>] ? proc_reg_read+0x79/0xc0 > Jul 26 23:04:13 mike kernel: [<c01d1b43>] ? vfs_read+0xc3/0x1a0 > Jul 26 23:04:13 mike kernel: [<c0219e10>] ? proc_reg_read+0x0/0xc0 > Jul 26 23:04:13 mike kernel: [<c01d1d28>] ? sys_read+0x58/0xb0 > Jul 26 23:04:13 mike kernel: [<c0103c73>] ? sysenter_do_call+0x12/0x28 > Jul 26 23:04:13 mike kernel: Code: 89 fb 0f 43 f2 89 ca 29 f2 29 f3 39 f9 0f > 46 > da 29 5c 24 14 f6 40 0c 01 8d 14 33 75 19 89 d9 89 f7 c1 e9 02 2b 7c 24 04 03 > 3c 24 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 4c 24 14 8b 00 85 c9 74 0a > Jul 26 23:04:13 mike kernel: EIP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 SS:ESP > 0068:d73fbefc > Jul 26 23:04:13 mike kernel: CR2: 00000000e07cf000 > Jul 26 23:04:13 mike kernel: ---[ end trace 3bb140bf57c1987e ]--- > Jul 26 23:04:13 mike kernel: note: cat[4835] exited with preempt_count 1 > > I understand it's quite a ridiculous thing to do, but userspace shouldn't be > able to cause kernel errors, no matter what kind of insane things I try. > gee, read_kcore() is huge. This makes it pretty hard to work out where exactly the kernel died. Is it reproducible, or do you still have the vmlinux from the above oops on-disk? If so, can you please help work out where it crashed? You could run something like addr2line -e vmlinux 0xc0224dd1 or gdb vmlinux (gdb) l *0xc0224dd1 both of these will need CONFIG_DEBUG_INFO=y. It is possible to work out where the kernel crashed using the above Code: line, but it's a bit of a pain. Thanks.
On Tue, 28 Jul 2009 16:05:27 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 27 Jul 2009 03:19:11 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=13850 > > > > Summary: reading /proc/kcore causes oops > > Product: Other > > Version: 2.5 > > Kernel Version: 2.6.30 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > AssignedTo: other_other@kernel-bugs.osdl.org > > ReportedBy: scgtrp@gmail.com > > Regression: No > > > > > > When trying to use an old trick for finding lost data by grep'ing > /proc/kcore, > > I managed to oops my server's kernel. I tried again on my desktop with cat > > /proc/kcore >/dev/null. cat was killed, and a similar oops appeared in my > dmesg > > which I managed to capture: > > > > Jul 26 23:04:13 mike kernel: BUG: unable to handle kernel paging request at > > e07cf000 > > Jul 26 23:04:13 mike kernel: IP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 > > Jul 26 23:04:13 mike kernel: *pde = 1b5f4067 *pte = 00000000 > > Jul 26 23:04:13 mike kernel: Oops: 0000 [#2] PREEMPT SMP > > Jul 26 23:04:13 mike kernel: last sysfs file: /sys/power/state > > Jul 26 23:04:13 mike kernel: Modules linked in: ipv6 sg sd_mod fuse > usb_storage > > usbhid hid snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq > snd_seq_device > > snd_pcm_oss snd_mixer_oss ppdev snd_intel8x0 snd_ac97_codec ac97_bus > snd_pcm > > snd_timer ohci_hcd parport_pc lp parport snd soundcore snd_page_alloc > nvidia(P) > > agpgart k8temp ehci_hcd forcedeth i2c_nforce2 i2c_core usbcore evdev > thermal > > processor fan button battery ac rtc_cmos rtc_core rtc_lib ext3 jbd mbcache > > ide_gd_mod ide_cd_mod cdrom sata_nv libata amd74xx ide_pci_generic ide_core > > scsi_mod > > Jul 26 23:04:13 mike kernel: > > Jul 26 23:04:13 mike kernel: Pid: 4835, comm: cat Tainted: P D > > (2.6.30-ARCH #1) W3107 > > Jul 26 23:04:13 mike kernel: EIP: 0060:[<c0224dd1>] EFLAGS: 00210286 CPU: 0 > > Jul 26 23:04:13 mike kernel: EIP is at read_kcore+0x2c1/0x4b0 > > Jul 26 23:04:13 mike kernel: EAX: ddb71ac0 EBX: 00001000 ECX: 00000400 EDX: > > e07d0000 > > Jul 26 23:04:13 mike kernel: ESI: e07cf000 EDI: da20e000 EBP: d73fbf30 ESP: > > d73fbefc > > Jul 26 23:04:13 mike kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > > Jul 26 23:04:13 mike kernel: Process cat (pid: 4835, ti=d73fa000 > task=d4b5cc00 > > task.ti=d73fa000) > > Jul 26 23:04:13 mike kernel: Stack: > > Jul 26 23:04:13 mike kernel: da20e000 e07cf000 d73fbf90 00000000 09459000 > > 00000000 00008000 00001000 > > Jul 26 23:04:13 mike kernel: 00000000 6798ed89 ddaf0000 ce920c80 c0224b10 > > fffffffb c0219e89 d73fbf90 > > Jul 26 23:04:13 mike kernel: 09459000 00008000 d73fbf90 6798ed89 ce920c80 > > 00008000 09459000 d73fbf80 > > Jul 26 23:04:13 mike kernel: Call Trace: > > Jul 26 23:04:13 mike kernel: [<c0224b10>] ? read_kcore+0x0/0x4b0 > > Jul 26 23:04:13 mike kernel: [<c0219e89>] ? proc_reg_read+0x79/0xc0 > > Jul 26 23:04:13 mike kernel: [<c01d1b43>] ? vfs_read+0xc3/0x1a0 > > Jul 26 23:04:13 mike kernel: [<c0219e10>] ? proc_reg_read+0x0/0xc0 > > Jul 26 23:04:13 mike kernel: [<c01d1d28>] ? sys_read+0x58/0xb0 > > Jul 26 23:04:13 mike kernel: [<c0103c73>] ? sysenter_do_call+0x12/0x28 > > Jul 26 23:04:13 mike kernel: Code: 89 fb 0f 43 f2 89 ca 29 f2 29 f3 39 f9 > 0f 46 > > da 29 5c 24 14 f6 40 0c 01 8d 14 33 75 19 89 d9 89 f7 c1 e9 02 2b 7c 24 04 > 03 > > 3c 24 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 4c 24 14 8b 00 85 c9 74 0a > > Jul 26 23:04:13 mike kernel: EIP: [<c0224dd1>] read_kcore+0x2c1/0x4b0 > SS:ESP > > 0068:d73fbefc > > Jul 26 23:04:13 mike kernel: CR2: 00000000e07cf000 > > Jul 26 23:04:13 mike kernel: ---[ end trace 3bb140bf57c1987e ]--- > > Jul 26 23:04:13 mike kernel: note: cat[4835] exited with preempt_count 1 > > > > I understand it's quite a ridiculous thing to do, but userspace shouldn't > be > > able to cause kernel errors, no matter what kind of insane things I try. > > > > gee, read_kcore() is huge. This makes it pretty hard to work out where > exactly the kernel died. > > Is it reproducible, or do you still have the vmlinux from the above > oops on-disk? > > If so, can you please help work out where it crashed? You could run > something like > > addr2line -e vmlinux 0xc0224dd1 > > or > > gdb vmlinux > (gdb) l *0xc0224dd1 > yes, disassemble will be helpful. If you compiled the kernel by yourself, # objdump -d fs/proc/kcore.o will also help us. Hmm, but this message is curious. unable to handle kernel paging request at e07cf000 What's layout of memory does your server have ? Could you show # grep "System RAM" /proc/iomem or head of dmesg ? IIUC, current code doesn't assume any memory hole in direct-map area. (And my new patch series should handle it even in CONFIG_HIGHMEM case.) Thanks, -Kame > both of these will need CONFIG_DEBUG_INFO=y. > > > It is possible to work out where the kernel crashed using the above > Code: line, but it's a bit of a pain. > > Thanks. > >
> What's layout of memory does your server have ? The log I gave was from my desktop, so I'll assume you wanted that instead of the server: [mike: mike in ~]$ grep "System RAM" /proc/iomem 00010000-0009efff : System RAM 00100000-1dedffff : System RAM > If so, can you please help work out where it crashed? I didn't compile the kernel myself; it's the stock Arch Linux kernel, which may explain the following useless output (this was the only file I could find called vmlinux): [mike: mike in /usr/src/linux-2.6.30-ARCH]$ addr2line -e vmlinux 0xc0224dd1 kcore.c:0 [mike: mike in /usr/src/linux-2.6.30-ARCH]$ gdb vmlinux GNU gdb 6.8 Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... (no debugging symbols found) (gdb) l *0xc0224dd1 No symbol table is loaded. Use the "file" command. I'll compile a minimal kernel later with debug symbols and see if I can reproduce it on that.
On Tue, 28 Jul 2009 22:46:56 -0400 Mike Smith <scgtrp@gmail.com> wrote: > > What's layout of memory does your server have ? > The log I gave was from my desktop, so I'll assume you wanted that > instead of the server: > [mike: mike in ~]$ grep "System RAM" /proc/iomem > 00010000-0009efff : System RAM > 00100000-1dedffff : System RAM > From this, your kernel's valid direct-map address range will be c0010000-c009efff c0100000-ddedffff And, == unable to handle kernel paging request at e07cf000 == e07cf000 doesn't exist in direct map. It seems this is vmalloc() area. At looking into mm/vmalloc.c, this area is unmapped under - purge_lock But proc/kcore just access this just under vmlist_lock. No guards at all. This is _a_ problem. But it seems race is not reproducable easily. I'll think more but is it guaranteed whether vmalloc area(struct vm_struct) linked to vmlist has always valid pages ? Considering get_vm_area(), it's not true I think. I wonder fs/proc/kcore.c's vmalloc area access needs some fix. let me try. Thanks, -Kame
On Wed, 29 Jul 2009 12:32:09 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Tue, 28 Jul 2009 22:46:56 -0400 > Mike Smith <scgtrp@gmail.com> wrote: > > > > What's layout of memory does your server have ? > > The log I gave was from my desktop, so I'll assume you wanted that > > instead of the server: > > [mike: mike in ~]$ grep "System RAM" /proc/iomem > > 00010000-0009efff : System RAM > > 00100000-1dedffff : System RAM > > > From this, your kernel's valid direct-map address range will be > > c0010000-c009efff > c0100000-ddedffff > Ok, I reproduced the bug on x86-32 and here is a fix. I reproduced the bug on x86-32 host with mem=512M boot option. As I expected, the bug is from holes in vmalloc area. (This hole means thera are a memory hole withing [start ...start+size-PAGE_SIZE) of valid vm_struct.) For review, I divided all into 3 patches. all series will be reply to this email. Thanks, -Kame
On Fri, 31 Jul 2009 16:07:48 +0900 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > On Wed, 29 Jul 2009 12:32:09 +0900 > KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote: > > > On Tue, 28 Jul 2009 22:46:56 -0400 > > Mike Smith <scgtrp@gmail.com> wrote: > > > > > > What's layout of memory does your server have ? > > > The log I gave was from my desktop, so I'll assume you wanted that > > > instead of the server: > > > [mike: mike in ~]$ grep "System RAM" /proc/iomem > > > 00010000-0009efff : System RAM > > > 00100000-1dedffff : System RAM > > > > > From this, your kernel's valid direct-map address range will be > > > > c0010000-c009efff > > c0100000-ddedffff > > > Ok, I reproduced the bug on x86-32 and here is a fix. > > I reproduced the bug on x86-32 host with mem=512M boot option. > > As I expected, the bug is from holes in vmalloc area. > (This hole means thera are a memory hole withing [start > ...start+size-PAGE_SIZE) > of valid vm_struct.) > > For review, I divided all into 3 patches. all series will be reply to this > email. > A memo. [1/3] fixes the bug in /dev/kmem, also. This causes machine check on my host (and one of cpu was disabled) # dd if=/dev/kmem of=/dev/null bs=1024 count=1048576 skip=3145728 I hope all users of kmem are sane people...but the patch fixes this. (If kmem has to access IOREMAP area, please teach me .) [2/3] fix for this /proc/kcore bug based on [1/3] [3/3] fix for vread/vwrite race conditions. not related to this reproducable bug itself but a fix for potential bug. Thanks, -Kame > Thanks, > -Kame > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >
On Tue, 28 Jul 2009 22:46:56 -0400 Mike Smith <scgtrp@gmail.com> wrote: > > What's layout of memory does your server have ? > The log I gave was from my desktop, so I'll assume you wanted that > instead of the server: > [mike: mike in ~]$ grep "System RAM" /proc/iomem > 00010000-0009efff : System RAM > 00100000-1dedffff : System RAM > This is v3. updated against 2.6.31-rc5 and several fixes in addition to type fixes. patches are reordered again but nochanges in concept. Thanks, -Kame