When I run the cachestat script from perf-tools (https://github.com/brendangregg/perf-tools/blob/master/fs/cachestat), my kernel oops. Here's a piece of the log: Aug 7 00:06:45 host kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Aug 7 00:06:45 host kernel: IP: profile_graph_entry+0x70/0x160 Aug 7 00:06:45 host kernel: PGD 1bd7ca067 Aug 7 00:06:45 host kernel: P4D 1bd7ca067 Aug 7 00:06:45 host kernel: PUD 1c835e067 Aug 7 00:06:45 host kernel: PMD 0 Aug 7 00:06:45 host kernel: Aug 7 00:06:45 host kernel: Oops: 0002 [#1] PREEMPT SMP Aug 7 00:06:45 host kernel: Modules linked in: fuse bbswitch(O) nvidia(PO) ctr ccm xt_REDIRECT nf_nat_redirect xt_comment xt_owner xt_hl xt_u32 xt_tcpudp iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media btusb btrtl arc4 intel_rapl snd_hda_codec_hdmi joydev mousedev iwlmvm snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support mac80211 snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_hda_codec kvm_intel snd_hda_core thinkpad_acpi snd_hwdep snd_pcm snd_timer iwlwifi cfg80211 rtsx_pci_ms memstick input_leds tpm_crb e1000e kvm irqbypass ptp intel_cstate intel_rapl_perf Aug 7 00:06:45 host kernel: pps_core psmouse hci_uart btbcm btqca pcspkr btintel mei_me bluetooth i2c_i801 mei nvram ecdh_generic snd intel_pch_thermal soundcore rfkill shpchp crc16 led_class wmi thermal battery tpm_tis ac tpm_tis_core i2c_hid tpm evdev intel_lpss_acpi intel_lpss acpi_pad mac_hid sch_fq_codel pci_stub vboxpci(O) overlay vboxnetadp(O) vboxnetflt(O) vboxdrv(O) loop ip_tables x_tables xfs libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax uas usb_storage hid_generic usbhid hid sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc serio_raw mmc_core atkbd libps2 aesni_intel ahci libahci aes_x86_64 crypto_simd xhci_pci glue_helper cryptd libata xhci_hcd rtsx_pci usbcore scsi_mod usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper Aug 7 00:06:45 host kernel: syscopyarea sysfillrect sysimgblt fb_sys_fops drm Aug 7 00:06:45 host kernel: CPU: 2 PID: 12258 Comm: sh Tainted: P O 4.12.4-1-lily #10 Aug 7 00:06:45 host kernel: Hardware name: LENOVO 20J6A01ACD/20J6A01ACD, BIOS R0FET31W (1.11 ) 03/10/2017 Aug 7 00:06:45 host kernel: task: ffff8801e74cbb80 task.stack: ffffc90009050000 Aug 7 00:06:45 host kernel: RIP: 0010:profile_graph_entry+0x70/0x160 Aug 7 00:06:45 host kernel: RSP: 0018:ffffc90009053c30 EFLAGS: 00010297 Aug 7 00:06:45 host kernel: RAX: 0000000000000000 RBX: ffffc90009053d60 RCX: 0000000000000000 Aug 7 00:06:45 host kernel: RDX: ffff8801e74cbb80 RSI: ffffc90009053d60 RDI: ffffc90009053c44 Aug 7 00:06:45 host kernel: RBP: ffffc90009053c30 R08: ffffea0008fdc11c R09: ffffea0008fdc100 Aug 7 00:06:45 host kernel: R10: 0000000000000000 R11: ffffc90009053ce0 R12: ffffffff811aa6d0 Aug 7 00:06:45 host kernel: R13: ffffffff811998b9 R14: 0000000000000000 R15: ffffea0008fdc100 Aug 7 00:06:45 host kernel: FS: 00007fe87ed91b40(0000) GS:ffff880252480000(0000) knlGS:0000000000000000 Aug 7 00:06:45 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 7 00:06:45 host kernel: CR2: 0000000000000018 CR3: 00000001bde96000 CR4: 00000000003406e0 Aug 7 00:06:45 host kernel: Call Trace: Aug 7 00:06:45 host kernel: prepare_ftrace_return+0x78/0xd0 Aug 7 00:06:45 host kernel: ftrace_graph_caller+0x78/0xa8 Aug 7 00:06:45 host kernel: ? mark_page_accessed+0x5/0x170 Aug 7 00:06:45 host kernel: ? pagecache_get_page+0x2c/0x1f0 Aug 7 00:06:45 host kernel: mark_page_accessed+0x5/0x170 Aug 7 00:06:45 host kernel: generic_file_read_iter+0x2f9/0x700 Aug 7 00:06:46 host kernel: ? mark_page_accessed+0x5/0x170 Aug 7 00:06:46 host kernel: ? ftrace_graph_caller+0xa8/0xa8 Aug 7 00:06:46 host kernel: ? pagevec_lru_move_fn+0xcb/0xf0 Aug 7 00:06:46 host kernel: xfs_file_buffered_aio_read+0x47/0xe0 [xfs] Aug 7 00:06:46 host kernel: xfs_file_read_iter+0x62/0xb0 [xfs] Aug 7 00:06:46 host kernel: __vfs_read+0xe1/0x130 Aug 7 00:06:46 host kernel: vfs_read+0x96/0x130 Aug 7 00:06:46 host kernel: SyS_read+0x55/0xc0 Aug 7 00:06:46 host kernel: entry_SYSCALL_64_fastpath+0x1e/0xa9 Aug 7 00:06:46 host kernel: RIP: 0033:0x7fe87e42af4c Aug 7 00:06:46 host kernel: RSP: 002b:00007fff56c2f958 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Aug 7 00:06:46 host kernel: RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fe87e42af4c Aug 7 00:06:46 host kernel: RDX: 0000000000001000 RSI: 00000000018f3bb0 RDI: 0000000000000004 Aug 7 00:06:46 host kernel: RBP: 0000000000002710 R08: 0000000000000000 R09: 0000000000000010 Aug 7 00:06:46 host kernel: R10: 00000000018f4b8c R11: 0000000000000246 R12: 00007fe87e756ad8 Aug 7 00:06:46 host kernel: R13: 0000000000000070 R14: 0000000001909fd0 R15: 00007fe87e756a80 Aug 7 00:06:46 host kernel: Code: bd 00 45 85 c9 75 33 57 9d 0f 1f 44 00 00 83 f9 31 77 20 48 8d 04 89 65 48 8b 14 25 80 d3 00 00 48 8b 8a 88 18 00 00 48 8d 04 c1 <48> c7 40 18 00 00 00 00 b8 01 00 00 00 5d c3 48 b8 eb 83 b5 80 Aug 7 00:06:46 host kernel: RIP: profile_graph_entry+0x70/0x160 RSP: ffffc90009053c30 Aug 7 00:06:46 host kernel: CR2: 0000000000000018 Aug 7 00:06:46 host kernel: ---[ end trace c59c5161a34998c3 ]--- Aug 7 00:06:46 host kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Aug 7 00:06:46 host kernel: IP: profile_graph_entry+0x70/0x160 Aug 7 00:06:46 host kernel: PGD 1bd7ca067 Aug 7 00:06:46 host kernel: P4D 1bd7ca067 Aug 7 00:06:46 host kernel: PUD 1c835e067 Aug 7 00:06:46 host kernel: PMD 0 Aug 7 00:06:46 host kernel: Aug 7 00:06:46 host kernel: Oops: 0002 [#2] PREEMPT SMP Aug 7 00:06:46 host kernel: Modules linked in: fuse bbswitch(O) nvidia(PO) ctr ccm xt_REDIRECT nf_nat_redirect xt_comment xt_owner xt_hl xt_u32 xt_tcpudp iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media btusb btrtl arc4 intel_rapl snd_hda_codec_hdmi joydev mousedev iwlmvm snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support mac80211 snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_hda_codec kvm_intel snd_hda_core thinkpad_acpi snd_hwdep snd_pcm snd_timer iwlwifi cfg80211 rtsx_pci_ms memstick input_leds tpm_crb e1000e kvm irqbypass ptp intel_cstate intel_rapl_perf Aug 7 00:06:46 host kernel: pps_core psmouse hci_uart btbcm btqca pcspkr btintel mei_me bluetooth i2c_i801 mei nvram ecdh_generic snd intel_pch_thermal soundcore rfkill shpchp crc16 led_class wmi thermal battery tpm_tis ac tpm_tis_core i2c_hid tpm evdev intel_lpss_acpi intel_lpss acpi_pad mac_hid sch_fq_codel pci_stub vboxpci(O) overlay vboxnetadp(O) vboxnetflt(O) vboxdrv(O) loop ip_tables x_tables xfs libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax uas usb_storage hid_generic usbhid hid sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc serio_raw mmc_core atkbd libps2 aesni_intel ahci libahci aes_x86_64 crypto_simd xhci_pci glue_helper cryptd libata xhci_hcd rtsx_pci usbcore scsi_mod usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper Aug 7 00:06:46 host kernel: syscopyarea sysfillrect sysimgblt fb_sys_fops drm Aug 7 00:06:46 host kernel: CPU: 2 PID: 12258 Comm: sh Tainted: P D O 4.12.4-1-lily #10 Aug 7 00:06:46 host kernel: Hardware name: LENOVO 20J6A01ACD/20J6A01ACD, BIOS R0FET31W (1.11 ) 03/10/2017 Aug 7 00:06:46 host kernel: task: ffff8801e74cbb80 task.stack: ffffc90009050000 Aug 7 00:06:46 host kernel: RIP: 0010:profile_graph_entry+0x70/0x160 Aug 7 00:06:46 host kernel: RSP: 0018:ffffc90009053b80 EFLAGS: 00010297 Aug 7 00:06:46 host kernel: RAX: 0000000000000000 RBX: ffffc90009053cb0 RCX: 0000000000000000 Aug 7 00:06:46 host kernel: RDX: ffff8801e74cbb80 RSI: ffffc90009053cb0 RDI: ffffc90009053b94 Aug 7 00:06:46 host kernel: RBP: ffffc90009053b80 R08: ffff8802070d6c80 R09: 00000000004c6000 Aug 7 00:06:46 host kernel: R10: 0000000000000000 R11: ffffffff81ea85ad R12: ffffffff811aa6d0 Aug 7 00:06:46 host kernel: R13: ffffffff811cff2e R14: 0000000000000000 R15: ffffc90009053e10 Aug 7 00:06:46 host kernel: FS: 0000000000000000(0000) GS:ffff880252480000(0000) knlGS:0000000000000000 Aug 7 00:06:46 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 7 00:06:46 host kernel: CR2: 0000000000000018 CR3: 00000001bde96000 CR4: 00000000003406e0 Aug 7 00:06:46 host kernel: Call Trace: Aug 7 00:06:46 host kernel: prepare_ftrace_return+0x78/0xd0 Aug 7 00:06:46 host kernel: ftrace_graph_caller+0x78/0xa8 Aug 7 00:06:46 host kernel: ? profile_graph_entry+0x70/0x160 Aug 7 00:06:46 host kernel: ? mark_page_accessed+0x5/0x170 Aug 7 00:06:46 host kernel: mark_page_accessed+0x5/0x170 Aug 7 00:06:46 host kernel: unmap_page_range+0x58e/0x990 Aug 7 00:06:46 host kernel: ? mark_page_accessed+0x5/0x170 Aug 7 00:06:46 host kernel: ? ftrace_graph_caller+0xa8/0xa8 Aug 7 00:06:46 host kernel: ? set_next_entity+0x162/0x630 Aug 7 00:06:46 host kernel: unmap_single_vma+0x7d/0xf0 Aug 7 00:06:46 host kernel: unmap_vmas+0x51/0xb0 Aug 7 00:06:46 host kernel: exit_mmap+0xab/0x190 Aug 7 00:06:46 host kernel: mmput+0x3c/0xf0 Aug 7 00:06:46 host kernel: do_exit+0x592/0xb40 Aug 7 00:06:46 host kernel: ? SyS_read+0x55/0xc0 Aug 7 00:06:46 host kernel: rewind_stack_do_exit+0x17/0x20 Aug 7 00:06:46 host kernel: RIP: 0033:0x7fe87e42af4c Aug 7 00:06:46 host kernel: RSP: 002b:00007fff56c2f958 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Aug 7 00:06:46 host kernel: RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fe87e42af4c Aug 7 00:06:46 host kernel: RDX: 0000000000001000 RSI: 00000000018f3bb0 RDI: 0000000000000004 Aug 7 00:06:46 host kernel: RBP: 0000000000002710 R08: 0000000000000000 R09: 0000000000000010 Aug 7 00:06:46 host kernel: R10: 00000000018f4b8c R11: 0000000000000246 R12: 00007fe87e756ad8 Aug 7 00:06:46 host kernel: R13: 0000000000000070 R14: 0000000001909fd0 R15: 00007fe87e756a80 Aug 7 00:06:46 host kernel: Code: bd 00 45 85 c9 75 33 57 9d 0f 1f 44 00 00 83 f9 31 77 20 48 8d 04 89 65 48 8b 14 25 80 d3 00 00 48 8b 8a 88 18 00 00 48 8d 04 c1 <48> c7 40 18 00 00 00 00 b8 01 00 00 00 5d c3 48 b8 eb 83 b5 80 Aug 7 00:06:46 host kernel: RIP: profile_graph_entry+0x70/0x160 RSP: ffffc90009053b80 Aug 7 00:06:46 host kernel: CR2: 0000000000000018 Aug 7 00:06:46 host kernel: ---[ end trace c59c5161a34998c4 ]--- Aug 7 00:06:46 host kernel: Fixing recursive fault but reboot is needed! Aug 7 00:06:46 host kernel: BUG: scheduling while atomic: sh/12258/0x00000002 Aug 7 00:06:46 host kernel: Modules linked in: fuse bbswitch(O) nvidia(PO) ctr ccm xt_REDIRECT nf_nat_redirect xt_comment xt_owner xt_hl xt_u32 xt_tcpudp iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media btusb btrtl arc4 intel_rapl snd_hda_codec_hdmi joydev mousedev iwlmvm snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support mac80211 snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_hda_codec kvm_intel snd_hda_core thinkpad_acpi snd_hwdep snd_pcm snd_timer iwlwifi cfg80211 rtsx_pci_ms memstick input_leds tpm_crb e1000e kvm irqbypass ptp intel_cstate intel_rapl_perf Aug 7 00:06:46 host kernel: pps_core psmouse hci_uart btbcm btqca pcspkr btintel mei_me bluetooth i2c_i801 mei nvram ecdh_generic snd intel_pch_thermal soundcore rfkill shpchp crc16 led_class wmi thermal battery tpm_tis ac tpm_tis_core i2c_hid tpm evdev intel_lpss_acpi intel_lpss acpi_pad mac_hid sch_fq_codel pci_stub vboxpci(O) overlay vboxnetadp(O) vboxnetflt(O) vboxdrv(O) loop ip_tables x_tables xfs libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mod dax uas usb_storage hid_generic usbhid hid sd_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc rtsx_pci_sdmmc serio_raw mmc_core atkbd libps2 aesni_intel ahci libahci aes_x86_64 crypto_simd xhci_pci glue_helper cryptd libata xhci_hcd rtsx_pci usbcore scsi_mod usb_common i8042 serio i915 video button intel_gtt i2c_algo_bit drm_kms_helper Aug 7 00:06:46 host kernel: syscopyarea sysfillrect sysimgblt fb_sys_fops drm Aug 7 00:06:46 host kernel: CPU: 2 PID: 12258 Comm: sh Tainted: P D O 4.12.4-1-lily #10 Aug 7 00:06:46 host kernel: Hardware name: LENOVO 20J6A01ACD/20J6A01ACD, BIOS R0FET31W (1.11 ) 03/10/2017 Aug 7 00:06:46 host kernel: Call Trace: Aug 7 00:06:46 host kernel: dump_stack+0x63/0x8d Aug 7 00:06:46 host kernel: __schedule_bug+0x55/0x70 Aug 7 00:06:46 host kernel: __schedule+0x6e4/0x860 Aug 7 00:06:46 host kernel: ? vprintk_func+0x20/0x50 Aug 7 00:06:46 host kernel: schedule+0x3d/0x90 Aug 7 00:06:46 host kernel: do_exit+0x928/0xb40 Aug 7 00:06:46 host kernel: ? SyS_read+0x55/0xc0 Aug 7 00:06:46 host kernel: rewind_stack_do_exit+0x17/0x20 Aug 7 00:06:46 host kernel: RIP: 0033:0x7fe87e42af4c Aug 7 00:06:46 host kernel: RSP: 002b:00007fff56c2f958 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 Aug 7 00:06:46 host kernel: RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fe87e42af4c Aug 7 00:06:46 host kernel: RDX: 0000000000001000 RSI: 00000000018f3bb0 RDI: 0000000000000004 Aug 7 00:06:46 host kernel: RBP: 0000000000002710 R08: 0000000000000000 R09: 0000000000000010 Aug 7 00:06:46 host kernel: R10: 00000000018f4b8c R11: 0000000000000246 R12: 00007fe87e756ad8 Aug 7 00:06:46 host kernel: R13: 0000000000000070 R14: 0000000001909fd0 R15: 00007fe87e756a80 This is a lightly customized kernel based on the one from Arch Linux, with some options turned on and the uksm patch. After I saw messages like the above, I stopped the script. Then I found I had a process stuck in D state. This also happened with my 4.9.33 kernel. Kernel oops, then some processes entered D state forever, needing a forceful reset. I have all the debug symbols available. The kernel can be downloaded from here: https://repo.archlinuxcn.org/x86_64/linux-lily{,-debug,-headers}-4.12.4-1-x86_64.pkg.tar.xz{,.sig} If you need more information, please tell me how to get it.
I've been informed that a Russian operative has infiltrated our boarders. They are traveling to Las Vegas and plan on gambling away the secret Russian tapes of Trump and his Russian escapades. I need to become the Dark Knight and track them down before they have the ability to destroy the reputation of the United States Presidency. I have been given a large sum of money, and will be playing high stakes poker to make sure the tapes do not get into the wrong hands. This mission will last till Aug 13th and since I'll be under a secret identity, I will not be able to reply back till the 14th. Until then, same Bat Time, same Bat channel! -- Steve
You have several out of tree modules (those labeled with '(O)') and a proprietary module (those labeled with '(P)', basically nvidia). Can you reproduce this bug without those loaded? If so, please post another report of the bug without the tainted modules. Thanks, -- Steve
Nevermind. I just triggered the bug. I'll take a look. -- Steve
Created attachment 257999 [details] Check for null ret_stack on profile function graph entry There's a small race when function graph shutsdown and the calling of the registered function graph entry callback. The callback must not reference the task's ret_stack without first checking that it is not NULL. Note, when a ret_stack is allocated for a task, it stays allocated until the task exits. The problem here, is that function_graph is shutdown, and a new task was created, which doesn't have its ret_stack allocated. But since some of the functions are still being traced, the callbacks can still be called. The normal function_graph code handles this, but starting with commit 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler") the profiler code references the ret_stack on function entry, but doesn't check if it is NULL first.
The fix is in mainline commit a8f0f9e49956a ("ftrace: Check for null ret_stack on profile function graph entry function")