Bug 194843
Summary: | [amdgpu] oops [drm:gfx_v8_0_priv_reg_irq] *ERROR* Illegal register access in command stream | ||
---|---|---|---|
Product: | Drivers | Reporter: | Johannes Hirte (johannes.hirte) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | alexdeucher |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 4.11.0-rc1 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg-4.10.0
Xorg.0.log |
Description
Johannes Hirte
2017-03-10 20:34:49 UTC
what chip is this? Please attach your xorg log and dmesg output. Created attachment 255173 [details]
dmesg-4.10.0
Created attachment 255175 [details]
Xorg.0.log
it's a Carrizo, A10-8700B R6 Requested logs attached, both running kernel 4.10.0 at moment. Do you need them from 4.11-rc1? Ok, it's not 4.11 specific. Now I had a system hang with 4.10.0 and found in the logs after reboot only this: Mar 12 12:12:48 probook kernel: ------------[ cut here ]------------ Mar 12 12:12:48 probook kernel: WARNING: CPU: 1 PID: 872 at ./include/linux/dma-fence.h:349 amdgpu_vm_grab_id+0x7ef/0x810 Mar 12 12:12:48 probook kernel: Modules linked in: uas usb_storage cmac rfcomm uhid bnep btusb btrtl btbcm btintel bluetooth hp_wmi kvm_amd kvm iwlmvm irqbypass mac80211 aesni_intel aes_x86_64 crypto_simd cryptd glue_hel per fam15h_power snd_hda_codec_conexant snd_hda_codec_generic i2c_piix4 k10temp snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm iwlwifi rtsx_pci_ms snd_timer cfg80211 memstick snd rfkill r81 69 soundcore mii wmi i2c_designware_platform i2c_designware_core hp_wireless rtsx_pci_sdmmc mmc_core ehci_pci ehci_hcd xhci_pci xhci_hcd rtsx_pci mfd_core efivarfs autofs4 Mar 12 12:12:48 probook kernel: CPU: 1 PID: 872 Comm: sdma0 Not tainted 4.10.0 #91 Mar 12 12:12:48 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 Mar 12 12:12:48 probook kernel: Call Trace: Mar 12 12:12:48 probook kernel: dump_stack+0x4f/0x73 Mar 12 12:12:48 probook kernel: __warn+0xc6/0xe0 Mar 12 12:12:48 probook kernel: warn_slowpath_null+0x18/0x20 Mar 12 12:12:48 probook kernel: amdgpu_vm_grab_id+0x7ef/0x810 Mar 12 12:12:48 probook kernel: ? dma_fence_wait_timeout+0x110/0x110 Mar 12 12:12:48 probook kernel: amdgpu_job_dependency+0x5a/0x90 Mar 12 12:12:48 probook kernel: amd_sched_main+0x9e/0x500 Mar 12 12:12:48 probook kernel: ? wake_atomic_t_function+0x50/0x50 Mar 12 12:12:48 probook kernel: kthread+0xfc/0x130 Mar 12 12:12:48 probook kernel: ? amd_sched_process_job+0xe0/0xe0 Mar 12 12:12:48 probook kernel: ? kthread_create_on_node+0x40/0x40 Mar 12 12:12:48 probook kernel: ret_from_fork+0x29/0x40 Mar 12 12:12:48 probook kernel: ---[ end trace 4591763eee9b4ab4 ]--- Mar 12 12:12:48 probook kernel: ------------[ cut here ]------------ Mar 12 12:12:48 probook kernel: WARNING: CPU: 2 PID: 863 at ./include/linux/dma-fence.h:349 amdgpu_vm_grab_id+0x7ef/0x810 Mar 12 12:12:48 probook kernel: Modules linked in: uas usb_storage cmac rfcomm uhid bnep btusb btrtl btbcm btintel bluetooth hp_wmi kvm_amd kvm iwlmvm irqbypass mac80211 aesni_intel aes_x86_64 crypto_simd cryptd glue_hel per fam15h_power snd_hda_codec_conexant snd_hda_codec_generic i2c_piix4 k10temp snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm iwlwifi rtsx_pci_ms snd_timer cfg80211 memstick snd rfkill r81 69 soundcore mii wmi i2c_designware_platform i2c_designware_core hp_wireless rtsx_pci_sdmmc mmc_core ehci_pci ehci_hcd xhci_pci xhci_hcd rtsx_pci mfd_core efivarfs autofs4 Mar 12 12:12:48 probook kernel: CPU: 2 PID: 863 Comm: gfx Tainted: G W 4.10.0 #91 Mar 12 12:12:48 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 Mar 12 12:12:48 probook kernel: Call Trace: Mar 12 12:12:48 probook kernel: dump_stack+0x4f/0x73 Mar 12 12:12:48 probook kernel: __warn+0xc6/0xe0 Mar 12 12:12:48 probook kernel: warn_slowpath_null+0x18/0x20 Mar 12 12:12:48 probook kernel: amdgpu_vm_grab_id+0x7ef/0x810 Mar 12 12:12:48 probook kernel: ? dma_fence_wait_timeout+0x110/0x110 Mar 12 12:12:48 probook kernel: amdgpu_job_dependency+0x5a/0x90 Mar 12 12:12:48 probook kernel: amd_sched_main+0x9e/0x500 Mar 12 12:12:48 probook kernel: ? wake_atomic_t_function+0x50/0x50 Mar 12 12:12:48 probook kernel: kthread+0xfc/0x130 Mar 12 12:12:48 probook kernel: ? amd_sched_process_job+0xe0/0xe0 Mar 12 12:12:48 probook kernel: ? kthread_create_on_node+0x40/0x40 Mar 12 12:12:48 probook kernel: ? umh_complete+0x40/0x40 Mar 12 12:12:48 probook kernel: ? call_usermodehelper_exec_async+0x137/0x140 Mar 12 12:12:48 probook kernel: ret_from_fork+0x29/0x40 Mar 12 12:12:48 probook kernel: ---[ end trace 4591763eee9b4ab5 ]--- Some more observation: It seems the hangs happen much more often/frequent with kernel 4.11 than with 4.10. Where 4.10 kernels running usually several days, I have a hang with 4.11 within a day. Additionally I've found some of the WARNING: CPU: 1 PID: 872 at ./include/linux/dma-fence.h:349 amdgpu_vm_grab_id entries in the logs without a hang at this time. As far as I've seen this was always with a 4.10 kernel. I wonder if there might be memory corruption going on, in which case enabling CONFIG_KASAN for the kernel build might give more clues. (In reply to Michel Dänzer from comment #7) > I wonder if there might be memory corruption going on, in which case > enabling CONFIG_KASAN for the kernel build might give more clues. I was testing the last days with KASAN enabled and didn't hit one hang or other BUG message in the logs. Today I've upgraded the RAM from one 4G module to two 8G modules and now the first hit directly after boot: [ 104.834811] wlp2s0: authenticate with 02:a0:f9:37:8e:a6 [ 104.838674] ================================================================== [ 104.838715] BUG: KASAN: global-out-of-bounds in iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] at addr ffffffffa0d4a336 [ 104.838724] Read of size 2 by task wpa_supplicant/4039 [ 104.838739] Address belongs to variable iwl_drv_exit+0xf66f/0x339 [iwlwifi] [ 104.838750] CPU: 2 PID: 4039 Comm: wpa_supplicant Not tainted 4.11.0-rc7-kasan-00001-g73080f5e1d5b #171 [ 104.838755] Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 [ 104.838760] Call Trace: [ 104.838772] dump_stack+0x4f/0x66 [ 104.838781] kasan_report+0x4da/0x510 [ 104.838798] ? iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] [ 104.838805] ? update_curr+0x14b/0x490 [ 104.838812] ? wake_atomic_t_function+0x2b0/0x2b0 [ 104.838819] __asan_report_load2_noabort+0x14/0x20 [ 104.838835] iwl_mvm_mac_ctxt_cmd_common+0x14b5/0x1610 [iwlmvm] [ 104.838854] ? iwl_mvm_channel_switch_noa_notif+0x40f/0x410 [iwlmvm] [ 104.838870] ? iwl_mvm_mac_ctxt_send_beacon+0xcb0/0xcb0 [iwlmvm] [ 104.838885] ? iwl_mvm_send_cmd_pdu+0x91/0xb0 [iwlmvm] [ 104.838901] ? iwl_mvm_send_cmd+0x160/0x160 [iwlmvm] [ 104.838917] iwl_mvm_mac_ctxt_cmd_sta+0xd1/0xe70 [iwlmvm] [ 104.838933] ? iwl_mvm_mac_ctxt_cmd_common+0x1610/0x1610 [iwlmvm] [ 104.838949] ? iwl_mvm_phy_ctxt_apply.constprop.3+0x31f/0x5d0 [iwlmvm] [ 104.838966] ? iwl_mvm_ref_taken+0x150/0x150 [iwlmvm] [ 104.838982] iwl_mvm_mac_ctx_send+0x68/0x110 [iwlmvm] [ 104.838996] iwl_mvm_mac_ctxt_changed+0x68/0x180 [iwlmvm] [ 104.839011] iwl_mvm_bss_info_changed+0x2f8/0xec0 [iwlmvm] [ 104.839043] ieee80211_bss_info_change_notify+0x177/0x4c0 [mac80211] [ 104.839070] ? __ieee80211_recalc_txpower+0x111/0x320 [mac80211] [ 104.839097] ieee80211_assign_vif_chanctx+0x7ce/0xf80 [mac80211] [ 104.839123] ieee80211_vif_use_channel+0x3ad/0x780 [mac80211] [ 104.839149] ieee80211_prep_connection+0x55b/0x1cf0 [mac80211] [ 104.839174] ? ieee80211_handle_bss_capability+0x220/0x220 [mac80211] [ 104.839182] ? __kmalloc+0x126/0x220 [ 104.839207] ieee80211_mgd_auth+0x69d/0xdd0 [mac80211] [ 104.839232] ? ieee80211_mlme_notify_scan_completed+0x1c0/0x1c0 [mac80211] [ 104.839261] ieee80211_auth+0x13/0x20 [mac80211] [ 104.839291] cfg80211_mlme_auth+0x2a7/0x6b0 [cfg80211] [ 104.839298] ? unwind_get_return_address+0x1e0/0x1e0 [ 104.839319] ? cfg80211_rx_mgmt+0x710/0x710 [cfg80211] [ 104.839342] ? parse_station_flags.isra.36+0x490/0x490 [cfg80211] [ 104.839363] nl80211_authenticate+0x8f7/0xfe0 [cfg80211] [ 104.839385] ? nl80211_parse_key+0xe70/0xe70 [cfg80211] [ 104.839406] ? nl80211_pre_doit+0xcd/0x560 [cfg80211] [ 104.839414] ? nla_parse+0xde/0x210 [ 104.839422] genl_family_rcv_msg+0x5c8/0x10f0 [ 104.839429] ? __alloc_skb+0x31f/0x560 [ 104.839435] ? genl_rcv+0x40/0x40 [ 104.839443] ? try_to_wake_up+0xb8/0x1080 [ 104.839450] ? alloc_skb_with_frags+0x8d/0x4c0 [ 104.839458] genl_rcv_msg+0x9b/0x120 [ 104.839465] netlink_rcv_skb+0x23b/0x340 [ 104.839471] ? genl_family_rcv_msg+0x10f0/0x10f0 [ 104.839477] genl_rcv+0x23/0x40 [ 104.839483] netlink_unicast+0x438/0x620 [ 104.839489] ? netlink_attachskb+0x640/0x640 [ 104.839497] netlink_sendmsg+0x86f/0xb60 [ 104.839503] ? netlink_broadcast+0x10/0x10 [ 104.839510] ? netlink_broadcast+0x10/0x10 [ 104.839516] sock_sendmsg+0xb5/0xf0 [ 104.839522] ___sys_sendmsg+0x6a2/0x8c0 [ 104.839529] ? ___sys_recvmsg+0x333/0x590 [ 104.839535] ? SYSC_sendto+0x300/0x300 [ 104.839541] ? sock_sendmsg+0xb5/0xf0 [ 104.839547] ? sock_write_iter+0x1e0/0x3b0 [ 104.839553] ? _raw_spin_unlock_irq+0x39/0x60 [ 104.839559] ? sock_sendmsg+0xf0/0xf0 [ 104.839567] ? __vfs_write+0x299/0x620 [ 104.839573] ? vfs_dedupe_get_page.isra.20+0x1d0/0x1d0 [ 104.839580] ? __fdget+0xe/0x10 [ 104.839587] __sys_sendmsg+0xc1/0x140 [ 104.839592] ? __sys_sendmsg+0xc1/0x140 [ 104.839598] ? SyS_shutdown+0x170/0x170 [ 104.839605] ? vfs_write+0x305/0x490 [ 104.839613] ? exit_to_usermode_loop+0x75/0xf0 [ 104.839620] SyS_sendmsg+0xd/0x20 [ 104.839626] entry_SYSCALL_64_fastpath+0x13/0x94 [ 104.839632] RIP: 0033:0x7fb02a23fad7 [ 104.839637] RSP: 002b:00007ffdd3d73b28 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 104.839645] RAX: ffffffffffffffda RBX: 000000000185daf0 RCX: 00007fb02a23fad7 [ 104.839649] RDX: 0000000000000000 RSI: 00007ffdd3d73b80 RDI: 0000000000000006 [ 104.839654] RBP: 00007fb02a4e6ae0 R08: 0000000000000000 R09: 00000000000000a6 [ 104.839658] R10: 0000000001867d90 R11: 0000000000000246 R12: 0000000000000000 [ 104.839663] R13: 0000000000000003 R14: 0000000000000011 R15: 000000000185d8c0 [ 104.839669] Memory state around the buggy address: [ 104.839676] ffffffffa0d4a200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 104.839682] ffffffffa0d4a280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 104.839687] >ffffffffa0d4a300: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00 [ 104.839691] ^ [ 104.839696] ffffffffa0d4a380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 104.839702] ffffffffa0d4a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 104.839705] ================================================================== [ 104.839709] Disabling lock debugging due to kernel taint [ 104.843536] wlp2s0: send auth to 02:a0:f9:37:8e:a6 (try 1/3) [ 104.849308] wlp2s0: authenticated (In reply to Johannes Hirte from comment #8) > (In reply to Michel Dänzer from comment #7) > > I wonder if there might be memory corruption going on, in which case > > enabling CONFIG_KASAN for the kernel build might give more clues. > > I was testing the last days with KASAN enabled and didn't hit one hang or > other BUG message in the logs. I have to correct this. Found in the logs three use-after-free from find_cpio_data The most detailed was this one: Apr 23 11:55:16 probook kernel: smpboot: Booting Node 0 Processor 1 APIC 0x11 Apr 23 11:55:16 probook kernel: ================================================================== Apr 23 11:55:16 probook kernel: BUG: KASAN: use-after-free in find_cpio_data+0x4d8/0x570 at addr ffff880037991000 Apr 23 11:55:16 probook kernel: Read of size 1 by task swapper/1/0 Apr 23 11:55:16 probook kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.11.0-rc7-00006-g3e06d0af3e4b #164 Apr 23 11:55:16 probook kernel: Hardware name: HP HP ProBook 645 G2/80FE, BIOS N77 Ver. 01.07 11/01/2016 Apr 23 11:55:16 probook kernel: Call Trace: Apr 23 11:55:16 probook kernel: dump_stack+0x4f/0x66 Apr 23 11:55:16 probook kernel: kasan_object_err+0x1c/0x70 Apr 23 11:55:16 probook kernel: kasan_report+0x252/0x510 Apr 23 11:55:16 probook kernel: ? find_cpio_data+0x4d8/0x570 Apr 23 11:55:16 probook kernel: ? put_dec+0xb0/0xb0 Apr 23 11:55:16 probook kernel: __asan_report_load1_noabort+0x14/0x20 Apr 23 11:55:16 probook kernel: find_cpio_data+0x4d8/0x570 Apr 23 11:55:16 probook kernel: ? dump_stack+0x66/0x66 Apr 23 11:55:16 probook kernel: ? snprintf+0x87/0xb0 Apr 23 11:55:16 probook kernel: ? vsprintf+0x20/0x20 Apr 23 11:55:16 probook kernel: find_microcode_in_initrd+0x229/0x3c0 Apr 23 11:55:16 probook kernel: ? get_builtin_firmware+0x5e/0x120 Apr 23 11:55:16 probook kernel: __load_ucode_amd+0x11c/0x240 Apr 23 11:55:16 probook kernel: ? clockevents_program_event+0x1a2/0x2c0 Apr 23 11:55:16 probook kernel: ? apply_microcode_amd+0x3d0/0x3d0 Apr 23 11:55:16 probook kernel: ? pick_next_task_fair+0x7a3/0xfe0 Apr 23 11:55:16 probook kernel: ? pick_next_task_fair+0x7a3/0xfe0 Apr 23 11:55:16 probook kernel: load_ucode_amd_ap+0x90/0x100 Apr 23 11:55:16 probook kernel: ? load_ucode_amd_ap+0x90/0x100 Apr 23 11:55:16 probook kernel: ? __load_ucode_amd+0x240/0x240 Apr 23 11:55:16 probook kernel: ? flat_send_IPI_mask+0x2b/0x40 Apr 23 11:55:16 probook kernel: ? sched_clock_cpu+0x1b/0x1e0 Apr 23 11:55:16 probook kernel: ? default_send_IPI_single+0x77/0xa0 Apr 23 11:55:16 probook kernel: load_ucode_ap+0x80/0x90 Apr 23 11:55:16 probook kernel: cpu_init+0x7dc/0xd40 Apr 23 11:55:16 probook kernel: ? smp_call_function_single+0xf7/0x340 Apr 23 11:55:16 probook kernel: ? syscall_init+0x140/0x140 Apr 23 11:55:16 probook kernel: ? debug_smp_processor_id+0x17/0x20 Apr 23 11:55:16 probook kernel: ? native_play_dead+0xf2/0x120 Apr 23 11:55:16 probook kernel: ? arch_cpu_idle_dead+0x28/0x40 Apr 23 11:55:16 probook kernel: ? do_idle+0x206/0x2d0 Apr 23 11:55:16 probook kernel: start_secondary+0x12/0x2c0 Apr 23 11:55:16 probook kernel: ? start_secondary+0x12/0x2c0 Apr 23 11:55:16 probook kernel: start_cpu+0x14/0x14 Apr 23 11:55:16 probook kernel: Object at ffff880037990f00, in cache kmalloc-512 size: 512 Apr 23 11:55:16 probook kernel: Allocated: Apr 23 11:55:16 probook kernel: PID = 4012 Apr 23 11:55:16 probook kernel: save_stack_trace+0x16/0x20 Apr 23 11:55:16 probook kernel: save_stack+0x46/0xd0 Apr 23 11:55:16 probook kernel: kasan_kmalloc+0xad/0xe0 Apr 23 11:55:16 probook kernel: kasan_slab_alloc+0x12/0x20 Apr 23 11:55:16 probook kernel: __kmalloc_node_track_caller+0xfe/0x290 Apr 23 11:55:16 probook kernel: __kmalloc_reserve.isra.36+0x2c/0xc0 Apr 23 11:55:16 probook kernel: __alloc_skb+0xd0/0x560 Apr 23 11:55:16 probook kernel: alloc_skb_with_frags+0x8d/0x4c0 Apr 23 11:55:16 probook kernel: sock_alloc_send_pskb+0x587/0x6f0 Apr 23 11:55:16 probook kernel: unix_stream_sendmsg+0x57d/0x880 Apr 23 11:55:16 probook kernel: sock_sendmsg+0xb5/0xf0 Apr 23 11:55:16 probook kernel: sock_write_iter+0x1e0/0x3b0 Apr 23 11:55:16 probook kernel: __do_readv_writev+0x2b7/0x350 Apr 23 11:55:16 probook kernel: do_readv_writev+0x79/0xb0 Apr 23 11:55:16 probook kernel: vfs_writev+0x37/0x50 Apr 23 11:55:16 probook kernel: do_writev+0x4d/0xd0 Apr 23 11:55:16 probook kernel: SyS_writev+0xb/0x10 Apr 23 11:55:16 probook kernel: entry_SYSCALL_64_fastpath+0x13/0x94 Apr 23 11:55:16 probook kernel: Freed: Apr 23 11:55:16 probook kernel: PID = 4281 Apr 23 11:55:16 probook kernel: save_stack_trace+0x16/0x20 Apr 23 11:55:16 probook kernel: save_stack+0x46/0xd0 Apr 23 11:55:16 probook kernel: kasan_slab_free+0x73/0xc0 Apr 23 11:55:16 probook kernel: kfree+0x91/0x1c0 Apr 23 11:55:16 probook kernel: skb_free_head+0x6a/0x90 Apr 23 11:55:16 probook kernel: skb_release_data+0x279/0x330 Apr 23 11:55:16 probook kernel: skb_release_all+0x3d/0x50 Apr 23 11:55:16 probook kernel: consume_skb+0x62/0x180 Apr 23 11:55:16 probook kernel: unix_stream_read_generic+0x1493/0x1b50 Apr 23 11:55:16 probook kernel: unix_stream_recvmsg+0x8a/0xa0 Apr 23 11:55:16 probook kernel: sock_recvmsg+0xc2/0x100 Apr 23 11:55:16 probook kernel: ___sys_recvmsg+0x227/0x590 Apr 23 11:55:16 probook kernel: __sys_recvmsg+0xbe/0x140 Apr 23 11:55:16 probook kernel: SyS_recvmsg+0xd/0x20 Apr 23 11:55:16 probook kernel: entry_SYSCALL_64_fastpath+0x13/0x94 Apr 23 11:55:16 probook kernel: Memory state around the buggy address: Apr 23 11:55:16 probook kernel: ffff880037990f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Apr 23 11:55:16 probook kernel: ffff880037990f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Apr 23 11:55:16 probook kernel: >ffff880037991000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Apr 23 11:55:16 probook kernel: ^ Apr 23 11:55:16 probook kernel: ffff880037991080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Apr 23 11:55:16 probook kernel: ffff880037991100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Apr 23 11:55:16 probook kernel: ================================================================== Apr 23 11:55:16 probook kernel: Disabling lock debugging due to kernel taint THe other two entries don't have the Allocated/Freed part. (In reply to Michel Dänzer from comment #7) > I wonder if there might be memory corruption going on, in which case > enabling CONFIG_KASAN for the kernel build might give more clues. You're right, KASAN pointet me at two other bugs: https://bugzilla.kernel.org/show_bug.cgi?id=195677 https://bugzilla.kernel.org/show_bug.cgi?id=196145 After eliminating this, no more problems with amdgpu happened. Closing this report as invalid. |