Bug 211101 - (AMDGPU) crash when using OpenCL
Summary: (AMDGPU) crash when using OpenCL
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(Other) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-09 12:59 UTC by Stefan de Konink
Modified: 2021-01-14 00:27 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.10.5-gentoo
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Stefan de Konink 2021-01-09 12:59:02 UTC
Upon running Tesseract (ocr) under Xorg, an OpenCL benchmark executes. This in effect causes the screen and connected monitors to go blank. The system remains accessible over the network. If raven ridge does not support OpenCL (anymore) it should not crash the driver.

[ 5906.578015] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 5906.578126] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 5911.628033] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=854610, emitted seq=854612
[ 5911.628104] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid 833 thread X:cs0 pid 834
[ 5911.628109] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
[ 5911.848066] [drm] free PSP TMR buffer
[ 5911.909148] amdgpu 0000:05:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 5911.909595] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[ 5911.910074] [drm] PSP is resuming...
[ 5911.930123] [drm] reserve 0x400000 from 0xf40fc00000 for PSP TMR
[ 5912.437956] amdgpu 0000:05:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 5912.497949] amdgpu 0000:05:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 5912.818636] [drm] kiq ring mec 2 pipe 1 q 0
[ 5913.668777] ------------[ cut here ]------------
[ 5913.668845] WARNING: CPU: 5 PID: 14459 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc.c:922 dc_commit_state+0x933/0xaf0 [amdgpu]
[ 5913.668847] Modules linked in: ctr ccm cmac bnep joydev ath10k_pci ath10k_core ath zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 zcommon(PO) znvpair(PO) videodev amdgpu spl(O) mac80211 videobuf2_common btusb btrtl zlib_deflate btbcm btintel mfd_core gpu_sched zlib_inflate bluetooth snd_hda_codec_conexant ttm kvm_amd i2c_algo_bit snd_hda_codec_generic snd_hda_codec_hdmi ecdh_generic drm_kms_helper ecc syscopyarea sdhci_pci kvm sysfillrect snd_hda_intel sysimgblt iosf_mbi fb_sys_fops cqhci irqbypass cec snd_intel_dspcfg cfg80211 sdhci snd_hda_codec wmi_bmof libarc4 r8169 mmc_core snd_hda_core aesni_intel realtek snd_pcm mdio_devres crypto_simd snd_timer wireguard libphy cryptd thinkpad_acpi ccp psmouse glue_helper nvram evdev ip6_udp_tunnel ledtrig_audio udp_tunnel i2c_piix4 sha1_generic snd soundcore rfkill wmi battery ac i2c_scmi video button sch_fq_codel drm backlight i2c_core fuse configfs efivarfs
[ 5913.668928] CPU: 5 PID: 14459 Comm: kworker/5:2 Tainted: P           O      5.10.5-gentoo #1
[ 5913.668930] Hardware name: LENOVO 20KU000NMH/20KU000NMH, BIOS R0UET77W (1.57 ) 04/07/2020
[ 5913.668936] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 5913.668995] RIP: 0010:dc_commit_state+0x933/0xaf0 [amdgpu]
[ 5913.668999] Code: 04 24 48 c7 00 00 00 00 00 48 c7 40 08 00 00 00 00 e9 01 f8 ff ff 31 d2 e9 54 f8 ff ff 80 b8 e0 02 00 00 00 0f 84 c0 fd ff ff <0f> 0b e9 b9 fd ff ff 48 89 ef e8 6e b1 00 00 48 89 ef e8 b6 9b 1e
[ 5913.669001] RSP: 0018:ffffc900094a7c50 EFLAGS: 00010202
[ 5913.669004] RAX: ffff8881f3caf800 RBX: ffff8882376e0690 RCX: 0000000000000005
[ 5913.669005] RDX: 0000000000000e24 RSI: 00000000000007cd RDI: 00000aca1f576270
[ 5913.669007] RBP: 0000000000000000 R08: ffffc900094a7bd4 R09: ffffc900094a7b20
[ 5913.669008] R10: 0000000000000002 R11: 000000000000000c R12: ffff8882376e0000
[ 5913.669010] R13: ffff8882376e1ec8 R14: ffff888110550000 R15: ffff8882376e1ec8
[ 5913.669012] FS:  0000000000000000(0000) GS:ffff88844ef40000(0000) knlGS:0000000000000000
[ 5913.669014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5913.669015] CR2: 00003a9a093e1000 CR3: 000000012fec8000 CR4: 00000000003506e0
[ 5913.669017] Call Trace:
[ 5913.669093]  dm_resume+0x3b0/0x510 [amdgpu]
[ 5913.669164]  ? psm_adjust_power_state_dynamic+0xec/0x1c0 [amdgpu]
[ 5913.669207]  amdgpu_device_ip_resume_phase2+0x52/0xb0 [amdgpu]
[ 5913.669268]  amdgpu_do_asic_reset+0x26c/0x39c [amdgpu]
[ 5913.669329]  amdgpu_device_gpu_recover.cold+0x6b9/0x98d [amdgpu]
[ 5913.669393]  amdgpu_job_timedout+0x11c/0x140 [amdgpu]
[ 5913.669398]  drm_sched_job_timedout+0x60/0xd0 [gpu_sched]
[ 5913.669403]  process_one_work+0x1dc/0x370
[ 5913.669406]  worker_thread+0x4d/0x3d0
[ 5913.669409]  ? rescuer_thread+0x3f0/0x3f0
[ 5913.669412]  kthread+0x125/0x140
[ 5913.669415]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5913.669418]  ret_from_fork+0x1f/0x30
[ 5913.669421] ---[ end trace c1d81a78b4c82ff4 ]---
[ 5913.683049] [drm] VCN decode and encode initialized successfully(under SPG Mode).
[ 5913.683060] amdgpu 0000:05:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[ 5913.683063] amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 5913.683065] amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 5913.683067] amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 5913.683070] amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 5913.683072] amdgpu 0000:05:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 5913.683074] amdgpu 0000:05:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 5913.683076] amdgpu 0000:05:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 5913.683078] amdgpu 0000:05:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 5913.683080] amdgpu 0000:05:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 5913.683082] amdgpu 0000:05:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
[ 5913.683084] amdgpu 0000:05:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
[ 5913.683086] amdgpu 0000:05:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
[ 5913.683088] amdgpu 0000:05:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
[ 5913.683090] amdgpu 0000:05:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
[ 5914.738098] amdgpu 0000:05:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
[ 5914.738106] amdgpu 0000:05:00.0: amdgpu: ib ring test failed (-110).
[ 5914.927936] [drm] free PSP TMR buffer

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4)
Comment 1 Artur Bac 2021-01-13 23:47:55 UTC
Hint - "IF" You use clang for building kernel on gentoo don't do this as amdgpu kernel module will not work properly, For months I used clang for building kernel and CL didn't work with 5700XT properly, I have had kernel null pointer dereference upon application that used CL exit.
Then I found that kernel build with gcc works ok with OCL and amdgpu.
Comment 2 Stefan de Konink 2021-01-14 00:27:01 UTC
Using gcc (Gentoo 10.2.0-r5 p6) 10.2.0, never used clang for that part of my system.

Note You need to log in before you can comment on or make changes to this bug.