Bug 212425

Summary: Kernel warning at drivers/gpu/drm/ttm/ttm_bo.c:517
Product: Drivers Reporter: Grzegorz Kowal (custos.mentis)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: alexdeucher, christian.koenig
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.11.9 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg (kernel 5.11.9, Talos II)

Description Grzegorz Kowal 2021-03-24 20:00:58 UTC
After installing today's release of kernel 5.11.9 I am getting a bunch of kernel warnings like:

 [   70.902471] WARNING: CPU: 6 PID: 2147 at drivers/gpu/drm/ttm/ttm_bo.c:517 ttm_bo_release+0x2b1/0x300
[   70.902481] Modules linked in: nls_iso8859_2 nls_cp852 vfat fat uinput ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables x_tables kvm_amd kvm pcspkr irqbypass sp5100_tco snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer lm92 lm63 it87 hwmon_vid fam15h_power nfsd k10temp auth_rpcgss fuse oid_registry lockd grace hid_logitech_hidpp hid_logitech_dj hid_generic sr_mod cdrom sd_mod megaraid_sas 8250 8250_base serial_core usbhid sunrpc dm_mod dax
[   70.902541] CPU: 6 PID: 2147 Comm: kmail Tainted: G        W         5.11.9-acrux #9
[   70.902554] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2901 05/04/2016
[   70.902556] RIP: 0010:ttm_bo_release+0x2b1/0x300
[   70.902559] Code: e8 74 2b 2f 00 e9 d9 fd ff ff 48 8b 7d 88 b9 30 75 00 00 31 d2 be 01 00 00 00 e8 3a 52 2f 00 48 8b 45 d8 eb 9d 4c 89 e0 eb 98 <0f> 0b c7 85 9c 00 00 00 00 00 00 00 4c 89 ef e8 4b f2 ff ff 48 8d
[   70.902561] RSP: 0018:ffffaf4a83313bb8 EFLAGS: 00010202
[   70.902566] RAX: 0000000000000001 RBX: ffff961b7553c500 RCX: 0000000000000008
[   70.902568] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa1db5248
[   70.902570] RBP: ffff961b7553c570 R08: ffff961ba14f7a38 R09: ffff9621dedaa3f8
[   70.902571] R10: ffff961ac60c1250 R11: ffff961ac60c1240 R12: ffff961ac5fe5588
[   70.902572] R13: ffff961b7553c400 R14: 0000000000000000 R15: ffff961ac5fe5f48
[   70.902573] FS:  00007ff78c2f5f00(0000) GS:ffff9621ded80000(0000) knlGS:0000000000000000
[   70.902575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   70.902576] CR2: 00007ff691820000 CR3: 0000000203d5d000 CR4: 00000000000406e0
[   70.902578] Call Trace:
[   70.902580]  ttm_bo_move_accel_cleanup+0x19d/0x398
[   70.902583]  amdgpu_bo_move+0x15c/0x698
[   70.902586]  ? amdgpu_vram_mgr_new+0x373/0x3d8
[   70.902587]  ttm_bo_handle_move_mem+0x8c/0x170
[   70.902590]  ttm_bo_validate+0x147/0x178
[   70.902592]  amdgpu_bo_fault_reserve_notify+0xbf/0x148
[   70.902594]  amdgpu_ttm_fault+0x33/0x80
[   70.902596]  __do_fault+0x33/0x90
[   70.902599]  handle_mm_fault+0xdff/0x1498
[   70.902601]  exc_page_fault+0x1a5/0x500
[   70.902604]  ? exit_to_user_mode_prepare+0x5d/0x118
[   70.902607]  ? asm_exc_page_fault+0x8/0x30
[   70.902609]  asm_exc_page_fault+0x1e/0x30
[   70.902611] RIP: 0033:0x7ff78d8f290d
[   70.902612] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 6f 46 bc f3 0f 6f 4e cc 4c 8b 4e dc 4c 8b 56 e4 4c 8b 5e ec 48 8b 4e f4 8b 56 fc <f3> 0f 7f 47 bc f3 0f 7f 4f cc 4c 89 4f dc 4c 89 57 e4 4c 89 5f ec
[   70.902614] RSP: 002b:00007ffdbd5bb838 EFLAGS: 00010207
[   70.902616] RAX: 00007ff691820000 RBX: 0000000000000044 RCX: 3f80000000000000
[   70.902617] RDX: 0000000000000000 RSI: 000055d1049acad4 RDI: 00007ff691820044
[   70.902618] RBP: 000055d1049aca90 R08: 000055d103fd9a30 R09: 0000000000000000
[   70.902619] R10: 000000003f800000 R11: 3f800000bf800000 R12: 0000000000000000
[   70.902620] R13: 000055d103f58320 R14: 0000000000000000 R15: 0000000000000044

Apparently the warnings show after commit 	7d09e9725b5dcc8d14e101de931e4969d033a6ad, which explains that the warning is triggered by "very likely a driver bug".
Comment 1 Alex Deucher 2021-03-24 20:09:25 UTC
I don't know that this should have gone to stable.  Looks like Sasha's bot picked it up.  I think it may depend on other ttm changes that are not in stable.
Comment 2 Erhard F. 2021-03-24 23:50:42 UTC
Created attachment 296039 [details]
dmesg (kernel 5.11.9, Talos II)

Yep, getting these too on my Talos II (ppc64) on kernel 5.11.9. Card is a Radeon HD6670.

[...]
------------[ cut here ]------------
WARNING: CPU: 8 PID: 1229 at drivers/gpu/drm/ttm/ttm_bo.c:517 .ttm_bo_release+0x298/0x810 [ttm]
Modules linked in: input_leds led_class joydev hid_generic usbhid hid fuse rfkill sd_mod uas usb_storage scsi_mod ecb xts ctr cbc radeon aes_generic libaes evdev snd_hda_codec_hdmi>
CPU: 8 PID: 1229 Comm: X Tainted: G        W         5.11.9-gentoo-TalosII #1
NIP:  c00800001a1a7278 LR: c00800001a1a7264 CTR: 0000000000000000
REGS: c000000144763370 TRAP: 0700   Tainted: G        W          (5.11.9-gentoo-TalosII)
MSR:  9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 44244248  XER: 20040000
CFAR: c000000000bcd1f4 IRQMASK: 0 
 GPR00: c00800001a1a7264 c000000144763610 c00800001a1ba900 c00800001a1b47e8 
 GPR04: 00000000ffffffff 0000000028b10000 00000000722135af 0000000000000008 
 GPR08: 0000000000000001 0000000000000001 0000000000000001 c00000000155d3f8 
 GPR12: 0000000044244248 c0000007ffffd000 0000000000003fff 0000000000000313 
 GPR16: 0000000000000010 000000003ed97f00 00000001467b5a30 0000000000000000 
 GPR20: c000000001206d10 c00000004b960000 c000000001206d18 0000000000000001 
 GPR24: c00000000101deb8 c00000002d5166c0 c00800001a1b47e8 c00000002d516400 
 GPR28: 0000000000000000 c00000002a668a28 c00800001a1b41d0 c00000002d516650 
NIP [c00800001a1a7278] .ttm_bo_release+0x298/0x810 [ttm]
LR [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm]
Call Trace:
[c000000144763610] [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm] (unreliable)
[c0000001447636e0] [c00800001a1ab154] .ttm_bo_move_accel_cleanup+0x2a4/0x570 [ttm]
[c0000001447637a0] [c00800001bcdea3c] .radeon_bo_move+0x40c/0x610 [radeon]
[c000000144763870] [c00800001a1a621c] .ttm_bo_handle_move_mem+0xac/0x1e0 [ttm]
[c000000144763920] [c00800001a1a9570] .ttm_bo_validate+0x1b0/0x260 [ttm]
[c000000144763a20] [c00800001bce1480] .radeon_bo_fault_reserve_notify+0x130/0x260 [radeon]
[c000000144763ae0] [c00800001bcde518] .radeon_ttm_fault+0x98/0x1b0 [radeon]
[c000000144763b70] [c000000000347cf8] .__do_fault+0x58/0x120
[c000000144763bf0] [c00000000034f32c] .handle_mm_fault+0x15ec/0x1f80

Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs nfs_ssc lockd grace sunrpc input_leds led_class joydev hid_generic usbhid hid fuse rfkill sd_mod uas usb_storage scsi_mod ecb >
CPU: 1 PID: 1229 Comm: X Tainted: G      D W         5.11.9-gentoo-TalosII #1
NIP:  c00800001a1a7278 LR: c00800001a1a7264 CTR: c000000000bcd190
REGS: c000000144763370 TRAP: 0700   Tainted: G      D W          (5.11.9-gentoo-TalosII)
MSR:  9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 44044244  XER: 20040000
CFAR: c000000000bcd1f4 IRQMASK: 0 
 GPR00: c00800001a1a7264 c000000144763610 c00800001a1ba900 c00800001a1b47e8 
 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 
 GPR08: 0000000000000001 0000000000000001 0000000000000001 000000007fffffff 
 GPR12: c000000000bcd190 c0000007ffffec00 0000000000000080 0000000146899640 
 GPR16: 00003fffa7cf1208 00003fffa7c8d440 0000000146899840 00003fffa7c889f8 
 GPR20: c000000001206d10 c00000004b960000 c000000001206d18 0000000000000001 
 GPR24: c00000000101deb8 c0000000bd492ac0 c00800001a1b47e8 c0000000bd492800 
 GPR28: 0000000000000000 c00000002a668a28 0000000000000000 c0000000bd492a50 
NIP [c00800001a1a7278] .ttm_bo_release+0x298/0x810 [ttm]
LR [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm]
Call Trace:
[c000000144763610] [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm] (unreliable)
[c0000001447636e0] [c00800001a1ab154] .ttm_bo_move_accel_cleanup+0x2a4/0x570 [ttm]
[c0000001447637a0] [c00800001bcdea3c] .radeon_bo_move+0x40c/0x610 [radeon]
[c000000144763870] [c00800001a1a621c] .ttm_bo_handle_move_mem+0xac/0x1e0 [ttm]
[c000000144763920] [c00800001a1a9570] .ttm_bo_validate+0x1b0/0x260 [ttm]
[c000000144763a20] [c00800001bce1480] .radeon_bo_fault_reserve_notify+0x130/0x260 [radeon]
[c000000144763ae0] [c00800001bcde518] .radeon_ttm_fault+0x98/0x1b0 [radeon]
[c000000144763b70] [c000000000347cf8] .__do_fault+0x58/0x120
[c000000144763bf0] [c00000000034f32c] .handle_mm_fault+0x15ec/0x1f80
[c000000144763d40] [c000000000063e84] .do_page_fault+0x2b4/0xa00
[c000000144763e10] [c00000000000c218] handle_page_fault+0x10/0x2c
--- interrupt: 300 at 0x3fffa7ca6d24
NIP:  00003fffa7ca6d24 LR: 00003fffa7ca6d00 CTR: 00003fffb21186bc
REGS: c000000144763e80 TRAP: 0300   Tainted: G      D W          (5.11.9-gentoo-TalosII)
MSR:  900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI>  CR: 44028842  XER: 00000000
CFAR: 00003fffb3ac5a58 DAR: 00003fffa6727000 DSISR: 42000000 IRQMASK: 0 
 GPR00: 00003fffa7ca6d00 00003fffdfe98710 00003fffa7cf7600 00003fffa78a2010 
 GPR04: 000000014689d8b0 0000000000000011 0000000146fc3b40 0000000000000068 
 GPR08: 0000000000000001 0000000000000001 0000000000000096 0000000000000000 
 GPR12: 0000000000000001 00003fffb4cd4810 0000000000000080 0000000146899640 
 GPR16: 00003fffa7cf1208 00003fffa7c8d440 0000000146899840 00003fffa7c889f8 
 GPR20: 00003fffdfe98ba8 0000000000000001 0000000146899640 000000014787b3c8 
 GPR24: 00003fffa6727000 000000014787b300 00003fffdfe98fb0 00003fffdfe98fa0 
 GPR28: 000000000000002a 000000014770f960 0000000000000068 0000000000000001 
NIP [00003fffa7ca6d24] 0x3fffa7ca6d24
LR [00003fffa7ca6d00] 0x3fffa7ca6d00
--- interrupt: 300
Instruction dump:
e8898188 48009675 e8410028 39200001 7f43d378 993f0058 48008d61 e8410028 
813f009c 39000001 2c290000 7d40409e <0b0a0000> 40820498 39200001 913f0000 
irq event stamp: 2872594
hardirqs last  enabled at (2872593): [<c000000000bcd694>] ._raw_spin_unlock_irqrestore+0x84/0xc0
hardirqs last disabled at (2872594): [<c000000000bc4834>] .__schedule+0x524/0xe00
softirqs last  enabled at (2872210): [<c000000000bcea9c>] .__do_softirq+0x64c/0x6c0
softirqs last disabled at (2872205): [<c0000000000de4b8>] .irq_exit+0x198/0x1f0
---[ end trace e1da02ebb1a3a186 ]---


 # inxi -b
System:    Kernel: 5.11.9-gentoo-TalosII ppc64 bits: 64 Desktop: MATE 1.24.1 
           Distro: Gentoo Base System release 2.7 
Machine:   Type: PowerPC Device System: T2P9D01 REV 1.01 details: PowerNV T2P9D01 REV 1.01 rev: 2.2 (pvr 004e 1202) 
CPU:       Info: 32-Core POWER9 altivec supported [MCP] speed: 2237 MHz min/max: 2154/3800 MHz 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Turks XT [Radeon HD 6670/7670] driver: radeon v: kernel 
           Device-2: ASPEED Graphics Family driver: N/A 
           Display: x11 server: X.Org 1.20.10 driver: ati,radeon unloaded: fbdev,modesetting resolution: 1920x1080~60Hz 
           OpenGL: renderer: AMD TURKS (DRM 2.50.0 / 5.11.9-gentoo-TalosII LLVM 11.0.0) v: 3.2 Mesa 21.0.0 
Network:   Device-1: Broadcom and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3 
           Device-2: Broadcom and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Comment 3 Christian König 2021-03-25 07:51:52 UTC
Yeah, Alex is right this patch should have never been backported in the first place.
Comment 4 Alex Deucher 2021-03-25 14:08:52 UTC
Reverted in stable:

commit bec771b5e0901f4b0bc861bcb58056de5151ae3a
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Mar 25 09:52:40 2021 +0100

    Revert "drm/ttm: Warn on pinning without holding a reference"
    
    This reverts commit 7d09e9725b5dcc8d14e101de931e4969d033a6ad which is
    commit 57fcd550eb15bce14a7154736379dfd4ed60ae81 upstream.
    
    It is causing too many warnings on 5.11.y, so should be dropped for now.
    
    Cc: Huang Rui <ray.huang@amd.com>
    Cc: Christian König <christian.koenig@amd.com>
    Cc: Daniel Vetter <daniel.vetter@intel.com>
    Cc: Christian Koenig <christian.koenig@amd.com>
    Cc: Huang Rui <ray.huang@amd.com>
    Cc: Sasha Levin <sashal@kernel.org>
    Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
    Link: https://lore.kernel.org/r/8c3da8bc-0bf3-496f-1fd6-4f65a07b2d13@amd.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>