After installing today's release of kernel 5.11.9 I am getting a bunch of kernel warnings like: [ 70.902471] WARNING: CPU: 6 PID: 2147 at drivers/gpu/drm/ttm/ttm_bo.c:517 ttm_bo_release+0x2b1/0x300 [ 70.902481] Modules linked in: nls_iso8859_2 nls_cp852 vfat fat uinput ipt_REJECT nf_reject_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables x_tables kvm_amd kvm pcspkr irqbypass sp5100_tco snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer lm92 lm63 it87 hwmon_vid fam15h_power nfsd k10temp auth_rpcgss fuse oid_registry lockd grace hid_logitech_hidpp hid_logitech_dj hid_generic sr_mod cdrom sd_mod megaraid_sas 8250 8250_base serial_core usbhid sunrpc dm_mod dax [ 70.902541] CPU: 6 PID: 2147 Comm: kmail Tainted: G W 5.11.9-acrux #9 [ 70.902554] Hardware name: To be filled by O.E.M. To be filled by O.E.M./SABERTOOTH 990FX R2.0, BIOS 2901 05/04/2016 [ 70.902556] RIP: 0010:ttm_bo_release+0x2b1/0x300 [ 70.902559] Code: e8 74 2b 2f 00 e9 d9 fd ff ff 48 8b 7d 88 b9 30 75 00 00 31 d2 be 01 00 00 00 e8 3a 52 2f 00 48 8b 45 d8 eb 9d 4c 89 e0 eb 98 <0f> 0b c7 85 9c 00 00 00 00 00 00 00 4c 89 ef e8 4b f2 ff ff 48 8d [ 70.902561] RSP: 0018:ffffaf4a83313bb8 EFLAGS: 00010202 [ 70.902566] RAX: 0000000000000001 RBX: ffff961b7553c500 RCX: 0000000000000008 [ 70.902568] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa1db5248 [ 70.902570] RBP: ffff961b7553c570 R08: ffff961ba14f7a38 R09: ffff9621dedaa3f8 [ 70.902571] R10: ffff961ac60c1250 R11: ffff961ac60c1240 R12: ffff961ac5fe5588 [ 70.902572] R13: ffff961b7553c400 R14: 0000000000000000 R15: ffff961ac5fe5f48 [ 70.902573] FS: 00007ff78c2f5f00(0000) GS:ffff9621ded80000(0000) knlGS:0000000000000000 [ 70.902575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 70.902576] CR2: 00007ff691820000 CR3: 0000000203d5d000 CR4: 00000000000406e0 [ 70.902578] Call Trace: [ 70.902580] ttm_bo_move_accel_cleanup+0x19d/0x398 [ 70.902583] amdgpu_bo_move+0x15c/0x698 [ 70.902586] ? amdgpu_vram_mgr_new+0x373/0x3d8 [ 70.902587] ttm_bo_handle_move_mem+0x8c/0x170 [ 70.902590] ttm_bo_validate+0x147/0x178 [ 70.902592] amdgpu_bo_fault_reserve_notify+0xbf/0x148 [ 70.902594] amdgpu_ttm_fault+0x33/0x80 [ 70.902596] __do_fault+0x33/0x90 [ 70.902599] handle_mm_fault+0xdff/0x1498 [ 70.902601] exc_page_fault+0x1a5/0x500 [ 70.902604] ? exit_to_user_mode_prepare+0x5d/0x118 [ 70.902607] ? asm_exc_page_fault+0x8/0x30 [ 70.902609] asm_exc_page_fault+0x1e/0x30 [ 70.902611] RIP: 0033:0x7ff78d8f290d [ 70.902612] Code: 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 6f 46 bc f3 0f 6f 4e cc 4c 8b 4e dc 4c 8b 56 e4 4c 8b 5e ec 48 8b 4e f4 8b 56 fc <f3> 0f 7f 47 bc f3 0f 7f 4f cc 4c 89 4f dc 4c 89 57 e4 4c 89 5f ec [ 70.902614] RSP: 002b:00007ffdbd5bb838 EFLAGS: 00010207 [ 70.902616] RAX: 00007ff691820000 RBX: 0000000000000044 RCX: 3f80000000000000 [ 70.902617] RDX: 0000000000000000 RSI: 000055d1049acad4 RDI: 00007ff691820044 [ 70.902618] RBP: 000055d1049aca90 R08: 000055d103fd9a30 R09: 0000000000000000 [ 70.902619] R10: 000000003f800000 R11: 3f800000bf800000 R12: 0000000000000000 [ 70.902620] R13: 000055d103f58320 R14: 0000000000000000 R15: 0000000000000044 Apparently the warnings show after commit 7d09e9725b5dcc8d14e101de931e4969d033a6ad, which explains that the warning is triggered by "very likely a driver bug".
I don't know that this should have gone to stable. Looks like Sasha's bot picked it up. I think it may depend on other ttm changes that are not in stable.
Created attachment 296039 [details] dmesg (kernel 5.11.9, Talos II) Yep, getting these too on my Talos II (ppc64) on kernel 5.11.9. Card is a Radeon HD6670. [...] ------------[ cut here ]------------ WARNING: CPU: 8 PID: 1229 at drivers/gpu/drm/ttm/ttm_bo.c:517 .ttm_bo_release+0x298/0x810 [ttm] Modules linked in: input_leds led_class joydev hid_generic usbhid hid fuse rfkill sd_mod uas usb_storage scsi_mod ecb xts ctr cbc radeon aes_generic libaes evdev snd_hda_codec_hdmi> CPU: 8 PID: 1229 Comm: X Tainted: G W 5.11.9-gentoo-TalosII #1 NIP: c00800001a1a7278 LR: c00800001a1a7264 CTR: 0000000000000000 REGS: c000000144763370 TRAP: 0700 Tainted: G W (5.11.9-gentoo-TalosII) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 44244248 XER: 20040000 CFAR: c000000000bcd1f4 IRQMASK: 0 GPR00: c00800001a1a7264 c000000144763610 c00800001a1ba900 c00800001a1b47e8 GPR04: 00000000ffffffff 0000000028b10000 00000000722135af 0000000000000008 GPR08: 0000000000000001 0000000000000001 0000000000000001 c00000000155d3f8 GPR12: 0000000044244248 c0000007ffffd000 0000000000003fff 0000000000000313 GPR16: 0000000000000010 000000003ed97f00 00000001467b5a30 0000000000000000 GPR20: c000000001206d10 c00000004b960000 c000000001206d18 0000000000000001 GPR24: c00000000101deb8 c00000002d5166c0 c00800001a1b47e8 c00000002d516400 GPR28: 0000000000000000 c00000002a668a28 c00800001a1b41d0 c00000002d516650 NIP [c00800001a1a7278] .ttm_bo_release+0x298/0x810 [ttm] LR [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm] Call Trace: [c000000144763610] [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm] (unreliable) [c0000001447636e0] [c00800001a1ab154] .ttm_bo_move_accel_cleanup+0x2a4/0x570 [ttm] [c0000001447637a0] [c00800001bcdea3c] .radeon_bo_move+0x40c/0x610 [radeon] [c000000144763870] [c00800001a1a621c] .ttm_bo_handle_move_mem+0xac/0x1e0 [ttm] [c000000144763920] [c00800001a1a9570] .ttm_bo_validate+0x1b0/0x260 [ttm] [c000000144763a20] [c00800001bce1480] .radeon_bo_fault_reserve_notify+0x130/0x260 [radeon] [c000000144763ae0] [c00800001bcde518] .radeon_ttm_fault+0x98/0x1b0 [radeon] [c000000144763b70] [c000000000347cf8] .__do_fault+0x58/0x120 [c000000144763bf0] [c00000000034f32c] .handle_mm_fault+0x15ec/0x1f80 Modules linked in: auth_rpcgss nfsv4 dns_resolver nfs nfs_ssc lockd grace sunrpc input_leds led_class joydev hid_generic usbhid hid fuse rfkill sd_mod uas usb_storage scsi_mod ecb > CPU: 1 PID: 1229 Comm: X Tainted: G D W 5.11.9-gentoo-TalosII #1 NIP: c00800001a1a7278 LR: c00800001a1a7264 CTR: c000000000bcd190 REGS: c000000144763370 TRAP: 0700 Tainted: G D W (5.11.9-gentoo-TalosII) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 44044244 XER: 20040000 CFAR: c000000000bcd1f4 IRQMASK: 0 GPR00: c00800001a1a7264 c000000144763610 c00800001a1ba900 c00800001a1b47e8 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 GPR08: 0000000000000001 0000000000000001 0000000000000001 000000007fffffff GPR12: c000000000bcd190 c0000007ffffec00 0000000000000080 0000000146899640 GPR16: 00003fffa7cf1208 00003fffa7c8d440 0000000146899840 00003fffa7c889f8 GPR20: c000000001206d10 c00000004b960000 c000000001206d18 0000000000000001 GPR24: c00000000101deb8 c0000000bd492ac0 c00800001a1b47e8 c0000000bd492800 GPR28: 0000000000000000 c00000002a668a28 0000000000000000 c0000000bd492a50 NIP [c00800001a1a7278] .ttm_bo_release+0x298/0x810 [ttm] LR [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm] Call Trace: [c000000144763610] [c00800001a1a7264] .ttm_bo_release+0x284/0x810 [ttm] (unreliable) [c0000001447636e0] [c00800001a1ab154] .ttm_bo_move_accel_cleanup+0x2a4/0x570 [ttm] [c0000001447637a0] [c00800001bcdea3c] .radeon_bo_move+0x40c/0x610 [radeon] [c000000144763870] [c00800001a1a621c] .ttm_bo_handle_move_mem+0xac/0x1e0 [ttm] [c000000144763920] [c00800001a1a9570] .ttm_bo_validate+0x1b0/0x260 [ttm] [c000000144763a20] [c00800001bce1480] .radeon_bo_fault_reserve_notify+0x130/0x260 [radeon] [c000000144763ae0] [c00800001bcde518] .radeon_ttm_fault+0x98/0x1b0 [radeon] [c000000144763b70] [c000000000347cf8] .__do_fault+0x58/0x120 [c000000144763bf0] [c00000000034f32c] .handle_mm_fault+0x15ec/0x1f80 [c000000144763d40] [c000000000063e84] .do_page_fault+0x2b4/0xa00 [c000000144763e10] [c00000000000c218] handle_page_fault+0x10/0x2c --- interrupt: 300 at 0x3fffa7ca6d24 NIP: 00003fffa7ca6d24 LR: 00003fffa7ca6d00 CTR: 00003fffb21186bc REGS: c000000144763e80 TRAP: 0300 Tainted: G D W (5.11.9-gentoo-TalosII) MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 44028842 XER: 00000000 CFAR: 00003fffb3ac5a58 DAR: 00003fffa6727000 DSISR: 42000000 IRQMASK: 0 GPR00: 00003fffa7ca6d00 00003fffdfe98710 00003fffa7cf7600 00003fffa78a2010 GPR04: 000000014689d8b0 0000000000000011 0000000146fc3b40 0000000000000068 GPR08: 0000000000000001 0000000000000001 0000000000000096 0000000000000000 GPR12: 0000000000000001 00003fffb4cd4810 0000000000000080 0000000146899640 GPR16: 00003fffa7cf1208 00003fffa7c8d440 0000000146899840 00003fffa7c889f8 GPR20: 00003fffdfe98ba8 0000000000000001 0000000146899640 000000014787b3c8 GPR24: 00003fffa6727000 000000014787b300 00003fffdfe98fb0 00003fffdfe98fa0 GPR28: 000000000000002a 000000014770f960 0000000000000068 0000000000000001 NIP [00003fffa7ca6d24] 0x3fffa7ca6d24 LR [00003fffa7ca6d00] 0x3fffa7ca6d00 --- interrupt: 300 Instruction dump: e8898188 48009675 e8410028 39200001 7f43d378 993f0058 48008d61 e8410028 813f009c 39000001 2c290000 7d40409e <0b0a0000> 40820498 39200001 913f0000 irq event stamp: 2872594 hardirqs last enabled at (2872593): [<c000000000bcd694>] ._raw_spin_unlock_irqrestore+0x84/0xc0 hardirqs last disabled at (2872594): [<c000000000bc4834>] .__schedule+0x524/0xe00 softirqs last enabled at (2872210): [<c000000000bcea9c>] .__do_softirq+0x64c/0x6c0 softirqs last disabled at (2872205): [<c0000000000de4b8>] .irq_exit+0x198/0x1f0 ---[ end trace e1da02ebb1a3a186 ]--- # inxi -b System: Kernel: 5.11.9-gentoo-TalosII ppc64 bits: 64 Desktop: MATE 1.24.1 Distro: Gentoo Base System release 2.7 Machine: Type: PowerPC Device System: T2P9D01 REV 1.01 details: PowerNV T2P9D01 REV 1.01 rev: 2.2 (pvr 004e 1202) CPU: Info: 32-Core POWER9 altivec supported [MCP] speed: 2237 MHz min/max: 2154/3800 MHz Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Turks XT [Radeon HD 6670/7670] driver: radeon v: kernel Device-2: ASPEED Graphics Family driver: N/A Display: x11 server: X.Org 1.20.10 driver: ati,radeon unloaded: fbdev,modesetting resolution: 1920x1080~60Hz OpenGL: renderer: AMD TURKS (DRM 2.50.0 / 5.11.9-gentoo-TalosII LLVM 11.0.0) v: 3.2 Mesa 21.0.0 Network: Device-1: Broadcom and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3 Device-2: Broadcom and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Yeah, Alex is right this patch should have never been backported in the first place.
Reverted in stable: commit bec771b5e0901f4b0bc861bcb58056de5151ae3a Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Thu Mar 25 09:52:40 2021 +0100 Revert "drm/ttm: Warn on pinning without holding a reference" This reverts commit 7d09e9725b5dcc8d14e101de931e4969d033a6ad which is commit 57fcd550eb15bce14a7154736379dfd4ed60ae81 upstream. It is causing too many warnings on 5.11.y, so should be dropped for now. Cc: Huang Rui <ray.huang@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Sasha Levin <sashal@kernel.org> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Link: https://lore.kernel.org/r/8c3da8bc-0bf3-496f-1fd6-4f65a07b2d13@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>