Bug 204181 - NULL pointer dereference regression in amdgpu
Summary: NULL pointer dereference regression in amdgpu
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-15 10:11 UTC by Sergey Kondakov
Modified: 2019-10-12 20:19 UTC (History)
23 users (show)

See Also:
Kernel Version: 5.2.1
Tree: Mainline
Regression: No


Attachments
dmesg (164.81 KB, text/plain)
2019-07-15 10:11 UTC, Sergey Kondakov
Details
dmesg with "drm=debug=4" (171.17 KB, text/plain)
2019-07-15 15:43 UTC, Sergey Kondakov
Details
kernel build config (227.25 KB, application/x-config)
2019-07-15 15:43 UTC, Sergey Kondakov
Details
amdgpu parameters (2.83 KB, text/plain)
2019-07-15 15:45 UTC, Sergey Kondakov
Details
X.log (53.91 KB, text/plain)
2019-07-15 15:48 UTC, Sergey Kondakov
Details
lsmem (10.04 KB, text/plain)
2019-07-15 15:50 UTC, Sergey Kondakov
Details
lspci -vv (57.29 KB, text/plain)
2019-07-15 15:50 UTC, Sergey Kondakov
Details
lspci -t -PP -q -k -v (2.57 KB, text/plain)
2019-07-15 15:53 UTC, Sergey Kondakov
Details
/proc/interrupts (5.40 KB, text/plain)
2019-07-15 15:59 UTC, Sergey Kondakov
Details
dmesg with "drm.debug=4" (237.09 KB, text/plain)
2019-07-16 15:29 UTC, Sergey Kondakov
Details
tail -n 2000 from dmesg with "drm.debug=5" (214.64 KB, application/octet-stream)
2019-07-16 16:36 UTC, Sergey Kondakov
Details
dmesg_2019-08-02-amdgpu_fail_on_patched_5.2.5 (181.40 KB, text/plain)
2019-08-02 02:21 UTC, Sergey Kondakov
Details
dmesg_2019-08-04-amdgpu-new_dereference-with-shadowprimary (175.65 KB, text/plain)
2019-08-04 05:17 UTC, Sergey Kondakov
Details
dmesg_2019-09-26-amdgpu-old_dereference_on_patched_5.3.1 (197.49 KB, text/plain)
2019-09-27 03:50 UTC, Sergey Kondakov
Details

Description Sergey Kondakov 2019-07-15 10:11:49 UTC
Created attachment 283693 [details]
dmesg

After updating from 5.1 to 5.2.1 in about 5-10 minutes of watching a Youtube video in Firefox I now get complete lock-up of video output and inability to shutdown using power button. Using "magic keys" allows me to reboot and get kernel log via `journalctl -b -1 -k`, here is relevant part:
BUG: kernel NULL pointer dereference, address: 00000000000002b4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 8200 Comm: kworker/u16:1 Tainted: G          IO      5.2.1-1383.gd5bbc26-HSF #1 openSUSE Tumbleweed (unreleased)
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
Workqueue: events_unbound commit_work
RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Code: 04 00 00 49 8b bc 02 80 02 00 00 48 8b 07 48 8b 40 50 e8 ed 88 a8 d6 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00 00
RSP: 0018:ffffa5568b1b7c00 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002
RDX: ffffffffc07fbd50 RSI: 0000000000000000 RDI: ffff8e9ee9500000
RBP: ffff8e9d90618000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffa5568b1b7c30 R11: 0000000000000000 R12: ffff8e9ee9500000
R13: ffff8e9ededb4448 R14: ffff8e9e47e10c00 R15: ffff8e9ededa0000
FS:  0000000000000000(0000) GS:ffff8e9eee000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 00000003bcf46000 CR4: 00000000000406e0
Call Trace:
 dc_commit_state+0x79/0xb0 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0x3c0/0xdb0 [amdgpu]
 ? finish_task_switch+0x74/0x300
 ? __switch_to+0x152/0x4e0
 ? __switch_to_asm+0x34/0x70
 ? __lock_acquire+0x3c8/0x7a0
 ? find_held_lock+0x32/0x90
 ? find_held_lock+0x32/0x90
 ? sched_clock+0x5/0x10
 ? mark_held_locks+0x2d/0x80
 ? preempt_count_sub+0x98/0xe0
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? wait_for_completion_timeout+0xe9/0x110
 ? commit_tail+0x3c/0x70
 commit_tail+0x3c/0x70
 process_one_work+0x271/0x5f0
 worker_thread+0x4a/0x3d0
 ? process_one_work+0x5f0/0x5f0
 kthread+0x118/0x140
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x27/0x50
Modules linked in: af_packet ts_bm xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq zram snd_pcm_oss rfcomm snd_mixer_oss it87 hwmon_vid bnep msr rc_avermedia tuner_simple tuner_types amd64_edac_mod tuner tda7432 edac_mce_amd btusb tvaudio kvm_amd ath9k btrtl btbcm msp3400 ath9k_common btintel bluetooth ath9k_hw kvm irqbypass ath bttv tea575x joydev tveeprom videobuf_dma_sg snd_usb_audio videobuf_core snd_usbmidi_lib rc_core snd_rawmidi snd_hda_codec_realtek mac80211 snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi v4l2_common snd_seq_device
 snd_hda_intel videodev sp5100_tco pcspkr snd_hda_codec wmi_bmof mxm_wmi amdgpu fam15h_power k10temp media i2c_piix4 cfg80211 r8169 snd_hda_core gpu_sched realtek snd_hwdep libphy ttm rfkill snd_pcm mac_hid hid_generic usbhid uas usb_storage ohci_pci serio_raw sd_mod ehci_pci ohci_hcd xhci_pci ehci_hcd xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs
CR2: 00000000000002b4
---[ end trace 0633d97cb3f2d2d6 ]---
RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Code: 04 00 00 49 8b bc 02 80 02 00 00 48 8b 07 48 8b 40 50 e8 ed 88 a8 d6 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00 00
RSP: 0018:ffffa5568b1b7c00 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000002
RDX: ffffffffc07fbd50 RSI: 0000000000000000 RDI: ffff8e9ee9500000
RBP: ffff8e9d90618000 R08: 0000000000000001 R09: 0000000000000000
R10: ffffa5568b1b7c30 R11: 0000000000000000 R12: ffff8e9ee9500000
R13: ffff8e9ededb4448 R14: ffff8e9e47e10c00 R15: ffff8e9ededa0000
FS:  0000000000000000(0000) GS:ffff8e9eee000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002b4 CR3: 00000003bcf46000 CR4: 00000000000406e0
Comment 1 Nicholas Kazlauskas 2019-07-15 13:07:54 UTC
Do you mind posting an dmesg log with drm=debug=4 as part of your boot parameters?

An xorg log would be good too if applicable.

I'm curious to know what the actual sequence / system setup is for reproducing this as this isn't really a typical sequence. I think you'd run into other NULL pointer dereferences even if this one is guarded.

I think the stream itself is NULL and it shouldn't be in the context.
Comment 2 Sergey Kondakov 2019-07-15 15:43:03 UTC
Created attachment 283695 [details]
dmesg with "drm=debug=4"
Comment 3 Sergey Kondakov 2019-07-15 15:43:30 UTC
Created attachment 283697 [details]
kernel build config
Comment 4 Sergey Kondakov 2019-07-15 15:45:40 UTC
Created attachment 283699 [details]
amdgpu parameters

These doesn't seem to change anything about the hang. Although, maybe with larger limits of scheduling (max_num_of_queues_per_device, sched_hw_submission, sched_jobs) hang happens sooner but I'm not sure.
Comment 5 Sergey Kondakov 2019-07-15 15:48:51 UTC
Created attachment 283701 [details]
X.log

amdgpu has TearFree and VariableRefresh (no LCD support though) enabled. Dual-screen with 2 60 fps, VA and TN, 1080p LCDs, recently overclocked to ~73 and ~72 fps via CVT-1.2 lines on both Linux and Windows.
Comment 6 Sergey Kondakov 2019-07-15 15:50:36 UTC
Created attachment 283703 [details]
lsmem
Comment 7 Sergey Kondakov 2019-07-15 15:50:56 UTC
Created attachment 283705 [details]
lspci -vv
Comment 8 Sergey Kondakov 2019-07-15 15:53:09 UTC
Created attachment 283707 [details]
lspci -t -PP -q -k -v
Comment 9 Sergey Kondakov 2019-07-15 15:56:11 UTC
(In reply to Nicholas Kazlauskas from comment #1)
> Do you mind posting an dmesg log with drm=debug=4 as part of your boot
> parameters?
> 
> An xorg log would be good too if applicable.
> 
> I'm curious to know what the actual sequence / system setup is for
> reproducing this as this isn't really a typical sequence. I think you'd run
> into other NULL pointer dereferences even if this one is guarded.
> 
> I think the stream itself is NULL and it shouldn't be in the context.

I don't think that putting 'drm=debug=4' into boot cmd has changed anything but here's some more data. I also stumbled into another baffling regression (bug #203703) recently (from 5.0 to 5.1) concerning network packet scheduling (fq_codel qdics) that halts affected Ethernet device, it also gives out repeatable kernel trace on random network activity unless qdics is changed on dumb "pfifo_fast" early on, similarly how this gives out same repeatable amdgpu trace on some random GPU activity. Weird.
Comment 10 Nicholas Kazlauskas 2019-07-15 15:58:19 UTC
Thanks for all the logs.

I meant drm.debug=4 actually, the drm=debug=4 was a typo on my part - sorry!
Comment 11 Sergey Kondakov 2019-07-15 15:59:08 UTC
Created attachment 283709 [details]
/proc/interrupts
Comment 12 Sergey Kondakov 2019-07-16 15:29:06 UTC
Created attachment 283741 [details]
dmesg with "drm.debug=4"

Here's actual debug dmesg. pci subsystem uses 'pci=x=y' syntax, so I wouldn't have thought that for drm that wouldn't be valid.

Right when I wanted to upload the first dump from hang with debug that happened in >16 hours of uptime and >30 minutes of video, it crashed before Firefox even had a chance to render single page which happened to be same Youtube page everything hanged on because it starts at last opened page. So, after >30 minutes it wasn't even a second to hang again. This dump is from that time.

Haven't tried launching a local video player or a 3D app. Without opening Youtube in Firefox or video opening Firefox, doing all 2D non-accelerated desktop stuff doesn't seem to trigger it.
Comment 13 Sergey Kondakov 2019-07-16 16:36:41 UTC
Created attachment 283745 [details]
tail -n 2000 from dmesg with "drm.debug=5"

drm.debug=4 seem to produce only 1 new relevant line:
"[drm:dc_commit_state [amdgpu]] dc_commit_state: 2 streams"
so I tried increasing it. debug=5 creates a horrible stream that bogs down system with i/o load from journald but it sure did write some more at the moment of hang. I'm not going any further than that, though.
Comment 14 Sergey Kondakov 2019-07-16 16:52:30 UTC
(In reply to Nicholas Kazlauskas from comment #10)
> Thanks for all the logs.
> 
> I meant drm.debug=4 actually, the drm=debug=4 was a typo on my part - sorry!

So, I've got all I could on this.

Could this be relevant to my recent LCD overclock ? I haven't tried going back to 60 fps yet.
cvt executable and modes/xf86cvt.c in X-server weren't updated for years and can't even produce cvt-1.2 modes or any useful "reduced blanking" modes with them, so I had to go for things like: 
https://github.com/kevinlekiller/cvt_modeline_calculator_12 and
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=899066
On Windows I had to use https://www.monitortests.com/forum/Thread-Custom-Resolution-Utility-CRU because AMD driver refuses to use custom modes it itself generates with "unsupported" (yeah, right…) "error" naggings.
Comment 15 Nicholas Kazlauskas 2019-07-16 16:55:14 UTC
Thanks for the logs. I don't think this is related to your overclock.

Since this behavior wasn't previously observed during our 5.2 testing I think that either a patch got lost or changed during the submission process, or something from 5.3 was backported into 5.2 that shouldn't have been.

I don't think it's necessairly setup specific.
Comment 16 Sergey Kondakov 2019-07-24 18:33:04 UTC
(In reply to Nicholas Kazlauskas from comment #15)
> Thanks for the logs. I don't think this is related to your overclock.
> 
> Since this behavior wasn't previously observed during our 5.2 testing I
> think that either a patch got lost or changed during the submission process,
> or something from 5.3 was backported into 5.2 that shouldn't have been.
> 
> I don't think it's necessairly setup specific.

That means that you were able to reproduce it ? If so, any known workaround or ETA on the fix ? Is rc1 of 5.3 affected ? Any plans on backport to 5.2.x ?
Comment 17 Yann HN 2019-07-25 10:52:44 UTC
I was facing the same issue, Complete Video output stop, X Server process went unresponsive.

I did a Hardware switch a day before.
GPU: PNY GTX 1060 -> Asus Vega 56
Mainboard: Asus Z370P -> MSI Z390A Pro

A friend suggested me to install some packages to enhance the GPU Support, one of them was "xf86-video-amdgpu".

Seams like that package was responsible for the issues.
Removing it fixed the issue without any other (notable) effects.

Some more info for context:
X: X.Org X Server 1.20.5
Desktop: plasmashell 5.16.3
Kernel: 5.2.2-arch1-1-ARCH #1 SMP PREEMPT Sun Jul 21 19:18:34 UTC 2019 x86_64 GNU/Linux
Comment 18 Michel Dänzer 2019-07-25 14:21:55 UTC
(In reply to Yann HN from comment #17)
> A friend suggested me to install some packages to enhance the GPU Support,
> one of them was "xf86-video-amdgpu".
> 
> Seams like that package was responsible for the issues.
> Removing it fixed the issue without any other (notable) effects.

Did you get the same amdgpu_dm_atomic_commit_tail => dc_commit_state => dc_stream_log NULL pointer dereference as reported here?

If yes, this is a kernel driver bug, xf86-video-amdgpu just triggers it / the Xorg modesetting driver avoids it somehow.

If not, please file your own report at https://bugs.freedesktop.org/enter_bug.cgi?product=xorg&component=Driver/AMDgpu and attach the corresponding Xorg log file and output of dmesg.
Comment 19 Yann HN 2019-07-25 15:42:40 UTC
(In reply to Michel Dänzer from comment #18)
> (In reply to Yann HN from comment #17)
> > A friend suggested me to install some packages to enhance the GPU Support,
> > one of them was "xf86-video-amdgpu".
> > 
> > Seams like that package was responsible for the issues.
> > Removing it fixed the issue without any other (notable) effects.
> 
> Did you get the same amdgpu_dm_atomic_commit_tail => dc_commit_state =>
> dc_stream_log NULL pointer dereference as reported here?
> 
> If yes, this is a kernel driver bug, xf86-video-amdgpu just triggers it /
> the Xorg modesetting driver avoids it somehow.
> 
> If not, please file your own report at
> https://bugs.freedesktop.org/enter_bug.cgi?product=xorg&component=Driver/
> AMDgpu and attach the corresponding Xorg log file and output of dmesg.

Yes, i re installed the package and was able to reproduce the error pretty fast, here the whole stack trace(package being the source of the issue confirmed):

Jul 25 17:38:12 arch-workstation kernel: BUG: kernel NULL pointer dereference, address: 00000000000002b4
Jul 25 17:38:12 arch-workstation kernel: #PF: supervisor read access in kernel mode
Jul 25 17:38:12 arch-workstation kernel: #PF: error_code(0x0000) - not-present page
Jul 25 17:38:12 arch-workstation kernel: PGD 0 P4D 0 
Jul 25 17:38:12 arch-workstation kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jul 25 17:38:12 arch-workstation kernel: CPU: 3 PID: 296 Comm: kworker/u24:4 Not tainted 5.2.2-arch1-1-ARCH #1
Jul 25 17:38:12 arch-workstation kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B98/Z390-A PRO (MS-7B98), BIOS 1.60 03/21/2019
Jul 25 17:38:12 arch-workstation kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Jul 25 17:38:12 arch-workstation kernel: RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Jul 25 17:38:12 arch-workstation kernel: Code: 04 00 00 49 8b bc 02 80 02 00 00 48 8b 07 48 8b 40 50 e8 1d 35 f7 cd b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00>
Jul 25 17:38:12 arch-workstation kernel: RSP: 0018:ffff9ced83f5faf0 EFLAGS: 00010202
Jul 25 17:38:12 arch-workstation kernel: RAX: 0000000000000000 RBX: ffff8b9687199000 RCX: 0000000000000002
Jul 25 17:38:12 arch-workstation kernel: RDX: ffffffffc1112710 RSI: 0000000000000000 RDI: ffff8b9687199000
Jul 25 17:38:12 arch-workstation kernel: RBP: ffff8b95c7868000 R08: ffff8b95c7868000 R09: 0000000000000000
Jul 25 17:38:12 arch-workstation kernel: R10: ffff8b95c7868000 R11: 0000000000000018 R12: 0000000000000001
Jul 25 17:38:12 arch-workstation kernel: R13: ffff9ced83f5fd58 R14: ffff8b967420cff0 R15: 0000000000000000
Jul 25 17:38:12 arch-workstation kernel: FS:  0000000000000000(0000) GS:ffff8b968d8c0000(0000) knlGS:0000000000000000
Jul 25 17:38:12 arch-workstation kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 25 17:38:12 arch-workstation kernel: CR2: 00000000000002b4 CR3: 000000080c284006 CR4: 00000000003606e0
Jul 25 17:38:12 arch-workstation kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 25 17:38:12 arch-workstation kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jul 25 17:38:12 arch-workstation kernel: Call Trace:
Jul 25 17:38:12 arch-workstation kernel:  dc_commit_state+0x9a/0x5a0 [amdgpu]
Jul 25 17:38:12 arch-workstation kernel:  ? dm_plane_helper_cleanup_fb+0xa3/0x120 [amdgpu]
Jul 25 17:38:12 arch-workstation kernel:  amdgpu_dm_atomic_commit_tail+0xc5d/0x1a10 [amdgpu]
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x34/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? __switch_to_asm+0x40/0x70
Jul 25 17:38:12 arch-workstation kernel:  ? _raw_spin_unlock_irq+0x1d/0x30
Jul 25 17:38:12 arch-workstation kernel:  ? finish_task_switch+0x84/0x2d0
Jul 25 17:38:12 arch-workstation kernel:  ? preempt_schedule_common+0x32/0x80
Jul 25 17:38:12 arch-workstation kernel:  ? commit_tail+0x3c/0x70 [drm_kms_helper]
Jul 25 17:38:12 arch-workstation kernel:  commit_tail+0x3c/0x70 [drm_kms_helper]
Jul 25 17:38:12 arch-workstation kernel:  process_one_work+0x1d1/0x3e0
Jul 25 17:38:12 arch-workstation kernel:  worker_thread+0x4a/0x3d0
Jul 25 17:38:12 arch-workstation kernel:  kthread+0xfb/0x130
Jul 25 17:38:12 arch-workstation kernel:  ? process_one_work+0x3e0/0x3e0
Jul 25 17:38:12 arch-workstation kernel:  ? kthread_park+0x90/0x90
Jul 25 17:38:12 arch-workstation kernel:  ret_from_fork+0x35/0x40
Jul 25 17:38:12 arch-workstation kernel: Modules linked in: fuse xt_nat xt_tcpudp veth xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defra>
Jul 25 17:38:12 arch-workstation kernel:  snd_usbmidi_lib ppdev iTCO_wdt snd_hda_codec iTCO_vendor_support snd_rawmidi snd_seq_device media snd_hda_core agpgart snd_hwdep syscopyarea snd_pcm aesni_intel sysfillrect snd_timer aes_x86_64 c>
Jul 25 17:38:12 arch-workstation kernel: CR2: 00000000000002b4
Jul 25 17:38:12 arch-workstation kernel: ---[ end trace 8659bfc7daefd7ef ]---
Jul 25 17:38:12 arch-workstation kernel: RIP: 0010:dc_stream_log+0x6/0xb0 [amdgpu]
Jul 25 17:38:12 arch-workstation kernel: Code: 04 00 00 49 8b bc 02 80 02 00 00 48 8b 07 48 8b 40 50 e8 1d 35 f7 cd b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 <8b> 86 b4 02 00 00 48 89 f3 48 89 f2 8b 8e 10 01 00 00 bf 04 00>
Jul 25 17:38:12 arch-workstation kernel: RSP: 0018:ffff9ced83f5faf0 EFLAGS: 00010202
lines 802-866/1002 87%
Comment 20 Nicholas Kazlauskas 2019-07-25 15:50:25 UTC
I haven't been able to reproduce this on my setup yet with xf86-video-amdgpu on Arch's 5.2.2 kernel. I don't see anything really missing between that and staging that could affect this issue.

It would probably help to have a dmesg log with drm.debug=0x54 - this will enable DRM atomic state debug prints.

You'll probably need to increase your log buffer size to get the state relevant to the crash.

ie: " log_buf_len=64M drm.debug=84 "
Comment 21 Frank Steinborn 2019-07-26 12:23:43 UTC
Facing the same issue (Vega64). I captured a dmesg (drm.debug=0x54) with lockup and uploaded it here:

https://nognu.de/p/dmesg_amdgpu.txt

Thanks!
Comment 22 Nicholas Kazlauskas 2019-07-26 16:02:22 UTC
Thanks for the log!

I can reproduce the issue now by emulating the sequence using IGT. It doesn't seem to show up in desktop usage for me.
Comment 23 Sergey Kondakov 2019-07-30 21:41:57 UTC
(In reply to Nicholas Kazlauskas from comment #22)
> Thanks for the log!
> 
> I can reproduce the issue now by emulating the sequence using IGT. It
> doesn't seem to show up in desktop usage for me.

Indeed. I tried using modeset X11 driver and got a bunch of errors in Xorg.0.log about inability to do "page flips", so I've put `PageFlip false` for it and `EnablePageFlip false` for amdgpu with removal of 'TearFree true' (why it isn't always on by default ?), just in case. No hangs for about 24 hours even with a lot of Youtube in Firefox even with amdgpu.

There seem to be a lot of patches for AMD GPUs queued for 5.2.5, any chance of the complete fix among them ?
Comment 24 Nicholas Kazlauskas 2019-07-31 16:28:15 UTC
This should be fixed with the series linked below:

https://patchwork.freedesktop.org/series/64505/

But it still needs review and backporting to older kernels.
Comment 25 Sergey Kondakov 2019-08-01 06:13:54 UTC
(In reply to Nicholas Kazlauskas from comment #24)
> This should be fixed with the series linked below:
> 
> https://patchwork.freedesktop.org/series/64505/
> 
> But it still needs review and backporting to older kernels.

So, I've patched my 5.2.5 kernel package with that set and re-enabled page flipping. So far, everything seems fine. When it's merged and released, this issue may be closed. Thanks !
Comment 26 Sergey Kondakov 2019-08-02 02:21:31 UTC
Created attachment 284083 [details]
dmesg_2019-08-02-amdgpu_fail_on_patched_5.2.5

(In reply to Nicholas Kazlauskas from comment #24)
> This should be fixed with the series linked below:
> 
> https://patchwork.freedesktop.org/series/64505/
> 
> But it still needs review and backporting to older kernels.

Celebration might have been premature. Hours later I've got another freeze with different error in amdgpu. Only this time, mouse cursor was movable over frozen frame right until I tried switching VT. Here's trace:
BUG: unable to handle page fault for address: 0000000800000184
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 21044 Comm: kworker/u16:0 Tainted: G        W IO      5.2.5-1396.g79b6a9c-HSF #1 openSUSE Tumbleweed (unreleased)
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
Workqueue: events_unbound commit_work
RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2e6/0xd60 [amdgpu]
Code: ff 48 89 de 48 8b b8 40 43 01 00 e8 94 3b 09 00 49 8b 54 24 08 48 89 9d 30 fe ff ff 8b 82 00 09 00 00 85 c0 0f 85 fb fd ff ff <80> bb 80 01 00 00 01 0f 86 a0 00 00 00 48 b9 00 00 00 00 01 00 00
RSP: 0018:ffff98198b837c30 EFLAGS: 00010202
RAX: 0000000000000023 RBX: 0000000800000004 RCX: ffff8aca7b146f18
RDX: ffff8acc2a2d9000 RSI: ffffffffc0994f00 RDI: 0000000000000002
RBP: ffff98198b837e10 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aca97bf3540
R13: ffff8acc114b1000 R14: ffff8acc035da000 R15: 0000000000000006
FS:  0000000000000000(0000) GS:ffff8acc2e000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000800000184 CR3: 00000003747c2000 CR4: 00000000000406e0
Call Trace:
 ? mark_held_locks+0x2d/0x80
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? finish_task_switch+0xa2/0x300
 ? __lock_acquire+0x3c3/0x7c0
 ? find_held_lock+0x32/0x90
 ? find_held_lock+0x32/0x90
 ? sched_clock+0x5/0x10
 ? mark_held_locks+0x2d/0x80
 ? preempt_count_sub+0x98/0xe0
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? wait_for_completion_timeout+0xe9/0x110
 ? commit_tail+0x3c/0x70
 commit_tail+0x3c/0x70
 process_one_work+0x271/0x5f0
 worker_thread+0x4a/0x3d0
 ? process_one_work+0x5f0/0x5f0
 kthread+0x118/0x140
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x27/0x50
Modules linked in: r8169 binfmt_misc af_packet ts_bm xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss zram snd_mixer_oss bnep it87 hwmon_vid msr joydev amd64_edac_mod edac_mce_amd btusb btrtl btbcm rc_avermedia btintel kvm_amd tuner_simple tuner_types bluetooth snd_usb_audio tuner kvm tda7432 snd_usbmidi_lib snd_rawmidi irqbypass tvaudio msp3400 snd_seq_device ath9k bttv ath9k_common ath9k_hw tea575x tveeprom ath videobuf_dma_sg videobuf_core rc_core v4l2_common pcspkr wmi_bmof videodev mxm_wmi mac80211 fam15h_power k10temp sp5100_tco
 media amdgpu i2c_piix4 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio snd_hda_intel cfg80211 snd_hda_codec snd_hda_core realtek gpu_sched libphy snd_hwdep ttm rfkill snd_pcm mac_hid hid_generic usbhid uas usb_storage ohci_pci serio_raw sd_mod ohci_hcd ehci_pci ehci_hcd xhci_pci xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs [last unloaded: r8169]
CR2: 0000000800000184
---[ end trace 7da703104c8acbc9 ]---
RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2e6/0xd60 [amdgpu]
Code: ff 48 89 de 48 8b b8 40 43 01 00 e8 94 3b 09 00 49 8b 54 24 08 48 89 9d 30 fe ff ff 8b 82 00 09 00 00 85 c0 0f 85 fb fd ff ff <80> bb 80 01 00 00 01 0f 86 a0 00 00 00 48 b9 00 00 00 00 01 00 00
RSP: 0018:ffff98198b837c30 EFLAGS: 00010202
RAX: 0000000000000023 RBX: 0000000800000004 RCX: ffff8aca7b146f18
RDX: ffff8acc2a2d9000 RSI: ffffffffc0994f00 RDI: 0000000000000002
RBP: ffff98198b837e10 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aca97bf3540
R13: ffff8acc114b1000 R14: ffff8acc035da000 R15: 0000000000000006
FS:  0000000000000000(0000) GS:ffff8acc2e000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000800000184 CR3: 00000003747c2000 CR4: 00000000000406e0

How ironic for it to manifest again during discussion video on Youtube about recent "JoJo" Part 5 finale's "perpetually trapped in the repeating nightmare of a frozen time" theme…
Comment 27 Sergey Kondakov 2019-08-04 05:17:02 UTC
Created attachment 284153 [details]
dmesg_2019-08-04-amdgpu-new_dereference-with-shadowprimary

So, I've been using explicitly disabled "EnablePageFlip" and "TearFree" options as workaround for the original dereference but then decided to try out "ShadowPrimary" during fiddling with mvtools' motion-interpolation optimization in mpv, since page flipping is disabled anyway. But the result was ANOTHER null pointer dereference mere seconds after login:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 3272 Comm: X:cs0 Tainted: G          IO      5.2.5-1407.g79b6a9c-HSF #1 openSUSE Tumbleweed
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260 [amdgpu]
Code: 89 08 48 8d 4a 40 48 89 48 08 48 89 42 40 48 8b 78 f0 c6 40 10 00 4c 8b a7 80 06 00 00 4d 85 e4 74 08 4d 8b a4 24 40 04 00 00 <4d> 8b 6c 24 08 31 f6 49 8b 95 80 06 00 00 48 85 d2 74 0f 48 8b 92
RSP: 0018:ffffafc2478aba10 EFLAGS: 00010246
RAX: ffff98742e20e670 RBX: ffff98742e20e658 RCX: ffff98744fc66040
RDX: ffff98744fc66000 RSI: ffff98742e20e638 RDI: ffff9873a295f800
RBP: ffff987459e00000 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffafc2478abb58 R14: ffff98744fc66000 R15: ffffafc2478abb58
FS:  00007f3ee03d7700(0000) GS:ffff98746de00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000003f27aa000 CR4: 00000000000406e0
Call Trace:
 amdgpu_cs_vm_handling+0x308/0x440 [amdgpu]
 amdgpu_cs_ioctl+0x154/0xa10 [amdgpu]
 ? amdgpu_cs_vm_handling+0x440/0x440 [amdgpu]
 drm_ioctl_kernel+0xaa/0xf0
 drm_ioctl+0x208/0x385
 ? amdgpu_cs_vm_handling+0x440/0x440 [amdgpu]
 ? _raw_spin_unlock_irqrestore+0x59/0x70
 ? preempt_count_sub+0x98/0xe0
 ? _raw_spin_unlock_irqrestore+0x46/0x70
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 do_vfs_ioctl+0x3ed/0x720
 ? __fget+0xf9/0x1b0
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x66/0xc0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f3ee641c7c7
Code: 00 00 90 48 8b 05 d1 86 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 86 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f3ee03d6a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f3ee03d6a70 RCX: 00007f3ee641c7c7
RDX: 00007f3ee03d6a70 RSI: 00000000c0186444 RDI: 000000000000000e
RBP: 00000000c0186444 R08: 00007f3ee03d6b80 R09: 0000000000000020
R10: 00007f3ee03d6b80 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000e R14: 000055d55e6f8bf0 R15: 000055d55e6f91a8
Modules linked in: af_packet xt_pkttype xt_string nf_nat_ftp nf_conntrack_ftp xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss msr bnep it87 hwmon_vid zram amd64_edac_mod edac_mce_amd kvm_amd kvm rc_avermedia tuner_simple tuner_types irqbypass tuner tda7432 btusb btrtl btbcm btintel tvaudio msp3400 bluetooth snd_usb_audio ath9k joydev bttv ath9k_common snd_usbmidi_lib tea575x ath9k_hw tveeprom snd_rawmidi videobuf_dma_sg mxm_wmi wmi_bmof pcspkr ath videobuf_core snd_seq_device k10temp fam15h_power rc_core snd_hda_codec_realtek v4l2_common snd_hda_codec_generic
 sp5100_tco snd_hda_codec_hdmi ledtrig_audio mac80211 amdgpu videodev media i2c_piix4 snd_hda_intel cfg80211 snd_hda_codec r8169 snd_hda_core realtek snd_hwdep libphy snd_pcm gpu_sched rfkill ttm mac_hid hid_generic usbhid uas usb_storage ohci_pci serio_raw sd_mod ehci_pci ohci_hcd ehci_hcd xhci_pci xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs
CR2: 0000000000000008
---[ end trace a7f0ed14134a76ad ]---
RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260 [amdgpu]
Code: 89 08 48 8d 4a 40 48 89 48 08 48 89 42 40 48 8b 78 f0 c6 40 10 00 4c 8b a7 80 06 00 00 4d 85 e4 74 08 4d 8b a4 24 40 04 00 00 <4d> 8b 6c 24 08 31 f6 49 8b 95 80 06 00 00 48 85 d2 74 0f 48 8b 92
RSP: 0018:ffffafc2478aba10 EFLAGS: 00010246
RAX: ffff98742e20e670 RBX: ffff98742e20e658 RCX: ffff98744fc66040
RDX: ffff98744fc66000 RSI: ffff98742e20e638 RDI: ffff9873a295f800
RBP: ffff987459e00000 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffafc2478abb58 R14: ffff98744fc66000 R15: ffffafc2478abb58
FS:  00007f3ee03d7700(0000) GS:ffff98746de00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000003f27aa000 CR4: 00000000000406e0
Comment 28 vr00m 2019-08-07 17:43:34 UTC
I experienced issues after upgrading kernel from 5.1 to 5.2 on my notebook with 2500 U. I tried kernel boot param iommu=soft and that fixed it.
Comment 29 bl0rp 2019-08-14 06:43:51 UTC
(In reply to vr00m from comment #28)
> I experienced issues after upgrading kernel from 5.1 to 5.2 on my notebook
> with 2500 U. I tried kernel boot param iommu=soft and that fixed it.


I've encountered this issue with kernel 5.2 (tried 5.2.8 just now) and also have a Ryzen 5 2500U notebook (Huawei Matebook D 14" (AMD)). Running Manjaro. The login screen appears fine, but after that, black screen. I know nothing's locked up because I was able to launch GZDoom from typing in the dark in the whisker menu and heard the sounds of Doom, or at least the title screen.
Comment 30 Andrey Grodzovsky 2019-08-14 19:06:17 UTC
(In reply to Sergey Kondakov from comment #27)
> Created attachment 284153 [details]
> dmesg_2019-08-04-amdgpu-new_dereference-with-shadowprimary
> 
> So, I've been using explicitly disabled "EnablePageFlip" and "TearFree"
> options as workaround for the original dereference but then decided to try
> out "ShadowPrimary" during fiddling with mvtools' motion-interpolation
> optimization in mpv, since page flipping is disabled anyway. But the result
> was ANOTHER null pointer dereference mere seconds after login:
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 1 PID: 3272 Comm: X:cs0 Tainted: G          IO     
> 5.2.5-1407.g79b6a9c-HSF #1 openSUSE Tumbleweed
> Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS
> F14e 09/09/2014
> RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260 [amdgpu]
> Code: 89 08 48 8d 4a 40 48 89 48 08 48 89 42 40 48 8b 78 f0 c6 40 10 00 4c
> 8b a7 80 06 00 00 4d 85 e4 74 08 4d 8b a4 24 40 04 00 00 <4d> 8b 6c 24 08 31
> f6 49 8b 95 80 06 00 00 48 85 d2 74 0f 48 8b 92
> RSP: 0018:ffffafc2478aba10 EFLAGS: 00010246
> RAX: ffff98742e20e670 RBX: ffff98742e20e658 RCX: ffff98744fc66040
> RDX: ffff98744fc66000 RSI: ffff98742e20e638 RDI: ffff9873a295f800
> RBP: ffff987459e00000 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffafc2478abb58 R14: ffff98744fc66000 R15: ffffafc2478abb58
> FS:  00007f3ee03d7700(0000) GS:ffff98746de00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000003f27aa000 CR4: 00000000000406e0
> Call Trace:
>  amdgpu_cs_vm_handling+0x308/0x440 [amdgpu]
>  amdgpu_cs_ioctl+0x154/0xa10 [amdgpu]
>  ? amdgpu_cs_vm_handling+0x440/0x440 [amdgpu]
>  drm_ioctl_kernel+0xaa/0xf0
>  drm_ioctl+0x208/0x385
>  ? amdgpu_cs_vm_handling+0x440/0x440 [amdgpu]
>  ? _raw_spin_unlock_irqrestore+0x59/0x70
>  ? preempt_count_sub+0x98/0xe0
>  ? _raw_spin_unlock_irqrestore+0x46/0x70
>  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
>  do_vfs_ioctl+0x3ed/0x720
>  ? __fget+0xf9/0x1b0
>  ksys_ioctl+0x5e/0x90
>  __x64_sys_ioctl+0x16/0x20
>  do_syscall_64+0x66/0xc0
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f3ee641c7c7
> Code: 00 00 90 48 8b 05 d1 86 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff
> ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d a1 86 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007f3ee03d6a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007f3ee03d6a70 RCX: 00007f3ee641c7c7
> RDX: 00007f3ee03d6a70 RSI: 00000000c0186444 RDI: 000000000000000e
> RBP: 00000000c0186444 R08: 00007f3ee03d6b80 R09: 0000000000000020
> R10: 00007f3ee03d6b80 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000000000e R14: 000055d55e6f8bf0 R15: 000055d55e6f91a8
> Modules linked in: af_packet xt_pkttype xt_string nf_nat_ftp
> nf_conntrack_ftp xt_tcpudp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack
> ebtable_nat ip6table_nat ip6table_mangle ip6table_raw ip6table_security
> iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables
> scsi_transport_iscsi ip6table_filter ip6_tables iptable_filter ip_tables
> x_tables bpfilter snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> snd_pcm_oss snd_mixer_oss msr bnep it87 hwmon_vid zram amd64_edac_mod
> edac_mce_amd kvm_amd kvm rc_avermedia tuner_simple tuner_types irqbypass
> tuner tda7432 btusb btrtl btbcm btintel tvaudio msp3400 bluetooth
> snd_usb_audio ath9k joydev bttv ath9k_common snd_usbmidi_lib tea575x
> ath9k_hw tveeprom snd_rawmidi videobuf_dma_sg mxm_wmi wmi_bmof pcspkr ath
> videobuf_core snd_seq_device k10temp fam15h_power rc_core
> snd_hda_codec_realtek v4l2_common snd_hda_codec_generic
>  sp5100_tco snd_hda_codec_hdmi ledtrig_audio mac80211 amdgpu videodev media
> i2c_piix4 snd_hda_intel cfg80211 snd_hda_codec r8169 snd_hda_core realtek
> snd_hwdep libphy snd_pcm gpu_sched rfkill ttm mac_hid hid_generic usbhid uas
> usb_storage ohci_pci serio_raw sd_mod ehci_pci ohci_hcd ehci_hcd xhci_pci
> xhci_hcd wmi exfat(O) l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel
> udp_tunnel pppox ppp_generic slhc vhba(O) uinput sg nbd dm_multipath
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua ecryptfs
> CR2: 0000000000000008
> ---[ end trace a7f0ed14134a76ad ]---
> RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260 [amdgpu]
> Code: 89 08 48 8d 4a 40 48 89 48 08 48 89 42 40 48 8b 78 f0 c6 40 10 00 4c
> 8b a7 80 06 00 00 4d 85 e4 74 08 4d 8b a4 24 40 04 00 00 <4d> 8b 6c 24 08 31
> f6 49 8b 95 80 06 00 00 48 85 d2 74 0f 48 8b 92
> RSP: 0018:ffffafc2478aba10 EFLAGS: 00010246
> RAX: ffff98742e20e670 RBX: ffff98742e20e658 RCX: ffff98744fc66040
> RDX: ffff98744fc66000 RSI: ffff98742e20e638 RDI: ffff9873a295f800
> RBP: ffff987459e00000 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffafc2478abb58 R14: ffff98744fc66000 R15: ffffafc2478abb58
> FS:  00007f3ee03d7700(0000) GS:ffff98746de00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000003f27aa000 CR4: 00000000000406e0

Sergey, I tried to reproduce you latest issue on Ellsmere (Polaris 10) with "ShadowPrimary" enabled flip disabled and didn't observe any crash.
In case you built your own kernel can you give me the output of this command -

Run gdb on amdgpu.ko
gdb drivers/gpu/drm/amd/amdgpu/amdgpu.ko

Then do - 
list *(amdgpu_vm_update_directories+0xe7)
Comment 31 Sergey Kondakov 2019-08-15 22:05:22 UTC
(In reply to Andrey Grodzovsky from comment #30)
> (In reply to Sergey Kondakov from comment #27)
> 
> Sergey, I tried to reproduce you latest issue on Ellsmere (Polaris 10) with
> "ShadowPrimary" enabled flip disabled and didn't observe any crash.
> In case you built your own kernel can you give me the output of this command
> -
> 
> Run gdb on amdgpu.ko
> gdb drivers/gpu/drm/amd/amdgpu/amdgpu.ko
> 
> Then do - 
> list *(amdgpu_vm_update_directories+0xe7)

The crash may take a while (hours) to manifest and requires some video-watching via Firefox and/or mpv (with '--opengl-pbo' option on opengl-hq profile). It also may or may not need VAAPI to be used ('--hwdec=vaapi-copy' in case of mpv).

My kernel is built on OBS build-server, so I had to enable debuginfo packaging and rebuild it, then debuginfo package used up mind-boggling 5,1gb of space leaving me with measly ~400mb on / ! After that I managed to get this:
0x2e127 is in amdgpu_vm_update_directories (../drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1191).
where line #1191 is:
struct amdgpu_bo *bo = parent->base.bo, *pbo;

But it a different build of the kernel, so I don't know if this is even relevant. I'm not going to stick around with this monstrosity. You may check out the packages at https://build.opensuse.org/package/binaries/home:X0F:HSF:Kernel/kernel-HSF/standard - they have pretty much all kernel modules that x86_64 supports, so it should run anywhere.
Comment 32 Sergey Kondakov 2019-08-17 05:13:28 UTC
Just got exactly the same 0010:amdgpu_vm_update_directories+0xe7/0x260 dereference immediately on login even with PageFlip & TearFree disabled and ShadowPrimary NOT enabled. Even with all the same addresses as before. So, now I'm not sure about what actually triggers it. However, my setup is as non-default as it gets:
amdgpu has these parameters: cik_support=1 si_support=1 msi=1 sched_policy=1 compute_multipipe=1 gartsize=1024 vm_fragment_size=9 max_num_of_queues_per_device=65536 sched_hw_submission=32 sched_jobs=1024 job_hang_limit=8000 halt_if_hws_hang=1 vm_fault_stop=0 vm_update_mode=3 vm_size=20 disp_priority=2 deep_color=1 gpu_recovery=1
irqbalance is enabled with interval=1 and rtirq has this:
RTIRQ_NAME_LIST="timer rtc snd drm amdgpu radeon i915 nvidia usb i8042 ahci"
RTIRQ_HIGH_LIST="watchdogd oom_reaper rcu_preempt rcu_sched rcu_bh rcub rcuc gfx sdma ksoftirqd khugepaged"
RTIRQ_PRIO_HIGH=80
RTIRQ_PRIO_DECR=2
RTIRQ_PRIO_LOW=50
RTIRQ_RESET_ALL=0
to boost amdgpu's processes to highest RT/FIFO priorities in hope to avoid video stuttering and audio x-runs under full load. Transparent hugepages are enabled in attempt to spare crappy AMD FX's TLB cache and MMU (hence the vm_fragment_size=9).

Maybe it's non-default vm_update_mode that does it. And few kernel versions back default gart of 256MB was triggering some kind of fault, probably stall and reset, maybe it even still does but I'm not going to check. Or maybe it's all irrelevant.
Comment 33 Nicholas Kazlauskas 2019-08-19 13:39:26 UTC
I(In reply to Sergey Kondakov from comment #26)
> Created attachment 284083 [details]
> dmesg_2019-08-02-amdgpu_fail_on_patched_5.2.5
> 
> (In reply to Nicholas Kazlauskas from comment #24)
> > This should be fixed with the series linked below:
> > 
> > https://patchwork.freedesktop.org/series/64505/
> > 
> > But it still needs review and backporting to older kernels.
> 
> Celebration might have been premature. Hours later I've got another freeze
> with different error in amdgpu. Only this time, mouse cursor was movable
> over frozen frame right until I tried switching VT. Here's trace:
> BUG: unable to handle page fault for address: 0000000800000184
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 2 PID: 21044 Comm: kworker/u16:0 Tainted: G        W IO     
> 5.2.5-1396.g79b6a9c-HSF #1 openSUSE Tumbleweed (unreleased)
> Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS
> F14e 09/09/2014
> Workqueue: events_unbound commit_work
> RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2e6/0xd60 [amdgpu]

Are you able to consistently reproduce this issue? Is it the same setup and same conditions as before? I haven't been able to see it in my testing at least.
Comment 34 Sergey Kondakov 2019-08-19 15:11:14 UTC
(In reply to Nicholas Kazlauskas from comment #33)
> I(In reply to Sergey Kondakov from comment #26)
> > Created attachment 284083 [details]
> > dmesg_2019-08-02-amdgpu_fail_on_patched_5.2.5
> > 
> > (In reply to Nicholas Kazlauskas from comment #24)
> > > This should be fixed with the series linked below:
> > > 
> > > https://patchwork.freedesktop.org/series/64505/
> > > 
> > > But it still needs review and backporting to older kernels.
> > 
> > Celebration might have been premature. Hours later I've got another freeze
> > with different error in amdgpu. Only this time, mouse cursor was movable
> > over frozen frame right until I tried switching VT. Here's trace:
> > BUG: unable to handle page fault for address: 0000000800000184
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0 
> > Oops: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 2 PID: 21044 Comm: kworker/u16:0 Tainted: G        W IO     
> > 5.2.5-1396.g79b6a9c-HSF #1 openSUSE Tumbleweed (unreleased)
> > Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3,
> BIOS
> > F14e 09/09/2014
> > Workqueue: events_unbound commit_work
> > RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2e6/0xd60 [amdgpu]
> 
> Are you able to consistently reproduce this issue? Is it the same setup and
> same conditions as before? I haven't been able to see it in my testing at
> least.

Yes, just having PageFlip enabled in amdgpu guarantees it. Changing anything other than PageFlip doesn't seem to affect it. Forcing TearFree on with PageFlip disabled may also trigger it, I think. You may try my previously linked kernel build in your testing but I doubt that it has something specific for it.

It may be not reproducible with modesetting X driver because it fails to engage page flipping on init and throws a bunch of errors about it in Xorg.0.log. For some reason I'm unable to use modesetting X driver at all, even with page flipping disabled, it draws only mouse cursor on black background instead of sddm login screen. So I have to use amdgpu with PageFlip and TearFree explicitly disabled. But then another, rarer 0010:amdgpu_vm_update_directories+0xe7/0x260 dereference may happen regardless (which I suspect is connected with vm_update_mode option, unlike the first one).

By the way, is there any disadvantage in forcing TearFree to be always on when it works ? Like additional frame of latency or something like that ?
Comment 35 Nicholas Kazlauskas 2019-08-21 13:38:29 UTC
Do you mind posting your compositor settings in plasma? That would certainly influence flip timing and submission and I haven't been able to reproduce the issue with the settings I'm using.
Comment 36 Sergey Kondakov 2019-08-21 14:37:12 UTC
(In reply to Nicholas Kazlauskas from comment #35)
> Do you mind posting your compositor settings in plasma? That would certainly
> influence flip timing and submission and I haven't been able to reproduce
> the issue with the settings I'm using.

Sure. They are also quite funky:
~/.config/kwinrc:
[Compositing]
AnimationSpeed=2
Backend=OpenGL
Enabled=true
GLColorCorrection=true
GLCore=true
GLPlatformInterface=glx
GLPreferBufferSwap=c
GLTextureFilter=2
HiddenPreviews=5
OpenGLIsUnsafe=false
OpenGLIsUnsafe0=false
OpenGLIsUnsafe1=false
UnredirectFullscreen=false
WindowsBlockCompositing=false
XRenderSmoothScale=false

However, I run LXQt with this in startup /usr/local/bin/kwin.sh script:
export __GL_YIELD=USLEEP
export KWIN_TRIPLE_BUFFER=0
export KWIN_USE_BUFFER_AGE=1
export KWIN_OPENGL_INTERFACE=egl
export KWIN_DIRECT_GL=1
export KWIN_FORCE_LANCZOS=1
export KWIN_PERSISTENT_VBO=1
export KWIN_EFFECTS_FORCE_ANIMATIONS=1
…
if [ -z "$WAYLAND_DISPLAY" ]; then
        export WINDOWMANAGER="env mesa_glthread=true nice -n -5 ionice -c 2 -n 0 -t chrt -v -r 5 kwin_x11 $KWIN_OPTIONS"
        exec /etc/X11/xinit/xinitrc
        return 0
else
        export WINDOWMANAGER="env mesa_glthread=true nice -n -5 ionice -c 2 -n 0 -t chrt -v -r 5 kwin_wayland"
        export QT_QPA_PLATFORM=wayland-egl
        export GDK_BACKEND=wayland
        export CLUTTER_BACKEND=wayland
        export SDL_VIDEODRIVER=wayland
        return 0
fi

X is run by /usr/local/bin/Xhp:
nice -n -10 ionice -c 2 -n 0 -t chrt -v -r 10 X "$@"
It hangs the system, by the way, if RT limit is not set by sched_rt_runtime_us

Here's ~/.drirc, just in case:
<driconf>
    <device screen="0" driver="radeonsi">
        <application name="Default">
            <option name="allow_glsl_relaxed_es" value="true" />
            <option name="radeonsi_enable_sisched" value="true" />
            <option name="allow_glsl_builtin_const_expression" value="true" />
            <option name="mesa_glthread" value="true" />
            <option name="radeonsi_enable_nir" value="true" />
            <option name="allow_glsl_extension_directive_midshader" value="true" />
            <option name="allow_rgb10_configs" value="true" />
            <option name="allow_glsl_cross_stage_interpolation_mismatch" value="true" />
            <option name="radeonsi_assume_no_z_fights" value="true" />
            <option name="allow_glsl_builtin_variable_redeclaration" value="true" />
            <option name="allow_glsl_layout_qualifier_on_function_parameters" value="true" />
            <option name="adaptive_sync" value="true" />
            <option name="radeonsi_commutative_blend_add" value="true" />
            <option name="allow_higher_compat_version" value="true" />
        </application>
    </device>
</driconf>

Some things from tuned.conf:
governor=schedutil
transparent_hugepages=always
/sys/kernel/mm/ksm/sleep_millisecs=250
/sys/kernel/mm/transparent_hugepage/shmem_enabled=advise 
/sys/kernel/mm/transparent_hugepage/defrag=defer+madvise
/sys/kernel/mm/transparent_hugepage/khugepaged/defrag=0
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_to_scan=512
/sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs=1000
/sys/kernel/mm/transparent_hugepage/khugepaged/alloc_sleep_millisecs=10000
dev.hpet.max-user-freq=4096
vm.zone_reclaim_mode=0
kernel.sched_autogroup_enabled=0
kernel.sched_latency_ns=1000000
kernel.sched_min_granularity_ns=100000
kernel.sched_wakeup_granularity_ns=1000
kernel.sched_nr_migrate=256
kernel.sched_migration_cost_ns=125
kernel.sched_cfs_bandwidth_slice_us=100
kernel.sched_tunable_scaling=1
kernel.sched_rt_period_us=1000000
kernel.sched_rt_runtime_us=900000
kernel.sched_rr_timeslice_ms=3

Originally the issue manifested with GLPreferBufferSwap=n and without double-buffering & EGL enforcement, I've made those in hope to compensate for disabled TearFree and PageFlip.

Please, answer the question about TearFree, if you can. I've been trying to find out since its creation and wasn't able to get even a hint. Can it really be just this perfect thing that everyone should have all the time, unless buged ?
Comment 37 Alex Deucher 2019-08-21 15:27:00 UTC
(In reply to Sergey Kondakov from comment #34)
> By the way, is there any disadvantage in forcing TearFree to be always on
> when it works ? Like additional frame of latency or something like that ?

The TearFree option is there to deal with compositors that do not support sync to vblank.  The ddx allocates another front buffer and then that buffer is updated synchronized with vblank with the data from the real front buffer.  So it uses an additional buffer.
Comment 38 Sergey Kondakov 2019-08-21 18:36:30 UTC
(In reply to Alex Deucher from comment #37)
> (In reply to Sergey Kondakov from comment #34)
> > By the way, is there any disadvantage in forcing TearFree to be always on
> > when it works ? Like additional frame of latency or something like that ?
> 
> The TearFree option is there to deal with compositors that do not support
> sync to vblank. The ddx allocates another front buffer and then that buffer
> is updated synchronized with vblank with the data from the real front
> buffer.  So it uses an additional buffer.

Thanks ! It's a shame, I've already begun believing in "The Silver Bullet of VSync". And it's completely "software" GPU-agnostic function, so alternatives like Wayland would have to just reimplement it the same way ? It always adds a buffer or "smart-enough" compositor can opt-out ? Or "the correct fix for latency" with TF is disabling vsync everywhere (such as kwin's GLPreferBufferSwap=n) else and let it handle it ?

No matter how I previously tried, nothing other than TearFree guaranteed actual lack of tearing in all times in simple 2x1080p configuration but there is abundance of buffering as it is in apps and a compositor + latency of LCD displays. I'm sure, you're aware of https://gitlab.freedesktop.org/xorg/xserver/issues/244 too. Strange that "the magic" of TF isn't done directly in compositors or kernel then.
Comment 39 Alex Deucher 2019-08-21 19:28:57 UTC
(In reply to Sergey Kondakov from comment #38)
> 
> Thanks ! It's a shame, I've already begun believing in "The Silver Bullet of
> VSync". And it's completely "software" GPU-agnostic function, so
> alternatives like Wayland would have to just reimplement it the same way ?
> It always adds a buffer or "smart-enough" compositor can opt-out ? Or "the
> correct fix for latency" with TF is disabling vsync everywhere (such as
> kwin's GLPreferBufferSwap=n) else and let it handle it ?
> 
> No matter how I previously tried, nothing other than TearFree guaranteed
> actual lack of tearing in all times in simple 2x1080p configuration but
> there is abundance of buffering as it is in apps and a compositor + latency
> of LCD displays. I'm sure, you're aware of
> https://gitlab.freedesktop.org/xorg/xserver/issues/244 too. Strange that
> "the magic" of TF isn't done directly in compositors or kernel then.

Here is your issue: "simple 2x1080p"

multiple display are really hard to deal with.  The display timing may be different, the blanking periods may not align, etc.  X uses a single surface for each multi-display desktopso when you are updating multiple displays, if the timings are not aligned, one display will show older content.  For this to work smoothly, you really need the compositor to have each display using it's own set of buffers and doing vsynced rendering to each display separately.
Comment 40 Sergey Kondakov 2019-08-21 21:39:29 UTC
(In reply to Alex Deucher from comment #39)
> (In reply to Sergey Kondakov from comment #38)
> Here is your issue: "simple 2x1080p"
> 
> multiple display are really hard to deal with.  The display timing may be
> different, the blanking periods may not align, etc.  X uses a single surface
> for each multi-display desktopso when you are updating multiple displays, if
> the timings are not aligned, one display will show older content.  For this
> to work smoothly, you really need the compositor to have each display using
> it's own set of buffers and doing vsynced rendering to each display
> separately.

I little bit strange to call 2x1080p on AMD's fancy 5-port GPU (+ possible DP multiplexing) "my issue". If anything is an issue with AMD's modern output controllers it's the lack of analogue signal in DVI port for my proper 1280x1024@89 CRT monitor with majestic >10k:1 contrast. Timing on both outputs is definitively different, though.

I still cannot fathom how is it still that all outputs are lumped together like that. Anyway, I was searching on my suspicion about kwin's vsync behaviour and stumbled on this treat: https://bugs.kde.org/show_bug.cgi?id=395632#c45 - new kwin developer working on that and multi-threaded per-output vsync _right now_, wants testers. Surely, this new version of kwin will "blow up" kernel module with this page-flipping bug ! And he would really benefit from your advices. Then we might not even need TearFree anywhere anymore !
https://phabricator.kde.org/T11071 - quite a progress already. Aims to make double-only per-output mandatory vsync via GLX_OML_sync_control.

Right now `qdbus-qt5 org.kde.KWin /KWin supportInformation` says:
…
maxFpsInterval: 16666666
refreshRate: 0
vBlankTime: 6000000
glStrictBinding: false
glStrictBindingFollowsDriver: true
…
Screens
=======
Multi-Head: no
Active screen follows mouse:  no
Number of Screens: 2

Screen 0:
---------
Name: DVI-D-0
Geometry: 0,0,1920x1080
Scale: 1
Refresh Rate: 72.9249

Screen 1:
---------
Name: HDMI-A-0
Geometry: 1920,0,1920x1080
Scale: 1
Refresh Rate: 71.8263

glxgears shows proper FPS (~72.923) but, judging by that bug, it's either mistiming updates or "cutting out" some frames. It will not tear if it would let apps render at their pace and then limit its own output to 60, isn't it ? And I'm as clueless as those bug-reporters on how to check its real rate on currently released version.
Comment 41 Alex Deucher 2019-08-21 21:51:32 UTC
(In reply to Sergey Kondakov from comment #40)

> 
> I little bit strange to call 2x1080p on AMD's fancy 5-port GPU (+ possible
> DP multiplexing) "my issue". 

It's a limitation of desktop environments on Linux in general.  Other OSes handle this differently, but in general, regardless of OS, it's a hard problem to solve.  If you have multiple displays running at different refresh rates how do you update them without tearing and also without non-synchronized content on some of the displays?
Comment 42 Michel Dänzer 2019-08-22 13:14:28 UTC
(In reply to Alex Deucher from comment #37)
> The TearFree option is there to deal with compositors that do not support
> sync to vblank.

Also for cases where a compositor cannot prevent tearing, in particular with rotation and other transforms via RandR.


> The ddx allocates another front buffer and then that buffer
> is updated synchronized with vblank with the data from the real front
> buffer.  So it uses an additional buffer.

Right, that's one of the main reasons why TearFree isn't enabled in all cases by default, another one being that it can incur (at least) one refresh cycle output latency.


(In reply to Sergey Kondakov from comment #38)
> It's a shame, I've already begun believing in "The Silver Bullet of VSync". 

If TearFree was a silver bullet, it would be enabled by default in all cases. :) (It already is in cases where a compositor cannot prevent tearing, per above)


> And it's completely "software" GPU-agnostic function, so alternatives like
> Wayland would have to just reimplement it the same way ?

More like the other way around actually; I consider TearFree sort of a poor man's Wayland compositor. The latter generally handle this better, or are at least in a better position to handle it.


> It always adds a buffer or "smart-enough" compositor can opt-out ?

Currently the former. It would be possible to eliminate the additional buffers while a compositor / other fullscreen client is using page flipping, but I never got around to implementing that.


> Or "the correct fix for latency" with TF is disabling vsync everywhere (such
> as kwin's GLPreferBufferSwap=n) else and let it handle it ?

I doubt that'll help for latency, and will waste energy generating frames which are never visible.


> Strange that "the magic" of TF isn't done directly in compositors or kernel
> then.

Compositors can do so, with some exceptions per above, but in cases where they can prevent tearing, they're generally preferable to TearFree.

The kernel cannot do this transparently though.


(In reply to Alex Deucher from comment #39)
> multiple display are really hard to deal with.  The display timing may be
> different, the blanking periods may not align, etc.  X uses a single surface
> for each multi-display desktopso when you are updating multiple displays, if
> the timings are not aligned, one display will show older content.  For this
> to work smoothly, you really need the compositor to have each display using
> it's own set of buffers and doing vsynced rendering to each display
> separately.

That wouldn't necessarily make much if any visible difference though, as the displays can still show inconsistent contents sometimes if their timings aren't synchronized, and tearing within a display can be avoided even with a single scanout buffer. The main benefit of separate scanout buffers is that the application can re-use buffers for rendering new frames earlier, but OTOH there's the overhead cost of compositing (because the client buffers can't be used directly for page flipping like this). This is pretty much how TearFree works, BTW (except due to the Xorg architecture, it can't actualy allow a compositor / other fullscreen client to re-use buffers earlier when sync-to-vblank is enabled for the client).


(In reply to Sergey Kondakov from comment #40)
> Anyway, I was searching on my suspicion about kwin's vsync
> behaviour and stumbled on this treat:
> https://bugs.kde.org/show_bug.cgi?id=395632#c45 - new kwin developer working
> on that and multi-threaded per-output vsync _right now_, wants testers.

That's not possible with the current Xorg architecture. It only allows flipping all displays (connected to a single GPU) together.

The way forward to solve this is Wayland.
Comment 43 Tom Seewald 2019-08-23 21:02:07 UTC
(In reply to Nicholas Kazlauskas from comment #24)
> This should be fixed with the series linked below:
> 
> https://patchwork.freedesktop.org/series/64505/
> 
> But it still needs review and backporting to older kernels.

That patch series (applied to mainline 5.2.x) appears to fix the hangs on my RX 560 while playing video with vaapi acceleration.

It would be great if this could be back-ported.
Comment 44 Sergey Kondakov 2019-08-24 09:43:50 UTC
(In reply to Alex Deucher from comment #41)
> (In reply to Sergey Kondakov from comment #40)
> 
> > 
> > I little bit strange to call 2x1080p on AMD's fancy 5-port GPU (+ possible
> > DP multiplexing) "my issue". 
> 
> It's a limitation of desktop environments on Linux in general.  Other OSes
> handle this differently, but in general, regardless of OS, it's a hard
> problem to solve.  If you have multiple displays running at different
> refresh rates how do you update them without tearing and also without
> non-synchronized content on some of the displays?

I guess, you always would have to have at least 2 (currently_rendering/next_shown and previously_rendered/currently_shown) buffers in an app, 2 per each viewport in compositor, 2 in each of system's video output controllers (what if a viewport shares several outputs or vice versa ?; "crtc" on GPU but I would prefer a tendency to simplification and de-specialization of GPUs by replacing them with separate general-purpose vector processors, rasterization or BVH ASICs, FPGA codec accelerators, output controllers, all with wider faster common system bus and RAM instead of GPU-daughterboard-on-CPU's-MB monstrosities) and 2 in each display's controller (the latter is especially a problem because of slow scalers wanting to do their bad scaling and other in-display transformations while adding unpredictable unknown latency). And then make them all work asynchronously with their own safe timeframes for flipping.

(In reply to Michel Dänzer from comment #42)
>…
> The way forward to solve this is Wayland.

Thanks for detailed explanations, stuff like that should be in manuals. As for Wayland, I even managed to launch Wayland LXQt sessions with kwin via sddm where most things work. But something made me postpone transition to it indefinitively, don't remember what exactly. But right now at least 2 reasons would be: custom display modes (I want my 72-73 "free" fps on 60 fps almost-trash-level displays and my CRT have to have its non-standard modes defined manually) and per-display colour correction (with auto-generated and custom profiles).

(In reply to Tom Seewald from comment #43)
> (In reply to Nicholas Kazlauskas from comment #24)
> > This should be fixed with the series linked below:
> > 
> > https://patchwork.freedesktop.org/series/64505/
> > 
> > But it still needs review and backporting to older kernels.
> 
> That patch series (applied to mainline 5.2.x) appears to fix the hangs on my
> RX 560 while playing video with vaapi acceleration.
> 
> It would be great if this could be back-ported.

Unfortunately, it didn't fix the page flip-triggered dereference for me. Do you have page flip related errors in Xorg log on "modesetting" X driver ? With and without it ?
Comment 45 Tom Seewald 2019-08-26 05:32:33 UTC
> Unfortunately, it didn't fix the page flip-triggered dereference for me. Do
> you have page flip related errors in Xorg log on "modesetting" X driver ?
> With and without it ?

I don't believe so, glancing over my Xorg.0.log and Xorg.0.log.old I don't see any errors about page flipping.  I just use the standard modesetting driver for Xorg.
Comment 46 Tom Seewald 2019-09-04 04:50:26 UTC
Will these patches[1] be back ported to 5.2/5.3 or will we need to wait until this hopefully lands in 5.4?

[1] https://patchwork.freedesktop.org/series/64505/
Comment 47 jamespharvey20 2019-09-06 10:37:42 UTC
I've been getting this crash about once a week.  Would be nice if something were done here.
Comment 48 jamespharvey20 2019-09-06 10:38:38 UTC
Vega 64.
Comment 49 Christopher Snowhill 2019-09-20 01:58:24 UTC
RX 480. Applied patch, haven't had any spurious crashes since. Using patchset since kernel 5.2.14, now using it on 5.3. Haven't had any suspend/wake crashes yet, either, but that may be unrelated.

Will continue applying it to successive 5.3 kernels until it is officially backported, and will report if there are any further crashes.
Comment 50 Sergey Kondakov 2019-09-20 13:19:09 UTC
(In reply to Christopher Snowhill from comment #49)
> RX 480. Applied patch, haven't had any spurious crashes since. Using
> patchset since kernel 5.2.14, now using it on 5.3. Haven't had any
> suspend/wake crashes yet, either, but that may be unrelated.
> 
> Will continue applying it to successive 5.3 kernels until it is officially
> backported, and will report if there are any further crashes.

I also built 5.3 with these patches, almost just as it came out:
https://patchwork.freedesktop.org/series/64505/
https://patchwork.freedesktop.org/series/64614/
https://patchwork.freedesktop.org/series/65192/

No fails on X11's amdgpu so far BUT I've changed both TearFree and vm_update_mode options to defaults (but pci=big_root_window that makes BAR=VRAM is still active), so it may be just worked around and not completely gone, will try vm_update_mode=3 later. Would be nice to have some clue about what vm_* options actually entail for OpenCL, compute-shader and general rendering performance. I just set them for whatever, code in amdgpu_vm.c goes high above my head.

Modesetting X11 driver behaves weirdly for me: enabling PageFlip in it still gives me errors and in both cases it just draws the black screen with movable cursor above it instead of sddm greet-screen. But amdgpu works, so, fine.
Comment 51 Alex Deucher 2019-09-20 14:04:37 UTC
Patches have been sent to stable and should land soon.
Comment 52 Sergey Kondakov 2019-09-21 05:26:50 UTC
(In reply to Alex Deucher from comment #51)
> Patches have been sent to stable and should land soon.

Thanks !
However, it seems that not all is well, after all: using vm_update_mode=3 have resulted in immediate 'RIP: 0010:amdgpu_vm_update_directories+0xe7/0x260' dereference hang before sddm could draw anything, so the second one is not fixed yet. Will use vm_update_mode=0 for now to make sure that offending code is never touched.
Comment 53 Sergey Kondakov 2019-09-27 03:50:30 UTC
Created attachment 285209 [details]
dmesg_2019-09-26-amdgpu-old_dereference_on_patched_5.3.1

After about a day of uptime my patched 5.3.1 hanged during hours-long Youtube video with dereference that is almost identical to the original one:
BUG: unable to handle page fault for address: 00000008000001b4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 2 PID: 396 Comm: kworker/u16:2 Tainted: G        W IO      5.3.1-1482.g27a0123-HSF #1 openSUSE Tumbleweed
Hardware name: Gigabyte Technology Co., Ltd. GA-990XA-UD3/GA-990XA-UD3, BIOS F14e 09/09/2014
Workqueue: events_unbound commit_work
RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2ee/0xfd0 [amdgpu]
…
Call Trace:
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 ? _raw_spin_unlock_irq+0x29/0x50
 ? trace_hardirqs_on+0x2c/0xf0
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? finish_task_switch+0xa3/0x2e0
 ? finish_task_switch+0x75/0x2e0
 ? __switch_to+0x152/0x4e0
 ? __switch_to_asm+0x34/0x70
 ? __schedule+0x353/0x900
 ? wait_for_completion_timeout+0x31/0x110
 ? _raw_spin_unlock_irq+0x29/0x50
 ? preempt_count_sub+0x9b/0xd0
 ? _raw_spin_unlock_irq+0x3a/0x50
 ? wait_for_completion_timeout+0xe9/0x110
 ? commit_tail+0x3c/0x70
 commit_tail+0x3c/0x70
 process_one_work+0x271/0x5b0
 worker_thread+0x4a/0x3d0
 ? process_one_work+0x5b0/0x5b0
 kthread+0x118/0x140
 ? kthread_create_worker_on_cpu+0x70/0x70
 ret_from_fork+0x27/0x50
…
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out

Could this be due to these additional patches ?
https://patchwork.freedesktop.org/series/64614/
https://patchwork.freedesktop.org/series/65192/

Or the fact that I patched kwin-5.16.5 with https://phabricator.kde.org/T11071 and added KWIN_USE_INTEL_SWAP_EVENT=1 & KWIN_USE_BUFFER_AGE=3, so it works with tighter timings now ?

Or any of these ?
options amdgpu cik_support=1 si_support=1 msi=1 disp_priority=2 dpm=1 runpm=1 sched_policy=1 compute_multipipe=1 vm_fragment_size=9 gartsize=1024 max_num_of_queues_per_device=65536 sched_hw_submission=32 sched_jobs=1024 job_hang_limit=8000 halt_if_hws_hang=1 vm_fault_stop=0 vm_update_mode=0 deep_color=1 gpu_recovery=1 lockup_timeout=2500,5000,8000,1000 ras_enable=1 mcbp=1 queue_preemption_timeout_ms=48 mes=1 hws_gws_support=1 discovery=1
Comment 54 Alex Deucher 2019-09-27 12:50:26 UTC
(In reply to Sergey Kondakov from comment #53)
> Or any of these ?
> options amdgpu cik_support=1 si_support=1 msi=1 disp_priority=2 dpm=1
> runpm=1 sched_policy=1 compute_multipipe=1 vm_fragment_size=9 gartsize=1024
> max_num_of_queues_per_device=65536 sched_hw_submission=32 sched_jobs=1024
> job_hang_limit=8000 halt_if_hws_hang=1 vm_fault_stop=0 vm_update_mode=0
> deep_color=1 gpu_recovery=1 lockup_timeout=2500,5000,8000,1000 ras_enable=1
> mcbp=1 queue_preemption_timeout_ms=48 mes=1 hws_gws_support=1 discovery=1

remove all of those.  You should use the defaults unless you are specifically debugging something.
Comment 55 Tom Seewald 2019-09-27 13:19:05 UTC
(In reply to Sergey Kondakov from comment #53)
> Created attachment 285209 [details]
> dmesg_2019-09-26-amdgpu-old_dereference_on_patched_5.3.1
> 
> After about a day of uptime my patched 5.3.1 hanged during hours-long
> Youtube video with dereference that is almost identical to the original one:

I don't believe the patches[1] have landed in a stable kernel release yet, at least going by the 5.3.1 change log[2] I don't see any reference to them.

[1] https://patchwork.freedesktop.org/series/64505/
[2] https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.1
Comment 56 Sergey Kondakov 2019-09-27 20:18:47 UTC
(In reply to Alex Deucher from comment #54)
> (In reply to Sergey Kondakov from comment #53)
> > Or any of these ?
> > options amdgpu cik_support=1 si_support=1 msi=1 disp_priority=2 dpm=1
> > runpm=1 sched_policy=1 compute_multipipe=1 vm_fragment_size=9 gartsize=1024
> > max_num_of_queues_per_device=65536 sched_hw_submission=32 sched_jobs=1024
> > job_hang_limit=8000 halt_if_hws_hang=1 vm_fault_stop=0 vm_update_mode=0
> > deep_color=1 gpu_recovery=1 lockup_timeout=2500,5000,8000,1000 ras_enable=1
> > mcbp=1 queue_preemption_timeout_ms=48 mes=1 hws_gws_support=1 discovery=1
> 
> remove all of those.  You should use the defaults unless you are
> specifically debugging something.

Then you may consider that I "specifically debugging" THIS. Because when I ask these questions here or in freedesktop.org, I specifically hope for an factual response from people with actual understanding and experience of how it works and what to be a proper way to debug without guesswork, based on knowledge that would compensate for the lack of meaningful documentation and one of the highest entry-barriers in software (even corporate monstrosity like Intel can't figure out GPUs still, market that is dominated by 2 oligopolists that run it with impunity however they feel like it, after all). This third dereference would be really hard to debug, though, because there is no clear reproduction steps, UNLESS you KNOW where and how to look as a developer. Or are you all just going to ignore the presence of kernel-crashing code because it "may" (or may not) be not triggered by your defaults ?

So, can you actually tell which code-path may result in this or, better yet, test it yourself so things like that just would not go into releases ?
The original dereference is triggered by mere presence of PageFlip which is on by default, so blindly running developer defaults (you can see what exactly I think about them here: https://bugzilla.kernel.org/show_bug.cgi?id=203703#c9 and c11) didn't help much anyone now, did it ?

Or can you at least explain on what exactly each of these options does, what may be desired and undesired consequences and how your consensus about defaults came to be ? Short summary (but not as short as modinfo) or links to mailing list discussions maybe ? Because my goals (as they are for any desktop user) are: minimal guaranteed latency (meaning, full aggressive preemption, lowest scheduling granularity and strict RT priorities) of audio/video/input/network pipelines under stress-load and in that specific order of priority, with working fast fail-over or recovery instead of hangs and reboots.

If I'd be using defaults then I still would be sitting on 3,3Ghz (instead of 4Ghz + 2,4Ghz for MMU & cache) FX CPU, non-ECC RAM ran by literally retarded AMD FX's MMU (you KNOW the one, the laughing stock of 2011-2017 x86 CPUs !) by slow default JEDEC timings, ~200W (instead of down-clocked and/or under-voltaged 90-120W) RX580 GPU (that would, no doubt, fry itself at some point like my previous 6870 did) with slow memory timings, sluggish non-patched kwin, 64ms of audio latency (instead of 8-12ms) and whole bunch of random hangs/drops in audio, video stuttering and input delays/skips due to scheduling priorities that are all other the place by default. So, no, thank you very much, on that. And YOU should NOT be testing exclusively on defaults either.

(In reply to Tom Seewald from comment #55)
> (In reply to Sergey Kondakov from comment #53)
> > Created attachment 285209 [details]
> > dmesg_2019-09-26-amdgpu-old_dereference_on_patched_5.3.1
> > 
> > After about a day of uptime my patched 5.3.1 hanged during hours-long
> > Youtube video with dereference that is almost identical to the original
> one:
> 
> I don't believe the patches[1] have landed in a stable kernel release yet,
> at least going by the 5.3.1 change log[2] I don't see any reference to them.
> 
> [1] https://patchwork.freedesktop.org/series/64505/
> [2] https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.3.1

They seem to be in queue for 5.3.2: https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=7f2f9d496c3b8809143f1fc14e8cb093cc981d78
BUT those only address #1 (PageFlip) dereference, NOT #2 (when vm_update_mode not 0) and #3 !
Comment 57 Andrey Grodzovsky 2019-09-28 00:07:24 UTC
Sergey, instead of throwing tantrums why can't you just do what you are asked ? You present an extremely convoluted set of driver config params and demand from us resolving the bug with those parameters in place. This introduces unneeded complication of the failure scenario which in turn introduces a lot of unknowns. Alex asks you to simplify the settings so less unknows are in the system so it's easier for us to try and figure out what goes wrong while we inspect the code. 
So please, bring the parameters back to default as this is the most well tested configuration and gives a baseline and also please provide addr2line for 0010:amdgpu_dm_atomic_commit_tail+0x2ee so we can get a better idea where in code the NULL ptr happened.
Comment 58 Damian Nowak 2019-09-29 18:10:46 UTC
I encounter this error once a week on average on my Radeon 7 (Vega 20). Great on see you guys actively working on it. When 5.3.2 releases to Arch, I'll keep using it for a week or two and report back whether I encounter an issue again or not. Thanks! 

@Sergey You could revert to defaults just for the duration of testing/debugging. It'll sure make things easier for developers, and you can still go back to your settings once the issue is fixed. Great settings nonetheless, do these kernel parameters really improve the power performance of RX 580, or did you need to do something in addition too? By the way, I used RX 580 on default Arch Linux settings (so most likely kernel defaults) for a year and it was fine so you probably don't have to worry about frying it. Now I'm using Radeon 7, while RX 580 is still alive in a different Windows-based computer.
Comment 59 Sergey Kondakov 2019-09-29 21:54:23 UTC
(In reply to Andrey Grodzovsky from comment #57)
> Sergey, instead of throwing tantrums why can't you just do what you are
> asked ? You present an extremely convoluted set of driver config params and
> demand from us resolving the bug with those parameters in place. This
> introduces unneeded complication of the failure scenario which in turn
> introduces a lot of unknowns. Alex asks you to simplify the settings so less
> unknows are in the system so it's easier for us to try and figure out what
> goes wrong while we inspect the code. 
> So please, bring the parameters back to default as this is the most well
> tested configuration and gives a baseline and also please provide addr2line
> for 0010:amdgpu_dm_atomic_commit_tail+0x2ee so we can get a better idea
> where in code the NULL ptr happened.

And how about instead of knowingly pushing untested code with known fatal errors you stop taking QA notes from FGLRX in the first place and do your own full testing ? You do realize that I, as all others, paid for that card to your employer, right ? And people don't buy your top cards, RX[4-5][7-8]0, VEGAs and so on, to use them as expensive bare output controllers.

DO NOT SHOOT THE MESSENGER. What you ignore from me, others will get one way or another, most of which would be incapable to even report it and just resort to cursing you and sell the hardware, going on Nvidia & Intel combo forever instead. Do you have any idea how many times in my life I've heard "at least it's without hassle" spiel about all (yes, all) AMD stuff from "normal people" ?

I don't demand from you resolving this personally for me and whatever I might configure. But I do demand you not pushing untested code, hide it under parameters that limit all cards to bare minimum and then use it as an excuse to continue not to test it. And then silently expect me to work as your QA as if I trained on how to debug kernel-level code and telepathically know what might be on your minds. What else, should I be expected to whip out chip programmer and write custom asm-code for your mystery chips by myself ? I don't have a laboratory or a dedicated debug station.

_Regarding this notion of "testing on defaults"_. Maybe I was not clear on that: that #3 dereference happened just once after about a full day of uptime. The machine sometime was running for more than a week straight without issues. So, defaulting will not show any difference on my end unless I run both configs no less than 2 weeks of pure uptime each without shutting down the machine. And it still be useless guesswork which will not produce any more pointers on what exactly goes wrong, at best it just will repeat or not.

However, you as a developers of that code and a trained experts, can use that little data there is to recheck exact offending code about no one else have a clue about. You also can fully reproduce my configuration (including exact packages of my kernel with debug-info) and work with full data of your own, since you not willing to test all your codepaths regularly as a rule.

I will try to figure out what the hell this "addr2line" is but it will probably include installing gigabytes of debug-symbols on SSD that has no space for them, so… it will take a while.

But the way, what happened with my answer about #2 ? You know, the `list *(amdgpu_vm_update_directories+0xe7)` part, which was real time-consuming pain to get, with:
0x2e127 is in amdgpu_vm_update_directories (../drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1191).
where line #1191 is:
struct amdgpu_bo *bo = parent->base.bo, *pbo;

Have you even seen it ? Was it the right thing ? Any thoughts on the cause for this one ? Should I do the same for the #3 ? Will it also go into a void of silence ?

(In reply to Damian Nowak from comment #58)
> I encounter this error once a week on average on my Radeon 7 (Vega 20).
> Great on see you guys actively working on it. When 5.3.2 releases to Arch,
> I'll keep using it for a week or two and report back whether I encounter an
> issue again or not. Thanks! 
> 
> @Sergey You could revert to defaults just for the duration of
> testing/debugging. It'll sure make things easier for developers, and you can
> still go back to your settings once the issue is fixed. Great settings
> nonetheless, do these kernel parameters really improve the power performance
> of RX 580, or did you need to do something in addition too? By the way, I
> used RX 580 on default Arch Linux settings (so most likely kernel defaults)
> for a year and it was fine so you probably don't have to worry about frying
> it. Now I'm using Radeon 7, while RX 580 is still alive in a different
> Windows-based computer.

Ok, I can. But what's next ? How exactly does that would give any more data ? What exactly should I do after booting the machine ?

Power ? No, the custom hacked GPU BIOS does. Although, after fiddling with voltages, I just left them on auto-defaults, where driver/firmware uses built-in per-card "chip quality" as multiplier for defaults, and limited frequency to 1300. Power-draw increases exponentially with frequency and after 1300 it increases ridiculous on RX580's 14nm chips. I also made fans never stop and act more aggressively but not to the point of out-noising the case and CPU fans. And I tightened memory timings too. 90-120W are numbers from MSI Afterburner, mostly about 90W and rarely 120W in some specific loads.

Pre-RX cards, the whole 2008-2015 generation of AMD GPU chips (and chipsets, for that matter), especially mobile ones, are well known to be self-destructive. And not long ago my 6870 has joined them. Ironically, default firmware settings on commercial GPUs are not safe, at least not on those generations. They are balanced by the manufacturers to barely survive warranty periods. That's why pre-overclocked cards, or any chips, is not a product that anyone should be exited about. AMD chips are knows as "the stoves" for the reason but device manufacturers bring it them from "inefficient" to "half-dead". Price's good though.

With the software parameters I mainly try to balance latency and CPU time, remove sources of stuttering, do proper prioritization during CPU & I/O contention, and enable features that can be safely enabled, so when I run my live test/install distro build on unknown hardware, I could test and/or use it fully without redoing and customizing the whole damn thing. But it's more of a guesswork with GPU than with everything else. Unfortunately, developers in general are not much of the fans of "multi-task desktop user experience" on last-gen ("last" being "older than one in laboratory") hardware.
Comment 60 jamespharvey20 2019-09-30 02:07:49 UTC
(In reply to Sergey Kondakov from comment #59)
> 
> And how about instead of knowingly pushing untested code with known fatal
> errors you stop taking QA notes from FGLRX in the first place and do your
> own full testing ? You do realize that I, as all others, paid for that card
> to your employer, right ? And people don't buy your top cards,
> RX[4-5][7-8]0, VEGAs and so on, to use them as expensive bare output
> controllers.

This.  If this were just a free project with volunteers giving their time, many of us who occasionally throw a tantrum towards AMD wouldn't be.  But, some of us are throwing money at AMD to try to have a stable system again, and keep getting regressions introduced that are either fixed very slowly, or not at all.

I'm here, because I was running an R9 390, and kernel 4.19 introduced a regression that causes a complete boot failure.  Others confirmed the same.  See https://bugs.freedesktop.org/show_bug.cgi?id=108781  (As I explain way below, this is still unfixed in 5.3.)

On that bug, I'm asked by an amd.com developer to bisect.  I run into hundreds, or even a thousand, commits that don't even compile, and only a later commit fixes that issue.  Fun, thanks for pushing those, guys.  I finally achieve a bisected commit, where 0d9988910989 causes a boot hang and the one previous to it doesn't.  Upon being told this shouldn't have to do with the bug I've posted, I do discover that this bug causes a black screen boot hang, but it's a different bug!  I then go on to document that I've found between 3 and 5 crashing commits in the new 4.19 commits.

So, how am I supposed to bisect this garbage, when a lot doesn't even compile, and there are multiple bugs popping in and out of existence causing the same symptom?  Boot crashes with black screen, and I'm supposed to know to mark that commit as good because it's a different bug causing the same issue?

I ask the AMD devs to tell me exactly which card they use in testing (if any, at all) so I can just buy that and be done with this.  No response.

So, I pay AMD more money and buy a RX 580, which is mostly a downgrade from the R9 390.  Get frequent crashes from that as well.

So, I just decide to buy a Vega 64.  I don't need the extra power, I just want to run a stable machine.  Since AMD devs aren't saying what card I could use that they do, in a hope that they might fix crashes before they push them, I figure the latest and greatest might be getting more attention.

All goes well until this regression is introduced.

I go back to try my R9 390, and guess what?  The same bug introduced in kernel 4.19 is still there in 5.3!  AMD's just ignored it, and hasn't bothered to try to reproduce it themselves and try to untangle the mess of spaghetti.

Since running a custom kernel with the patchset, I haven't had this crash, but come on guys!  Couldn't AMD have a bank of 50 computers running different cards, constantly running the latest unpushed code and going through different stress tests?  Hey, Jim, monitor #14 and #36 keep crashing, let's look into it.....
Comment 61 jamespharvey20 2019-09-30 02:09:47 UTC
It might look like I'm just ranting.  That's not the reason I posted my comment.  I'm trying to give feedback to AMD about how bad so many customer experiences are right now, and have been for quite some time, and how there should be easy and affordable (for AMD) ways to make it better.

Note You need to log in before you can comment on or make changes to this bug.