Bug 207383 - [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
Summary: [Regression] 5.7 amdgpu/polaris11 gpf: amdgpu_atomic_commit_tail
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-21 09:51 UTC by Duncan
Modified: 2020-07-01 19:08 UTC (History)
8 users (show)

See Also:
Kernel Version: 5.7-rc1 - 5.7 - 5.8-rc1+
Tree: Mainline
Regression: Yes


Attachments
kernel config (95.83 KB, text/plain)
2020-04-21 09:51 UTC, Duncan
Details
automated boot-time dmesg dump (83.97 KB, text/plain)
2020-04-21 09:57 UTC, Duncan
Details
Partial git bisect log (1.69 KB, text/plain)
2020-06-28 15:30 UTC, Duncan
Details

Description Duncan 2020-04-21 09:51:33 UTC
Created attachment 288649 [details]
kernel config

5.7-rc1 and rc2 regression from kernel 5.6.0

After starting X/plasma on 5.7-rc1 and rc2, system runs for a few seconds to a few hours, then display freezes.  The pointer continues to be movable and audio will continue to play for some seconds but they eventually stop as well. The kernel remains alive at least enough to reboot with SRQ-b, not sure if previous SRQs have any effect or not.

Sometimes but not always there's a gpf left in the log, appearing to confirm it's amdgpu (the -dirty is simply a patch making mounts noatime by default):

Apr 20 03:25:55 h2 kernel: general protection fault, probably for non-canonical address 0xc1316515e40a92f6: 0000 [#1] SMP
Apr 20 03:25:55 h2 kernel: CPU: 3 PID: 3921 Comm: kworker/u16:5 Tainted: G                T 5.7.0-rc2-dirty #194
Apr 20 03:25:55 h2 kernel: Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS F6 03/30/2012
Apr 20 03:25:55 h2 kernel: Workqueue: events_unbound commit_work
Apr 20 03:25:55 h2 kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x102d/0x1fd8
Apr 20 03:25:55 h2 kernel: Code: 48 89 9d a0 fc ff ff 8b 90 e0 02 00 00 85 d2 0f 85 26 f1 ff ff 48 8b 85 e0 fc ff ff 48 89 85 a0 fc ff ff 48 8b b5 e0 fc ff ff <80> be b0 01 00 00 01 0f 86 b4 00 00 00 31 c0 48 b9 00 00 00 00 01
Apr 20 03:25:55 h2 kernel: RSP: 0018:ffffc9000216bad0 EFLAGS: 00010286
Apr 20 03:25:55 h2 kernel: RAX: ffff88842a6e1000 RBX: ffff8883d1d5b800 RCX: ffff8884283db200
Apr 20 03:25:55 h2 kernel: RDX: ffff8884283db2e0 RSI: c1316515e40a92f6 RDI: 0000000000000002
Apr 20 03:25:55 h2 kernel: RBP: ffffc9000216be50 R08: 0000000000000001 R09: 0000000000000001
Apr 20 03:25:55 h2 kernel: R10: 0000000000030000 R11: 0000000000000000 R12: 0000000000000000
Apr 20 03:25:55 h2 kernel: R13: 0000000000000005 R14: ffff88842bb76000 R15: ffff88841c08cc00
Apr 20 03:25:55 h2 kernel: FS:  0000000000000000(0000) GS:ffff88842ecc0000(0000) knlGS:0000000000000000
Apr 20 03:25:55 h2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 03:25:55 h2 kernel: CR2: 000078617de4fffc CR3: 000000040ca0e000 CR4: 00000000000406e0
Apr 20 03:25:55 h2 kernel: Call Trace:
Apr 20 03:25:55 h2 kernel:  ? 0xffffffff81000000
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x34/0x70
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x40/0x70
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x34/0x70
Apr 20 03:25:55 h2 kernel:  ? __switch_to_asm+0x40/0x70
Apr 20 03:25:55 h2 kernel:  ? commit_tail+0x8e/0x120
Apr 20 03:25:55 h2 kernel:  ? process_one_work+0x1a9/0x300
Apr 20 03:25:55 h2 kernel:  ? worker_thread+0x45/0x3b8
Apr 20 03:25:55 h2 kernel:  ? kthread+0xf3/0x130
Apr 20 03:25:55 h2 kernel:  ? process_one_work+0x300/0x300
Apr 20 03:25:55 h2 kernel:  ? __kthread_create_on_node+0x180/0x180
Apr 20 03:25:55 h2 kernel:  ? ret_from_fork+0x22/0x40
Apr 20 03:25:55 h2 kernel: ---[ end trace 33869116def8e8ad ]---
Apr 20 03:25:55 h2 kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x102d/0x1fd8
Apr 20 03:25:55 h2 kernel: Code: 48 89 9d a0 fc ff ff 8b 90 e0 02 00 00 85 d2 0f 85 26 f1 ff ff 48 8b 85 e0 fc ff ff 48 89 85 a0 fc ff ff 48 89 85 a0 fc ff ff 48 8b b5 e0 fc ff ff <80> be b0 01 00 00 01 0f 86 b4 00 00 00 31 c0 48 b9 00 00 00 00 01
Apr 20 03:25:55 h2 kernel: RSP: 0018:ffffc9000216bad0 EFLAGS: 00010286
Apr 20 03:25:55 h2 kernel: RAX: ffff88842a6e1000 RBX: ffff8883d1d5b800 RCX: ffff8884283db200
Apr 20 03:25:55 h2 kernel: RDX: ffff8884283db2e0 RSI: c1316515e40a92f6 RDI: 0000000000000002
Apr 20 03:25:55 h2 kernel: RBP: ffffc9000216be50 R08: 0000000000000001 R09: 0000000000000001
Apr 20 03:25:55 h2 kernel: R10: 0000000000030000 R11: 0000000000000000 R12: 0000000000000000
Apr 20 03:25:55 h2 kernel: R13: 0000000000000005 R14: ffff88842bb76000 R15: ffff88841c08cc00
Apr 20 03:25:55 h2 kernel: FS:  0000000000000000(0000) GS:ffff88842ecc0000(0000) knlGS:0000000000000000
Apr 20 03:25:55 h2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 20 03:25:55 h2 kernel: CR2: 000078617de4fffc CR3: 000000040ca0e000 CR4: 00000000000406e0

That's it.  Nothing in the log since boot before, and the next entry is after reboot.

gcc version 9.3.0 on Gentoo.  AMD fx6100 on the Gigabyte board in the log above.    
xorg-server 1.20.8, mesa 20.0.4, xf86-video-amdgpu 19.1.0, linux-firmware 20200413

kernel config attached
Comment 1 Duncan 2020-04-21 09:57:59 UTC
Created attachment 288651 [details]
automated boot-time dmesg dump
Comment 2 Duncan 2020-04-21 10:04:28 UTC
I build kernels from git and can apply testing patches as necessary.  I may bisect, but haven't yet, and it'd take a bit and may not be reliable as the trigger time is variable.  Plus of course I can't do anything I don't want interrupted while attempting to bisect.  So hoping the polaris-11, log and pin to v5.6..v5.7-rc1 is enough.
Comment 3 Duncan 2020-04-23 04:59:01 UTC
CCed the two from MAINTAINERS bugzi would let me add.  It wouldn't let me add amd-gfx@ or david1.zhou@, and Alex's gmail address according to bugzi isn't what's in MAINTAINERS.
Comment 4 Duncan 2020-04-27 19:24:54 UTC
Still there with 5.7-rc3, altho /maybe/ it's not triggering as quickly.  Took 13 hours to trigger this time and I'd almost decided it was fixed as it had been triggering sooner than that, but could simply be luck.  Rebooted to rc3 again.  We'll see...
Comment 5 Duncan 2020-04-27 19:42:32 UTC
Well, that didn't take long.  Four konsole terminals open to do (various aspects of) a system update.  Just a few seconds after I entered the (git-based) sync command, display-FREEZE!

Back on 5.6.0 now.  I'll probably test again with rc4, perhaps earlier if I see a set of drm/amdgpu updates in mainline git.
Comment 6 Alex Deucher 2020-04-27 19:43:50 UTC
Can you bisect?
Comment 7 Duncan 2020-05-01 08:20:43 UTC
Bisecting, but it's slow going when the bug can take 12+ hours to trigger, and even then I can't be sure a "good" is actually so.

So far (at 5.6.0-01623-g12ab316ce, ~7 bisect steps to go, under 100 commits "after"), the first few were all "good", while the one I'm currently testing obviously isn't "bad" in terms of this bug yet, but does display a nasty buffer-sync issue with off-frame read-outs and eventual firefox crashes trying to play 4k@30fps youtube in firefox, a bit of a struggle with this kit but usually OK (it's the 4k@60fps that's the real problem in firefox/chromium, tho it tends to be fine without the browser overhead in mpv/smplayer/vlc).

But I hadn't seen that issue with the full 5.7-rc1 thru rc3, so it was apparently already fixed with rc1.  And no incidents of this bug, full system or full graphics lockups with a segfault in amdgpu_dm_atomic_commit_tail, during the bisect yet.
Comment 8 Duncan 2020-05-01 08:28:50 UTC
Hmm.  Don't think I mentioned on this bug yet that I'm running dual 4K TVs as monitors.  So it could only trigger on dual display, and two 4K displays means it's pumping a lot more pixels than most cards, too.
Comment 9 Duncan 2020-05-02 16:03:21 UTC
I'm not there yet but it's starting to look like a possibly dud bisect: everything showing good so far.  Maybe I didn't wait long enough for the bug to trigger at some step and I'm running up the wrong side of the tree, or maybe it's not drm after all (I thought I'd try something new and limit the paths to drivers/gpu/drm/ and include/drm/, but that may have been a critical mistake).  Right now there's only 3-4 even remotely reasonable candidates (out of 14 left to test... the rest being mediatek or similar):

4064b9827
Peter Xu
mm: allow VM_FAULT_RETRY for multiple times

6bfef2f91
Jason Gunthorpe
mm/hmm: remove HMM_FAULT_SNAPSHOT

17ffdc482
Christoph Hellwig
mm: simplify device private page handling in hmm_range_fault

And maybe (but I'm neither EFI nor 32-bit)

72e0ef0e5
Mikel Rychliski
PCI: Use ioremap(), not phys_to_virt() for platform ROM


Meanwhile, user-side I've gotten vulkan/mesa/etc updates recently.  I'm considering checking out linus-master/HEAD again, doing a pull, and seeing if by chance either the last week's kernel updates or the user-side updates have eliminated the problem.  If not I can come back and finish the bisect (or try just reverting those four on current linus-master/HEAD), before starting a new clean bisect if necessary.  Just saved the bisect log and current pointer.
Comment 10 Duncan 2020-05-03 15:10:59 UTC
(In reply to Duncan from comment #9)
> I'm not there yet but it's starting to look like a possibly dud bisect:
> everything showing good so far

Good but not ideal news!

I did get an apparent graphics crash at the bisect-point above, but it didn't dump anything in the log this time and behavior was a bit different than usual for this bug -- audio continued playing longer and I was able to confirm SRQ-E termination via audio and cpu-fan, and SRQ-S sync via sata-activity LED.

So I'm not sure it's the same bug, or maybe a different one; I'm bisecting pre-rc1 after all so others aren't unlikely.

So I'm rebooted to the same bisect step to try again, with any luck to get that gpf dump in the log confirming it's the same bug this time.

If it *is* the same bug, it looks like I avoided a dud bisect after all, just happened to be all good until almost the very end, I'm only a few steps away from pinning it down, and it's almost certainly one of the commits listed in comment #9. =:^)

> Meanwhile, user-side I've gotten vulkan/mesa/etc updates recently.  I'm
> considering checking out linus-master/HEAD again, doing a pull, and seeing
> if by chance either the last week's kernel updates or the user-side updates
> have eliminated the problem.

Been there, done that, still had the bug, with gpf-log-dump confirmation.  Back to the bisect.
Comment 11 Duncan 2020-05-05 04:23:26 UTC
(In reply to Duncan from comment #10)
> I did get an apparent graphics crash at the bisect-point above, but it
> didn't dump anything in the log this time

Got a gpf dump with amdgpu_atomic_commit_tail, confirming it's the same bug.  Still a couple bisect steps to go, but the EFI candidate's out now, leaving only three (plus mediatek and nouveau, and an amdgpu that says it was doc fix only), and the current round is testing between 406 and the 6bf/17f pair so I should eliminate at least one of the three this round:

4064b9827
Peter Xu
mm: allow VM_FAULT_RETRY for multiple times

6bfef2f91
Jason Gunthorpe
mm/hmm: remove HMM_FAULT_SNAPSHOT

17ffdc482
Christoph Hellwig
mm: simplify device private page handling in hmm_range_fault
Comment 12 Duncan 2020-05-06 17:46:44 UTC
OK, bisect says:

4064b9827
Peter Xu
mm: allow VM_FAULT_RETRY for multiple times

... which came in via akpm and touches drivers/drm/ttm/ttm_bo_vm.c.

But I'm not entirely confident in that result ATM.  Among other things I had set ZSWAP_DEFAULT_ON for 5.7, and I had zswap configured but not active previously, so that could be it too.  I'm not typically under enough memory pressure to trigger it, but...

Luckily a git show -R generated patch still applies cleanly on current master (5.7.0-rc4-00029-gdc56c5acd, tho I've only built it not rebooted to test it yet) so I can test both the commit-revert patch and the changed zswap options now.

So I'm confirming still.

But perhaps take another look at that commit and see if there's some way allowing unlimited VM_FAULT_RETRY could leave drm at least on on amdgpu eternally stalled, which does seem to fit the symptoms, whether it's unlimited VM_FAULT_RETRY or not.
Comment 13 Duncan 2020-05-06 22:06:49 UTC
Well, so much for /that/ bisect!  Took me a few hours but then had the graphics stall twice in a few minutes... with the above commit reverted AND with memory compression off.

So it's back to square one, except I know that my originally chosen new memory compression options aren't involved.  New bisect time.
Comment 14 Duncan 2020-06-03 00:04:49 UTC
Unfortunately the bug's still there in 5.7 release. =:^(

Not properly bisected yet as after the first failure I needed something reasonably stable for awhile as I had about a dozen live-git kde-plasma userspace bugs to track down and report, but kernel 5.6.0-07388-gf365ab31e has been exactly that, stable for me, for weeks now (built May 6), and the bug definitely triggered in 5.7-rc1, so it's gotta be between those.  With the unrelated userspace side mostly fixed now, and this kernelspace bug now known to remain unfixed in the normal development cycle, maybe I can get back to bisecting it again.
Comment 15 Duncan 2020-06-21 07:01:42 UTC
Bug's in v5.8-rc1-226-g4333a9b0b too.
Comment 16 rtmasura+kernel 2020-06-22 15:20:33 UTC
Reporting I've had the same issue with kernel 5.7.2 and 5.7.4:

Jun 22 07:10:24 abiggun kernel: general protection fault, probably for non-canonical address 0xd3d74027d6d8fad4: 0000 [#1] PREEMPT SMP NOPTI
Jun 22 07:10:24 abiggun kernel: CPU: 0 PID: 32680 Comm: kworker/u12:9 Not tainted 5.7.4-arch1-1 #1
Jun 22 07:10:24 abiggun kernel: Hardware name: System manufacturer System Product Name/Crosshair IV Formula, BIOS 1102    08/24/2010
Jun 22 07:10:24 abiggun kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Jun 22 07:10:24 abiggun kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 22 07:10:24 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 f>
Jun 22 07:10:24 abiggun kernel: RSP: 0018:ffffb0cc421abaf8 EFLAGS: 00010286
Jun 22 07:10:24 abiggun kernel: RAX: 0000000000000006 RBX: ffffa21b8e16c400 RCX: ffffa21cab9c8800
Jun 22 07:10:24 abiggun kernel: RDX: ffffa21ca7326200 RSI: ffffffffc10de1a0 RDI: d3d74027d6d8fad4
Jun 22 07:10:24 abiggun kernel: RBP: ffffb0cc421abe60 R08: 0000000000000001 R09: 0000000000000001
Jun 22 07:10:24 abiggun kernel: R10: 00000000000002be R11: 00000000001c57a1 R12: 0000000000000000
Jun 22 07:10:24 abiggun kernel: R13: 0000000000000006 R14: ffffa218e4959800 R15: ffffa219e5b12780
Jun 22 07:10:24 abiggun kernel: FS:  0000000000000000(0000) GS:ffffa21cbfc00000(0000) knlGS:0000000000000000
Jun 22 07:10:24 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 22 07:10:24 abiggun kernel: CR2: 00007fec2b573008 CR3: 0000000344bd8000 CR4: 00000000000006f0
Jun 22 07:10:24 abiggun kernel: Call Trace:
Jun 22 07:10:24 abiggun kernel:  ? cpumask_next_and+0x19/0x20
Jun 22 07:10:24 abiggun kernel:  ? update_sd_lb_stats.constprop.0+0x115/0x8f0
Jun 22 07:10:24 abiggun kernel:  ? __update_load_avg_cfs_rq+0x277/0x2f0
Jun 22 07:10:24 abiggun kernel:  ? update_load_avg+0x58f/0x660
Jun 22 07:10:24 abiggun kernel:  ? update_curr+0x108/0x1f0
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 22 07:10:24 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 22 07:10:24 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 22 07:10:24 abiggun kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 22 07:10:24 abiggun kernel:  process_one_work+0x1da/0x3d0
Jun 22 07:10:24 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 22 07:10:24 abiggun kernel:  worker_thread+0x4d/0x3e0
Jun 22 07:10:24 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 22 07:10:24 abiggun kernel:  kthread+0x13e/0x160
Jun 22 07:10:24 abiggun kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 22 07:10:24 abiggun kernel:  ret_from_fork+0x22/0x40
Jun 22 07:10:24 abiggun kernel: Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_plantronics mc vhost_net vhost tap vhost_iotlb snd_seq_dumm>
Jun 22 07:10:24 abiggun kernel:  crypto_simd cryptd glue_helper xts dm_crypt hid_generic usbhid hid raid456 libcrc32c crc32c_generic async_raid6_recov async>
Jun 22 07:10:24 abiggun kernel: ---[ end trace 536cfe34e3c36293 ]---
Jun 22 07:10:24 abiggun kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 22 07:10:24 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 f>
Jun 22 07:10:25 abiggun kernel: RSP: 0018:ffffb0cc421abaf8 EFLAGS: 00010286
Jun 22 07:10:25 abiggun kernel: RAX: 0000000000000006 RBX: ffffa21b8e16c400 RCX: ffffa21cab9c8800
Jun 22 07:10:25 abiggun kernel: RDX: ffffa21ca7326200 RSI: ffffffffc10de1a0 RDI: d3d74027d6d8fad4
Jun 22 07:10:25 abiggun kernel: RBP: ffffb0cc421abe60 R08: 0000000000000001 R09: 0000000000000001
Jun 22 07:10:25 abiggun kernel: R10: 00000000000002be R11: 00000000001c57a1 R12: 0000000000000000
Jun 22 07:10:25 abiggun kernel: R13: 0000000000000006 R14: ffffa218e4959800 R15: ffffa219e5b12780
Jun 22 07:10:25 abiggun kernel: FS:  0000000000000000(0000) GS:ffffa21cbfc00000(0000) knlGS:0000000000000000
Jun 22 07:10:25 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 22 07:10:25 abiggun kernel: CR2: 00007fec2b573008 CR3: 0000000344bd8000 CR4: 00000000000006f0
Comment 17 Duncan 2020-06-22 17:44:59 UTC
(In reply to rtmasura+kernel from comment #16)
> Reporting I've had the same issue with kernel 5.7.2 and 5.7.4:

Thanks!

> Jun 22 07:10:24 abiggun kernel: Hardware name: System manufacturer System
> Product Name/Crosshair IV Formula, BIOS 1102    08/24/2010

So socket AM3 from 2010, slightly older than my AM3+ from 2012.  Both are PCIe-2.0.

What's your CPU and GPU?

As above my GPU is Polaris11 (AMD Radeon RX 460, arctic-islands/gcn4 series, pcie-3),  AMD fx6100 CPU.

Guessing the bug is gpu-series code specific or there'd be more people howling, so what you're running for gpu is significant.  It's /possible/ it may be specific to people running pcie mismatch, as well (note my pcie-3 gpu card on a pcie-2 mobo).

> Jun 22 07:10:24 abiggun kernel: Workqueue: events_unbound commit_work
> [drm_kms_helper]
> 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]

That's the bit of the dump I understand, similar to mine...

If you can find a quicker/more-reliable way to trigger the crash, it'd sure be helpful for bisecting.  Also, if you're running a bad kernel enough to tell (not just back to 5.6 after finding 5.7 bad), does it reliably dump-log before the reboot for you?  I'm back to a veerrry--sloowww second bisect attempt, with for instance my current kernel having crashed three times now so it's obviously bugged, but nothing dumped in the log on the way down yet so I can't guarantee it's the _same_ bug (the bisect is in pre-rc1 code so chances of a different bug are definitely non-zero), and given the bad results on the first bisect I'm trying to confirm each bisect-bad with a log-dump and each bisect-good with at least 3-4 days no crash.  But this one's in between right now, frequent crashing but no log-dump to confirm it's the same bug.
Comment 18 rtmasura+kernel 2020-06-22 17:57:25 UTC
lspci:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 Northbridge only single slot PCI-e GFX Hydra part (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GFX port 0)
00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 0)
00:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 3)
00:0b.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD990 PCI to PCI bridge (PCI Express GFX2 port 0)
00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP2 Port 0)
00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [RAID5 mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
02:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:04.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:05.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:06.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:08.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:09.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
06:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P4000] (rev a1)
09:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
0a:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
0b:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
0b:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
0c:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1470 (rev c3)
0d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1471
0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
0e:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
                                                                        
A few notes on that: The AMD Vega56 is used for this PC, the Quadro P4000 is disabled on my system and passed through to VMs. 

I haven't found any way to trigger it. Seems completely random. Sat down this morning to update a VM (not the one with the nvidia passthrough) and it froze, wasn't any real graphical things going on other than normal KDE stuff. 


lscpu:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          6
On-line CPU(s) list:             0-5
Thread(s) per core:              1
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      16
Model:                           10
Model name:                      AMD Phenom(tm) II X6 1090T Processor
Stepping:                        0
CPU MHz:                         3355.192
BogoMIPS:                        6421.46
Virtualization:                  AMD-V
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-5
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_
                                 opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 po
                                 pcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate vmmcall
                                  npt lbrv svm_lock nrip_save pausefilter


I would be happy to help with any testing, just let me know what information you need.
Comment 19 Duncan 2020-06-22 19:36:29 UTC
(In reply to rtmasura+kernel from comment #18)
> 09:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P4000]
> (rev a1)

> 0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)

> A few notes on that: The AMD Vega56 is used for this PC, the Quadro P4000 is
> disabled on my system and passed through to VMs. 

So newer graphics, Vega56/gcn5 compared to my gcn4.

No VMs at all here so that can be excluded as a factor (unless it's a minor trigger similar to my zooming or video play).

> I haven't found any way to trigger it. Seems completely random. Sat down
> this morning to update a VM (not the one with the nvidia passthrough) and it
> froze, wasn't any real graphical things going on other than normal KDE
> stuff. 

KDE/Plasma here too.  I think kwin exercises the opengl a bit more than some WMs, in part because it's a compositor as well.  The bug most often hits here when playing video or using kwin's zoom effect, which exercise the graphics a bit.

So mostly kde/kwin triggers could lower the population hitting it and could be a factor, based on both of us running it.

> Model name:                      AMD Phenom(tm) II X6 1090T Processor

Newer graphics, gcn5 to gcn4, older cpu, phenom ii to fx, than here.

So we know gcn4 and gcn5 are affected, and pcie2 bus with pcie3 cards and kde/kwin are common-factor possible triggers so far.

> I would be happy to help with any testing, just let me know what information
> you need.

If you happen to run anything besides KDE/Plasma on X, duplicating (or failing to duplicate) the bug on non-kde and/or on wayland would be useful info.  I only run KDE Plasma on X here.  Well, that and CLI (on amdgpu-drm-framebuffer) more than some but not enough that I'd have expected to see it there, which I haven't.
Comment 20 rtmasura+kernel 2020-06-22 20:00:46 UTC
I have XFCE4 installed as well, I'll give it a test and let you know in 24 hours; a GPF should have happened by then
Comment 21 rtmasura+kernel 2020-06-23 15:36:40 UTC
OK. I've uninstalled the vast majority of KDE and am using a vanilla XFCE4. It's been about 12 hours on 5.7.4-arch1-1 and I have yet to have a crash. It is looking like it may be something with KDE.
Comment 22 Duncan 2020-06-23 23:41:25 UTC
(In reply to rtmasura+kernel from comment #21)
> OK. I've uninstalled the vast majority of KDE and am using a vanilla XFCE4.
> It's been about 12 hours on 5.7.4-arch1-1 and I have yet to have a crash. It
> is looking like it may be something with KDE.

Note that it is possible to run kwin (kwin_x11 being the actual executable) on another desktop, or conversely, a different WM on plasma.  To run kwin and make it replace the existing WM you'd simply type in (in the xfce runner or terminal window, it can be done from a different VT as well but then you gotta feed kwin the display information too) kwin_x11 --replace.  Presumably other WMs have a similar command-line option.

I've never actually done it on a non-plasma desktop (tho I run live-git plasma and frameworks so I must always be prepared to restart it or various other plasma components, to the point I have non-kde-invoked shortcuts setup to do it there), but I /think/ kwin would continue to use the configuration setup on kde, the various window rules, configured kwin keyboard shortcuts and effects, etc.

That could prove whether it's actually kwin triggering or not (tho it's a kernel bug regardless), tho I suspect the proof is academic at this point given that you've demonstrated that the trigger does appear to be kde/plasma related, at least.  IMO kwin triggering is a reasonably safe assumption given that.  But it does explain why the bug isn't widely reported, plasma being the apparent biggest trigger and limited to specific now older generations of hardware means few people, even of those running the latest kernels, are going to see it.

Meanwhile, I actually got a log-dump on the 4th crash of the kernel at that bisect step, confirming it is indeed this bug, and have advanced a bisect step.  But git says I still have ~11 steps, 1000+ commits, so it's still well too large to start trying to pick out candidate buggy commits from the remainder.  Slow going indeed.  At this rate a full bisect and fix could well be after 5.8 release, giving us two full bad release cycles and kernels before a fix.  Not good. =:^(
Comment 23 rtmasura+kernel 2020-06-24 08:55:18 UTC
Yeah, over 24 hours and still stable. And glad I could help, I rarely have anything I can give back to the community.

And wow, that much work. Truly, we all do appreciate your work, but I don't think most of us understand how much. Thank you from all of us :)
Comment 24 rtmasura+kernel 2020-06-27 04:37:18 UTC
I've been up and stable on XFCE4 since that last message, but just crashed today with a bit of a different error. This happened after I turned on a screen tear fix:

xfconf-query -c xfwm4 -p /general/vblank_mode -s glx

I also didn't reboot to activate it, I just hot loaded it with:

xfwm4 --replace --vblank=glx &

Don't think that changes anything, but just in case. Not sure if it's related, I had a game idling on my monitor while I was cooking, and it's the first time I had played it. It was Battle of Wesnoth. Anyway, here's the log:


Jun 26 21:08:03 abiggun kernel: general protection fault, probably for non-canonical address 0x3b963e011fb9f84: 0000 [#1] PREEMPT SMP NOPTI
Jun 26 21:08:03 abiggun kernel: CPU: 4 PID: 362093 Comm: kworker/u12:1 Not tainted 5.7.4-arch1-1 #1
Jun 26 21:08:03 abiggun kernel: Hardware name: System manufacturer System Product Name/Crosshair IV Formula, BIOS 1102    08/24/2010
Jun 26 21:08:03 abiggun kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Jun 26 21:08:03 abiggun kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 26 21:08:03 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00 00 00 00 01 00 00
Jun 26 21:08:03 abiggun kernel: RSP: 0018:ffff993cc4037af8 EFLAGS: 00010206
Jun 26 21:08:03 abiggun kernel: RAX: 0000000000000006 RBX: ffff931ae09c0800 RCX: ffff931bfe478000
Jun 26 21:08:03 abiggun kernel: RDX: ffff931bf2dd2600 RSI: ffffffffc10a51a0 RDI: 03b963e011fb9f84
Jun 26 21:08:03 abiggun kernel: RBP: ffff993cc4037e60 R08: 0000000000000001 R09: 0000000000000001
Jun 26 21:08:03 abiggun kernel: R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000
Jun 26 21:08:03 abiggun kernel: R13: 0000000000000006 R14: ffff931bd0450c00 R15: ffff931b3574dc80
Jun 26 21:08:03 abiggun kernel: FS:  0000000000000000(0000) GS:ffff931c3fd00000(0000) knlGS:0000000000000000
Jun 26 21:08:03 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 21:08:03 abiggun kernel: CR2: 00007fe602dc0008 CR3: 0000000418080000 CR4: 00000000000006e0
Jun 26 21:08:03 abiggun kernel: Call Trace:
Jun 26 21:08:03 abiggun kernel:  ? tomoyo_write_self+0x100/0x1d0
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x34/0x70
Jun 26 21:08:03 abiggun kernel:  ? __switch_to_asm+0x40/0x70
Jun 26 21:08:03 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 26 21:08:03 abiggun kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 26 21:08:03 abiggun kernel:  process_one_work+0x1da/0x3d0
Jun 26 21:08:03 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 26 21:08:03 abiggun kernel:  worker_thread+0x4d/0x3e0
Jun 26 21:08:03 abiggun kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 26 21:08:03 abiggun kernel:  kthread+0x13e/0x160
Jun 26 21:08:03 abiggun kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 26 21:08:03 abiggun kernel:  ret_from_fork+0x22/0x40
Jun 26 21:08:03 abiggun kernel: Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device mc hid_plantronics macvtap macvlan vhost_net vhost tap vhost_iotlb fuse xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc rfkill tun lm92 hwmon_vid input_leds amdgpu squashfs nouveau loop edac_mce_amd kvm_amd ccp rng_core mxm_wmi snd_hda_codec_via gpu_sched snd_hda_codec_generic snd_hda_codec_hdmi ledtrig_audio kvm ttm snd_hda_intel snd_intel_dspcfg wmi_bmof snd_hda_codec drm_kms_helper snd_hda_core pcspkr sp5100_tco k10temp snd_hwdep snd_pcm cec i2c_piix4 joydev rc_core mousedev igb syscopyarea snd_timer sysfillrect snd sysimgblt i2c_algo_bit dca fb_sys_fops soundcore asus_atk0110 evdev mac_hid wmi drm crypto_user agpgart ip_tables x_tables ext4 crc16 mbcache jbd2 ecb crypto_simd cryptd
Jun 26 21:08:03 abiggun kernel:  glue_helper xts hid_generic usbhid hid dm_crypt raid456 libcrc32c crc32c_generic async_raid6_recov async_memcpy async_pq async_xor xor async_tx ohci_pci raid6_pq md_mod ehci_pci ehci_hcd ohci_hcd xhci_pci xhci_hcd ata_generic pata_acpi pata_jmicron vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio dm_mod
Jun 26 21:08:03 abiggun kernel: ---[ end trace 4e7c8ad2195077a2 ]---
Jun 26 21:08:03 abiggun kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 26 21:08:03 abiggun kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00 00 00 00 01 00 00
Jun 26 21:08:03 abiggun kernel: RSP: 0018:ffff993cc4037af8 EFLAGS: 00010206
Jun 26 21:08:03 abiggun kernel: RAX: 0000000000000006 RBX: ffff931ae09c0800 RCX: ffff931bfe478000
Jun 26 21:08:03 abiggun kernel: RDX: ffff931bf2dd2600 RSI: ffffffffc10a51a0 RDI: 03b963e011fb9f84
Jun 26 21:08:03 abiggun kernel: RBP: ffff993cc4037e60 R08: 0000000000000001 R09: 0000000000000001
Jun 26 21:08:03 abiggun kernel: R10: 0000000000000018 R11: 0000000000000018 R12: 0000000000000000
Jun 26 21:08:03 abiggun kernel: R13: 0000000000000006 R14: ffff931bd0450c00 R15: ffff931b3574dc80
Jun 26 21:08:03 abiggun kernel: FS:  0000000000000000(0000) GS:ffff931c3fd00000(0000) knlGS:0000000000000000
Jun 26 21:08:03 abiggun kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 26 21:08:03 abiggun kernel: CR2: 00007fe602dc0008 CR3: 0000000418080000 CR4: 00000000000006e0
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.137Z - debug: [REPOSITORY] fetch request: /cytrus.json
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.138Z - debug: [REPOSITORY] request: /cytrus.json
Jun 26 21:08:23 abiggun Thunar[3946]: { repository: 'https://launcher.cdn.ankama.com' }
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.155Z - debug: [REPOSITORY] fetchJson: Parsing data for /cytrus.json
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.156Z - debug: [REGISTRY] update
Jun 26 21:08:23 abiggun Thunar[3946]: 2020-06-27T04:08:23.156Z - debug: [REGISTRY] Parse repository Data
Jun 26 21:08:40 abiggun audit[241624]: ANOM_ABEND auid=1000 uid=1000 gid=985 ses=2 subj==unconfined pid=241624 comm="GpuWatchdog" exe="/opt/google/chrome/chrome" sig=11 res=1
Jun 26 21:08:40 abiggun kernel: GpuWatchdog[241650]: segfault at 0 ip 0000556ef31897ad sp 00007f11132a95d0 error 6 in chrome[556eeeadc000+785b000]
Jun 26 21:08:40 abiggun kernel: Code: 00 79 09 48 8b 7d b0 e8 f1 95 6c fe c7 45 b0 aa aa aa aa 0f ae f0 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 f3 5a ba fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 38 5b 41 5c 41 5d 41 5e
Jun 26 21:08:40 abiggun audit: BPF prog-id=71 op=LOAD
Jun 26 21:08:40 abiggun audit: BPF prog-id=72 op=LOAD
Jun 26 21:08:40 abiggun systemd[1]: Started Process Core Dump (PID 362491/UID 0).
Jun 26 21:08:40 abiggun audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-coredump@4-362491-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 26 21:08:45 abiggun systemd-coredump[362492]: Process 241624 (chrome) of user 1000 dumped core.
                                                  
                                                  Stack trace of thread 241650:
                                                  #0  0x0000556ef31897ad n/a (chrome + 0x62b07ad)
                                                  #1  0x0000556ef17e5c93 n/a (chrome + 0x490cc93)
                                                  #2  0x0000556ef17f7199 n/a (chrome + 0x491e199)
                                                  #3  0x0000556ef17ad6cf n/a (chrome + 0x48d46cf)
                                                  #4  0x0000556ef17f795c n/a (chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a (chrome + 0x48f78b9)
                                                  #6  0x0000556ef180ea1b n/a (chrome + 0x4935a1b)
                                                  #7  0x0000556ef184ae78 n/a (chrome + 0x4971e78)
                                                  #8  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #9  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241624:
                                                  #0  0x00007f1117c2a05f __poll (libc.so.6 + 0xf505f)
                                                  #1  0x00007f11190c663b n/a (libxcb.so.1 + 0xc63b)
                                                  #2  0x00007f11190c845b xcb_wait_for_special_event (libxcb.so.1 + 0xe45b)
                                                  #3  0x00007f11128cd381 n/a (libGLX_mesa.so.0 + 0x57381)
                                                  #4  0x00007f11128c132b n/a (libGLX_mesa.so.0 + 0x4b32b)
                                                  #5  0x0000556ef295706e n/a (chrome + 0x5a7e06e)
                                                  #6  0x0000556ef2955cb8 n/a (chrome + 0x5a7ccb8)
                                                  #7  0x0000556ef17e5c93 n/a (chrome + 0x490cc93)
                                                  #8  0x0000556ef17f7199 n/a (chrome + 0x491e199)
                                                  #9  0x0000556ef17ad999 n/a (chrome + 0x48d4999)
                                                  #10 0x0000556ef17f795c n/a (chrome + 0x491e95c)
                                                  #11 0x0000556ef17d08b9 n/a (chrome + 0x48f78b9)
                                                  #12 0x0000556ef59a9ed9 n/a (chrome + 0x8ad0ed9)
                                                  #13 0x0000556ef13329b4 n/a (chrome + 0x44599b4)
                                                  #14 0x0000556ef139addd n/a (chrome + 0x44c1ddd)
                                                  #15 0x0000556ef1330901 n/a (chrome + 0x4457901)
                                                  #16 0x0000556eeede80ce ChromeMain (chrome + 0x1f0f0ce)
                                                  #17 0x00007f1117b5c002 __libc_start_main (libc.so.6 + 0x27002)
                                                  #18 0x0000556eeeadc6aa _start (chrome + 0x1c036aa)
                                                  
                                                  Stack trace of thread 241636:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241642:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241644:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241643:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 359981:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241651:
                                                  #0  0x00007f1117c34f3e epoll_wait (libc.so.6 + 0xfff3e)
                                                  #1  0x0000556ef192ea1a n/a (chrome + 0x4a55a1a)
                                                  #2  0x0000556ef192c227 n/a (chrome + 0x4a53227)
                                                  #3  0x0000556ef18588d0 n/a (chrome + 0x497f8d0)
                                                  #4  0x0000556ef17f795c n/a (chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a (chrome + 0x48f78b9)
                                                  #6  0x0000556ef1809624 n/a (chrome + 0x4930624)
                                                  #7  0x0000556ef180ea1b n/a (chrome + 0x4935a1b)
                                                  #8  0x0000556ef184ae78 n/a (chrome + 0x4971e78)
                                                  #9  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #10 0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241655:
                                                  #0  0x00007f1119245158 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x10158)
                                                  #1  0x0000556ef1846f60 n/a (chrome + 0x496df60)
                                                  #2  0x0000556ef18475b0 n/a (chrome + 0x496e5b0)
                                                  #3  0x0000556ef17ad716 n/a (chrome + 0x48d4716)
                                                  #4  0x0000556ef17f795c n/a (chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a (chrome + 0x48f78b9)
                                                  #6  0x0000556ef180ea1b n/a (chrome + 0x4935a1b)
                                                  #7  0x0000556ef184ae78 n/a (chrome + 0x4971e78)
                                                  #8  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #9  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241656:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 242011:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241646:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241657:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241658:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 351071:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 351072:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 359972:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241659:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 361357:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241647:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241652:
                                                  #0  0x00007f1119245158 pthread_cond_timedwait@@GLIBC_2.3.2 (libpthread.so.0 + 0x10158)
                                                  #1  0x0000556ef1846f60 n/a (chrome + 0x496df60)
                                                  #2  0x0000556ef18475b0 n/a (chrome + 0x496e5b0)
                                                  #3  0x0000556ef1809c6a n/a (chrome + 0x4930c6a)
                                                  #4  0x0000556ef180a54c n/a (chrome + 0x493154c)
                                                  #5  0x0000556ef180a234 n/a (chrome + 0x4931234)
                                                  #6  0x0000556ef184ae78 n/a (chrome + 0x4971e78)
                                                  #7  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #8  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241653:
                                                  #0  0x00007f1117c34f3e epoll_wait (libc.so.6 + 0xfff3e)
                                                  #1  0x0000556ef192ea1a n/a (chrome + 0x4a55a1a)
                                                  #2  0x0000556ef192c227 n/a (chrome + 0x4a53227)
                                                  #3  0x0000556ef18588d0 n/a (chrome + 0x497f8d0)
                                                  #4  0x0000556ef17f795c n/a (chrome + 0x491e95c)
                                                  #5  0x0000556ef17d08b9 n/a (chrome + 0x48f78b9)
                                                  #6  0x0000556ef180ea1b n/a (chrome + 0x4935a1b)
                                                  #7  0x0000556ef184ae78 n/a (chrome + 0x4971e78)
                                                  #8  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #9  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241660:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241661:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241662:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241665:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241666:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241663:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241667:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241664:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241851:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241852:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241853:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 245560:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x0000556ef1846e48 n/a (chrome + 0x496de48)
                                                  #2  0x0000556ef18475d9 n/a (chrome + 0x496e5d9)
                                                  #3  0x0000556ef184739f n/a (chrome + 0x496e39f)
                                                  #4  0x0000556ef17ad751 n/a (chrome + 0x48d4751)
                                                  #5  0x0000556ef17f795c n/a (chrome + 0x491e95c)
                                                  #6  0x0000556ef17d08b9 n/a (chrome + 0x48f78b9)
                                                  #7  0x0000556ef180ea1b n/a (chrome + 0x4935a1b)
                                                  #8  0x0000556ef184ae78 n/a (chrome + 0x4971e78)
                                                  #9  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #10 0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241862:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 361354:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 361028:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241902:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 361345:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 361358:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241645:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241638:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241639:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241640:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241641:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241750:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241855:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 309100:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 359991:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
                                                  
                                                  Stack trace of thread 241637:
                                                  #0  0x00007f1119244e32 pthread_cond_wait@@GLIBC_2.3.2 (libpthread.so.0 + 0xfe32)
                                                  #1  0x00007f111158e3bc n/a (radeonsi_dri.so + 0x4ae3bc)
                                                  #2  0x00007f111158cdb8 n/a (radeonsi_dri.so + 0x4acdb8)
                                                  #3  0x00007f111923e422 start_thread (libpthread.so.0 + 0x9422)
                                                  #4  0x00007f1117c34bf3 __clone (libc.so.6 + 0xffbf3)
Comment 25 rtmasura+kernel 2020-06-27 04:38:47 UTC
Same kernel (5.7.4) and I'll try to reproduce it, and if it happens I'll turn off the screen tear and try to reproduce again

Let me know if that's anything I can provide you
Comment 26 rtmasura+kernel 2020-06-27 05:16:46 UTC
and just got another crash, only watching a video in chrome. Guess the chrome bit at the end might be more important than I thought

I *think* I've turned off the glx for xfwm.. we'll see. My computer has been showing video in chrome every day without issues before today. I hadn't updated since last week either, no changes in the system.
Comment 27 rtmasura+kernel 2020-06-27 06:08:43 UTC
and another crash, chrome's good at causing them (watching youtube). Used -s "" for the setting which I think should set it to 'auto', and what I assumed was default. I've changed that to -s "off" to see if that helps.
Comment 28 Duncan 2020-06-27 07:07:39 UTC
(In reply to rtmasura+kernel from comment #27)
> and another crash, chrome's good at causing them (watching youtube). Used -s
> "" for the setting which I think should set it to 'auto', and what I assumed
> was default. I've changed that to -s "off" to see if that helps.

You just added those updates as I was typing a comment pointing out that chrome/chromium in your bug; bugzilla warned of a mid-air collision!  Chrom(e|ium) has new vulkan accel code and very likely exercises some of the same relatively new amdgpu kernel code kwin does, so both of them triggering the bug wouldn't surprise me at all.

As it happens I switched back to firefox during the 5.6 kernel cycle, so haven't seen chromium's interaction with the (kernel 5.7) bug myself, but once I saw it in that trace I said to myself I bet that's his trigger!


FWIW I advanced a couple more bisect steps pretty quickly as it was triggering as I tried to complete system updates (which on gentoo of course means building the packages), but then I hit an apparently good kernel, and uptime says 3 days now, something I've not seen in awhile!  Only thing is, I finished those updates and they were pretty calm the next couple days, so I've not been stressing the system to the same extent, either.  Given the problems I got myself into the first bisect run, I'm going to run on this kernel a bit longer before I do that bisect good to advance a step.  If it reaches a week and I've done either a good system update or a some heavy 4k@60 youtube on firefox, I'll call it good, but I'm not ready to yet.

The good news is, in a couple more bisect steps I'll be down to some practical number of remaining commits to report the range here, and if they have the time, a dev with a practiced eye should be able to narrow it down by say 3/4 (two steps ahead of my bisect), leaving something actually practical to examine closer.  After that it'll be past the point of my bisect being the only bottleneck, if it's big enough to get dev priority time, of course.  If not, I'll just have to keep plugging away at the bisect...
Comment 29 zzyxpaw 2020-06-27 22:26:41 UTC
Just hit this on Archlinux with linux-5.7.6 on a Vega 64. So far I've had three crashes mostly occuring within the first few minutes of uptime. I'm not running kwin or chrome, just a light window manager (bspwm) and compton.

During the first two, steam's fossilize was running which lead me to suspect it was triggered by an interaction with that. However the third crashed before I even managed to start steam, so either I'm just lucky or my system is good at triggering this. @Duncan I'm not sure if you want to muddle your bisect results with a different system configuration, but I'm happy to help test commits if that would be helpful.

I've noticed the call traces reported in the kernel log are slightly different for each crash; I'm not sure if they're likely to be useful or not. Here's at least the one from my first crash:

Jun 27 14:04:40 erebor kernel: general protection fault, probably for non-canonical address 0x5dda9795528973db: 0000 [#1] PREEMPT SMP NOPTI
Jun 27 14:04:40 erebor kernel: CPU: 14 PID: 193610 Comm: kworker/u32:14 Tainted: G           OE     5.7.6-arch1-1 #1
Jun 27 14:04:40 erebor kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./AB350 Pro4, BIOS P4.90 06/14/2018
Jun 27 14:04:40 erebor kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Jun 27 14:04:40 erebor kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 27 14:04:40 erebor kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00>
Jun 27 14:04:40 erebor kernel: RSP: 0018:ffffbcec0a4afaf8 EFLAGS: 00010206
Jun 27 14:04:40 erebor kernel: RAX: 0000000000000006 RBX: ffff9b71dbaed000 RCX: ffff9b7472e4b800
Jun 27 14:04:40 erebor kernel: RDX: ffff9b72504ea400 RSI: ffffffffc13181e0 RDI: 5dda9795528973db
Jun 27 14:04:40 erebor kernel: RBP: ffffbcec0a4afe60 R08: 0000000000000001 R09: 0000000000000001
Jun 27 14:04:40 erebor kernel: R10: 0000000000000082 R11: 00000000000730e2 R12: 0000000000000000
Jun 27 14:04:40 erebor kernel: R13: 0000000000000006 R14: ffff9b71dbaed800 R15: ffff9b71a8fdb580
Jun 27 14:04:40 erebor kernel: FS:  0000000000000000(0000) GS:ffff9b747ef80000(0000) knlGS:0000000000000000
Jun 27 14:04:40 erebor kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 27 14:04:40 erebor kernel: CR2: 000056460ce164b0 CR3: 0000000341c86000 CR4: 00000000003406e0
Jun 27 14:04:40 erebor kernel: Call Trace:
Jun 27 14:04:40 erebor kernel:  ? __erst_read+0x160/0x1d0
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x34/0x70
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x40/0x70
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x34/0x70
Jun 27 14:04:40 erebor kernel:  ? __switch_to_asm+0x40/0x70
Jun 27 14:04:40 erebor kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 27 14:04:40 erebor kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 27 14:04:40 erebor kernel:  process_one_work+0x1da/0x3d0
Jun 27 14:04:40 erebor kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 27 14:04:40 erebor kernel:  worker_thread+0x4d/0x3e0
Jun 27 14:04:40 erebor kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 27 14:04:40 erebor kernel:  kthread+0x13e/0x160
Jun 27 14:04:40 erebor kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 27 14:04:40 erebor kernel:  ret_from_fork+0x22/0x40
Jun 27 14:04:40 erebor kernel: Modules linked in: snd_seq_midi snd_seq_dummy snd_seq_midi_event snd_hrtimer snd_seq fuse ccm 8021q garp mrp stp llc snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_de>
Jun 27 14:04:40 erebor kernel:  blake2b_generic libcrc32c crc32c_generic xor uas usb_storage raid6_pq crc32c_intel xhci_pci xhci_hcd
Jun 27 14:04:40 erebor kernel: ---[ end trace cb5c0d96dd991657 ]---
Jun 27 14:04:40 erebor kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 27 14:04:40 erebor kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00>
Jun 27 14:04:40 erebor kernel: RSP: 0018:ffffbcec0a4afaf8 EFLAGS: 00010206
Jun 27 14:04:40 erebor kernel: RAX: 0000000000000006 RBX: ffff9b71dbaed000 RCX: ffff9b7472e4b800
Jun 27 14:04:40 erebor kernel: RDX: ffff9b72504ea400 RSI: ffffffffc13181e0 RDI: 5dda9795528973db
Jun 27 14:04:40 erebor kernel: RBP: ffffbcec0a4afe60 R08: 0000000000000001 R09: 0000000000000001
Jun 27 14:04:40 erebor kernel: R10: 0000000000000082 R11: 00000000000730e2 R12: 0000000000000000
Jun 27 14:04:40 erebor kernel: R13: 0000000000000006 R14: ffff9b71dbaed800 R15: ffff9b71a8fdb580
Jun 27 14:04:40 erebor kernel: FS:  0000000000000000(0000) GS:ffff9b747ef80000(0000) knlGS:0000000000000000
Jun 27 14:04:40 erebor kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 27 14:04:40 erebor kernel: CR2: 000056460ce164b0 CR3: 0000000341c86000 CR4: 00000000003406e0
Comment 30 mnrzk 2020-06-28 01:12:58 UTC
I've been looking at this bug for a while now and I'll try to share what I've found about it.

In some conditions, when amdgpu_dm_atomic_commit_tail calls dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct dm_atomic_state* with an garbage context pointer.

I've also found that this bug exclusively occurs when commit_work is on the workqueue. After forcing drm_atomic_helper_commit to run all of the commits without adding to the workqueue and running the OS, the issue seems to have disappeared. The system was stable for at least 1.5 hours before I manually shut it down (meanwhile it has usually crashed within 30-45 minutes).

Perhaps there's some sort of race condition occurring after commit_work is queued?
Comment 31 Duncan 2020-06-28 10:48:15 UTC
(In reply to mnrzk from comment #30)
> In some conditions, when amdgpu_dm_atomic_commit_tail calls
> dm_atomic_get_new_state, dm_atomic_get_new_state returns a struct
> dm_atomic_state* with an garbage context pointer.

Good! Someone with the bug who can actually read and work the code, now. Portends well for a fix.  =:^)

> I've also found that this bug exclusively occurs when commit_work is on the
> workqueue. After forcing drm_atomic_helper_commit to run all of the commits
> without adding to the workqueue and running the OS, the issue seems to have
> disappeared.

I see it always with the workqueue too, but not being a dev I simply assumed that was how it was; I had no idea it could be taken off the workqueue.

> The system was stable for at least 1.5 hours before I manually
> shut it down (meanwhile it has usually crashed within 30-45 minutes).

You're seeing a crash much faster than I am.  I believe my longest uptime before a crash with the telltale trace was something like two and a half days, with the obvious implications for bisect good since it's always a gamble that I've simply not tested long enough.

> Perhaps there's some sort of race condition occurring after commit_work is
> queued?

Agreed, FWIW, tho you've taken it farther than I could, not being able to work with code much beyond bisect or modifying an existing patch here or there.
Comment 32 Duncan 2020-06-28 15:30:47 UTC
Created attachment 289911 [details]
Partial git bisect log

(In reply to zzyxpaw from comment #29)
> @Duncan I'm not sure if you want to muddle your
> bisect results with a different system configuration, but I'm happy to help
> test commits if that would be helpful.

Here's my current git bisect log you can replay.

I believe that should leave you at v5.6-rc2-245-gcf6c26ec7, which I'm going to build and boot to as soon as I post this.

But if your system's as good at triggering the bug as you suggest, try deleting that last good before the replay as I'm only ~98% sure about it given a potential trigger-time of days on my system.  That should leave you at 7be97138e which you can try triggering it with.  If your system's reliably triggering within minutes and it doesn't trigger on that, you can confirm my bisect good and go from there.

Note that if you're building with gcc-10.x you'll likely need a couple patches that were committed later in the 5.7 cycle, depending on if if they were applied before or after whatever you're testing.  If you're building with gcc-9.3 (and presumably earlier) they shouldn't be necessary.

a9a3ed1ef and e78d334a5 are the commits in question.  One was necessary to build with gcc-10, the other to get past a boot-time crash when built with gcc-10.  Only one's applying at cf6c26ec7, I don't remember which, but they were both necessary for 7be97138e.

At my somewhat limited git skill level it was easiest to redirect a git show of the commit to a patchfile, then apply the patch on top of whatever git bisect gave me and git reset --hard to clean up the patches before the next git bisect good/bad.  I guess a git cherry-pick would be the usual way to apply them but I'm not entirely sure how that interacts with git bisect, so applying the patches on top was easier way for me, particularly given that I already have scripts to automate patch application for my local default-to-noatime patch.
Comment 33 Michel Dänzer 2020-06-29 07:39:39 UTC
(In reply to rtmasura+kernel from comment #24)
> xfwm4 --replace --vblank=glx &

FWIW, I recommend

 xfwm4 --vblank=xpresent

instead. --vblank=glx is less efficient and relies on rather exotic GLX functionality which can be quirky with Mesa.
Comment 34 mnrzk 2020-06-29 22:09:23 UTC
Has anyone tried 5.8-rc3? I've been testing it out for the past 3 hours and it seems stable to me. Also, there were some amdgpu drm fixes pushed between rc2 and rc3 which could have fixed it.

Could someone else experiencing this bug test 5.8-rc3 and see if it's fixed?

I have some debug code and kernel options which may have interfered with my testing so I wouldn't exactly say the bug is fixed based on my findings.
Comment 35 Duncan 2020-07-01 19:08:44 UTC
(In reply to mnrzk from comment #34)
> Has anyone tried 5.8-rc3? I've been testing it out for the past 3 hours and
> it seems stable to me.

I have now (well, v5.8.0-rc3-00017-g7c30b859a).  Unfortunately got a freeze with our familiar trace fairly quickly (building kde updates at the time) so it's not fixed yet.  =:^(

Note You need to log in before you can comment on or make changes to this bug.