Bug 209019 - [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Summary: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read tra...
Status: NEW
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-24 04:03 UTC by rtmasura+kernel
Modified: 2020-08-27 03:19 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.8.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (97.84 KB, text/plain)
2020-08-25 02:01 UTC, rtmasura+kernel
Details
Xorg.0.log (1.79 MB, text/plain)
2020-08-27 03:19 UTC, rtmasura+kernel
Details

Description rtmasura+kernel 2020-08-24 04:03:35 UTC
In kernel 5.8.1 and kernel 5.8.2 I am having an issue with multiple displays. It's losing detection of one of my monitors and causing it to re-detect the monitors, and detects it, then loses it again, and re-detects, and so forth until it does eventually detect the monitor and the issue goes away for some time. It's random, I can't reproduce it at will. 

I have managed to trigger it:
1. By maximizing a window on a monitor that is not the one it loses connection to.

2. The screen graphical effect such as when xfce-screenshooter darkens the screen to  capture a region


Let me know if there's anything else I can provide. I think Arch Linux is currently at 5.8.3, so I'll do an upgrade.




uname -a:                                                                                                
Linux abiggun 5.8.2-arch1-1 #1 SMP PREEMPT Thu, 20 Aug 2020 20:45:00 +0000 x86_64 GNU/Linux



dmesg:
[21159.093638] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21159.542494] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21169.281470] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21169.712677] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21179.389388] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21179.832841] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21180.189786] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[21193.108746] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[21203.108238] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21203.563296] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21231.665363] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21232.103796] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21232.201005] [drm:dc_link_detect_helper [amdgpu]] *ERROR* No EDID read.
[21232.450766] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[21234.635424] [drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read dpcd data failed.
[21255.768880] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21256.214181] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21266.041592] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21266.514366] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21268.894874] [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
[21269.334409] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21269.431262] [drm:dc_link_detect_helper [amdgpu]] *ERROR* No EDID read.
[21269.679883] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
[21272.604484] [drm] amdgpu_dm_irq_schedule_work FAILED src 1
[21273.016019] [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12



lscpu:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          6
On-line CPU(s) list:             0-5
Thread(s) per core:              1
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      16
Model:                           10
Model name:                      AMD Phenom(tm) II X6 1090T Processor
Stepping:                        0
CPU MHz:                         3297.796
BogoMIPS:                        6423.85
Virtualization:                  AMD-V
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        3 MiB
L3 cache:                        6 MiB
NUMA node0 CPU(s):               0-5
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse
                                  sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good n
                                 opl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm 
                                 extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt cpb hw_pstate vmm
                                 call npt lbrv svm_lock nrip_save pausefilter




lspci (note: the nvidia card is blacklisted and passed to KVM guests. Vega56 is the one connected to the monitors):
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 Northbridge only single slot PCI-e GFX Hydra part (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD890S/RD990 I/O Memory Management Unit (IOMMU)
00:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GFX port 0)
00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 0)
00:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP Port 3)
00:0b.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD990 PCI to PCI bridge (PCI Express GFX2 port 0)
00:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890/RD9x0/RX980 PCI to PCI bridge (PCI Express GPP2 Port 0)
00:11.0 RAID bus controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 SATA Controller [RAID5 mode] (rev 40)
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 42)
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:16.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 10h Processor Link Control
02:00.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:04.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:05.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:06.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:08.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
03:09.0 PCI bridge: PLX Technology, Inc. PEX 8624 24-lane, 6-Port PCI Express Gen 2 (5.0 GT/s) Switch [ExpressLane] (rev bb)
04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
06:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
09:00.0 VGA compatible controller: NVIDIA Corporation GP104GL [Quadro P4000] (rev a1)
09:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
0a:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
0b:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
0b:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 03)
0c:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge (rev c3)
0d:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
0e:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
Comment 1 Alex Deucher 2020-08-24 15:27:29 UTC
Please attach your dmesg output and Xorg log (if using X).  If this is a regression, can you bisect?
Comment 2 rtmasura+kernel 2020-08-24 19:42:58 UTC
Hmm I don't appear to have an up to date Xorg log, unless I'm misunderstanding where it should be:

head /var/log/Xorg.0.log                                                                           
[1105825.601] (--) Log file renamed from "/var/log/Xorg.pid-1681661.log" to "/var/log/Xorg.0.log"
[1105825.603] 
X.Org X Server 1.20.8
X Protocol Version 11, Revision 0
[1105825.603] Build Operating System: Linux Arch Linux
[1105825.603] Current Operating System: Linux abiggun 5.6.11-arch1-1 #1 SMP PREEMPT Wed, 06 May 2020 17:32:37 +0000 x86_64
[1105825.603] Kernel command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=ef7e3964-346a-44d8-b5e9-ee81e59833b9 rw cryptdevice=UUID=64e88839-e390-4431-bbe9-9f25b41860aa:cryptroot root=/dev/mapper/cryptroot usb-storage.quirks=0bc2:ab44:u usb-storage.quirks=0bc2:ab38:u usb-storage.quirks=0bc2:ab45:u amd_iommu=on iommu=pt apparmor=1 security=apparmor vfio-pci.ids=10de:1bb1,10de:10f0
[1105825.603] Build Date: 05 May 2020  05:08:17AM
[1105825.603]  
[1105825.603] Current version of pixman: 0.40.0

tail /var/log/Xorg.0.log                                                                                
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.133] (II) UnloadModule: "libinput"
[1105827.134] (II) UnloadModule: "libinput"
[1105827.188] (II) Server terminated successfully (0). Closing log file.


I have no dmesg other than the one I provided, there were no other errors in dmesg. 


So journal is probably the best bet. I can attach it all if you like:
sudo journalctl -b -1 -p 3 
-- Logs begin at Wed 2020-07-22 10:17:18 PDT, end at Mon 2020-08-24 12:40:41 PDT. --
Aug 23 11:56:00 abiggun systemd-modules-load[391]: Failed to find module 'vboxdrv'
Aug 23 11:56:00 abiggun systemd-modules-load[391]: Failed to find module 'vboxnetflt'
Aug 23 11:56:00 abiggun systemd-modules-load[391]: Failed to find module 'vboxnetadp'
Aug 23 11:56:00 abiggun systemd-modules-load[391]: Failed to find module 'vboxpci'
Aug 23 11:56:00 abiggun systemd-udevd[421]: /etc/udev/rules.d/40-libsane.rules:26: GOTO="libsane_rules_end" has no matching label, ignoring
Aug 23 11:56:01 abiggun systemd-udevd[421]: /etc/udev/rules.d/S99-2000S1.rules:26: GOTO="libsane_rules_end" has no matching label, ignoring
Aug 23 11:56:13 abiggun systemd[1367]: pam_systemd_home(systemd-user:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 11:56:13 abiggun smbd[1229]: [2020/08/23 11:56:13.018977,  0] ../../lib/util/become_daemon.c:135(daemon_ready)
Aug 23 11:56:13 abiggun smbd[1229]:   daemon_ready: daemon 'smbd' finished starting up and ready to serve connections
Aug 23 11:56:19 abiggun gdm-password][1994]: PAM unable to dlopen(/usr/lib/security/pam_gnome_keyring.so): /usr/lib/security/pam_gnome_keyring.so: cannot open shared object file: No such file or directory
Aug 23 11:56:19 abiggun gdm-password][1994]: PAM adding faulty module: /usr/lib/security/pam_gnome_keyring.so
Aug 23 11:56:22 abiggun gdm-password][1994]: pam_systemd_home(gdm-password:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 11:56:22 abiggun systemd[2024]: pam_systemd_home(systemd-user:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 11:56:24 abiggun systemd-resolved[1181]: Failed to send hostname reply: Invalid argument
Aug 23 19:13:45 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:13:52 abiggun xscreensaver[2469]: pam_systemd_home(xscreensaver:auth): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 19:13:53 abiggun kernel: ata3: softreset failed (1st FIS failed)
Aug 23 19:13:53 abiggun kernel: ata4: softreset failed (device not ready)
Aug 23 19:13:53 abiggun kernel: ata2: softreset failed (device not ready)
Aug 23 19:13:53 abiggun kernel: ata1: softreset failed (device not ready)
Aug 23 19:13:53 abiggun kernel: ata6: softreset failed (device not ready)
Aug 23 19:13:53 abiggun kernel: ata5: softreset failed (device not ready)
Aug 23 19:46:38 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:46:48 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:46:58 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:46:59 abiggun kernel: [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
Aug 23 19:47:12 abiggun kernel: [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
Aug 23 19:47:22 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:47:51 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:47:51 abiggun kernel: [drm:dc_link_detect_helper [amdgpu]] *ERROR* No EDID read.
Aug 23 19:47:52 abiggun kernel: [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
Aug 23 19:47:54 abiggun kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read dpcd data failed.
Aug 23 19:48:15 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:48:25 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:48:28 abiggun kernel: [drm:dpcd_set_source_specific_data [amdgpu]] *ERROR* Error in DP aux read transaction, not writing source specific data
Aug 23 19:48:28 abiggun kernel: [drm:dc_link_detect_helper [amdgpu]] *ERROR* No EDID read.
Aug 23 19:48:29 abiggun kernel: [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
Aug 23 19:48:32 abiggun kernel: [drm:dm_restore_drm_connector_state [amdgpu]] *ERROR* Restoring old state failed with -12
Aug 23 21:04:05 abiggun sudo[37251]: pam_systemd_home(sudo:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 22:29:22 abiggun sudo[48011]: pam_systemd_home(sudo:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 22:29:30 abiggun sudo[48099]: pam_systemd_home(sudo:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 22:29:34 abiggun sudo[48131]: pam_systemd_home(sudo:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 22:29:44 abiggun sudo[48157]: pam_systemd_home(sudo:account): Failed to query user record: Unit dbus-org.freedesktop.home1.service not found.
Aug 23 22:29:45 abiggun systemd-udevd[421]: /etc/udev/rules.d/40-libsane.rules:26: GOTO="libsane_rules_end" has no matching label, ignoring
Aug 23 22:29:45 abiggun systemd-udevd[421]: /etc/udev/rules.d/S99-2000S1.rules:26: GOTO="libsane_rules_end" has no matching label, ignoring
Aug 23 22:29:45 abiggun gdm[1234]: GLib: g_hash_table_foreach: assertion 'version == hash_table->version' failed
Aug 23 22:29:48 abiggun kernel: watchdog: watchdog0: watchdog did not stop!
Comment 3 Alex Deucher 2020-08-24 20:17:37 UTC
Please attach your full dmesg output.
Comment 4 rtmasura+kernel 2020-08-25 02:01:24 UTC
Created attachment 292159 [details]
dmesg

I had to wait for the issue to happen again. This is on kernel 5.8.3, and it did not resolve itself until I unplugged a monitor.
Comment 5 rtmasura+kernel 2020-08-26 18:23:38 UTC
Since I started getting this I was using the LTS kernel during the day because this is my work from home machine. I just had it happen with 5.4.60-1-lts. The virtual machine running on top of the machine IS using 5.8.2, but as this one does not have anything passed through other than USB devices, and the graphical display is spice, I think it's unlikely to be caused by the kernel. It's likely something else, so that means this ticket probably should be closed. I was one of the people bothered by https://bugzilla.kernel.org/show_bug.cgi?id=207383 and it was just suspicious that it began happening only after the upgrade to 5.8.1 which included the fix. 

As a troubleshooting step I disconnected power but not video, which caused the issue to remain. Only unplugging the video input stopped it. I've replaced the wire as a first troubleshooting step. If you have an idea what could cause it I'm all ears; right now I'll just assume hardware until I can rule it out.

Thanks for your help.
Comment 6 rtmasura+kernel 2020-08-26 20:04:02 UTC
Hmm, new wire did not help. And it always recovers during shutdown of the kernel, so when it leaves graphical mode. I don't think that rules out hardware, though, so still testing.

Also didn't realize they had backported the hack to the LTS kernel.. so it also is not ruled out.
Comment 7 rtmasura+kernel 2020-08-27 03:19:29 UTC
Created attachment 292183 [details]
Xorg.0.log

Found the second place for the log, let me know if it helps at all

Note You need to log in before you can comment on or make changes to this bug.