Bug 194645
Summary: | amdgpu: amdgpu_resume failed | ||
---|---|---|---|
Product: | Drivers | Reporter: | Mateusz Lenik (mlen) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | deathsimple |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.10 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | full dmesg |
Description
Mateusz Lenik
2017-02-21 08:31:14 UTC
Created attachment 254851 [details]
full dmesg
Can you bisect? I'll try to do it later today. I did try to bisect, but wasn't very successful. It reported totally unrelated commit as the first “bad” commit. I'm sure that resume for amdgpu works correctly on v4.9 and is broken on 4.10-rc1 $ git bisect log git bisect start # good: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9 git bisect good 69973b830859bc6529a7a0468ba0d80ee5117826 # bad: [7ce7d89f48834cefece7804d38fc5d85382edf77] Linux 4.10-rc1 git bisect bad 7ce7d89f48834cefece7804d38fc5d85382edf77 # bad: [72cca7baf4fba777b8ab770b902cf2e08941773f] Merge tag 'staging-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging git bisect bad 72cca7baf4fba777b8ab770b902cf2e08941773f # bad: [b8d2798f32785398fcd1c48ea80c0c6c5ab88537] Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect bad b8d2798f32785398fcd1c48ea80c0c6c5ab88537 # bad: [ea3349a03519dcd4f32d949cd80ab995623dc5ac] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active git bisect bad ea3349a03519dcd4f32d949cd80ab995623dc5ac # bad: [54dbf3a57731da6e21c6e65bd1a7f9ee009708ca] Merge branch 'nway-reset' git bisect bad 54dbf3a57731da6e21c6e65bd1a7f9ee009708ca # bad: [0c288c86928e50d6d8d2efa4ca23dca58d28543e] tipc: create TIPC_LISTEN as a new sk_state git bisect bad 0c288c86928e50d6d8d2efa4ca23dca58d28543e # bad: [e1da71ca88170d1a6232951294b44dc0c824e464] i40e: Drop code for unsupported flow types git bisect bad e1da71ca88170d1a6232951294b44dc0c824e464 # bad: [0f524a80ff35af8a7664d7661d948107da142e04] net: Add warning if any lower device is still in adjacency list git bisect bad 0f524a80ff35af8a7664d7661d948107da142e04 # bad: [f9dbd5a343eeb3e8bf8853256d05188dd27c1ecf] Merge branch 'ila-cached-route' git bisect bad f9dbd5a343eeb3e8bf8853256d05188dd27c1ecf # bad: [687d911466774808ed4926edadb20cc4f0153bed] Merge branch 's390-net' git bisect bad 687d911466774808ed4926edadb20cc4f0153bed # bad: [b07bf5fa32e0991b2634444652de56066fe311f2] Merge branch 'xgene-gpio' git bisect bad b07bf5fa32e0991b2634444652de56066fe311f2 # bad: [86e3a04002378610d77d3941f45ebbf531cf605e] net: ti: netcp_ethss: use new api ethtool_{get|set}_link_ksettings git bisect bad 86e3a04002378610d77d3941f45ebbf531cf605e # bad: [d6d50c7ea42d3659782695bfebf4ae6548d00db5] net: stmmac: use phydev from struct net_device git bisect bad d6d50c7ea42d3659782695bfebf4ae6548d00db5 # bad: [a54e1612bc3dd4a3974b69a6043dd4ddbab14e6d] net: mv643xx_eth: use new api ethtool_{get|set}_link_ksettings git bisect bad a54e1612bc3dd4a3974b69a6043dd4ddbab14e6d # bad: [1e8a655db2f4dd8777eda08c2af7d1381b9eecca] net: mv643xx_eth: use phydev from struct net_device git bisect bad 1e8a655db2f4dd8777eda08c2af7d1381b9eecca # first bad commit: [1e8a655db2f4dd8777eda08c2af7d1381b9eecca] net: mv643xx_eth: use phydev from struct net_device (In reply to Mateusz Lenik from comment #4) > I did try to bisect, but wasn't very successful. It reported totally > unrelated commit as the first “bad” commit. > > I'm sure that resume for amdgpu works correctly on v4.9 and is broken on > 4.10-rc1 Make sure that reboot to the correct kernel after each retry. On Ubuntu I usually change /etc/grub.d/10_linux to automatically boot into the latest installed kernel and not the one with the highest version number as well. I did check that it was using the freshly built kernel manually before every suspend test and I rebuilt x11 drivers every time the revision was changed. Can incremental builds (that means without clean/distclean, just building whatever files have changed) have some effect on bisect outcome? The other weird thing is that after few bisections `make kernelrelease` returned `4.8.0+` even when I wanted to find out what is wrong in commits between v4.9 and v4.10-r1. Any idea what did I do incorrectly? (In reply to Mateusz Lenik from comment #6) > I did check that it was using the freshly built kernel manually before every > suspend test and I rebuilt x11 drivers every time the revision was changed. Please don't rebuild the x11 drivers! If it's a kernel bug you only need to build and check the kernel. X11 shouldn't be involved here. Or does it work with an old X11 and a new kernel? Could be that it is a new feature which causes the issue and you need both new kernel and new X drivers to trigger the problem. > Can incremental builds (that means without clean/distclean, just building > whatever files have changed) have some effect on bisect outcome? Well in theory that should work fine. But in practice it is possible (but unlikely) that the build system didn't track some dependencies correctly. It's just that in this case you usually get a kernel which don't want to boot at all. So you notice the problem immediately. > The other weird thing is that after few bisections `make kernelrelease` > returned `4.8.0+` even when I wanted to find out what is wrong in commits > between v4.9 and v4.10-r1. Any idea what did I do incorrectly? You did nothing wrong, bisect just drifted into a branch which was merged in 4.10, but originally based on 4.8. That just happens when that branch was under development for a long time before merging (e.g. started of as something 4.8 based). (In reply to Christian König from comment #7) > Please don't rebuild the x11 drivers! If it's a kernel bug you only need to > build and check the kernel. X11 shouldn't be involved here. > > Or does it work with an old X11 and a new kernel? Could be that it is a new > feature which causes the issue and you need both new kernel and new X > drivers to trigger the problem. I didn't think about it this way, because I used exactly the same version of the drivers, just rebuild them via emerge @x11-module-rebuild @module-rebuild. On the other hand, initially I didn't rebuild them and used the version built against 4.10. It failed to resume on v4.9, so I assumed I should rebuild them too. I'll check this, maybe x drivers are the real issue here. I retried it with x11 drivers build for 4.9.11, it still crashes [ 64.460823] suspend debug: Waiting for 5 second(s). [ 69.490160] sd 2:0:0:0: [sda] Starting disk [ 69.490214] sd 4:0:0:0: [sdb] Starting disk [ 69.490280] sd 5:0:0:0: [sdc] Starting disk [ 69.491999] power_meter ACPI000D:00: Found ACPI power meter. [ 69.492179] rtc_cmos 00:00: System wakeup disabled by ACPI [ 69.493742] serial 00:03: activated [ 69.495145] serial 00:04: activated [ 69.502637] pcieport 0000:00:1c.3: System wakeup disabled by ACPI [ 69.502808] pcieport 0000:00:1c.2: System wakeup disabled by ACPI [ 69.527164] [drm] PCIE GART of 8192M enabled (table at 0x0000000000040000). [ 69.530900] DPM is already running [ 69.540893] [drm] ring test on 0 succeeded in 14 usecs [ 69.561779] igb 0000:05:00.0 enp5s0: igb: enp5s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX [ 69.806059] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 69.806089] ata1: SATA link down (SStatus 0 SControl 300) [ 69.806109] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 69.806129] ata2: SATA link down (SStatus 0 SControl 300) [ 69.807977] ata3.00: supports DRM functions and may not be fully accessible [ 69.810946] ata4.00: configured for UDMA/100 [ 69.812759] ata3.00: disabling queued TRIM support [ 69.813020] ata3.00: supports DRM functions and may not be fully accessible [ 69.814376] ata8: SATA link down (SStatus 0 SControl 300) [ 69.814392] ata7: SATA link down (SStatus 0 SControl 300) [ 69.814410] ata10: SATA link down (SStatus 0 SControl 300) [ 69.814427] ata9: SATA link down (SStatus 0 SControl 300) [ 69.814445] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 69.814478] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 69.817823] ata3.00: disabling queued TRIM support [ 69.817890] ata3.00: configured for UDMA/133 [ 70.862372] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 1 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.007943] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 2 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.153518] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 3 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.299093] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 4 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.444670] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 5 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.590256] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 6 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.735830] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 7 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.881433] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 8 test failed (scratch(0xC040)=0xCAFEDEAD) [ 71.988538] [drm:sdma_v3_0_ring_test_ring] *ERROR* amdgpu: ring 9 test failed (0xCAFEDEAD) [ 71.988543] [drm:amdgpu_resume] *ERROR* resume of IP block <sdma_v3_0> failed -22 [ 71.988546] [drm:amdgpu_device_resume] *ERROR* amdgpu_resume failed (-22). [ 73.031518] [drm:gfx_v8_0_ring_test_ib] *ERROR* amdgpu: IB test timed out. [ 73.031523] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on GFX ring (-110). [ 73.031525] [drm:amdgpu_device_resume] *ERROR* ib ring test failed (-110). [ 73.061223] [drm] PCIE GART of 8192M enabled (table at 0x0000000000040000). [ 73.064209] DPM is already running [ 73.074176] [drm] ring test on 0 succeeded in 14 usecs [ 74.428386] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 1 test failed (scratch(0xC040)=0xCAFEDEAD) [ 74.578515] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 2 test failed (scratch(0xC040)=0xCAFEDEAD) [ 74.728630] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 3 test failed (scratch(0xC040)=0xCAFEDEAD) [ 74.878757] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 4 test failed (scratch(0xC040)=0xCAFEDEAD) [ 74.951465] ata6.00: qc timeout (cmd 0x27) [ 74.951473] ata6.00: failed to read native max address (err_mask=0x4) [ 74.951474] ata6.00: HPA support seems broken, skipping HPA handling [ 74.951475] ata6.00: revalidation failed (errno=-5) [ 74.951485] ata5.00: qc timeout (cmd 0x27) [ 74.951491] ata5.00: failed to read native max address (err_mask=0x4) [ 74.951492] ata5.00: HPA support seems broken, skipping HPA handling [ 74.951493] ata5.00: revalidation failed (errno=-5) [ 75.028962] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 5 test failed (scratch(0xC040)=0xCAFEDEAD) [ 75.179242] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 6 test failed (scratch(0xC040)=0xCAFEDEAD) [ 75.261502] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 75.261522] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 75.330302] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 7 test failed (scratch(0xC040)=0xCAFEDEAD) [ 75.480460] [drm:gfx_v8_0_ring_test_ring] *ERROR* amdgpu: ring 8 test failed (scratch(0xC040)=0xCAFEDEAD) [ 75.590842] [drm:sdma_v3_0_ring_test_ring] *ERROR* amdgpu: ring 9 test failed (0xCAFEDEAD) [ 75.590845] [drm:amdgpu_resume] *ERROR* resume of IP block <sdma_v3_0> failed -22 [ 75.590847] [drm:amdgpu_device_resume] *ERROR* amdgpu_resume failed (-22). [ 76.615310] [drm:gfx_v8_0_ring_test_ib] *ERROR* amdgpu: IB test timed out. [ 76.615314] [drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on GFX ring (-110). [ 76.615316] [drm:amdgpu_device_resume] *ERROR* ib ring test failed (-110). [ 76.617239] amdgpu 0000:02:00.0: ffff94d7cbbd6000 pin failed [ 78.315987] ata5.00: configured for UDMA/133 [ 78.343296] ata6.00: configured for UDMA/133 [ 78.366810] PM: resume of devices complete after 8878.648 msecs [ 78.387537] PM: Finishing wakeup. [ 78.387538] Restarting tasks ... done. (In reply to Mateusz Lenik from comment #6) > Can incremental builds (that means without clean/distclean, just building > whatever files have changed) have some effect on bisect outcome? It's more likely that the problem doesn't always occur even with an affected commit, and that you therefore accidentally incorrectly declared some commits as good. Try suspending / resuming multiple times, maybe even rebooting a couple of times in between, before declaring a commit as good. |