Created attachment 253581 [details] dmesg logfile I have Gigabyte RX460 2GB gpu card, Debian testing Xfce and adg5f drm-next-4.11-wip kernel downloaded and compiled as today. Computer works ok but the dmesg command shows the following boot errors that might interest amdgou driver developers. Mounting my home partiton fails amdgpu IB tests: [ 7.001953] [drm] ib test on ring 12 succeeded [ 7.055163] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null) [ 8.011874] [drm:0xffffffffa01360ce] *ERROR* amdgpu: IB test timed out. [ 8.011910] [drm:0xffffffffa00e1b4b] *ERROR* amdgpu: failed testing IB on ring 13 (-110). [ 8.011943] [drm:0xffffffffa00be574] *ERROR* ib ring test failed (-110). Some powerplay errors: [ 4.888584] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 4.891452] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table! [ 4.894807] amdgpu: [powerplay] failed to send message 309 ret is 254 [ 4.894824] amdgpu: [powerplay] failed to send pre message 14e ret is 254 Bios recognition errors: [ 4.729628] [drm] BIOS signature incorrect 20 7 [ 4.729635] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
Created attachment 253671 [details] dmesg log
Created attachment 253681 [details] xorg log file
Created attachment 253691 [details] journal log file
Created attachment 253701 [details] /var/log/messages
I began having problems with my AMD GPU when Fedora 25 switched from their 4.8.16-300.fc25 kernel to a 4.9.3 kernel, as described here: https://bugzilla.redhat.com/show_bug.cgi?id=1414025 The initial symptom was that there was no kernel frame buffer, so the system dropped back to using an accelerated video interface. With the latest Fedora kernel (4.9.6-200.fc25), the system eventually runs normally, but it takes upwards of 6 minutes for the system to boot. As shown in the files I attached, I too get many messages of the form: [ 346.235933] failed to send pre message 148 ret is 0 [ 346.455587] failed to send message 148 ret is 0 I'd like the importance of this bug raised to medium or high, as it is a clear regression from the 4.8.16 kernel to the 4.9.3 kernel.
Typo in the above comment: s/an accelerated video/an un-accelerated video/
Does using the new ucode here help? https://people.freedesktop.org/~agd5f/radeon_ucode/polaris/
Alex, thanks for the new firmware. Still Bios recognition errors at boot, but otherwise ok. [ 3.461112] [drm] BIOS signature incorrect 20 7 [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff Steven, you are using Tonga gpu and radeon kernel driver that fails at boot and the system using VESA driver. The amdgpu driver has support for Tonga, but you need to make a custom 4.11-wip kernel. Stock distribution kernels do not have stable amdgpu code. Creating a custom kernel in Debian: Use the command: git clone -b drm-next-4.11-wip git://people.freedesktop.org/~agd5f/linux The kernel configuration file of Debian Official kernel are available in /boot, named after the kernel release. Copy the .config file to the linux directory. Connect all your devices and run the command: make localmodconfig. You can use the command make defconfig too for creating initial .config file. Use the command: make xconfig and check that you have enabled: Reroute Broken IRQ, Virtualization KVM and 300Hz CPU timer, I also disabled Swap, Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control CPU and devices. In the drivers->graphics->amdgpu enable cik support for a gcn 1.1 gpu and si support for a gcn 1.0 gpu. Create debian kernel package: export CONCURRENCY_LEVEL=4 fakeroot make-kpkg --initrd kernel_image Install the kernel package with Gdebi. To make a custom kernel to boot, add a line to /etc/initramfs-tools/modules: unix And run: sudo update-initramfs Reboot.
After updating the firmware I still have powerplay erros: [ 3.574222] amdgpu: [powerplay] [AVFS] Something is broken. See log! [ 3.577052] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
Thanks for the information on building a new kernel. I'll give that a try. I'm running Fedora 25, but I think I can follow your Debian instructions.
(In reply to fin4478 from comment #8) > Alex, thanks for the new firmware. Still Bios recognition errors at boot, > but otherwise ok. > > [ 3.461112] [drm] BIOS signature incorrect 20 7 > [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: > expecting 0xaa55, got 0xffff > This is harmless. The driver tries several methods to fetch the vbios image. The driver would not load at all if it failed to fetch the vbios image.
Created attachment 253891 [details] New dmesg file
I successfully built a custom kernel. It appears to be working well. Thanks for the help! I included a new dmesg.log file because I still see messages like: [ 9.719278] amdgpu: [powerplay] failed to send pre message 15b ret is 0 [ 10.158327] amdgpu: [powerplay] failed to send message 15b ret is 0 Are these harmless or do they indicate a problem?
One other error message I just noticed: [ 5.538117] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
(In reply to fin4478 from comment #8) > Alex, thanks for the new firmware. Still Bios recognition errors at boot, > but otherwise ok. > > [ 3.461112] [drm] BIOS signature incorrect 20 7 > [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: > expecting 0xaa55, got 0xffff > > Steven, you are using Tonga gpu and radeon kernel driver that fails at boot > and the system using VESA driver. The amdgpu driver has support for Tonga, > but you need to make a custom 4.11-wip kernel. Stock distribution kernels do > not have stable amdgpu code. Creating a custom kernel in Debian: > Use the command: > git clone -b drm-next-4.11-wip git://people.freedesktop.org/~agd5f/linux > > The kernel configuration file of Debian Official kernel are available in > /boot, named after the kernel release. Copy the .config file to the linux > directory. Connect all your devices and run the command: make > localmodconfig. You can use the command make defconfig too for creating > initial .config file. > > Use the command: make xconfig and check that you have enabled: Reroute > Broken IRQ, Virtualization KVM and 300Hz CPU timer, I also disabled Swap, > Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control > CPU and devices. In the drivers->graphics->amdgpu enable cik support for a > gcn 1.1 gpu and si support for a gcn 1.0 gpu. > > Create debian kernel package: > export CONCURRENCY_LEVEL=4 > fakeroot make-kpkg --initrd kernel_image > > Install the kernel package with Gdebi. To make a custom kernel to boot, add > a line to /etc/initramfs-tools/modules: > unix > And run: sudo update-initramfs > Reboot. I tried the above but notice that it fails when trying to build headers. If I am trying to build a 4.10.5 kernel, could I copy files from what was downloaded when I ran the git command?
(In reply to Milo from comment #15) > > I tried the above but notice that it fails when trying to build headers. You do not need kernel headers unless you are using some dkms drivers. Currently there is a temperature bug in wip kernel. 4.11-rc3 kernel from kernel.org works. Some kernel version headers build failed because missing BUG-REPORTS file. Create the file into the linux directory.
(In reply to fin4478 from comment #16) > (In reply to Milo from comment #15) > > > > > I tried the above but notice that it fails when trying to build headers. > > You do not need kernel headers unless you are using some dkms drivers. > Currently there is a temperature bug in wip kernel. 4.11-rc3 kernel from > kernel.org works. > > Some kernel version headers build failed because missing BUG-REPORTS file. > Create the file into the linux directory. Thanks. Yes I do have some dkms packages installed that need headers. As you suggested, creating REPORTING-BUGS solved the headers build fail. So it built as version 4.10.0-rc5-gec3fa8e6ca19. When I booted from it, after 20 minutes the screen was still blank so I rebooted. I had built 4.9.10 before from kernel.org which after 10 minutes eventually booted into X. I used this .config along with you comments in comment #8 to build 4.10.0-rc5-gec3fa8e6ca19.
(In reply to Milo from comment #17) > So it built as version 4.10.0-rc5-gec3fa8e6ca19. When I booted from it, > after 20 minutes the screen was still blank so I rebooted. All your software must in sync, so use Debian testing Xfce, Oibaf ppa yakkety version and latest kernels. I will post my kernel 4.11-rc config.
Created attachment 255553 [details] Kernel 4.11-rc3 config file for RX400 series, add drivers for your hardware
This bug report is partly a duplicate of that one: https://bugs.freedesktop.org/show_bug.cgi?id=100443 I'm getting the same AVS/Powerplay messages, updating the firmware didn't help. The topic headline is very unspecific and the replies appear very confusing to me. Has this issue been solved or not? Does a custom Kernel change the messages? - I'm using the newest Ubuntu mainline Kernel. Is something wrong with the configuration used to build this Kernel by the Kernel team? How is this issue related to Tonga GPUs? Polaris is the first dGPU with AVFS. How many issues do we count here and which replies belong to which issue? Perhaps someone could make a summary or something - that would be very pleasant.
As I previously reported, building a custom kernel as suggested in comment 8 allows me to use my video card. I do continue to get the powerplay error messages, but aside from slowing down boot a little, they don't seem to do any harm.
Thanks for clarification. That means building the Kernel was not a possible fix for the messages but for getting the driver to start with Tonga. So the AVFS- issue and missing/wrong value in the voltage dependency table is still existent. As well as your remaining Powerplay messages.
(In reply to Christian Lanig from comment #22) > Thanks for clarification. That means building the Kernel was not a possible > fix for the messages but for getting the driver to start with Tonga. > > So the AVFS- issue and missing/wrong value in the voltage dependency table > is still existent. As well as your remaining Powerplay messages. i didn't have the avfs issue but the other two on my r9 m390 and booting taking as long as 10 minutes when i moved from kernel 4.8 to 4.9/4.10 having built 4.11-rc4, my boot times are back down to around 30 seconds but the messages still persist in dmesg. and it seems i have added some ib related messages. so i was just confirming that from my perspective there is an issue that was solved though not the messages that led me here.
With kernel 4.11.3-200.fc25.x86_64, I no longer need a custom kernel to use my video card. I still see messages like: [ 13.599542] [drm] ib test on ring 13 succeeded [ 13.606627] [drm:amdgpu_device_init [amdgpu]] *ERROR* ib ring test failed (-110). [ 14.500572] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 14.983369] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 15.120567] amdgpu: [powerplay] failed to send pre message 155 ret is 0 [ 15.609965] amdgpu: [powerplay] failed to send message 155 ret is 0 [ 16.014478] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 16.165919] amdgpu: [powerplay] failed to send pre message 15b ret is 0 [ 16.570511] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 16.721456] amdgpu: [powerplay] failed to send message 15b ret is 0 [ 17.498715] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 17.951062] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 18.843123] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 19.295427] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 20.187154] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 20.639852] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 21.531857] amdgpu: [powerplay] failed to send pre message 260 ret is 0 [ 21.984781] amdgpu: [powerplay] failed to send message 260 ret is 0 [ 21.998448] [drm] Initialized amdgpu 3.10.0 20150101 for 0000:05:00.0 on minor 0 but at least the card is otherwise functional.
trying to install a 4.13 kernel and boot times are back to more than 5 minutes (the failed to send message appears for more than 5 minutes when i run journalctl -e) journalctl log - https://pastebin.com/7dScPxNn
Milo, try to set amdgpu.audio=0 to the kernel command line.
Created attachment 260365 [details] attachment-474-0.html i have added it /etc/default/grub as follows: GRUB_CMDLINE_LINUX_DEFAULT="nointremap quiet amdgpu.audio=0" and then ran update-grub however it still takes more than 5 minutes to boot On Mon, Oct 23, 2017 at 8:11 PM <bugzilla-daemon@bugzilla.kernel.org> wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=193651 > > --- Comment #26 from fin4478@hotmail.com --- > Milo, try to set amdgpu.audio=0 to the kernel command line. > > -- > You are receiving this mail because: > You are on the CC list for the bug.
(In reply to Alex Deucher from comment #11) > (In reply to fin4478 from comment #8) > > Alex, thanks for the new firmware. Still Bios recognition errors at boot, > > but otherwise ok. > > > > [ 3.461112] [drm] BIOS signature incorrect 20 7 > > [ 3.461117] amdgpu 0000:01:00.0: Invalid PCI ROM header signature: > > expecting 0xaa55, got 0xffff > > > > This is harmless. The driver tries several methods to fetch the vbios image. > The driver would not load at all if it failed to fetch the vbios image. All messages that use the dev_err function slows down booting and make it look ugly. Amd should manage this fix to the pci driver: Change Invalid PCI ROM header signature message to use the dev_info function in drivers/pci/rom.c