Bug 201763
Summary: | amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! | ||
---|---|---|---|
Product: | Drivers | Reporter: | Rogério Brito (rbrito) |
Component: | Video(DRI - non Intel) | Assignee: | drivers_video-dri |
Status: | NEW --- | ||
Severity: | normal | CC: | fin4478, vyanitskiy |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.18.10 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
dmesg log with kernel 4.18.10
AMD wip kernel config with 1000Hz timer for Ryzen 5 1600 desktop PC dmesg log of kernel 4.19 with error messages amdgpu corresponding Xorg log to dmesg with error messages from amdgpu |
Description
Rogério Brito
2018-11-22 06:41:47 UTC
Created attachment 279599 [details]
dmesg log with kernel 4.18.10
From the dmesg output, it looks like the AMD GPU is powered off most of the time. Do the freezes happen when you explicitly use it for something, e.g. for a game via DRI_PRIME=1? Before reporting, test with latest drivers. That means with these: https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.21-wip https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers Use the Oipaf ppa bionic version with Debian testing. Google how to use ppas in Debian. Disable vsync in Xfce Compositor settings. Disable Thunar thumbnails too. You can use my kernel config as base, add more drivers for your hardware with make xconfig. Created attachment 279641 [details]
AMD wip kernel config with 1000Hz timer for Ryzen 5 1600 desktop PC
Dear Michel, First of all, sorry for the late reply. I had really a really bad start of the year (death in family, complications caused by that, health problems, fire at home and also recovering from that hard hit etc.) So, I'm really sorry for the late reply. (In reply to Michel Dänzer from comment #2) > From the dmesg output, it looks like the AMD GPU is powered off most of the > time. Do the freezes happen when you explicitly use it for something, e.g. > for a game via DRI_PRIME=1? I never play games (really, the only game that I played in the last few years was 2048 on a browser), but I guess that other applications may use the discrete AMD GPU that this notebook has. I just set the DRI_PRIME variable now in my .bash_profile file and I will observe if I still get the lock ups. OTOH, while opening terminal sessions (I live by them), I just observed the following in my dmesg logs: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - [ 4335.591693] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 4335.594671] amdgpu: [powerplay] can't get the mac of 5 [ 4335.595690] amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 4341.181479] amdgpu: [powerplay] VI should always have 2 performance levels [ 4341.231068] amdgpu 0000:04:00.0: GPU pci config reset [ 4433.700699] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 4433.705976] amdgpu: [powerplay] can't get the mac of 5 [ 4433.707025] amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 4439.230380] amdgpu: [powerplay] VI should always have 2 performance levels [ 4439.276205] amdgpu 0000:04:00.0: GPU pci config reset [ 4843.838487] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 4843.842649] amdgpu: [powerplay] can't get the mac of 5 [ 4843.844046] amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 4849.072890] amdgpu: [powerplay] VI should always have 2 performance levels [ 4849.121352] amdgpu 0000:04:00.0: GPU pci config reset [ 4954.354975] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 4954.358935] amdgpu: [powerplay] can't get the mac of 5 [ 4954.360287] amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 4960.173664] amdgpu: [powerplay] VI should always have 2 performance levels [ 4960.219082] amdgpu 0000:04:00.0: GPU pci config reset [ 4982.871619] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 4982.874760] amdgpu: [powerplay] can't get the mac of 5 [ 4982.875794] amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 4988.077968] amdgpu: [powerplay] VI should always have 2 performance levels [ 4988.126289] amdgpu 0000:04:00.0: GPU pci config reset [ 5023.317917] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). [ 5023.321614] amdgpu: [powerplay] can't get the mac of 5 [ 5023.322918] amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 5029.036045] amdgpu: [powerplay] VI should always have 2 performance levels [ 5029.081469] amdgpu 0000:04:00.0: GPU pci config reset - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - I'm using, as you may expect, Debian's testing distribution (I can give you the precise details), upgraded almost daily (when I am not up-to-date, I am 1 or 2 days late due to weekends, when I have to take care of my son). I observed a few details more with respect to the bug: 1 - The problem of freezes has always occurred when I am using the GUI and clicking or typing something. It is my (wild) guess that the problem occurs when many interrupts happen, but I have no way to prove it. I have not yet seen the freezes when I leave the computer running scripts to perform some long job (say, reencoding some lecture videos that I get from youtube to make them smaller) with ffmpeg, even if it takes many days on uninterrupted computation (and heat being generated). OTOH, if I am interacting with it with a mouse intensely (say, with a program like scantailor or some other programs), switching windows or editing some texts in Emacs, then I get freezes in just a few hours (say, 3 or 4 hours). In fact, I hope that it doesn't occur during me typing this report (crossing fingers and copying the contents to Emacs to save it and paste the contents in case it freezes). 2 - The problem isn't detected by Dell's builtin UEFI application of system diagnostic (as I said, it seems to happen when I interact with the computer and the screen is being constantly updated). 3 - I discovered that whatever bug this is, it actually doesn't *completely* freeze the computer, since at least the sound card keeps playing sound in a loop (not that I intend to, but probably the samples that are already in the sound card memory). I recorded a few (short) videos of the problem that I see and I uploaded them to YouTube: * https://www.youtube.com/watch?v=6o7Fl8kqtwg * https://www.youtube.com/watch?v=6o7Fl8kqtwg * https://www.youtube.com/watch?v=9zPluvySdIM If you have any idea, please let me know. Even if the freezes have nothing to do with the video card, I would like to have the messages (which, as you mention, may be indicative of something) of the GPU being fixed (in the hopes that it fixes things for other users that may not have the initiative of filing something to able developers). As a last resort, I may end up selling this computer (even though the money will not be sufficient to buy one with similar specs). :-( Thanks, Rogério Brito. Oh, I forgot to say that the kernel that I am using is currently identified as: Linux zatz 4.19.0-2-amd64 #1 SMP Debian 4.19.16-1 (2019-01-17) x86_64 GNU/Linux I can report the versions of the graphics stack once I know what is relevant. I can also try to stress test anything here. Thanks once again, Rogério Brito. (In reply to Rogério Brito from comment #5) > First of all, sorry for the late reply. I had really a really bad start of > the year (death in family, complications caused by that, health problems, > fire at home and also recovering from that hard hit etc.) Nothing to apologize for, I hope things are (getting) better for you now! > (In reply to Michel Dänzer from comment #2) > > From the dmesg output, it looks like the AMD GPU is powered off most of the > > time. Do the freezes happen when you explicitly use it for something, e.g. > > for a game via DRI_PRIME=1? > > I never play games (really, the only game that I played in the last few > years was 2048 on a browser), but I guess that other applications may use > the discrete AMD GPU that this notebook has. The AMD GPU should only be used if you explicitly choose to, by setting DRI_PRIME=1 or maybe using a corresponding setting of your desktop environment. Maybe the AMD GPU is only getting powered up accidentally, and the freezes happen due to something going wrong while powering it up/down. Please attach the corresponding Xorg log file, preferably captured after dmesg has at least two instances of [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). You could also try modprobe.blacklist=amdgpu on the kernel command line, to see if the freezes happen even if the amdgpu driver never initializes the AMD GPU. Dear Michel, (In reply to Michel Dänzer from comment #7) > (In reply to Rogério Brito from comment #5) > > First of all, sorry for the late reply. I had really a really bad start of > > the year (death in family, complications caused by that, health problems, > > fire at home and also recovering from that hard hit etc.) > > Nothing to apologize for, I hope things are (getting) better for you now! Things are slowly getting better now (still working on fixing things related to the fire at home). > > (In reply to Michel Dänzer from comment #2) > > > From the dmesg output, it looks like the AMD GPU is powered off most of > the > > > time. Do the freezes happen when you explicitly use it for something, > e.g. > > > for a game via DRI_PRIME=1? > > > > I never play games (really, the only game that I played in the last few > > years was 2048 on a browser), but I guess that other applications may use > > the discrete AMD GPU that this notebook has. > > The AMD GPU should only be used if you explicitly choose to, by setting > DRI_PRIME=1 or maybe using a corresponding setting of your desktop > environment. Maybe the AMD GPU is only getting powered up accidentally, and > the freezes happen due to something going wrong while powering it up/down. Nice to know that. I may have mentioned before, but I put DRI_PRIME=1 on my bash_profile file. I notice that when I open/close Firefox, then I get one instance of: ------------ [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). amdgpu: [powerplay] can't get the mac of 5 amdgpu: [powerplay] VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! ------------ > Please attach the corresponding Xorg log file, preferably captured after > dmesg has at least two instances of > > [drm] PCIE GART of 256M enabled (table at 0x000000F400000000). OK, I am attaching both a dmesg log and the corresponding Xorg log of this moment that I am writing (just performed a cold boot, to rule things out), but the Xorg log doesn't contain anything after the first 50 seconds or so... I can turn on some debug options, if you want me to. > You could also try modprobe.blacklist=amdgpu on the kernel command line, to > see if the freezes happen even if the amdgpu driver never initializes the > AMD GPU. OK, I will do that after I finish this message. Thanks, Rogério Brito. Created attachment 282069 [details]
dmesg log of kernel 4.19 with error messages amdgpu
Created attachment 282071 [details]
corresponding Xorg log to dmesg with error messages from amdgpu
Dear Michel and other people, Since the last time that I reported this bug, the lock ups have not happened anymore. OTOH, the messages on the dmesg log persist. I can include newer logs (but I don't think that many things have changed since then). Just as a reminder, here is what I'm getting: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (...) [468341.815312] amdgpu: can't get the mac of 5 [468341.816323] amdgpu: VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [468347.792326] amdgpu: VI should always have 2 performance levels (...) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Of course, the cause may be another thing ealier in the logs. Since the first time that I reported the issue, I upgraded the BIOS/Firmware from Dell's site, but I'm guessing that it is very conservative and only includes an "updated" (not so much) microcode for the CPU vulnerabilities of all these years. I'm running an up-to-date Debian testing distribution, but I can perform any (non-destructive :-)) tests that you want me to. Thanks, Rogério Brito. Hello all, I have a DELL Inspiron 5547 with Radeon R7 M256: 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445 / 530/535 / 620/625 Mobile] I never experienced any lockups with it, most likely because I was running quite old kernel most of the time. About a year ago, I started to use Arch Linux (thus more or less recent kernels), and also started to see these messages too: [ 3916.822707] amdgpu: can't get the mac of 5 [ 3916.824691] amdgpu: VBIOS did not find boot engine clock value in dependency table. Using Memory DPM level 0! [ 3923.082543] amdgpu: VI should always have 2 performance levels It's not like they indicate any problems, the GPU actually works: with hashcat and proprietary OpenCL run-time on top of open source amdgpu driver I get nearly the same performance as under Windows; and even OpenGL/Vulkan rendering seems to work (although performance is significantly worse compared to Intel Graphics). Even though I use Intel Graphics most of the time, I was always interested to investigate the cause of those warnings. I had a quick look at the kernel's code, and from what I can see they are all related to the power management (powerplay). I patched and compiled my own kernel to get a bit more information, and here is what I managed to understand: > [ 3916.822707] amdgpu: can't get the mac of 5 According to 'drivers/gpu/drm/amd/powerplay/inc/smumgr.h', the 'mac 5' corresponds to SMU_MAX_LEVELS_VDDGFX. This value is neither handled in iceland_get_mac_definition(), nor it's defined in 'drivers/gpu/drm/amd/powerplay/inc/smu71.h'. For other GPU families this constant is used in '*_Discrete_DpmTable', while in 'SMU71_Discrete_DpmTable' I could not find anything related to VDDGFX. Therefore I guess this GPU family (Iceland, SMU71) does not support this kind of power control. > [57695.583784] amdgpu: VBIOS did not find boot engine clock value in > dependency table. Using Memory DPM level 0! This is something I would love to investigate further, but unfortunately have no time. The warning itself comes from iceland_populate_smc_boot_level() defined in 'drivers/gpu/drm/amd/powerplay/smumgr/iceland_smumgr.c'. This function attempts to get initial clock levels for Graphics DPM and Memory DPM from VBIOS. Since we see only one warning, it successfully gets the clock value for Graphics DPM, but not for Memory DPM. The function attempts to find value 'data->vbios_boot_state.mclk_bootup_value' in table 'data->dpm_table.mclk_table', which in its turn is populated by iceland_populate_all_memory_levels(). I need to add some more debug statements to see the contents of this table and the value that is attampted to be found in it. > [ 3923.082543] amdgpu: VI should always have 2 performance levels I patched the kernel to provide more details in this message, so: > [ 5312.502812] amdgpu: VI should always have 2 performance levels, however 1 > was detected This one comes from smu7_apply_state_adjust_rules() defined in 'drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c'. As far as I can see, the code is able to handle values !=2, and in some pleces I see checks like ==1, I most likely this warning can be safely ignored. As I conclusion, I would say that none of those warnings is critical. P.S. I am not a kernel developer, and neither I am familiar with amdgpu code base. Just had some spare time :) Best regards, Vadim. Here we go: [ 582.721066] amdgpu: iceland_populate_all_memory_levels(): mclk_table has 3 entries [ 582.721081] amdgpu: iceland_populate_all_memory_levels(): dpm_levels[0] is 30000 [ 582.721095] amdgpu: iceland_populate_all_memory_levels(): dpm_levels[1] is 60000 [ 582.721110] amdgpu: iceland_populate_all_memory_levels(): dpm_levels[2] is 90000 [ 582.722669] amdgpu: VBIOS did not find boot engine clock value (29900) in dependency table. Using Memory DPM level 0! As can be seen, the driver falls-back to level 0, which is very close to the requested value (29900 vs 30000). Looks like a bug in VBIOS, because AFAIU, value 29900 comes from there (see smu7_dpm_patch_boot_state() in 'drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c'). In any case, this does not look critical to me too. Best regards, Vadim. |