I got this problem with all distributions using new kernels, that I tried to boot from live CD, as well as the ones I have on the hard drive. Faster booting ones manage to boot before they freeze, while slower ones freeze in the middle of booting process. [klod@klod ~]$ lspci | grep VGA 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] BeaverCreek [Radeon HD 6520G] 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6630M/6650M/6750M/7670M/7690M] When I turn off discrete GPU in my laptop's BIOS, it works well.
Does booting with radeon.runpm=0 on the kernel command line in grub help?
Yes, it does. Thank you :) I can't find any documentation on radeon parameters, but I guess that has something to do with power management. I tried radeon.dpm=0 earlier, but it didn't work. I hope this issue will be resolved soon :)
These patches should fix it: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=9babd35ad72af631547c7ca294bc2e931cc40e58 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7848865914c6a63ead674f0f5604b77df7d3874f
Just a few questions: 1. - What are "PX" and "non-PX" cards? 2. - Aren't these going to disable power management in my GPU? What's the difference between applying those and using "radeon.runpm=0" in grub? 3. - How can I apply those? Thank you ! :)
(In reply to klod from comment #4) > Just a few questions: > > 1. - What are "PX" and "non-PX" cards? PX = PowerXpress. PX systems are laptops with two GPUs, an integrated and a discrete GPU. > > 2. - Aren't these going to disable power management in my GPU? What's the > difference between applying those and using "radeon.runpm=0" in grub? > It does not disable power management. It only disables the special handling for PX systems which gets incorrectly applied to non-PX systems in certain cases. When you apply the patches you shouldn't need to add the radeon.runpm=0 option. If radeon.runpm=0 fixes the issue, so should the patches. > 3. - How can I apply those? > If you are using git: git am <patch file> Otherwise: patch -p1 -i <patch file>
Well, "radeon.runpm=0" allows me to boot and use the system, but with much higher temperature and shorter battery life. I wouldn't call that "fixing the issue", as it's still worse than what I have with 3.12 and "radeon.dpm=1" parameter.
I think those patches are applied in 3.14 and 3.13.8, but i still need to use radeon.runpm=0 in order to boot with my discrete card.
(In reply to klod from comment #7) > I think those patches are applied in 3.14 and 3.13.8, but i still need to > use radeon.runpm=0 in order to boot with my discrete card. You have a PX system so those patches are not relevant for you. It seems runpm is not working properly on your system. Booting with radeon.runpm=0 reverts back to the 3.12 behavior (PX dGPUs are not dynamically powered down).
Well, it seems so. What can I do to find what the problem is?
Did manually powering on/off the dGPU via debugfs ever work on your system? See the "Forcing the power state of the devices" section of this page: http://nouveau.freedesktop.org/wiki/Optimus/ for how to test that.
Created attachment 131601 [details] possible fix Does the attached kernel patch help?
I'm sorry, I'm very busy these days. I will try that when I have time
Created attachment 131741 [details] possible fix updated patch.
Created attachment 131781 [details] possible fix fix a stupid typo.
Hi. I believe that I am in the exact same situation as the OP. The system locks up in about 10 (or sometimes 14-15) seconds of boot time and I don't have to go past typing my luks password to mount rootfs. It works fine if I use radeon.runpm=0 or if I disable CONFIG_VGA_SWITCHEROO (=n) kernel option and recompile kernel. It also works with kernel 3.12.23 (and 3.12.24 if I remember correctly). It doesn't work with 3.15.4 and 3.16-rc4 . But I diff-ed the sources a bit and noticed the runpm option (and a lot of related others) didn't exist in 3.12.23/24 so it makes sense that it works. I tried to apply the patch above in your comment #14 but 4 hunks failed and kernel wouldn't compile. I also tried echo OFF > /sys/kernel/debug/vgaswitcheroo/switch which locks up the running system: I had audio playing (from a video) and it was repeating the last buffer while lockedup; only pressing the power button for 4 seconds worked to turn off laptop (Lenovo Z575 , same lspci output as OP); sysrq or numlock/capslock leds have no effect while in this state. The cat /sys/kernel/debug/vgaswitcheroo/switch before echo-ing OFF to it was like this: sudo cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:01.0 1:DIS: :Pwr:0000:01:00.0 I'd be happy to try anything, I have time. Been trying to track this down for 2 days by changing and matching kernel configs(which was silly because I should've focused on video settings only), knowing that 3.12.23 worked but 3.15.4 didn't. It all eventually lead me to this page, thank goodness :) I'm not exactly sure what more info should I give at this time, but I'd be happy to, just say what you need. What else I can say: turning the discrete card off from bios (switching a setting from Dynamic(both cards on) to UDMA(only integrated card on, discrete card off) works fine. Probably worth noting is that when I had Windows 7 64bit, with laptop drivers, it would work fine: the discrete card would be powered off while not in use while windows was running; this was with lenovo provided video drivers(heavily outdated from like 2011-2012). However, with ati mobility (generic?) drivers from amd site(even the very latest ones of a month or two ago when I last tried), it would freeze the system (likely when the driver was (after a short while) trying to power off the discrete card) and there was one option which would prevent the freeze and that was a setting in registry called "enableulps"(which is set to 1 by default!) which when set to "0" (manually by me, in safe mode, then rebooted) then the discrete card was on 100mhz gpu and 150Mhz memory all the time when idle (it would go higher when in use, but never lower, never power down - with these drivers); but with the original manufacturer drivers(from lenovo) enableulps is 1 and the card is at 0mhz gpu and 27mhz mem (reported by gpu-z) ie. turned off (but those readings however incorrect they were consistently reported as such all the time) So there may be something that the lenovo drivers did extra in order to be able to power down the discrete card, something that the generic ati mobility drivers don't do(without freezing the system, with the out-of-the-box install). I have also tried in linux the fglrx driver, but I can't remember much about how it worked, because I was running the system mostly in UDMA mode (only the integrate graphics card on, from bios) but I remember it didn't lock up (maybe because it didn't try to power down the discrete card?) when I tried it sometimes with both cards (with fglrx driver). Should you need me to retry something with fglrx driver, let me know; it may not be easy but I am willing to try again. I am now using the radeon driver with no intent to switch back to fglrx. The temperatures while I'm just writing this are: $ sensors acpitz-virtual-0 Adapter: Virtual device temp1: +53.0°C (crit = +98.0°C) temp2: +44.0°C (crit = +126.0°C) radeon-pci-0008 Adapter: PCI adapter temp1: +53.0°C (crit = +120.0°C, hyst = +90.0°C) radeon-pci-0100 Adapter: PCI adapter temp1: +61.0°C k10temp-pci-00c3 Adapter: PCI adapter temp1: +52.6°C (high = +70.0°C) (crit = +100.0°C, hyst = +99.0°C) and when I have UDMA set in bios (so only one card) they are around 40-44, max 45 (and of course radeon-pci-0100 is gone) acpitz-virtual-0 temp1 is equal to k10temp-pci-00c3 temp1 = cpu temp acpitz-virtual-0 temp2 is motherboard temp radeon-pci-0008 temp1 is integrated gfx card radeon-pci-0100 temp1 is discrete gfx card some lspci -vvv (the VGA parts) 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] BeaverCreek [Radeon HD 6520G] (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 3970 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 42 Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M] Region 1: I/O ports at 3000 [size=256] Region 2: Memory at f0200000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at <unassigned> [disabled] Capabilities: <access denied> Kernel driver in use: radeon 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6630M/6650M/6750M/7670M/7690M] (prog-if 00 [VGA controller]) Subsystem: Lenovo Radeon HD 6650M Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at f0100000 (64-bit, non-prefetchable) [size=128K] Region 4: I/O ports at 2000 [size=256] [virtual] Expansion ROM at f0120000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: radeon Thank you for your time on this. Much appreciated!
Created attachment 142981 [details] dmesg with radeon.test=3 with radeon.test=3 this took about 80 seconds to show me the boot screen text, the screen was black and no cursor before that, numlock/capslock leds weren't turning on, it seemed locked up but it wasn't - it was doing the tests. [ with radeon.test=1 takes less time (maybe 40-50 sec) of black screen (right after loading the initrd image) ] and when I booted in X, I tried to start parole (media player) to play a video(which it autoplays on startup from the last playlist) but it crashed and the window closed without notice and I guess the errors were: [ 336.437925] [drm:radeon_gem_object_create] *ERROR* Failed to allocate GEM object (1544192, 2, 4096, -12) maybe a side effect of the tests on another note: I enabled a few other flags (like radeon.dpm=1) to reduce the temperatures: (now the cards frequencies look as they do when they are idle, even when playing a video in a window - it uses only the IGD for everything, I've noticed) # cat $(find /sys/kernel/debug/dri/ -iname \*pm\*) uvd vclk: 0 dclk: 0 power level 0 sclk: 10000 mclk: 15000 vddc: 900 vddci: 0 uvd vclk: 0 dclk: 0 power level 0 sclk: 10000 mclk: 15000 vddc: 900 vddci: 0 uvd vclk: 0 dclk: 0 power level 0 sclk: 27587 vddc: 888 uvd vclk: 0 dclk: 0 power level 0 sclk: 27587 vddc: 888 # sensors acpitz-virtual-0 Adapter: Virtual device temp1: +47.0°C (crit = +98.0°C) temp2: +43.0°C (crit = +126.0°C) radeon-pci-0008 Adapter: PCI adapter temp1: +47.0°C (crit = +120.0°C, hyst = +90.0°C) radeon-pci-0100 Adapter: PCI adapter temp1: +52.0°C (crit = +120.0°C, hyst = +90.0°C) k10temp-pci-00c3 Adapter: PCI adapter temp1: +47.2°C (high = +70.0°C) (crit = +100.0°C, hyst = +99.0°C) and these are while playing a (720p)video on the background in a window(resized by me to less than 720p) which is like 1/4 of the screen behind a semi-transparent xfce4-terminal background
Created attachment 142991 [details] dmesg with radeon.test=0 here is another dmesg in which the only thing that I've changed(from the previous dmesg) is radeon.test=0 (I actually used grub and edited the radeon.test to 3 in the previous boot, so in this boot I didn't have to do anything to get radeon.test=0 )
Created attachment 143001 [details] kernel .config used in previous dmesgs 3.16-rc4
Created attachment 143011 [details] ati catalyst screenshot of the Information tab when both graphic cards were on found a screenshot of ATI Catalyst Control Center Information tab when I used fglrx driver which shows information about both graphic cards - might be useful.
Created attachment 143031 [details] sloppy patch try I modified your previous patch slightly(because some hunks failed and compilation error), but still doesn't handle the case when runpm=1 only when runpm=-1 but regardless I wanted to test it and just as expected it doesn't work(system locks up still) and I'm realizing it's because vgaswitcheroo is doing the turning OFF of the DIS card and since this is why it freezes anyway, I guess the problem is in the vgaswitcheroo not being able to turn off DIS without crashing (just like amd mobility does on Windows). Too tired, can't keep my eyes open, but I'll try to find out more about it in the morning. :)
Created attachment 157401 [details] updated sloppy patch to kernel 3.18-rc4 Doesn't freeze(works) with(kernel params): radeon.dpm=1 radeon.runpm=1 radeon.dpm=0 radeon.runpm=1 (but this doesn't turn off the DIScrete card) sudo cat /sys/kernel/debug/vgaswitcheroo/switch 0:IGD:+:Pwr:0000:00:01.0 1:DIS: :DynPwr:0000:01:00.0 # echo 'OFF' > /sys/kernel/debug/vgaswitcheroo/switch has no effect Freezes system with: radeon.dpm=0 radeon.runpm=-1 (what I tested, freezes system in about 10 seconds after boot) Without this patch: Tested to freeze, only when: # echo 'OFF' > /sys/kernel/debug/vgaswitcheroo/switch with: radeon.dpm=1 radeon.runpm=0 radeon.dpm=0 radeon.runpm=0
In order to avoid any freezes, with the caveat that the DIScrete card won't ever be turned off (even by # echo 'OFF' > /sys/kernel/debug/vgaswitcheroo/switch which would have no effect) I am using: radeon.dpm=1 radeon.runpm=1 with the above patch[originally made by Alex Deucher above in Comment #14 ] (which I also put here https://github.com/emanueLczirai/coostomhuston/blob/a3c118ac44b616ebcc049419cc08c4d13ebb44bd/system/lenovo%20z575/OS/manjaro/filesystem%20now/home/emacs/build/kernel/linuxgit/2100_DIScrete_gfx_card_systemfreeze.patch ) As per our irc conversation, it would seem that maybe the lenovo board requires some quirk in order to avoid the system freeze and thus I accept the current workaround. Besides, I always keep my DIS card off from BIOS anyway (the BIOS Graphics: UDMA setting(instead of Dynamic), does this) I am thus then giving up on this for now, but I'm always ready to test new ideas, if any should arise. Thank you.
This bug is also affects me, but in other way. Im using DRI_PRIME to switch between intergrated and discrete gpu. Running system on integrated gpu and it works perfectly. When im launching games\video(not all, but very much of them) on discrete card - system freezes after some very a bit of time. Netconsole - empty Kdump - no record. I don't have an idea how to debug it. radeon.runpm=0 is present. Looking forward to some discussion