Bug 193981

Summary: AMDGPU: R9 380 Fan rotates all the time (loud!)
Product: Drivers Reporter: Fabian (fabiscafe)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, dryady, felix.schwarz, fin4478
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 4.9.7 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output

Description Fabian 2017-02-04 16:01:06 UTC
Created attachment 254121 [details]
dmesg output

The fan of my "Sapphire R9 380 ITX" is really loud on Linux. 

cat /sys/class/hwmon/hwmon2/pwm1
reports that it runs all the time on
"94"

while 
sensors | grep amdgpu -A 3
tells me that its at 

amdgpu-pci-0100
Adapter: PCI adapter
temp1:        +31.0°C  (crit =  +0.0°C, hyst =  +0.0°C)

When I now set /sys/class/hwmon/hwmon2/pwm1 to 0, the tmp still stays at 31-34°C while "idle". So thats not a thing.


Before linux loads (UEFI-routines, grub) and on Windows the fan stays offline unless there is enough load that it is needed. On linux the fan starts to rotate on boot and keeps rotating all the time (on load it rotates faster).
at 41°C pwm1 goes up to 104


find /sys/class/hwmon/hwmon2/ -maxdepth 1 -type f -print -execdir cat '{}' \; -exec echo \;
/sys/class/hwmon/hwmon2/pwm1_min
0

/sys/class/hwmon/hwmon2/pwm1_max
255

/sys/class/hwmon/hwmon2/pwm1
94

/sys/class/hwmon/hwmon2/pwm1_enable
1

/sys/class/hwmon/hwmon2/temp1_crit
0

/sys/class/hwmon/hwmon2/uevent

/sys/class/hwmon/hwmon2/temp1_crit_hyst
0

/sys/class/hwmon/hwmon2/temp1_input
32000

/sys/class/hwmon/hwmon2/name
amdgpu
Comment 1 fin4478 2017-02-09 11:32:59 UTC
When you create bugs against amdgpu driver, use the latest kernel and mesa code:
https://cgit.freedesktop.org/~agd5f/linux/?h=drm-next-4.11-wip
https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers

Problems might be fixed in the latest code.

Yaketty version of Oipaf ppa works with Debian testing Xfce that is the best distribution. Manjaro is much more difficult to use. 


Are you sure it is the gpu fan? Disable cpufreq, cpu temp and fan handling the kernel config and use the Bios to control these.

My computer with X4 845 and Gigabyte RX460 2GB with 2 fans is quiet.
Comment 2 Fabian 2017-02-09 12:05:45 UTC
Is there some kind of live-image? I just can not switch to another distro, since I need to work with this computer.

> Are you sure it is the gpu fan? 
Yep. I can just write "0" to /sys/class/hwmon/hwmon2/pwm1 and its quiet.

This btw wasnt the case since the beginning.There was a Kernel and/or linux-firmware release that brought this bug in. I just cant figure out wich Version it was.
Comment 3 Fabian 2017-02-09 12:19:46 UTC
Linux 4.4.47 is quiet. So it has to be somewhere in between 4.5 - 4.8. Wasnt there a fan/hwmon related work in 4.6? or .7?


4.10RC7 has no fix for this. Mesa/Xorg seems also not related here.
Comment 4 Fabian 2017-02-09 12:26:20 UTC
Also maybe not related, but according to sapphire, on OS start, the fan will rotate for a short time (short like 0.5s) at a higher speed, to remove some dust.

Maybe the linux-hwmon taked this dust-removal speed as default value and keeps it for the whole runtime?

In Windows it's that way: a short high rotation speed and then slow down to almost?) 0.
Comment 5 fin4478 2017-02-09 13:43:36 UTC
(In reply to Fabian from comment #2)
> Is there some kind of live-image? I just can not switch to another distro,
> since I need to work with this computer.

No live images and for example today adg5f 4.11-wip kernel received 20 or so patches. To test with Debian testing Xfce create a 15GB test partition with the Gparted livecd.

Burn Debian netinstaller to a cdrw or usb memory:
http://cdimage.debian.org/cdimage/stretch_di_alpha7/amd64/iso-cd/
Choose Debian Desktop and Xfce in the installer. Use the Whisker menu plugin along with the Weather plugin and remove the original applications menu. You can configure Xfce freely, so make it suitable for your liking.

Install Amd firmware:
https://packages.debian.org/stretch/firmware-amd-graphics
With Synaptic install Gdebi. With Gdebi you can install downloaded .deb packages easily. Give the root password when asked.

Latest mesa:
Oibaf PPA:
https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers

PPAs with Debian, use case B:
http://www.webupd8.org/2014/10/how-to-add-launchpad-ppas-in-debian-via.html


Latest Amd driver
Use the command: git clone -b drm-next-4.11-wip git://people.freedesktop.org/~agd5f/linux

The kernel configuration file of Debian Official kernel are available in /boot, named after the kernel release. Copy the .config file to the linux directory. Connect all your devices and run the command: make localmodconfig. You can use the command make defconfig too for creating initial .config file. 

Use the command: make xconfig and check that you have enabled: Reroute Broken IRQ, Virtualization KVM and 300Hz CPU timer, I also disabled Swap, Kernel Debug, CPU Freq scaling , Cpu handling in Acpi, Used Bios to control CPU and devices. In the drivers->graphics->amdgpu enable cik support for a gcn 1.1 gpu and si support for a gcn 1.0 gpu.

With Synaptic, install kernel-package and fakeroot packages. 
Create debian kernel package:
export CONCURRENCY_LEVEL=4
fakeroot make-kpkg --initrd kernel_image

Install the kernel package with Gdebi. To make a custom kernel to boot, add a line to /etc/initramfs-tools/modules:
unix
And run: sudo update-initramfs
Reboot.
Comment 6 Alex Deucher 2017-02-09 13:54:19 UTC
(In reply to Fabian from comment #3)
> Linux 4.4.47 is quiet. So it has to be somewhere in between 4.5 - 4.8. Wasnt
> there a fan/hwmon related work in 4.6? or .7?

kernel 4.4 did not have power management support enabled for your card so the clocks and voltages were always left at the lowest levels.
Comment 7 fin4478 2017-02-09 14:14:27 UTC
(In reply to fin4478 from comment #5)

> 
> Burn Debian netinstaller to a cdrw or usb memory:
> http://cdimage.debian.org/cdimage/stretch_di_alpha7/amd64/iso-cd/

Link above is old, should be:
https://www.debian.org/devel/debian-installer/
Comment 8 Michel Dänzer 2017-02-10 02:40:24 UTC
(In reply to fin4478 from comment #5)

fin4478@hotmail.com, this kind of comment is essentially noise and makes the lives of those of us working with bug reports and fixing problems harder. Please stop, thanks.
Comment 9 winches 2017-03-21 19:32:34 UTC
hello, i have the same bug. On my sapphire r9 380 itx, the fan seem to work at 100% from the loading of the kernel (no problem in bios and in grub stage).
T
he bug seem to occur since the 4.9 series. 
T
he archlinux usb key i use to downgrade my kernel have a 4.9.11 and i have to remove my r9 380 in order to boot or my system hang.

The bug occur in the last kernel from the archlinux (4.10.4) too.

Maybe it's specific to this model. I have a sapphire r7 370 in this computer with no fan problem and last week a sapphire 7870 ghz edition with no problem.
Comment 10 Alex Deucher 2017-03-21 20:21:38 UTC
(In reply to winches from comment #9)
> hello, i have the same bug. On my sapphire r9 380 itx, the fan seem to work
> at 100% from the loading of the kernel (no problem in bios and in grub
> stage).
> T
> he bug seem to occur since the 4.9 series. 

Can you bisect?
Comment 11 winches 2017-03-21 22:36:30 UTC
I can try. I've seen how to do on the wiki.
I will test the kernel 4.9.1 and the 4.11rc1 tomorrow.
It may be longto do all the bisect, it took nearly 40-50 min to compile.
Comment 12 winches 2017-03-24 18:43:43 UTC
I start to bisect.
My problem is, between the linux-git-4.8.r15051.g133d970e0dad and the linux-git-4.9rc5.r369.g697ed8d03909 my system doesn't found my harddrive with his UUID and refuse to boot. It's a bug found in the RC release.
Since the fan bug occur after the loading of the drive, i don't know if the fan bug is present or not...

I continue to bisect but do you know to pass the UUID not found. Something to do in the compile config maybe.
Comment 13 Felix Schwarz 2017-03-24 22:46:43 UTC
> I continue to bisect but do you know to pass the UUID not found.

You could try to find the exact commit which fixed the harddrive detection (at worst by bisecting it). If the fix is sufficiently simple you can apply at for every bisection step. This makes bisection a bit harder but if the fix is easy this should be doable even for non-developers.
Comment 14 Michel Dänzer 2017-03-25 05:30:01 UTC
First of all, you can just run "git bisect skip" for any commits where you can't test for the problem you're bisecting, for any reason. With luck, this will allow git bisect to identify the regressing commit anyway. Even if not, it'll provide a range of candidate commits, which can be further narrowed down using Felix's technique.
Comment 15 winches 2017-03-25 18:43:05 UTC
Thank you. "git bisect skip" could be useful next time. To avoid the problem i use "git bisect log" to-a-file and "git bisect replay" from-a-file.

So the result :
646cccb55b26b95b981ea9a63512260d0c21cac3 is the first bad commit
commit 646cccb55b26b95b981ea9a63512260d0c21cac3
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Wed Oct 26 16:41:39 2016 -0400

    drm/amdgpu: add support for new smc firmware on tonga
    
    Newer tonga parts require new smc firmware.
    
    Reviewed-by: Huang Rui <ray.huang@amd.com>
    Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org

:040000 040000 0e55b61a1495742ae3b60078e49fcdf53cb5030e 182f12f02fa1aa8d9cf12b28743369be587e33a9 M	drivers

That the bad part if it's a firmware bug.
The sapphire r9 380 itx has only one fan and is shorter than any other r9 380. So their bios/firmware may be specific...
Comment 16 winches 2017-03-25 19:12:19 UTC
I've check the code, so if i understand; to solve the problem on my computer i have to rename tonga_smc.bin to tonga_k_smc.bin
If it works, it's a firmware problem.