Bug 42162 - [bisected] continuous gpu resets on radeon
[bisected] continuous gpu resets on radeon
Status: CLOSED CODE_FIX
Product: Drivers
Classification: Unclassified
Component: PCI
All Linux
: P1 normal
Assigned To: drivers_pci@kernel-bugs.osdl.org
:
: 42172 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-01 15:05 UTC by Niels Ole Salscheider
Modified: 2012-06-13 15:12 UTC (History)
6 users (show)

See Also:
Kernel Version: b03e7495a862b028294f59fc87286d6d78ee7fa1
Tree: Mainline
Regression: Yes


Attachments
dmesg output (112.38 KB, application/octet-stream)
2011-09-01 15:06 UTC, Niels Ole Salscheider
Details
output of lspci -vvvxxx (54.80 KB, application/octet-stream)
2011-09-01 15:11 UTC, Niels Ole Salscheider
Details
fix (3.14 KB, patch)
2011-09-01 17:12 UTC, Alex Deucher
Details | Diff
v2 (2.96 KB, patch)
2011-09-01 18:40 UTC, Alex Deucher
Details | Diff
Remove MRRS modification from PCI code (1.80 KB, patch)
2011-09-01 21:52 UTC, Jon Mason
Details | Diff
Remove MRRS modification from PCI code, version 2 (3.25 KB, patch)
2011-09-07 22:01 UTC, Jon Mason
Details | Diff

Description Niels Ole Salscheider 2011-09-01 15:05:05 UTC
Since commit b03e7495a862b028294f59fc87286d6d78ee7fa1 I experience continuous gpu resets on my Radeon HD6870 (see attached dmesg output).
They happen during desktop usage but more often (nearly immediately) when playing a game.

Booting with pci=pcie_bus_safe does not help.
Comment 1 Niels Ole Salscheider 2011-09-01 15:06:26 UTC
Created attachment 71112 [details]
dmesg output
Comment 2 Niels Ole Salscheider 2011-09-01 15:11:33 UTC
Created attachment 71122 [details]
output of lspci -vvvxxx
Comment 3 Alex Deucher 2011-09-01 17:12:04 UTC
Created attachment 71142 [details]
fix

Possible fix.
Comment 4 Alex Deucher 2011-09-01 18:40:15 UTC
Created attachment 71182 [details]
v2

slightly better version.
Comment 5 Jon Mason 2011-09-01 20:30:30 UTC
It might be better to simply rip out the MRRS tweaking code from pcie_bus_configure_set
Comment 6 Jon Mason 2011-09-01 21:52:50 UTC
Created attachment 71222 [details]
Remove MRRS modification from PCI code

Do not modify the value of MRRS when determining the MPS value.
Comment 7 Jon Mason 2011-09-01 21:53:32 UTC
Please try the patch from comment #6 and verify that it resolves your issues.
Comment 8 Niels Ole Salscheider 2011-09-02 06:57:33 UTC
Yes, this patch solves my issue, too.
Comment 9 Jon Mason 2011-09-02 13:29:03 UTC
Thanks, I'll push the patch shortly (and give you some "Tested-by" credit).
Comment 10 nissarin 2011-09-03 01:18:38 UTC
*** Bug 42172 has been marked as a duplicate of this bug. ***
Comment 11 Nicolas Mailhot 2011-09-04 13:52:06 UTC
(In reply to comment #7)
> Please try the patch from comment #6 and verify that it resolves your issues.

Fixes radeon problems here too
https://bugzilla.redhat.com/show_bug.cgi?id=734201
http://koji.fedoraproject.org/koji/taskinfo?taskID=3323239
Comment 12 Jon Mason 2011-09-07 22:01:07 UTC
Created attachment 71932 [details]
Remove MRRS modification from PCI code, version 2

Updated version of the patch which uses the MPS "safe" method by default (as the "performance" method was causing issues on some systems).
Comment 13 Jon Mason 2011-09-07 22:39:35 UTC
If it's not too much trouble, I'd appreciate the updated patch being tested.
Comment 14 Florian Mickler 2011-09-08 10:15:23 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc5:

commit d054ac16eeb658bccadb06b12c39cee22243b10f
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu Sep 1 17:46:15 2011 +0000

    drm/radeon/kms: make sure pci max read request size is valid on evergreen+ (v2)
Comment 15 Florian Mickler 2012-01-12 21:25:17 UTC
A patch referencing this bug report has been merged in Linux v3.1-rc10:

commit ed2888e906b56769b4ffabb9c577190438aa68b8
Author: Jon Mason <mason@myri.com>
Date:   Thu Sep 8 16:41:18 2011 -0500

    PCI: Remove MRRS modification from MPS setting code

Note You need to log in before you can comment on or make changes to this bug.