Bug 47471 - Radeon - NMI: PCI system error (SERR) for reason a1 on CPU 0.
Summary: Radeon - NMI: PCI system error (SERR) for reason a1 on CPU 0.
Status: RESOLVED DOCUMENTED
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-13 15:00 UTC by 4Strings
Modified: 2017-03-09 03:37 UTC (History)
3 users (show)

See Also:
Kernel Version: 3.2.28 - 3.4.9
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg - kernel 3.2.28 (109.99 KB, application/octet-stream)
2012-09-13 15:00 UTC, 4Strings
Details
syslog - kernel 3.2.28 (7.29 KB, application/octet-stream)
2012-09-13 15:02 UTC, 4Strings
Details
messages - kernel 3.2.28 (59.02 KB, application/octet-stream)
2012-09-13 15:04 UTC, 4Strings
Details
dmesg - kernel 3.4.9 (52.75 KB, application/octet-stream)
2012-09-13 15:05 UTC, 4Strings
Details

Description 4Strings 2012-09-13 15:00:48 UTC
Created attachment 80091 [details]
dmesg - kernel 3.2.28

Distribution: slackware -current  
kernel version: 3.2.28-smp
Hardware Environment: Dell Inspiron 6400 - video card: Ati X1400

Hi! 
On my Slackware -current whenever I try to change my ati x1400 power profile (by echoing the "/sys/class/drm/card0/device/power_profile" file) I always get the following NMI alert:
NMI: PCI system error (SERR) for reason a1 on CPU 0.
Dazed and confused, but trying to continue

(It's sometimes reported the reason "b1" instead of the "a1") 

There is no report about the NMI error message, during the system boot. 
At the startup my system defaults the video card power profile to its "default" state, so that my system startup scenar is the following:
---
root@darkstar:~# cat /sys/class/drm/card0/device/power_method 
profile
root@darkstar:~# cat /sys/class/drm/card0/device/power_profile 
default

root@darkstar:~# cat /sys/kernel/debug/dri/0/radeon_pm_info 
default engine clock: 432000 kHz
current engine clock: 432000 kHz
default memory clock: 396000 kHz
current memory clock: 396000 kHz
PCIE lanes: 0
---

Everytime I change the video card clock frequency, by switching to another power profile, I get the NMI alert:
---
root@darkstar:~# echo low > /sys/class/drm/card0/device/power_profile
NMI: PCI system error (SERR) for reason a1 on CPU 0.
Dazed and confused, but trying to continue
...
root@darkstar:~# cat /sys/kernel/debug/dri/0/radeon_pm_info 
default engine clock: 432000 kHz
current engine clock: 324000 kHz
default memory clock: 396000 kHz
current memory clock: 135000 kHz
PCIE lanes: 1
---

The NMI alert still occurres with KMS disabled (booting with the "nomodeset" kernel option).
With KMS disabled, I tried to reduce the power by adding the following three lines to the "Device Section" of a new "xorg.conf":
  Option "DynamicPM"         "on"
  Option "ClockGating"       "on"
  Option "ForceLowPowerMode" "on"
But, as soon as I startX I receive the NMI alert!

I could notice this issue running other kernel versions up to the 3.2.28 (specifically: 3.2.23, 3.2.26, 3.2.27, 3.2.28)

I tried upgrading to the slackware kernel vanilla version 3.4.9-smp (now in testing). I could notice a slightly different behaviour compared to the 3.2.x kernels: 
Changing the power profile doesnt't produce anymore the NMI error message output but, the NMI error message always occures during the bootup of the system!
(I've attached the dmesg-kernel_3.4.9, too.)
Same situation running gentoo live 12.1 (kernel 3.3.0) and knoppix live 7.0.4 (kernel 3.4.9).

Despite the NMI alerts the system works well, but I'm worried about the possibility that the NMI signal is warning about (or could lead to...) video card failure.
Comment 1 4Strings 2012-09-13 15:02:38 UTC
Created attachment 80101 [details]
syslog - kernel 3.2.28
Comment 2 4Strings 2012-09-13 15:04:49 UTC
Created attachment 80111 [details]
messages - kernel 3.2.28
Comment 3 4Strings 2012-09-13 15:05:17 UTC
Created attachment 80121 [details]
dmesg - kernel 3.4.9
Comment 4 Alex Deucher 2012-09-13 15:06:26 UTC
This is a duplicate of bug 43078.  The NMI comes from changing the number of PCIE lanes.  I was never able to track down why as far as I know, it's harmless.
Comment 5 4Strings 2012-09-13 15:33:49 UTC
Thanks very much for your answer! From now on I will read the NMI alert with no fear... Your word "harmless" is quite reassuring! :)

Note You need to log in before you can comment on or make changes to this bug.