Bug 16295 - radeon conflict/interference with rt2800pci
Summary: radeon conflict/interference with rt2800pci
Status: RESOLVED OBSOLETE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-25 19:37 UTC by Rafał Miłecki
Modified: 2012-08-09 14:12 UTC (History)
2 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg with modprobe/rmmod operations and lockups (21.51 KB, text/plain)
2010-06-25 19:42 UTC, Rafał Miłecki
Details
lspci -nnvv (29.18 KB, text/plain)
2010-06-25 19:46 UTC, Rafał Miłecki
Details
dmesg with rt2x00pci_regbusy_read: Error - Indirect register access failed (1.69 KB, text/plain)
2010-07-02 21:47 UTC, Rafał Miłecki
Details

Description Rafał Miłecki 2010-06-25 19:37:54 UTC
This can be duplicate of FDO bug #28106:
https://bugs.freedesktop.org/show_bug.cgi?id=28106

For all my tests I use KMS all the time.

Normally I can load and unload rt2800pci dozens of times without a problem. However if I unload it after "ifconfig wlan0 up" my GPU gets unstable.

When I tried to modprobe rt2800pci after 4th GPU softreset I got lock up and machine selt-reboot. After rebooting I saw BIOS warning that last booting ended with "sync flood" (I also got this warning in another radeon bug when it tried to divide by 0).
Comment 1 Rafał Miłecki 2010-06-25 19:42:08 UTC
Created attachment 26948 [details]
dmesg with modprobe/rmmod operations and lockups
Comment 2 Rafał Miłecki 2010-06-25 19:46:03 UTC
Created attachment 26949 [details]
lspci -nnvv

This is netbook MSI U230

01:05.0 VGA compatible controller: ATI Technologies Inc RS780M/RS780MN [Radeon HD 3200 Graphics]
01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
02:00.0 Network controller: RaLink RT3090 Wireless 802.11n 1T/1R PCIe
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Comment 3 Rafał Miłecki 2010-06-28 14:14:44 UTC
I'm getting confused. Now I get GPU lockups even without loading any WiFi driver :| It happens frequently with KDE4's effects enabled, not so often with effects disabled (but still).

It was really annoying to get lockup even every few seconds, so I decided to try removing "radeon_gpu_reset" from lockup handling. Now I get warnings in dmesg, but GPU continues to work (even without restarting).

However it's far from perfect solution.

Any idea how can I fix/debug that lockups?
Comment 4 Rafał Miłecki 2010-07-02 20:51:17 UTC
(In reply to comment #3)
> I'm getting confused. Now I get GPU lockups even without loading any WiFi
> driver :| It happens frequently with KDE4's effects enabled, not so often
> with
> effects disabled (but still).
> 
> It was really annoying to get lockup even every few seconds, so I decided to
> try removing "radeon_gpu_reset" from lockup handling. Now I get warnings in
> dmesg, but GPU continues to work (even without restarting).

I got that problems when using my debugging patch created for bug https://bugs.freedesktop.org/show_bug.cgi?id=28745

After removing it and recompiling I get previous problem. No more GPU lockups while normal work. Unfortunately I was silly enough to do not save my debugging patch. This could be some timing issue my patch exposed and I can not reproduce it anymore :|

Anyway problem with stability after removing rt2800pci is still present. This doesn't happen with rt2860sta however. Unfortunately differences between that two drivers are too big to make is debuggable by comparison :|
Comment 5 Rafał Miłecki 2010-07-02 21:47:56 UTC
Created attachment 27002 [details]
dmesg with rt2x00pci_regbusy_read: Error - Indirect register access failed

Now for difference I get "Indirect register access failed" error immediately after loading rt2800pci second time. GPU doesn't even have time to lockup...
Comment 6 Rafał Miłecki 2010-07-02 22:37:32 UTC
Plus sometimes I get additional error on loading module:

[  235.095924] cfg80211: Calling CRDA to update world regulatory domain
[  235.292128] cfg80211: World regulatory domain updated:
[  235.292142]     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
[  235.292159]     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  235.292164]     (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[  235.292173]     (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
[  235.292182]     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  235.292192]     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
[  235.585215] rt2800pci 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[  235.585253] rt2800pci 0000:02:00.0: setting latency timer to 64
[  235.651541] phy0: Selected rate control algorithm 'minstrel'
[  235.654001] Registered led device: rt2800pci-phy0::radio
[  235.655446] Registered led device: rt2800pci-phy0::assoc
[  235.655545] Registered led device: rt2800pci-phy0::quality
[  235.934490] phy0 -> rt2800pci_mcu_status: Error - MCU request failed, no response from hardware
Comment 7 Alex Deucher 2010-07-03 15:24:23 UTC
(In reply to comment #3)
> I'm getting confused. Now I get GPU lockups even without loading any WiFi
> driver :| It happens frequently with KDE4's effects enabled, not so often
> with
> effects disabled (but still).
> 
> It was really annoying to get lockup even every few seconds, so I decided to
> try removing "radeon_gpu_reset" from lockup handling. Now I get warnings in
> dmesg, but GPU continues to work (even without restarting).
> 
> However it's far from perfect solution.
> 
> Any idea how can I fix/debug that lockups?

I'd guess the GPU hasn't hung, but some fences haven't come up in a timely enough manner that the driver attempts to reset the GPU thinking that it has hung.  I'd guess an interrupt problem of some sort (either high latency in dealing with interrupts or missed interrupts).  This would explain the GPU reset as the fence code is interrupt driven.  Might be worth checking the CP scratch regs in the hang check code to make sure it hasn't missed a fence in which case a GPU reset isn't necessary.  Does disabling MSIs help?  try booting with pci=nomsi

Note You need to log in before you can comment on or make changes to this bug.