Bug 80901 - [radeon] loading corrupts lspci entry + unloading crashes kernel
Summary: [radeon] loading corrupts lspci entry + unloading crashes kernel
Status: RESOLVED INVALID
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: IA-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-22 14:00 UTC by Andrea Paternò
Modified: 2014-07-22 22:22 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.16.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
lspci w/ radeon (36.43 KB, text/plain)
2014-07-22 14:00 UTC, Andrea Paternò
Details
System Journal log (99.73 KB, text/plain)
2014-07-22 14:02 UTC, Andrea Paternò
Details
Kernel config (108.59 KB, text/plain)
2014-07-22 14:04 UTC, Andrea Paternò
Details
dmesg (103.73 KB, text/plain)
2014-07-22 14:37 UTC, Andrea Paternò
Details
System Journal log after module unloading (105.22 KB, text/plain)
2014-07-22 18:00 UTC, Andrea Paternò
Details

Description Andrea Paternò 2014-07-22 14:00:38 UTC
Created attachment 143901 [details]
lspci w/ radeon

Hi all. I've been searching the internet for quite a while now, but couldn't find any solution to my problem. I currently am on a laptop Dell Inspiron 15R SE, with a i7 3632QM cpu ( which embeds a i915 intel card ) and a discrete AMD/ATI Radeon HD 773M.

I know this kind of hybrid graphics setup's support is getting better more recently, but I still am facing some problems with the open source driver.

The main problem is that, I don't really know if ( with PRIME ) my card is working at all. This is because when I tried the tool radeontop, it couldn't find my card. This is due, I suspect, to the fact that whenever I load my "radeon" module, I think it messes up the device. This seems confirmed by the fact that I get a clean lspci -vvv when the module is unloaded, and a "unknown header type 7f" when the module is loaded. See attachments please.

Also, it takes a few seconds to load the module itself, and when I unload it ( modprobe -r radeon ) it crashes my machine, right after a few seconds. The system log shows a dereference problem, but "live" I can only see the stack trace the kernel outputs and nothing more.


Steps to Reproduce:
modprobe radeon
modprobe -r radeon

Actual Results:
Radeon's LSPCI entry is messed up and card is not recognized by tools. On unloading, crashes.

Expected Results:
Radeon's LSPCI entry is ok, radeon card is recognized and on unloading, nothing.

Build Date & Hardware:
Laptop Dell Inspiron 15R SE
Intel i7 3632QM
AMD Radeon HD 7730M Cape Verde

Software:
Arch Linux
Custom Kernel 3.16.0-rc6
Xorg-server 1.16.0-2
xf86-video-intel 2.99.912-4
xf86-video-ati 1:7.4.0-3

Additional Builds and Platforms:
Problem encountered also with stock kernel config and with kernel version 3.13.
Comment 1 Andrea Paternò 2014-07-22 14:02:29 UTC
Created attachment 143911 [details]
System Journal log

System log on failure
Comment 2 Andrea Paternò 2014-07-22 14:04:34 UTC
Created attachment 143921 [details]
Kernel config
Comment 3 Alex Deucher 2014-07-22 14:29:55 UTC
Does rendering with PRIME work?  E.g., DRM_PRIME=1 glxinfo

I'm not sure if radeontop supports all cards or not.
Comment 4 Andrea Paternò 2014-07-22 14:36:45 UTC
It does.

gandalf@the_shire ~ » LIBGL_DEBUG=1 DRI_PRIME=1 glxinfo | grep "renderer string"
libGL: Can't open configuration file /home/gandalf/.drirc: No such file or directory.
libGL: Can't open configuration file /home/gandalf/.drirc: No such file or directory.
OpenGL renderer string: Gallium 0.4 on AMD CAPE VERDE

Also, I just noticed that everytime I lauch anithing with PRIME, it takes a few seconds before returning the output. If you check the dmesg output I just attached, you'll see that it shows several driver-related outputs. Basically It prints them out everytime I lauch the program with PRIME
Comment 5 Andrea Paternò 2014-07-22 14:37:15 UTC
Created attachment 143931 [details]
dmesg
Comment 6 Alex Deucher 2014-07-22 16:13:02 UTC
(In reply to Andrea Paternò from comment #4)
> It does.
> 
> gandalf@the_shire ~ » LIBGL_DEBUG=1 DRI_PRIME=1 glxinfo | grep "renderer
> string"
> libGL: Can't open configuration file /home/gandalf/.drirc: No such file or
> directory.
> libGL: Can't open configuration file /home/gandalf/.drirc: No such file or
> directory.
> OpenGL renderer string: Gallium 0.4 on AMD CAPE VERDE
> 
> Also, I just noticed that everytime I lauch anithing with PRIME, it takes a
> few seconds before returning the output. If you check the dmesg output I
> just attached, you'll see that it shows several driver-related outputs.
> Basically It prints them out everytime I lauch the program with PRIME

Everything appears to be working fine.  The delay and messages are due to the card being powered down by default to save power and then powered back up when you use it.
Comment 7 Andrea Paternò 2014-07-22 18:00:06 UTC
Created attachment 143941 [details]
System Journal log after module unloading

Makes sense! Nonetheless, the "unknown header type 7f" problem remains, as well as the unloading problem. I managed to capture the journal log of the system just before the crash, which is attached.
Comment 8 Alex Deucher 2014-07-22 18:30:30 UTC
(In reply to Andrea Paternò from comment #7)
> Makes sense! Nonetheless, the "unknown header type 7f" problem remains, as
> well as the unloading problem. I managed to capture the journal log of the
> system just before the crash, which is attached.

The lspci -vvv problem is that lspci doesn't power up the GPU so it just reads back garbage since the GPU is powered down.  It should work if you run lspci when the GPU is powered up (e.g., when rendering with DRM_PRIME=1 or if you disable runtime power management (boot with radeon.runpm=0 on the kernel command line in grub).  The crash is a bug in vgaswitcheroo I think rather than the radeon driver.
Comment 9 Andrea Paternò 2014-07-22 18:41:52 UTC
That is.. well. I feel quite dumb, because it totally makes sense. It totally works when the card is powered up: both lspci and radeontop work like a charm!

Now I only have to figure out what may cause the vgaswitcheroo problem
Comment 10 Andrea Paternò 2014-07-22 20:57:30 UTC
It turns out that the vgaswitcheroo crash may be avoided by integrating the radeon module directly into the kernel. Of course, this prevents the user from unloading it, but, with the power management features on, I see no reason why should I want to unload it.

I had to compile the radeon extra firmware in the kernel as well, but as of now, I am not facing any issue, even after suspending/resuming.
Comment 11 Alex Deucher 2014-07-22 22:22:59 UTC
You can leave the radeon driver as a module and just not unload it.  There's generally no reason to.

Note You need to log in before you can comment on or make changes to this bug.