Bug 68301

Summary: [bisected] Headless OpenCL broken
Product: Drivers Reporter: Niels Ole Salscheider (niels_ole)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: NEW ---    
Severity: normal CC: alexdeucher, niels_ole
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output

Description Niels Ole Salscheider 2014-01-08 19:41:56 UTC
Created attachment 121331 [details]
dmesg output

Since 10ebc0bc09344ab6310309169efc73dfe6c23d72, headless OpenCL is broken on my university's Radeon S7000 unless I pass "radeon.runpm=0".

The first OpenCL program after a reboot seems to work, but all after that output an error similar to the following:

radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 4352 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 4352 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000

In dmesg, I can see that some parts of the initialization routine are repeated:
[  348.906146] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  348.906151] [drm] PCIE gen 3 link speeds already enabled
[  348.909465] [drm] PCIE GART of 1024M enabled (table at 0x0000000000478000).
[...]

After that, the GPU keeps locking up.

I have attached the dmesg output.
Comment 2 Alex Deucher 2014-01-13 19:56:27 UTC
Do graphics work ok for you with runpm=1?  I.e., is it just compute that's causing a problem?
Comment 3 Niels Ole Salscheider 2014-01-13 21:54:34 UTC
> Make sure your kernel has this patch:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=f244d8b623dae7a7bc695b0336f67729b95a9736

My kernel has this patch but it does not help.

> Do graphics work ok for you with runpm=1?  I.e., is it just compute that's
> causing a problem?

It is a bit difficult to test this because I only have SSH access to the machine at the moment.
Without any parameter, I get the already mentioned error for compute and a GPU lockup entry in dmesg when I try to start X.

With runpm=1, my SSH session hangs a few seconds after I load the radeon module and I cannot open another one. I can still ping the computer, though.

Unfortunately, it will be a few weeks until I have physical access to the machine again.