68301 – [bisected] Headless OpenCL broken

Bug 68301 - [bisected] Headless OpenCL broken

Summary: [bisected] Headless OpenCL broken

Status:	NEW

Alias:	None

Product:	Drivers
Classification:	Unclassified
Component:	Video(DRI - non Intel) (show other bugs)
Hardware:	All Linux

Importance:	P1 normal
Assignee:	drivers_video-dri

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-08 19:41 UTC by Niels Ole Salscheider
Modified:	2014-01-13 21:54 UTC (History)
CC List:	2 users (show)

See Also:
Kernel Version:	3.13
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
dmesg output (86.16 KB, application/octet-stream) 2014-01-08 19:41 UTC, Niels Ole Salscheider	Details
Add an attachment (proposed patch, testcase, etc.)

Description Niels Ole Salscheider 2014-01-08 19:41:56 UTC

Created attachment 121331 [details]
dmesg output

Since 10ebc0bc09344ab6310309169efc73dfe6c23d72, headless OpenCL is broken on my university's Radeon S7000 unless I pass "radeon.runpm=0".

The first OpenCL program after a reboot seems to work, but all after that output an error similar to the following:

radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 4352 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 4352 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000000800000

In dmesg, I can see that some parts of the initialization routine are repeated:
[  348.906146] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
[  348.906151] [drm] PCIE gen 3 link speeds already enabled
[  348.909465] [drm] PCIE GART of 1024M enabled (table at 0x0000000000478000).
[...]

After that, the GPU keeps locking up.

I have attached the dmesg output.

Comment 1 Alex Deucher 2014-01-13 19:52:57 UTC

Make sure your kernel has this patch:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f244d8b623dae7a7bc695b0336f67729b95a9736

Comment 2 Alex Deucher 2014-01-13 19:56:27 UTC

Do graphics work ok for you with runpm=1?  I.e., is it just compute that's causing a problem?

Comment 3 Niels Ole Salscheider 2014-01-13 21:54:34 UTC

> Make sure your kernel has this patch:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=f244d8b623dae7a7bc695b0336f67729b95a9736

My kernel has this patch but it does not help.

> Do graphics work ok for you with runpm=1?  I.e., is it just compute that's
> causing a problem?

It is a bit difficult to test this because I only have SSH access to the machine at the moment.
Without any parameter, I get the already mentioned error for compute and a GPU lockup entry in dmesg when I try to start X.

With runpm=1, my SSH session hangs a few seconds after I load the radeon module and I cannot open another one. I can still ping the computer, though.

Unfortunately, it will be a few weeks until I have physical access to the machine again.

Note You need to log in before you can comment on or make changes to this bug.