Bug 89661 - Kernel panic when trying use amdkfd driver on Kaveri
Summary: Kernel panic when trying use amdkfd driver on Kaveri
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Video(DRI - non Intel) (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: drivers_video-dri
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-13 09:32 UTC by Bernd Steinhauser
Modified: 2016-04-13 17:08 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.18.0 + drm-next branch
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Picture of the kernel panic output (143.07 KB, image/jpeg)
2014-12-13 09:32 UTC, Bernd Steinhauser
Details
Print errors in case of NULL pointers and don't dereference them (717 bytes, patch)
2014-12-15 21:37 UTC, Oded Gabbay
Details | Diff
More checks on pointers being used (1.24 KB, patch)
2014-12-16 11:26 UTC, Oded Gabbay
Details | Diff
workaround for the module order problem (5.73 KB, patch)
2014-12-17 11:28 UTC, Oded Gabbay
Details | Diff
hacky workaround for module order problem (5.76 KB, patch)
2014-12-17 11:32 UTC, Oded Gabbay
Details | Diff

Description Bernd Steinhauser 2014-12-13 09:32:19 UTC
The kernel I tried to use was 3.18.0 and I merged the drm-next branch from
git://people.freedesktop.org/~airlied/linux

which includes the HSA driver amdkfd. CONFIG_HSA_AMD=y is set.
When trying to boot the kernel, I get a kernel panic, as shown in the uploaded picture.

CPU is a A10-7800 Kaveri, Motherboard is ASRock FM2A88X Extreme6+.
Comment 1 Bernd Steinhauser 2014-12-13 09:32:54 UTC
Created attachment 160441 [details]
Picture of the kernel panic output
Comment 2 Michel Dänzer 2014-12-15 02:07:40 UTC
Does it also happen with CONFIG_HSA_AMD=m?
Comment 3 Oded Gabbay 2014-12-15 21:37:46 UTC
Created attachment 160721 [details]
Print errors in case of NULL pointers and don't dereference them
Comment 4 Oded Gabbay 2014-12-15 21:38:25 UTC
Hi,
Please try the attached patch.
Comment 5 Bernd Steinhauser 2014-12-15 21:45:32 UTC
(In reply to Michel Dänzer from comment #2)
> Does it also happen with CONFIG_HSA_AMD=m?

Only tried CONFIG_HSA_AMD=n, not module, but this happens so early that I'm confident it does not matter.

Will try the patch, thanks.
Comment 6 Bernd Steinhauser 2014-12-16 11:14:04 UTC
Tried the patch, exactly the same result.
Comment 7 Oded Gabbay 2014-12-16 11:26:49 UTC
Created attachment 160751 [details]
More checks on pointers being used
Comment 8 Oded Gabbay 2014-12-16 11:29:43 UTC
Hi,
Three things, please:

1. Please try the attached patch. It tries to verify more pointers before using them.

2. You said CONFIG_HSA_AMD=y. What's the value of CONFIG_DRM_RADEON ? If its "m", could you change it to "y" ?

3. I would still like to ask if you could check with the following config:
CONFIG_DRM_RADEON="m"
CONFIG_HSA_AMD="m"

Thanks

Oded
Comment 9 Oded Gabbay 2014-12-16 11:38:25 UTC
One more thing,
I'm trying to understand the exact tree you are using so we will look at the same code.
Did you just took drm-next, or did you manually merged between trees ?
If you did a manual merge, could you try instead to just take drm-next ? It's already based on 3.18.0
Comment 10 Oded Gabbay 2014-12-16 15:11:12 UTC
Hi,
So I managed to recreate the bug on my setup.
This is happening because you compiled all the modules inside the kernel. I need to address that, but for now, if you will compile them as "m", everything is supposed to work.
Comment 11 Bernd Steinhauser 2014-12-16 21:16:59 UTC
Hm, ok. So should I still try the steps above? Because trying to use drm_radeon as a module would require me to do some testing with that setup before.

(In reply to Oded Gabbay from comment #8)

> 2. You said CONFIG_HSA_AMD=y. What's the value of CONFIG_DRM_RADEON ? If its
> "m", could you change it to "y" ?
I'm using a static initrd (only a basic system, but doesn't contain any kernel modules), so all drivers necessary to start the system (including drm_radeon) are compiled in.

Regarding the tree:
I took plain 3.18 (b2776b) and then merged the drm-next branch from the repo mentioned above.
iirc, it was a fast forward.
Comment 12 Oded Gabbay 2014-12-17 07:40:17 UTC
As I said, there is definitely a bug when compiling both radeon and amdkfd inside the kernel.
I'm working on fixing it, but that could take a few days.
In the meantime, the only way to make it work without touching the code, is to either compile both drivers as modules or just radeon as module.

No need for further experiments.
Comment 13 Oded Gabbay 2014-12-17 11:28:16 UTC
Created attachment 160951 [details]
workaround for the module order problem
Comment 14 Oded Gabbay 2014-12-17 11:29:19 UTC
I attached a new patch which should solve you the problem when compiling all the drivers into the kernel image.
This is a hacky workaround, so this is not the final solution, but it will help you continue with your setup, I hope.
Comment 15 Oded Gabbay 2014-12-17 11:32:03 UTC
Created attachment 160961 [details]
hacky workaround for module order problem
Comment 16 Bernd Steinhauser 2014-12-17 21:45:54 UTC
Thanks, I'll give it a try.
Comment 17 Bernd Steinhauser 2014-12-18 18:41:45 UTC
Ok, it does now boot and seems to work.
Comment 18 Bernd Steinhauser 2016-04-13 17:08:43 UTC
At some point (didn't have a closer look), this was fixed and does now work as expected without workarounds.
(Tested: 4.5.1)

Note You need to log in before you can comment on or make changes to this bug.