Bug 89661

Summary: Kernel panic when trying use amdkfd driver on Kaveri
Product: Drivers Reporter: Bernd Steinhauser (linux)
Component: Video(DRI - non Intel)Assignee: drivers_video-dri
Status: RESOLVED CODE_FIX    
Severity: normal CC: oded.gabbay
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.18.0 + drm-next branch Subsystem:
Regression: No Bisected commit-id:
Attachments: Picture of the kernel panic output
Print errors in case of NULL pointers and don't dereference them
More checks on pointers being used
workaround for the module order problem
hacky workaround for module order problem

Description Bernd Steinhauser 2014-12-13 09:32:19 UTC
The kernel I tried to use was 3.18.0 and I merged the drm-next branch from
git://people.freedesktop.org/~airlied/linux

which includes the HSA driver amdkfd. CONFIG_HSA_AMD=y is set.
When trying to boot the kernel, I get a kernel panic, as shown in the uploaded picture.

CPU is a A10-7800 Kaveri, Motherboard is ASRock FM2A88X Extreme6+.
Comment 1 Bernd Steinhauser 2014-12-13 09:32:54 UTC
Created attachment 160441 [details]
Picture of the kernel panic output
Comment 2 Michel Dänzer 2014-12-15 02:07:40 UTC
Does it also happen with CONFIG_HSA_AMD=m?
Comment 3 Oded Gabbay 2014-12-15 21:37:46 UTC
Created attachment 160721 [details]
Print errors in case of NULL pointers and don't dereference them
Comment 4 Oded Gabbay 2014-12-15 21:38:25 UTC
Hi,
Please try the attached patch.
Comment 5 Bernd Steinhauser 2014-12-15 21:45:32 UTC
(In reply to Michel Dänzer from comment #2)
> Does it also happen with CONFIG_HSA_AMD=m?

Only tried CONFIG_HSA_AMD=n, not module, but this happens so early that I'm confident it does not matter.

Will try the patch, thanks.
Comment 6 Bernd Steinhauser 2014-12-16 11:14:04 UTC
Tried the patch, exactly the same result.
Comment 7 Oded Gabbay 2014-12-16 11:26:49 UTC
Created attachment 160751 [details]
More checks on pointers being used
Comment 8 Oded Gabbay 2014-12-16 11:29:43 UTC
Hi,
Three things, please:

1. Please try the attached patch. It tries to verify more pointers before using them.

2. You said CONFIG_HSA_AMD=y. What's the value of CONFIG_DRM_RADEON ? If its "m", could you change it to "y" ?

3. I would still like to ask if you could check with the following config:
CONFIG_DRM_RADEON="m"
CONFIG_HSA_AMD="m"

Thanks

Oded
Comment 9 Oded Gabbay 2014-12-16 11:38:25 UTC
One more thing,
I'm trying to understand the exact tree you are using so we will look at the same code.
Did you just took drm-next, or did you manually merged between trees ?
If you did a manual merge, could you try instead to just take drm-next ? It's already based on 3.18.0
Comment 10 Oded Gabbay 2014-12-16 15:11:12 UTC
Hi,
So I managed to recreate the bug on my setup.
This is happening because you compiled all the modules inside the kernel. I need to address that, but for now, if you will compile them as "m", everything is supposed to work.
Comment 11 Bernd Steinhauser 2014-12-16 21:16:59 UTC
Hm, ok. So should I still try the steps above? Because trying to use drm_radeon as a module would require me to do some testing with that setup before.

(In reply to Oded Gabbay from comment #8)

> 2. You said CONFIG_HSA_AMD=y. What's the value of CONFIG_DRM_RADEON ? If its
> "m", could you change it to "y" ?
I'm using a static initrd (only a basic system, but doesn't contain any kernel modules), so all drivers necessary to start the system (including drm_radeon) are compiled in.

Regarding the tree:
I took plain 3.18 (b2776b) and then merged the drm-next branch from the repo mentioned above.
iirc, it was a fast forward.
Comment 12 Oded Gabbay 2014-12-17 07:40:17 UTC
As I said, there is definitely a bug when compiling both radeon and amdkfd inside the kernel.
I'm working on fixing it, but that could take a few days.
In the meantime, the only way to make it work without touching the code, is to either compile both drivers as modules or just radeon as module.

No need for further experiments.
Comment 13 Oded Gabbay 2014-12-17 11:28:16 UTC
Created attachment 160951 [details]
workaround for the module order problem
Comment 14 Oded Gabbay 2014-12-17 11:29:19 UTC
I attached a new patch which should solve you the problem when compiling all the drivers into the kernel image.
This is a hacky workaround, so this is not the final solution, but it will help you continue with your setup, I hope.
Comment 15 Oded Gabbay 2014-12-17 11:32:03 UTC
Created attachment 160961 [details]
hacky workaround for module order problem
Comment 16 Bernd Steinhauser 2014-12-17 21:45:54 UTC
Thanks, I'll give it a try.
Comment 17 Bernd Steinhauser 2014-12-18 18:41:45 UTC
Ok, it does now boot and seems to work.
Comment 18 Bernd Steinhauser 2016-04-13 17:08:43 UTC
At some point (didn't have a closer look), this was fixed and does now work as expected without workarounds.
(Tested: 4.5.1)