Created attachment 107420 [details] dmesg output showing the memory corruption report Distribution: Fedora 18 x86_64 Kernel Version: 3.11 My machine has a flat panel display that does not report EDID correctly to my video card. I found the edid_firmware parameter to drm_kms_helper, and I tried the following line in rc.local: echo -n edid/1280x1024.bin > /sys/module/drm_kms_helper/parameters/edid_firmware Sometimes, nothing bad happens. Other times, I get a memory corruption backtrace like the one in the attached dmesg report.
Created attachment 107421 [details] Configuration file used to build the kernel
You need to set that parameter on the command line for it to have effect for LVDS. I guess no one checked whether changing it on the fly was safe...
(In reply to Chris Wilson from comment #2) > You need to set that parameter on the command line for it to have effect for > LVDS. I guess no one checked whether changing it on the fly was safe... Hmm, it looked like a list corruption in the firmware loader to me. Which shouldn't happen regardless of whether it's safe to set edid_firmware on the fly. Alex, please see if you can reproduce the problem with this patch: diff --git a/drivers/gpu/drm/drm_edid_load.c b/drivers/gpu/drm/drm_edid_load.c index 271b42b..0a54bdd7 100644 --- a/drivers/gpu/drm/drm_edid_load.c +++ b/drivers/gpu/drm/drm_edid_load.c @@ -152,7 +152,7 @@ static u8 *edid_load(struct drm_connector *connector, const char *name, goto out; } - err = request_firmware(&fw, name, &pdev->dev); + err = 1; platform_device_unregister(pdev); if (err) { @@ -231,7 +231,6 @@ static u8 *edid_load(struct drm_connector *connector, const char *name, name, connector_name); relfw_out: - release_firmware(fw); out: if (err)
Side note, it seems wasteful to do the full firmware loading dance of checking the firmware cache, checking the kernel builtin firmware blobs, trying to load the firmware directly from kernel, asking the user space helper to please give us the firmware, and finally checking the DRM built-in EDID blobs... every time drm_helper_probe_single_connector_modes() gets run with the edid_firmware parameter set.
(In reply to Chris Wilson from comment #2) > You need to set that parameter on the command line for it to have effect for > LVDS. I guess no one checked whether changing it on the fly was safe... In mainline kernel with Fedora 18, if I set the kernel parameter, the kernel hangs for more than a minute before mounting the root filesystem and booting normally. The distro kernel does not hang (3.10.10), but I already downloaded the source RPM and cannot find which patch, if any, solves this.
(In reply to Alex Villacis Lasso from comment #5) > (In reply to Chris Wilson from comment #2) > > You need to set that parameter on the command line for it to have effect > for > > LVDS. I guess no one checked whether changing it on the fly was safe... > > In mainline kernel with Fedora 18, if I set the kernel parameter, the kernel > hangs for more than a minute before mounting the root filesystem and booting > normally. The distro kernel does not hang (3.10.10), but I already > downloaded the source RPM and cannot find which patch, if any, solves this. Wild guess, the distro kernel has CONFIG_FW_LOADER_USER_HELPER=n.
It does. I thought I made a point of basing the compiled kernel config off the distro kernel config except in the places I wanted to change, and I did not intend to set CONFIG_FW_LOADER_USER_HELPER in my config. If I set CONFIG_FW_LOADER_USER_HELPER=n in my config and recompile, could that influence whether the bug might trigger, even before applying the debug patch?
(In reply to Alex Villacis Lasso from comment #7) > It does. I thought I made a point of basing the compiled kernel config off > the distro kernel config except in the places I wanted to change, and I did > not intend to set CONFIG_FW_LOADER_USER_HELPER in my config. If I set > CONFIG_FW_LOADER_USER_HELPER=n in my config and recompile, could that > influence whether the bug might trigger, even before applying the debug > patch? With CONFIG_FW_LOADER_USER_HELPER=n the kernel doesn't try to ask the user space to supply the firmware (=edid in this case). Which also means there's no 60 second timeout for the userspace reply which apparently doesn't happen. And, *if* the problem is in the firmware loader code, that might vastly influence whether the bug might trigger. Now, the 60 second timeout isn't this bug. (It might be considered a different bug.) This bug is the list corruption you see. All else being equal, if you can reproduce it with the patch, it's in DRM. If the patch makes the problem go away, it's in the firmware loader. So we'd appreciate you trying the debug patch, thanks. :)
I have compiled a 3.11 kernel with CONFIG_FW_LOADER_USER_HELPER=y, with the patch applied, and without the edid_firmware boot parameter. After three boots and manually writing the string to the sysfs file, I am yet unable to trigger the bug. However, I did notice that the bug was random, so I will keep testing.
(In reply to Alex Villacis Lasso from comment #9) > I have compiled a 3.11 kernel with CONFIG_FW_LOADER_USER_HELPER=y, with the > patch applied, and without the edid_firmware boot parameter. After three > boots and manually writing the string to the sysfs file, I am yet unable to > trigger the bug. However, I did notice that the bug was random, so I will > keep testing. Please also try with the rc.local approach you had earlier (just to try to reproduce the bug; this is not the recommended approach for regular use).
Alternatively, please try this patch on top of 3.11, with CONFIG_FW_LOADER_USER_HELPER=y and the rc.local thing: diff --git a/drivers/gpu/drm/drm_edid_load.c b/drivers/gpu/drm/drm_edid_load.c index 271b42b..7a0a1d3 100644 --- a/drivers/gpu/drm/drm_edid_load.c +++ b/drivers/gpu/drm/drm_edid_load.c @@ -155,6 +155,13 @@ static u8 *edid_load(struct drm_connector *connector, const err = request_firmware(&fw, name, &pdev->dev); platform_device_unregister(pdev); + if (err) + goto out; + else { + err = -EINVAL; + goto relfw_out; + } + if (err) { i = 0; while (i < GENERIC_EDIDS && strcmp(name, generic_edid_name[i])) If you can reproduce the issue with this, the problem is in the firmware loader.
(In reply to Jani Nikula from comment #11) > Alternatively, please try this patch on top of 3.11, with > CONFIG_FW_LOADER_USER_HELPER=y and the rc.local thing: > > diff --git a/drivers/gpu/drm/drm_edid_load.c > b/drivers/gpu/drm/drm_edid_load.c > index 271b42b..7a0a1d3 100644 > --- a/drivers/gpu/drm/drm_edid_load.c > +++ b/drivers/gpu/drm/drm_edid_load.c > @@ -155,6 +155,13 @@ static u8 *edid_load(struct drm_connector *connector, > const > err = request_firmware(&fw, name, &pdev->dev); > platform_device_unregister(pdev); > > + if (err) > + goto out; > + else { > + err = -EINVAL; > + goto relfw_out; > + } > + > if (err) { > i = 0; > while (i < GENERIC_EDIDS && strcmp(name, > generic_edid_name[i])) > > If you can reproduce the issue with this, the problem is in the firmware > loader. Not exactly what I expected. I recompiled the test kernel with this patch. The boot process ran normally, except that there was no backtrace at that moment, and the screen resolution failed to change. However, after a few hours, I attempted to shut down, and then the system hung on the shutdown splash screen. The keyboard was unresponsive. I could not even dismiss the splash screen to check whether there was a backtrace behind.
Please retry recent kernels and report back. There's been some changes in the area. Thanks.
Ping for the re-test.
Timeout, closing. Assumed fixed by some of the changes in the edid firmware loader. Please reopen if the problem reappears with recent kernels.