Bug 15550

Summary: black screen after close/open LID
Product: Drivers Reporter: Thomas Bächler (thomas)
Component: Video(DRI - Intel)Assignee: Chris Wilson (chris)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: high CC: chris, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.34-rc4 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 56331    
Attachments: dmesg output
lsmod output
kernel configuration
dmidecode output
script to get acpi tables
ACPI tables
ACPI interrupts before closing the LID
ACPI interrupts after opening the LID
ACPI events

Description Thomas Bächler 2010-03-16 20:19:11 UTC
Created attachment 25554 [details]
dmesg output

When I close and then open the LID, the display sometimes stops working and kacpi_notify uses 100% CPU forever. I need to reboot to fix the problem.

Most of the time, the display is first off, then when I close/open again, it is on but only displays a black screen.

This seems to be related to the LID button being hit too frequently, as it mostly happens when I don't open the LID "fast enough".

I am attaching dmesg, lsmod, kernel config and dmidecode.
Comment 1 Thomas Bächler 2010-03-16 20:20:02 UTC
Created attachment 25555 [details]
lsmod output
Comment 2 Thomas Bächler 2010-03-16 20:20:43 UTC
Created attachment 25556 [details]
kernel configuration
Comment 3 Thomas Bächler 2010-03-16 20:21:27 UTC
Created attachment 25557 [details]
dmidecode output
Comment 4 Zhang Rui 2010-03-18 07:08:38 UTC
please attach the output of "grep . /sys/firmware/acpi/interrupts/*" both before and after closing-opening the lid.
Comment 5 Zhang Rui 2010-03-18 07:09:40 UTC
Created attachment 25578 [details]
script to get acpi tables

please use this script as root user to get all the acpi tables of your laptop.
and attach the result here, as a tarball.
Comment 6 Thomas Bächler 2010-03-18 09:20:46 UTC
Created attachment 25586 [details]
ACPI tables

ACPI tables are attached, the other stuff will follow later, as soon as I get time to crash my machine again.
Comment 7 Thomas Bächler 2010-03-20 16:06:41 UTC
Okay, I was unable to reproduce the precise bug. I could only reproduce that the display just didn't turn itself on again after closing and opening the LID several times, but kacpi_notify didn't hang this time. I am still attaching the acpi interrupts as you requested.
Comment 8 Thomas Bächler 2010-03-20 16:07:11 UTC
Created attachment 25620 [details]
ACPI interrupts before closing the LID
Comment 9 Thomas Bächler 2010-03-20 16:07:43 UTC
Created attachment 25621 [details]
ACPI interrupts after opening the LID
Comment 10 Thomas Bächler 2010-03-20 16:13:19 UTC
Okay, in the latest test (comment #7 to #9), I always got

Mar 20 17:09:13 evey kernel: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Mar 20 17:09:13 evey kernel: render error detected, EIR: 0x00000000
Mar 20 17:09:13 evey kernel: [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 33329 at 32515)

in my log repeatedly (about once a second).
Comment 11 Zhang Rui 2010-03-22 01:25:38 UTC
re-assign to the graphics experts.
Comment 12 ykzhao 2010-03-22 08:42:18 UTC
Will you please add the option of "nomodeset" and do the following test under the console mode?
    1. kill the process using /proc/acpi/event (use the command of "lsof /proc/acpi/event" to get the process id)
    2. cat /proc/acpi/event >lid_event
    3. close and reopen the LID several times
    
After the test, please attach the output of lid_event.

thanks.
Comment 13 Thomas Bächler 2010-03-27 12:04:09 UTC
Created attachment 25728 [details]
ACPI events

Attaching the ACPI events.

It seems that closing the LID very shortly does not generate an ACPI event, it has to stay closed for a while (around one second).
Comment 14 Thomas Bächler 2010-03-27 16:31:38 UTC
Another fun fact: When I open the LID, the display always enables itself quickly, then disables itself again. When I press a key, it usually enables again - at least that is the "normal" behaviour when I don't experience the bug.

Maybe this is related to the problem.
Comment 15 Thomas Bächler 2010-04-12 21:33:55 UTC
Any new ideas or possible tests on this? At the time of this writing, I am afraid to close the LID at all, as it is likely that I won't get my machine back without a reboot (I actually haven't closed the LID of this laptop for weeks, at least while it was on).

I know it doesn't help that the bug reproduces with different symptoms each time I try it, but unfortunately that's not up to me.
Comment 16 Thomas Bächler 2010-04-17 16:03:10 UTC
The problem is still present in 2.6.34. To reduce confusion, this is what I am seeing now:

1) When the bug occurs, I can now always reproduce the behaviour from comment #10.

The hanging kacpi_notify as I originally described does not occur at all anymore.

2) When I close the LID, ACPI reports it as closed only after a timeout of 3 seconds, it still shows as open before.

3) When I open the LID, the display enables, then disables again. In about 3/4 of the attempts, it will then enable again, on 1/4 of the attempts, the bug will occur.

4) When the bug occurs, the LID switch still works, it shows "open" and "closed" as expected.
Comment 17 Thomas Bächler 2010-04-17 16:20:51 UTC
Okay, now it gets interesting, I hope the following information will help to solve this bug.

I applyied the following (test) kernel patch to the latest Linus tree (2.6.34-rc4):

diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c
index 216e9f5..c429e6c 100644
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -1121,11 +1121,12 @@ out:
                pwm |= PWM_PCH_ENABLE;
                I915_WRITE(BLC_PWM_PCH_CTL1, pwm);
        }
-       dev_priv->lid_notifier.notifier_call = intel_lid_notify;
+       /*dev_priv->lid_notifier.notifier_call = intel_lid_notify;
        if (acpi_lid_notifier_register(&dev_priv->lid_notifier)) {
                DRM_DEBUG_KMS("lid notifier registration failed\n");
                dev_priv->lid_notifier.notifier_call = NULL;
-       }
+       }*/
+       dev_priv->lid_notifier.notifier_call = NULL;
        /* keep the LVDS connector */
        dev_priv->int_lvds_connector = connector;
        drm_sysfs_connector_add(connector);

The following behaviour occurs:
1) The display is still disabled when I close the LID.
2) The weird enable->disable->enable behaviour when reopening the LID is gone. Instead, the display only enables once and stays on.
3) The bug is gone as far as I can determine (closed/opened the LID a dozen times).

This seems like a race condition between the intel_lid_notify in intel_lvds.c and something the BIOS (or any other component) does. What would be a proper fix here?
Comment 18 Thomas Bächler 2010-04-17 16:50:45 UTC
Narrowing it down further: The problem is definitely caused by
        mutex_lock(&dev->mode_config.mutex);
        drm_helper_resume_force_mode(dev);
        mutex_unlock(&dev->mode_config.mutex);
in intel_lid_notify(). If I return NOTIFY_OK before that, the bug is gone as well.
Comment 19 Thomas Bächler 2010-04-18 13:35:50 UTC
The problem persists after upgrading to the latest ACPI/EC provided by Toshiba.

Also, with my modified 2.6.34-rc4, the display does not enable after resume from suspend to RAM, might be caused by my patch, or by something else, I didn't test that yet.
Comment 20 Thomas Bächler 2010-04-21 15:29:42 UTC
After testing this for a few days, I can confirm that the above workaround fixes the problem.

Does someone know what could interfere with drm_helper_resume_force_mode() to make it cause such breakage? Or how a proper fix would look like? I am pretty lost at this point, due to lack of knowledge of the i915 code.
Comment 21 Thomas Bächler 2010-05-04 23:51:48 UTC
Bump. Nobody replied to this bug report in way over a month, despite new and potentially useful information being added.
Comment 22 Thomas Bächler 2010-05-10 07:51:16 UTC
Bump. No response from the assignees in 7 weeks.
Comment 23 Zhang Rui 2010-05-10 07:55:51 UTC
ping Yakui, :p
Comment 24 Thomas Bächler 2010-05-17 06:50:52 UTC
Bump. No response in 8 weeks.
Comment 25 Thomas Bächler 2010-06-01 08:46:08 UTC
I hate to have to do this ... I'd really like to have a response on this bug, considering it's been 10 weeks and 1 day since the last one.

It'd be great if someone with knowledge about the i915 drm driver would comment my findings from comment 17/18 and tell me how to proceed further.
Comment 26 Chris Wilson 2010-07-24 09:21:11 UTC
We need to add your system to the intel_no_modeset_on_lid[] quirk table. Let me figure out how.
Comment 27 Chris Wilson 2010-07-24 09:24:28 UTC
In fact you've already done so! :)

commit 1073af33fdd4e960c70b828e899b1291b44f0b3d
Author: Thomas Bächler <thomas@archlinux.org>
Date:   Fri Jul 2 10:44:23 2010 +0200

    gpu/drm/i915: Add a blacklist to omit modeset on LID open
    
    On some machines (currently only the Toshiba Tecra A11 is known), the GPU
    locks up when modeset is forced on LID open. This patch adds a new DMI
    blacklist and omits modesetting for all matches.
    
    Fixes https://bugzilla.kernel.org/show_bug.cgi?id=15550
    
    Signed-off-by: Thomas Bächler <thomas@archlinux.org>
    Signed-off-by: Eric Anholt <eric@anholt.net>

currently in Eric Anholt's tree.
Comment 28 Thomas Bächler 2010-07-24 16:42:00 UTC
Any chance to get that merged before before 2.6.35? Linus announced a "last call" the other day, and there's two more patches on Eric's tree.