Bug 12904

Summary: hard lockup when connecting some external monitors unless acpi_osi="!Windows 2006" - Sony Vaio SR290
Product: ACPI Reporter: Nik A. Melchior (melchior+kernel)
Component: Power-VideoAssignee: Zhang Rui (rui.zhang)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, harestomper, lenb, rui.zhang, shaohua.li, something_for_the_pain, yakui.zhao
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: lspci -v
dmidecode
dmesg 2.6.29-rc6
dmesg 2.6.29-rc6 with acpi=off irqpoll
acpidump
lspci -vxxx
dmesg with acpi patch
Xorg.0.log
grep . /sys/firmware/acpi/interrupts/*
grep . /sys/firmware/acpi/interrupts/* with acpi_osi="!Windows 2006"
dmesg with acpi_osi="!Windows 2006"
customized DSDT: don't invoke SMI when connecting projector
sudo acpidump > acpi_dump
sudo dmidecode > dmi_decode
DMI patch: disable Vista compatibility
fix string matching Sony SR290 laptop
patch: disable vista compatibility for Sony VGN-NS50B_L

Description Nik A. Melchior 2009-03-20 11:43:19 UTC
Latest working kernel version: none
Earliest failing kernel version: 2.6.26
Distribution: Debian (unstable)
Hardware Environment: Sony Vaio SR290 with Mobile 4 Series Chipset Integrated Graphics (PCI ids: 8086:2a42 and 8086:2a43)
Software Environment: 64-bit Debian unstable
Problem Description:

This laptop locks up when certain external display devices are attached after boot.  Only overhead projectors seem to cause the lockup.  I can attach external LCD monitors at any time and use (e.g.) xrandr without problem.  Likewise, if the projector is attached before turning the laptop on, the display will be mirrored and it will work fine.  In this case, unplugging the projector will cause the lockup.

Steps to reproduce:

This problem occurs in X or at the console, with or without the i915 module loaded.  It occurs at the console with the framebuffer enabled or with vga=normal.  In the latter case, the cursor continues to blink, but the system doesn't respond to keystrokes.  Alt+SysRq doesn't work in any of these cases.
Comment 1 Nik A. Melchior 2009-03-20 11:44:11 UTC
Created attachment 20610 [details]
lspci -v

lspci -v
Comment 2 Nik A. Melchior 2009-03-20 11:53:49 UTC
Created attachment 20611 [details]
dmidecode
Comment 3 Nik A. Melchior 2009-03-20 11:56:46 UTC
Created attachment 20612 [details]
dmesg 2.6.29-rc6

dmesg output after starting X (without plugging any external monitors)
Comment 4 Anonymous Emailer 2009-03-20 14:16:17 UTC
Reply-To: akpm@linux-foundation.org

On Fri, 20 Mar 2009 11:43:19 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=12904
> 
>            Summary: hard lockup on Sony Vaio SR when connecting some
>                     external monitors
>            Product: Platform Specific/Hardware
>            Version: 2.5
>      KernelVersion: 2.6.29-rc6
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: x86-64
>         AssignedTo: platform_x86_64@kernel-bugs.osdl.org
>         ReportedBy: melchior+kernel@cmu.edu
> 
> 
> Latest working kernel version: none
> Earliest failing kernel version: 2.6.26
> Distribution: Debian (unstable)
> Hardware Environment: Sony Vaio SR290 with Mobile 4 Series Chipset Integrated
> Graphics (PCI ids: 8086:2a42 and 8086:2a43)
> Software Environment: 64-bit Debian unstable
> Problem Description:
> 
> This laptop locks up when certain external display devices are attached after
> boot.  Only overhead projectors seem to cause the lockup.  I can attach
> external LCD monitors at any time and use (e.g.) xrandr without problem. 
> Likewise, if the projector is attached before turning the laptop on, the
> display will be mirrored and it will work fine.  In this case, unplugging the
> projector will cause the lockup.
> 
> Steps to reproduce:
> 
> This problem occurs in X or at the console, with or without the i915 module
> loaded.  It occurs at the console with the framebuffer enabled or with
> vga=normal.  In the latter case, the cursor continues to blink, but the
> system
> doesn't respond to keystrokes.  Alt+SysRq doesn't work in any of these cases.
> 

Gee, I'm struggling to work out which kernel subsystem might be involved
here.  If anything I suppose it is ACPI.

Help?
Comment 5 Alan 2009-03-21 07:21:19 UTC
Boot with acpi=off and see if its repeatable, if it is then see if booting with irqpoll has any effect (I suspect it wont)
Comment 6 Nik A. Melchior 2009-03-21 12:34:13 UTC
With both options, the problem seems to go away.  I can plug and unplug the projector without lockups, and even use xrandr in X to play with resolutions and turn the outputs on and off.  The system seems to mostly freeze up (though the mouse pointer is still responsive) for several seconds with each xrandr command, but after a bit of flashing, the command always works.  I'll attach the dmesg output in case it is useful.

irqpoll alone has no effect.

The system does not finish booting with acpi=off alone.  It seems to have trouble talking to the hard drive, and eventually sits waiting for the root filesystem to appear in /dev/mapper.  If I plug in a projector at this point, the console still echoes my keystrokes, so I guess it is working.
Comment 7 Nik A. Melchior 2009-03-21 12:35:00 UTC
Created attachment 20620 [details]
dmesg 2.6.29-rc6 with acpi=off irqpoll
Comment 8 Zhang Rui 2009-03-27 01:09:21 UTC
boot w/o acpi=off, make sure ACPI video driver is loaded,
look for file /proc/acpi/video/*/DOS, and run echo 1 > DOS before connecting external display devices, is the problem still reproducible in this case?
Comment 9 ykzhao 2009-03-27 02:29:19 UTC
Hi, Nik
    Will you please confirm whether the hard lockup still occurs in console mode if the external monitor is hot-plugged? (Please don't load the i915 module driver). If still lockup, how about the lockup if the acpi video driver is not loaded?
    Will you please also attach the output of acpidump, lspci -vxxx?
    thanks.
Comment 10 Nik A. Melchior 2009-03-30 16:16:41 UTC
Zhang: Yes, the problem is still reproducible in that case.

ykzhao: My previous combinations of tests (comment #6) were all conducted first at the console, without the i915 driver.  With "acpi=off irqpoll", the console test worked, so I loaded the i915 module and found that it still worked in X.  Without these kernel options, the lockup occurs at the console without the i915 module and even without the acpi video module.
Comment 11 Nik A. Melchior 2009-03-30 16:18:19 UTC
Created attachment 20738 [details]
acpidump
Comment 12 Nik A. Melchior 2009-03-30 16:18:49 UTC
Created attachment 20739 [details]
lspci -vxxx
Comment 13 Zhang Rui 2009-03-31 01:27:02 UTC
please apply this patch, set CONFIG_DRM_I915 and rebuild your kernel.
make sure the ACPI video driver is loaded, is the problem fixed this time?
Comment 14 Zhang Rui 2009-03-31 01:27:55 UTC
(In reply to comment #13)
> please apply this patch, set CONFIG_DRM_I915 and rebuild your kernel.

http://patchwork.kernel.org/patch/13147/
Hah, this patch.
Comment 15 Nik A. Melchior 2009-03-31 15:17:09 UTC
Sorry, that patch doesn't help.  With the ACPI video module loaded, I tried plugging in the projector with and without the i915 module, using the framebuffer and vga=normal.  In all cases, the lockup still occurred.

I also noticed that, even thought the ACPI video module was loaded, /proc/acpi/video/ did not exist.  I will attach dmesg in case it gives you some hints.
Comment 16 Nik A. Melchior 2009-03-31 15:18:51 UTC
Created attachment 20760 [details]
dmesg with acpi patch

Here is dmesg with this patch applied:
http://patchwork.kernel.org/patch/13147/

With this patch, /proc/acpi/video no longer appears
Comment 17 Zhang Rui 2009-04-01 01:22:34 UTC
hah, you must make sure ACPI video driver is loaded AFTER i915 driver.
so please reload the acpi video driver manually after boot and make sure /proc/acpi/video/ exists before connecting the external display.
Comment 18 Nik A. Melchior 2009-04-01 20:59:36 UTC
I can't find a way to do that.  It seems that i915 depends on the video module.  If I try to manually insmod i915, it complains about an unknown symbol: acpi_video_register

modprobe, of course, helpfully loads 'video' to resolve the dependency when I ask it to load i915.
Comment 19 Shaohua 2009-04-08 01:30:11 UTC
please try latest intel Xorg gfx driver and attach the output of /var/log/Xorg.0.log here with the projector plugged in.
Comment 20 Nik A. Melchior 2009-04-10 16:02:20 UTC
Created attachment 20925 [details]
Xorg.0.log

kernel: 2.6.29-rc6 (without the acpi patch referenced above) KMS off
libdrm: git tag libdrm-2.4.6
intel X driver: git HEAD (7e516b)
Comment 21 Zhang Rui 2009-05-11 06:51:43 UTC
please try the latest git kernel, and set both CONFIG_ACPI_VIDEO=y, CONFIG_DRM_I915=Y and CONFIG_DRM_I915_KMS=y.
does the problem still exist?
Comment 22 Nik A. Melchior 2009-05-11 16:05:41 UTC
Yes, the problem still occurs with kernel 2.6.30-rc5 and Intel X driver 2.7.0.  It locks up in X with or without KMS enabled, and even at the console without the i915 module loaded (the ACPI video module is loaded, though).
Comment 23 Zhang Rui 2009-05-25 08:26:09 UTC
please attach the output of "grep . /sys/firmware/acpi/interrups/*"
Comment 24 Nik A. Melchior 2009-05-25 18:30:58 UTC
Created attachment 21543 [details]
grep . /sys/firmware/acpi/interrupts/*

output of
  grep . /sys/firmware/acpi/interrupts/*
after a fresh reboot of 2.6.30-rc5
Comment 25 Zhang Rui 2009-05-26 02:07:39 UTC
what if you boot with acpi_osi="!Windows 2006"?
Comment 26 Nik A. Melchior 2009-05-26 02:38:37 UTC
Created attachment 21555 [details]
grep . /sys/firmware/acpi/interrupts/* with acpi_osi="!Windows 2006"

Here's the interrupts info with that kernel option.  I don't have a projector to attach at the moment to see if the machine locks up, but I can try that tomorrow.
Comment 27 Nik A. Melchior 2009-05-26 18:06:34 UTC
Aha!  That seems to have helped.  No more lockups!  I was able to connect a projector in the console, and in X, and fiddle with xrandr repeatedly.  I'll attach dmesg.  Let me know if you need any other information.
Comment 28 Nik A. Melchior 2009-05-26 18:07:12 UTC
Created attachment 21560 [details]
dmesg with acpi_osi="!Windows 2006"
Comment 29 Zhang Rui 2009-05-27 07:10:43 UTC
I guess the lockup is caused by an SMI call.
this is the EC AML query method that I think is invoked when plugging the projector,
                    Method (_Q28, 0, NotSerialized)
                    {
                        P8XH (0x00, 0x28)
                        If (_OSI ("Windows 2006"))
                        {
                            If (LEqual (\_SB.PCI0.PEGP.VGA.ATID, 0xFFFF))
                            {
                                Notify (\_SB.PCI0.GFX0, 0x81)
                            }
                            Else
                            {
                                Store (0x8F, I_AL)
                                Store (0xA0, I_AH)
                                Store (0x00, I_BL)
                                Store (0x11, I_BH)
                                PHDD (0xE2, I20B)
                            }
                        }
                        Else
                        {
                            Notify (\_SB.PCI0.LPCB.SNC, 0x91)
                        }
                    }

If the laptop is running on Windows Vista and the external ATI graphics card is not used, a SMI is invoked.
But if it's not Vista, an event is sent to the sony platform device \_SB.PCI0.LPCB.SNC

this is a problem because it's hard to know why Linux reboots when invoking the SMI.
In order to confirm my conclusion, please apply the customized DSDT attached below, which I comment the SMI call in _Q28.
Comment 30 Zhang Rui 2009-05-27 07:12:34 UTC
Created attachment 21580 [details]
customized DSDT: don't invoke SMI when connecting projector
Comment 31 Zhang Rui 2009-05-27 07:13:25 UTC
note that you don't need the acpi_osi="!Windows 2006" any more after applying the custom DSDT.
Comment 32 Nik A. Melchior 2009-05-29 17:36:20 UTC
That worked.  I used the custom DSDT, removed the acpi_osi kernel parameter, and the lockups are gone.  Thanks for all your help!

Again, let me know if you need additional information or you want me to test anything else before this fix is committed.
Comment 33 Len Brown 2009-05-29 18:09:26 UTC
Linux claims Vista compatibility to the BIOS for a number of reasons,
but we are getting burnt by that on this Sony.

Unless we can figure out what Vista is doing for this platform,
I think we have to disable Vista compatibility vis DMI for this system.
Comment 34 something_for_the_pain 2009-06-16 20:49:51 UTC
I too see a similar problem - I have opened a bug in Launchpad, below is a brief overview of the details.

Based on the above comments it seems like there is a partial fix for this - do we know if/when this will be rolled into the kernel?


-------------------------------------------------------------------
Symptoms:
-------------------------------------------------------------------

Attaching an external monitor causes a complete lock-up - no response to pings, keyboard/mouse, etc.

Booting with a monitor connected shows no lock-up, however I get critical battery warnings (when the battery is full).


-------------------------------------------------------------------
System:
-------------------------------------------------------------------

Manufacturer:     Sony
Model:            Vaio VGN-NS10J
OS:               Ubuntu 8.10
Kernel:           Linux newton 2.6.27-14-generic #1 SMP Fri Jun 5 10:22:01 UTC 2009 x86_64 GNU/Linux


-------------------------------------------------------------------
Launchpad:
-------------------------------------------------------------------

Additional details can be found in Launchpad - please let me know if you require any more details.

ID:               319717
Link:             https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/319717
Comment 35 something_for_the_pain 2009-06-16 21:08:49 UTC
Just to confirm - adding acpi_osi="!Windows 2006" fixes the problem for me also.
Comment 36 Zhang Rui 2009-06-17 02:01:21 UTC
please attach your acpidump as well.
Comment 37 something_for_the_pain 2009-06-17 18:18:11 UTC
Created attachment 21968 [details]
sudo acpidump > acpi_dump

This is the output of acpidump having booted with the acpi_osi="!Windows 2006" bootarg.
Comment 38 Zhang Rui 2009-06-18 01:48:33 UTC
right, this is the same problem. hard lockup in SMM.

please attach your dmidecode output,
let's disable the Vista compatibility for these two laptops in Linux.
Comment 39 something_for_the_pain 2009-06-18 18:27:25 UTC
Created attachment 21997 [details]
sudo dmidecode > dmi_decode


This is the output of dmidecode having booted with the acpi_osi="!Windows 2006"
bootarg.
Comment 40 Zhang Rui 2009-06-19 01:57:09 UTC
Created attachment 22000 [details]
DMI patch: disable Vista compatibility

please remove acpi_osi="!Windows 2006", apply this patch and see if it helps.
Comment 41 something_for_the_pain 2009-06-19 18:02:17 UTC
To what file do I apply this patch?
Comment 42 Zhang Rui 2009-06-22 01:20:33 UTC
you just need to
1. enter the root directory of the Linux kernel source
2. copy the patch to this directory
3. run "patch -p1 < patch"
Comment 43 Len Brown 2009-06-24 03:31:17 UTC
patch in comment #40 applied to acpi-test
Comment 44 Len Brown 2009-06-25 16:06:05 UTC
shipped in linux-2.6.31-rc1
closed
Comment 45 Nik A. Melchior 2010-05-02 00:24:22 UTC
Created attachment 26195 [details]
fix string matching Sony SR290 laptop

I'm sorry I never tested the fix released with 2.6.31, but it seems that the patch doesn't quite detect my laptop model.  This patch indicates the necessary change, which you should be able to verify from my earlier 'dmidecode' attachment.
Comment 46 Voldemar 2010-05-16 12:44:50 UTC
I have the same problem. Kernel option 'acpi_osi="!Windows 2006' and 'acpi=off irqpoll' in my case are not actions. With kernel 2.6.31 I get a kernel panic, and kernel 2.6.33 boot goes normally, but hangs when connect an external display as well and remains. Also hanging can be seen in singlemode. after connecting/disconnecting the external display. Laptop Sony Vaio VGN-NS50B. What information about the system I have yet to add?
Thanks!

[url=http://paste.ubuntu.com/434346/]Xorg.0.log_with_2.6.31-6_acpi_off[/url]
[url=http://paste.ubuntu.com/434347/]Xorg.0.log_with_2.6.33-ARCH_acpi_off[/url]
[url=http://paste.ubuntu.com/434348/]dmesg_2.6.31-6_with_acpi_off[/url]
[url=http://paste.ubuntu.com/434350/]dmesg_2.6.33-ARCH_with_acpi_off[/url]
[url=http://paste.ubuntu.com/434351/]dmi_decode[/url]
[url=http://paste.ubuntu.com/434352/]lspci -v[/url]
[url=http://paste.ubuntu.com/434353/]lspci -vxxx[/url]
Comment 47 Zhang Rui 2010-05-17 07:33:46 UTC
please verify if the problem still exists in the latest vanilla kernel (with boot option acpi_osi="!Windows 2006").

BTW, the links above don't work, please make a double check.
Comment 48 Voldemar 2010-05-17 08:54:58 UTC
Sorry for the link. BB-code not working.
Vanilla kernel is now making. Post the links again.

Xorg.0.log_with_2.6.31-6_acpi_off      http://paste.ubuntu.com/434346/
Xorg.0.log_with_2.6.33-ARCH_acpi_off   http://paste.ubuntu.com/434347/
dmesg_2.6.31-6_with_acpi_off           http://paste.ubuntu.com/434348/
dmesg_2.6.33-ARCH_with_acpi_off        http://paste.ubuntu.com/434350/
dmi_decode       http://paste.ubuntu.com/434351/
lspci -v         http://paste.ubuntu.com/434352/
lspci -vxxx      http://paste.ubuntu.com/434353/
Comment 49 Voldemar 2010-05-17 11:31:00 UTC
Maked vanilla kernel 6.2.34. Kernel option 'acpi_osi = "! Windows 2006"' does not have work for me. When connecting / disconnecting an external display kernel hangs.

dmesg for kernel 2.6.34 with kernel option acpi_osi = "!Windows 2006"   http://paste.ubuntu.com/434894/

What information is needed from me?
Thanks!
Comment 50 Zhang Rui 2010-05-18 01:31:37 UTC
Created attachment 26416 [details]
patch: disable vista compatibility for Sony VGN-NS50B_L

this seems to be an grub2 bug http://savannah.gnu.org/bugs/?27641
Please try the patch attached without the acpi_osi="!Windows 2006" boot option.
Comment 51 Voldemar 2010-05-18 07:36:11 UTC
Yes. That's correct.
Please tell me if you need more information or you want me to test anything
Thank you very much!
Comment 52 Zhang Rui 2010-05-18 08:13:40 UTC
did you test the patch?
is the problem fixed by the patch in comment #50?
Comment 53 Voldemar 2010-05-18 09:49:34 UTC
Yes. I applied a patch from comment #50 and rebuild the kernel. It works. Hot connecting/disconnecting an external display without problems.
Adds a link to a new dmesg
http://paste.ubuntu.com/435429/
Comment 54 Zhang Rui 2010-06-08 07:25:48 UTC
Len, please apply the patch in comment #50.
Comment 55 Nik A. Melchior 2010-06-19 18:31:22 UTC
Please don't forget the small string change in comment #45.  Thanks!
Comment 56 Zhang Rui 2010-06-21 06:45:40 UTC
patch has been sent to ACPI mail list. :)
https://patchwork.kernel.org/patch/107123/
Comment 57 Len Brown 2010-09-28 22:02:41 UTC
commit 096486eece7ef38cf1ee46b704482c75c4010fb1
Author: Nik A. Melchior <melchior+kernel@cmu.edu>
Date:   Mon Jun 21 12:47:05 2010 +0800

    ACPI video: fix string mismatch for Sony SR290 laptop


shipped in v2.6.35-rc2-19-g096486e
closed