Bug 13148

Summary: resume after suspend-to-ram broken on Sony Vaio VGN-SR19VN when sony-laptop driver present
Product: Drivers Reporter: fanderay (fanderay4)
Component: PlatformAssignee: Rafael J. Wysocki (rjw)
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: florian, jnm11, krummas, lenb, loppituu, malattia, rjw, rui.zhang, saintiss, vyacheslavovich
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 2.6.30rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 12398, 56331    
Attachments: dmesg output
lspci -vvv -xxx output
dsdt decode
resume failure bisection log
acpidump
customized DSDT: using fake _DOS
dmesg-acpi-debug
dmesg-acpi-debug after s2ram and resume
PCI: Clear saved_state after the state has been restored
ACPI: Always try to get control of the PCIe capability structure from the BIOS
ACPI / sony-laptop: Suspend late and resume early
do not initialize SNC
disable hot keys setup
do sony resume in pm notifier chain

Description fanderay 2009-04-22 14:39:30 UTC
Hardware: Sony Vaio VGN-SR19VN

In 2.6.30rc2, resume after suspend-to-ram on a Vaio SR19VN with the sony-laptop driver compiled into the kernel results immediately in a hard freeze: the screen stays off/black and the system appears completely hung, e.g. pressing Caps Lock does not cycle the keyboard LED.

Steps to reproduce:

1. Build kernel with ACPI and sony-laptop driver compiled in (CONFIG_SONY_LAPTOP=y)
2. Boot with init=/bin/bash
3. Run s2ram -f -p to suspend
4. Attempt to resume as usual by pressing the power button.

If the kernel is built without sony-laptop, step (4) succeeds.
Comment 1 Rafael J. Wysocki 2009-04-22 20:51:50 UTC
2.6.29 didn't have this problem, did it?
Comment 2 Zhang Rui 2009-04-23 02:31:42 UTC
re-assign to Mattia
Comment 3 fanderay 2009-04-23 10:37:06 UTC
This problem does also occur in 2.6.29.
Comment 4 Mattia Dongili 2009-04-23 15:26:33 UTC
Could you attach a boot log?
Comment 5 Mattia Dongili 2009-04-23 15:29:57 UTC
Sorry, also lspci and DSDT.
Thanks
Comment 6 fanderay 2009-04-23 16:56:43 UTC
Created attachment 21094 [details]
dmesg output
Comment 7 fanderay 2009-04-23 17:00:25 UTC
Created attachment 21095 [details]
lspci -vvv -xxx output
Comment 8 fanderay 2009-04-23 17:01:36 UTC
Created attachment 21096 [details]
dsdt decode
Comment 9 Rafael J. Wysocki 2009-04-25 11:16:33 UTC
Handled-By : Mattia Dongili <malattia@linux.it>
Comment 10 Len Brown 2009-04-28 01:33:32 UTC
is CONFIG_SONY_LAPTOP=y important,
or do you see the failure also with CONFIG_SONY_LAPTOP=m ?
(I assume you see no failure with CONFIG_SONY_LAPTOP=n)

Can you git-bisect what between 2.6.29 and the 2.6.30-rc
caused the regression?
Comment 11 fanderay 2009-04-28 10:17:25 UTC
"m" behaves the same as "y" when the module is loaded.

As mentioned above, this problem does occur also in 2.6.29 (but not in 2.6.26).  I'm not really familiar with git, I'm afraid.
Comment 12 Mattia Dongili 2009-04-30 08:27:55 UTC
fanderay,

can you also test 2.6.30-rc4 (there is some new code that might help) and 2.6.28 (to try to narrow down when the breakage started).

thanks
Comment 13 fanderay 2009-04-30 10:40:14 UTC
Hi Mattia,

The same problem still occurs in rc4.  It does not occur in 2.6.28, so it seems to have broken between .28 and .29.
Comment 14 Mattia Dongili 2009-04-30 11:12:04 UTC
Really, not much happened on sony-laptop between .28 and .29:

$ git log v2.6.28..v2.6.29 -- drivers/misc/sony-laptop.c drivers/platform/x86/sony-laptop.c
commit d97c0defba25a959a990f6d4759f43075540832e
Merge: ec9f168 b4f9fe1
Author: Len Brown <len.brown@intel.com>
Date:   Fri Jan 9 04:01:26 2009 -0500

    Merge branch 'drivers-platform' into release
    
    Conflicts:
        drivers/misc/Kconfig
    
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 30823736162ff91512965e3c730557e34fa71d6d
Author: Lin Ming <ming.m.lin@intel.com>
Date:   Tue Dec 16 16:59:35 2008 +0800

    ACPI: sony-laptop.c: call acpi_get_object_info to get node info
    
    Avoid using internal acpica structures acpi_namespace_node and acpi_operand_object
    Call acpi_get_object_info to get node ascii name and method arg count
    
    Signed-off-by: Lin Ming <ming.m.lin@intel.com>
    Signed-off-by: Len Brown <len.brown@intel.com>

commit 41b16dce390510f550a4d2b12b98e0258bbed6e2
Author: Len Brown <len.brown@intel.com>
Date:   Mon Dec 1 00:09:47 2008 -0500

    create drivers/platform/x86/ from drivers/misc/
    
    Move x86 platform specific drivers from drivers/misc/
    to a new home under drivers/platform/x86/.
    
    The community has been maintaining x86 vendor-specific
    platform specific drivers under /drivers/misc/ for a few years.
    The oldest ones started life under drivers/acpi.
    They moved out of drivers/acpi/ because they don't actually
    implement the ACPI specification, but either simply
    use ACPI, or implement vendor-specific ACPI extensions.
    
    In the future we anticipate...
    drivers/misc/ will go away.
    other architectures will create drivers/platform/<arch>
    
    Signed-off-by: Len Brown <len.brown@intel.com>
(END) 

So the only real change that touched the driver was 30823736162ff91512965e3c730557e34fa71d6d which just modifies sony_walk_callback that is not even called on resume.
Comment 15 fanderay 2009-04-30 20:06:42 UTC
I learned how to use bisect.  The first bad commit, according to this exercise, was:

----------------------------------------------------------------
commit b424e8d3b438e841cd1700f6433a100a5d611e4a
Merge: 7c7758f f6dc1e5
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Wed Jan 7 15:41:01 2009 -0800

    Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci
----------------------------------------------------------------

The bisection process was however complicated by the presence of another suspend problem which appeared intermittently between .28 and .29.  I'll call this Problem 1 since it seems to have appeared first chronologically, and the original resume problem Problem 2.  The symptoms of Problem 1 are a system hang during the suspend process, immediately after the line "Suspending console(s) (use no_console_suspend to debug)" is printed.  The screen stays alive but in all other respects the system is hard-frozen, e.g. Caps Lock does not cycle the keyboard LED.  The system never actually suspends and must be physically powered off.

The first thing I did was to bisect Problem 1.  That yielded this first bad commit:

----------------------------------------------------------------
commit 9ea09af3bd3090e8349ca2899ca2011bd94cda85
Author: Heiko Carstens <heiko.carstens@de.ibm.com>
Date:   Mon Dec 22 12:36:30 2008 +0100

    stop_machine: introduce stop_machine_create/destroy.
----------------------------------------------------------------

After that I repeated the original bisection but this time marked Problem 1 points as good.  That yielded the first bad commit for Problem 2 mentioned at the beginning.

It's interesting to note that at the second to last bisection point
(7c7758f99d39d529a64d4f60d22129bbf2f16d74) both suspend and resume were successful.  Then at the last bisection point (f6dc1e5e3d4b523e1616b43beddb04e4fb1d376a) Problem 1 reappeared.  Finally at b424e8d3b438e841cd1700f6433a100a5d611e4a Problem 2 took over.  The bisection log is attached for reference.
Comment 16 fanderay 2009-04-30 20:09:13 UTC
Created attachment 21173 [details]
resume failure bisection log
Comment 17 Rafael J. Wysocki 2009-05-13 10:33:38 UTC
Not-Handled-By : Mattia Dongili <malattia@linux.it>
Notify-Also : Mattia Dongili <malattia@linux.it>

Fanderay, can you please check if the kernel where commit e8c331e963c58b83db24b7d0e39e8c07f687dbc6 is the head works for you correctly?
Comment 18 fanderay 2009-05-14 21:27:53 UTC
Hi Rafael,

The kernel with HEAD e8c331e... exhibits Problem 1, i.e. the suspend itself fails.
Comment 19 Rafael J. Wysocki 2009-05-15 21:20:03 UTC
Thanks for testing!

Can you also test the kernel where the head is commit 9eff02e2042f96fb2aedd02e032eca1c5333d767, please?
Comment 20 fanderay 2009-05-16 09:32:26 UTC
HEAD 9eff02e... also yields a Problem 1 kernel.
Comment 21 Rafael J. Wysocki 2009-05-16 20:11:52 UTC
I'm out of ideas.

Could you carry out a bisection between 9eff02e and the last known good kernel (presumably 2.6.28) ?
Comment 22 fanderay 2009-05-16 22:03:13 UTC
Will that actually be of any help?  It would seem that at best it would give more information about Problem 1, but that was fixed long ago anyway.
Comment 23 Rafael J. Wysocki 2009-05-16 22:11:48 UTC
Ah, sorry, my bad.

In that case the bisection isn't really going to help.

We'll need to figure out what happens to the sony-laptop driver during suspend in your box, then.

Mattia, do you have any ideas how to debug this?
Comment 24 Mattia Dongili 2009-05-17 08:54:54 UTC
Not really actually, I suspect some bad interaction with something else.
What I'd do though is commenting out parts of the resume function and see which bit causes the problem.

Fanderay,
There are four blocks in the sony_nc_resume function in 2.6.29, try to comment them out. All of them first and then uncomment them one by one from top to bottom.
If you build sony-laptop as module the process will be faster.

thanks
Mattia
Comment 25 fanderay 2009-05-17 09:38:01 UTC
Hi Mattia,

Good call.  I did as you suggested and the results seem to point to this block as the problem:

        /* set the last requested brightness level */
        if (sony_backlight_device &&
                        !sony_backlight_update_status(sony_backlight_device))
                printk(KERN_WARNING DRV_PFX "unable to restore brightness level\n");
Comment 26 Mattia Dongili 2009-05-17 13:09:14 UTC
Hah! and now why has this stopped working?
Let's ask on linux-acpi and see if anyone has have any clue.
Comment 27 Zhang Rui 2009-05-18 01:19:47 UTC
what if you comment these two lines?
can the laptop come back?
Comment 28 fanderay 2009-05-18 12:01:10 UTC
Yes, without those lines the resume succeeds; with them it fails (tested again with 30rc6).
Comment 29 Mattia Dongili 2009-05-18 13:57:23 UTC
Zhang,
Those two lines have been there since the dawn of times, could it be that some other change in ec.c (?) had a bad effect on restoring the brightness?

Is it worth maybe bisecting limiting the scope to drivers/acpi ?

Thanks,
Mattia
Comment 30 Zhang Rui 2009-05-19 06:29:50 UTC
please attach the acpidump output.

(In reply to comment #29)
> Is it worth maybe bisecting limiting the scope to drivers/acpi ?
> 

sounds reasonable.

Fanderay,
can you please git bisect drivers/acpi to see which commit introduces this regression?
Comment 31 fanderay 2009-05-19 09:08:07 UTC
Created attachment 21426 [details]
acpidump
Comment 32 fanderay 2009-05-19 09:18:57 UTC
The problem with the bisections is the interference from Problem 1 between .28 and .29.  Since it's a suspend failure, it is impossible to tell for any bisection point that hits it whether resume is also failing.  Can you think of any way around this?

Another note: unfortunately it appears that resume failures are caused by more than just the backlight restore.  I'm currently running a 30rc6 kernel with the backlight lines removed; this seems to allow the resume to succeed when booting with init=/bin/bash, but today when I tried to resume under normal conditions (after suspend from vesafb console, X running on another vt) the resume produced a hardlock just as before.  I haven't yet looked into this further to see whether X, fbcon, etc. make any difference - one thing at a time...
Comment 33 Zhang Rui 2009-05-20 01:31:20 UTC
>        if (sony_backlight_device &&
>                        !sony_backlight_update_status(sony_backlight_device))


ACPI video backlight control methods are available on this laptop, which means that sony_backlight_device is NULL. So commenting the above two lines should have no effect...
please attach the output of "grep . /sys/class/backlight/*/*"

Plus, would you please change the above two lines to:
if (sony_backlight_device)
    if(!sony_backlight_update_status(sony_backlight_device))
and see if this helps.
I run into such kind of problems before.
Comment 34 fanderay 2009-05-20 09:37:41 UTC
% grep . /sys/class/backlight/*/*
/sys/class/backlight/sony/actual_brightness:6
/sys/class/backlight/sony/bl_power:0
/sys/class/backlight/sony/brightness:6
/sys/class/backlight/sony/max_brightness:7
%
Comment 35 fanderay 2009-05-20 10:54:31 UTC
In 30rc6 the lines are slightly different:

        if (sony_backlight_device &&
                        sony_backlight_update_status(sony_backlight_device) < 0)
                printk(KERN_WARNING DRV_PFX "unable to restore brightness level\n");

If I change this to

        if (sony_backlight_device)
                if (sony_backlight_update_status(sony_backlight_device) < 0)
                        printk(KERN_WARNING DRV_PFX "unable to restore brightness level\n");

then the resulting object file is byte-for-byte identical with the original one according to cmp(1), so I guess it doesn't make a difference...
Comment 36 Zhang Rui 2009-05-21 01:41:04 UTC
there are two set of ACPI video backlight control methods available.
One is for the integrated Intel graphics which is not available on this laptop.
Another one is for the external ATI graphics, but unfortunately the _DOS method is not implemented, which results in no ACPI backlight control.

So a simple solution is that enabling the ACPI backlight control on this laptop instead of Sony platform specific methods, like the other sony laptops, so that we will not invoke sony_backlight_update_status any more.

But as you said, this is a regression, we'd better root cause the problem.
could you please run git-bisect to find out which commit introduces this regression?
Comment 37 fanderay 2009-05-21 10:05:08 UTC
As I explained in Comments # 15 and 32, interference from another problem makes bisection unreliable.  If you have a way around this then I'll try again.

I'm also concerned that removing the backlight_update_status call in sony_nc_resume is not solving the problem completely (Comment #32); is there a patch for "enabling the ACPI backlight control on this laptop
instead of Sony platform specific methods" that I can try to see if it helps?
Comment 38 Zhang Rui 2009-05-22 02:10:14 UTC
(In reply to comment #37)
> As I explained in Comments # 15 and 32, interference from another problem
> makes
> bisection unreliable.  If you have a way around this then I'll try again.
> 
sorry I forgot this.

> I'm also concerned that removing the backlight_update_status call in
> sony_nc_resume is not solving the problem completely (Comment #32); is there
> a
> patch for "enabling the ACPI backlight control on this laptop
> instead of Sony platform specific methods" that I can try to see if it helps?
no, there is no such kind of patch,
the way to fix it is using a customized DSDT.
I'll attach the DSDT later.
Comment 39 Zhang Rui 2009-05-22 02:11:57 UTC
Created attachment 21479 [details]
customized DSDT: using fake _DOS
Comment 40 Zhang Rui 2009-05-22 02:15:38 UTC
hmm, can you make sure sony_backlight_update_status is also invoked with the same parameter in 2.6.28?
Comment 41 Mattia Dongili 2009-05-22 07:56:22 UTC
(In reply to comment #40)
> hmm, can you make sure sony_backlight_update_status is also invoked with the
> same parameter in 2.6.28?

Zhang,

see comment #14, there has been almost no change to sony-laptop between .28 and .29, the only change is related to code that is run at load time when the driver lists the available methods in the SNC device definition.
Comment 42 fanderay 2009-05-22 11:40:28 UTC
Results with Zhang's custom DSDT:

1. Boot with init=/bin/bash: resume succeeds.
2. Boot normally: resume fails with a hardlock.

These are the same results as those obtained by commenting out the call to sony_backlight_update_status in sony_nc_resume.

Note: The "normal boot" case for me means booting to standard multiuser mode with no X and an ordinary VGA console (no framebuffer).  This is a Debian system.

One minor additional piece of data about the failure case with the custom DSDT: after the hardlock occurs and the power is physically cycled, the system comes up to the BIOS boot screen with the LCD brightness set to 0.  This doesn't happen in other cases.
Comment 43 Zhang Rui 2009-05-25 08:06:02 UTC
could you please set CONFIG_ACPI_DEBUG,
rebuilt with the custom DSDT,
boot with init=/bin/bash, acpi.debug_layer=0xffffffff, acpi.debug_level=0x07
and attach the dmesg output after boot.
Comment 44 fanderay 2009-05-25 13:03:57 UTC
Created attachment 21532 [details]
dmesg-acpi-debug
Comment 45 Rafael J. Wysocki 2009-05-25 23:20:12 UTC
On Monday 25 May 2009, Mattia Dongili wrote:
> On Sun, May 24, 2009 at 09:11:50PM +0200, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> > 
> > The following bug entry is on the current list of known regressions
> > from 2.6.29.  Please verify if it still should be listed and let me know
> > (either way).
> > 
> > 
> > Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=13148
> > Subject             : resume after suspend-to-ram broken on Sony Vaio
> VGN-SR19VN when sony-laptop driver present
> > Submitter   : fanderay <fanderay4@googlemail.com>
> > Date                : 2009-04-22 14:39 (33 days old)
> 
> sorry for not mentioning this before, but it looks like this regression
> was introduced between .28 and .29
Comment 46 Zhang Rui 2009-05-26 02:01:44 UTC
(In reply to comment #43)
> could you please set CONFIG_ACPI_DEBUG,
> rebuilt with the custom DSDT,
> boot with init=/bin/bash, acpi.debug_layer=0xffffffff, acpi.debug_level=0x07
> and attach the dmesg output after boot.

oops, what I really want is the dmesg output after resume,
can you reattach that info please?
Comment 47 fanderay 2009-05-26 09:19:51 UTC
Created attachment 21557 [details]
dmesg-acpi-debug after s2ram and resume
Comment 48 Zhang Rui 2009-06-18 09:22:01 UTC
sorry, the DSDT in comment #39 is way off the target.

According to the comment #15,
you don't have any suspend/resume problem before f6dc1e5e3d4b523e1616b43beddb04e4fb1d376a is applied, is that true?
Comment 49 fanderay 2009-06-30 09:29:51 UTC
I can't really say anything more beyond that comment and the attached bisection log.  Also since all of those tests were carried out only with init=/bin/bash, even the "working" cases should be viewed with a grain of salt in light of Comment #42.
Comment 50 fanderay 2009-07-03 13:33:33 UTC
I just moved to .30.1 and something significant seems to have changed: now resume succeeds when booting with init=/bin/bash, although it still hardlocks on a normal boot ("normal" as described in Comment #42).  Perhaps some module is causing problems.
Comment 51 Mattia Dongili 2009-07-03 14:54:22 UTC
Could it be some resume scriptlet in the suspend utilities from Debian?

Also, out of curiosity (I don't think it has been tried before), does booting with "acpi_backlight=vendor" help?

Thanks
-- mattia
Comment 52 fanderay 2009-08-01 10:44:11 UTC
Hi Mattia,

Moved to .30.4 and results are the same.  It doesn't seem like it would be a script, since things appear to hardlock immediately before even hardware-level functions are restored (after pressing the resume button, the hard drive light comes on for a moment, then goes off, and the screen stays black, keyboard input is not recognized and caps lock does not even cycle the keyboard LED), but anyway I killed acpid which is the driver for such scripts and it didn't change anything.  Nor does booting with acpi_backlight=vendor (wouldn't this mean to use the sony rather than generic ACPI backlight control? if so, that's what happens on this system anyway even without this option.)
Comment 53 fanderay 2009-08-02 11:51:09 UTC
(Er, just realized I was being daft in Comment #50 - I'd simply built with sony-laptop as a module, so naturally resume worked with init=/bin/bash since it wasn't loaded.  The real state of things is still as summarized in Comment #42.)
Comment 54 Marcus 2009-08-18 13:54:36 UTC
Any progress on this? Do you need any testing done? I have a vgn-sr19vn
Comment 55 fanderay 2009-08-26 12:41:16 UTC
Hi Marcus,

I think it would be hugely useful if you have a SR19VN that you can help test with!  The most basic question is, does resume after s2ram work for you on that system?
Comment 56 Zhang Rui 2009-09-03 06:51:31 UTC
Hi, Mattia, what's the status of this bug?
Comment 57 Marcus 2009-09-03 07:33:29 UTC
(In reply to comment #55)
> Hi Marcus,
> 
> I think it would be hugely useful if you have a SR19VN that you can help test
> with!  The most basic question is, does resume after s2ram work for you on
> that
> system?

It is not working - blank screen and i have to hard reboot
Comment 58 Zhang Rui 2009-09-15 05:34:07 UTC
*** Bug 13225 has been marked as a duplicate of this bug. ***
Comment 59 Ari Alanko 2009-09-16 10:51:08 UTC
Hi,

Suspend to ram worked prior 2.6.27.12 on my VGN-SR19vn laptop. Git bisect tells me, that
5f94bb6eda87dcd0136ceb52b62c03ebbb651443 is first bad commit
commit 5f94bb6eda87dcd0136ceb52b62c03ebbb651443
Author: Rafael J. Wysocki <rjw@sisk.pl>
Date:   Wed Jan 14 00:38:27 2009 +0100

    PCI: Suspend and resume PCI Express ports with interrupts disabled
    
    commit 90d25f246ddefbb743764f8d45ae97e545a6ee86 upstream

SELinux breaks sony-laptop-module in 2.6.27.11 kernel with Fedora 11.
I tested using insructions from description and i can test patches.
Comment 60 Rafael J. Wysocki 2009-09-16 21:37:17 UTC
I'm not sure at all if this is the same bug and I wouldn't expect this particular commit to cause problem.

I have an idea to test, but you'll need to test 2.6.31 before that.  So, please try 2.6.31 and let me know if that works.
Comment 61 Ari Alanko 2009-09-18 17:21:31 UTC
Hi,

2.6.31 didn't works.
Comment 62 Rafael J. Wysocki 2009-09-18 20:44:38 UTC
Created attachment 23116 [details]
PCI: Clear saved_state after the state has been restored

Please try this patch on top of 2.6.31 and see if it changes anything.
Comment 63 Ari Alanko 2009-09-19 12:46:20 UTC
(In reply to comment #62)
> Created an attachment (id=23116) [details]
> PCI: Clear saved_state after the state has been restored
> 
> Please try this patch on top of 2.6.31 and see if it changes anything.

Nope, didn't work. No changes.
Comment 64 Rafael J. Wysocki 2009-09-21 21:49:52 UTC
Created attachment 23134 [details]
ACPI: Always try to get control of the PCIe capability structure from the BIOS

Please try this patch too (preferably along with the previous one).
Comment 65 Ari Alanko 2009-09-27 11:00:11 UTC
(In reply to comment #64)
> Created an attachment (id=23134) [details]
> ACPI: Always try to get control of the PCIe capability structure from the
> BIOS
> 
> Please try this patch too (preferably along with the previous one).

I try with and without #62 on 2.6.31. Didn't work. I also tested both,#62 and #64, on 2.6.31.1 and no changes.
Comment 66 Rafael J. Wysocki 2009-09-27 19:16:55 UTC
Hmm.   Mattia, what's Sony PIC?
Comment 67 Rafael J. Wysocki 2009-09-27 20:38:44 UTC
Created attachment 23193 [details]
ACPI / sony-laptop: Suspend late and resume early

If it is what I think it is, it likely should be resumed before the PCIe ports.

Please try this patch (on top of 2.6.31) and report back (I couldn't test it).
Comment 68 Ari Alanko 2009-09-30 08:05:23 UTC
(In reply to comment #67)
> Created an attachment (id=23193) [details]
> ACPI / sony-laptop: Suspend late and resume early
> 
> If it is what I think it is, it likely should be resumed before the PCIe
> ports.
> 
> Please try this patch (on top of 2.6.31) and report back (I couldn't test
> it).

Tested patch on top of clean 2.6.31 and didn't work.
Comment 69 Rafael J. Wysocki 2009-09-30 21:06:47 UTC
Too bad.

I'm out of ideas for now.  At least I can't figure out in what way sony-laptop may depend on PCIe root ports and vice versa.  If anyone can explain that to me, I'll appreciate it very much.
Comment 70 Rafael J. Wysocki 2009-09-30 21:11:34 UTC
Ari, can you please check if the problem is present in the kernel where mainline commit c70e0d9dfef3d826c8ae4f7544acc53887cb161d is the head (ie. download the Linus' git tree, do 'git checkout c70e0d9dfef3d826c8ae4f7544acc53887cb161d', compile and install the resulting kernel and see if resume works)?
Comment 71 Ari Alanko 2009-10-05 21:27:55 UTC
(In reply to comment #70)
> Ari, can you please check if the problem is present in the kernel where
> mainline commit c70e0d9dfef3d826c8ae4f7544acc53887cb161d is the head (ie.
> download the Linus' git tree, do 'git checkout
> c70e0d9dfef3d826c8ae4f7544acc53887cb161d', compile and install the resulting
> kernel and see if resume works)?

Hi, suspending halt/crash somewhere between suspendin console(s)-text and shutting off display, capslock-led don't work. I tested with and without sony-laptop module.
Comment 72 Rafael J. Wysocki 2009-10-05 22:23:27 UTC
So, I'm not sure the results of -stable bisection can be applied to the mainline.  Let's go back to the top of the git, then.

Can you confirm that suspend/resume works with 2.6.32-rc3 without sony-laptop and doesn't work with sony-laptop?
Comment 73 Ari Alanko 2009-10-06 19:59:21 UTC
(In reply to comment #72)
> So, I'm not sure the results of -stable bisection can be applied to the
> mainline.  Let's go back to the top of the git, then.
> 
> Can you confirm that suspend/resume works with 2.6.32-rc3 without sony-laptop
> and doesn't work with sony-laptop?

Tested and confirmed.
Comment 74 Rafael J. Wysocki 2009-10-06 21:00:34 UTC
Thanks.

So, it looks like we need to focus on sony-laptop and find out why it causes resume to fail.

I'm not really familiar with this driver, so I need some time to understand how it works.
Comment 75 Mattia Dongili 2009-10-06 22:22:41 UTC
Rafael, sony-laptop doesn't do much on resume, really.

My understanding was that resuming succeeds with init=/bin/bash (is the case for single user mode too?). Is it still true?

Could it be that it's not really the resume process that triggers the bug but some script poking at the sysfs files from sony-laptop?

I sent a couple of patches that are now in the acpi-test tree, they are on the linux-acpi list, this one specifically could help:
http://www.spinics.net/lists/linux-acpi/msg24620.html
Could you give it a try?

Thanks
Comment 76 Rafael J. Wysocki 2009-10-06 22:55:25 UTC
(In reply to comment #75)
> Rafael, sony-laptop doesn't do much on resume, really.

I know, but apparently something it does affects suspend/resume.  To be precise, it probably affects the BIOS which then triggers the issue.

Of course, it also is possible that one of the sony-laptop scripts does something wrong.  To rule this out, can you please advise Ari how to disable those scripts without unloading the sony-laptop driver?

Ari, did you try hibernation on this box?
Comment 77 Mattia Dongili 2009-10-07 23:00:54 UTC
I don't know of any sony-laptop specific scripts, I was more thinking of the distribution specific resume callbacks that restore brightness or rfkill statuses.
But I just re-read the whole thread and it looks like booting with init=/bin/bash actually never succeeded. What I was going to suggest if it was working was to resume and then poke at the files manually to see if anything there is causing the failure.

Rafael, apologies for missing your previous question at #66, Sony PIC is pci device that was used to control most of the special features on sony laptops. The SR series is not equipped with it so only the SNC (SNY5001 in the DSDT) part of the sony-laptop driver is active.

Something else we could start with is commenting out the SNC driver registration in sony-laptop (see attached patch) and see how that goes with a suspend/resume cycle... Sounds dumb but I have no real clues about what's wrong, maybe one of the initialization we do in sony_nc_function_setup needs to be undone when suspending.
Comment 78 Mattia Dongili 2009-10-07 23:02:03 UTC
Created attachment 23306 [details]
do not initialize SNC
Comment 79 Ari Alanko 2009-10-09 11:14:39 UTC
(In reply to comment #76)
> (In reply to comment #75)
> > Rafael, sony-laptop doesn't do much on resume, really.
> 
> I know, but apparently something it does affects suspend/resume.  To be
> precise, it probably affects the BIOS which then triggers the issue.
> 
> Of course, it also is possible that one of the sony-laptop scripts does
> something wrong.  To rule this out, can you please advise Ari how to disable
> those scripts without unloading the sony-laptop driver?
> 
> Ari, did you try hibernation on this box?

Hibernation works
Comment 80 Ari Alanko 2009-10-12 21:13:29 UTC
(In reply to comment #78)
> Created an attachment (id=23306) [details]
> do not initialize SNC

I tested patch on top of clean 2.6.31.3 and suspend/resume to ram works, but there is no control of brightness.
Comment 81 Ari Alanko 2009-10-12 21:23:58 UTC
(In reply to comment #75)
> Rafael, sony-laptop doesn't do much on resume, really.
> 
> My understanding was that resuming succeeds with init=/bin/bash (is the case
> for single user mode too?). Is it still true?
> 
> Could it be that it's not really the resume process that triggers the bug but
> some script poking at the sysfs files from sony-laptop?
> 
> I sent a couple of patches that are now in the acpi-test tree, they are on
> the
> linux-acpi list, this one specifically could help:
> http://www.spinics.net/lists/linux-acpi/msg24620.html
> Could you give it a try?

Tested top of clean 2.6.31.3 and resume didn't work.
Comment 82 Mattia Dongili 2009-10-13 11:45:54 UTC
(In reply to comment #80)
> (In reply to comment #78)
> > Created an attachment (id=23306) [details] [details]
> > do not initialize SNC
> 
> I tested patch on top of clean 2.6.31.3 and suspend/resume to ram works, but
> there is no control of brightness.

ok, this gives us a starting point. Could you revert the patch and try the one I am about to post?

Thanks
Comment 83 Mattia Dongili 2009-10-13 11:46:48 UTC
Created attachment 23384 [details]
disable hot keys setup
Comment 84 Ari Alanko 2009-10-13 13:36:28 UTC
(In reply to comment #83)
> Created an attachment (id=23384) [details]
> disable hot keys setup

Resume didn't work.
Comment 85 Zhang Rui 2009-12-04 03:23:50 UTC
Mattia,
what's the status of this bug?
Ari,
the problem still exists in the latest kernel, i.e. 2.6.32, right?
Comment 86 Mattia Dongili 2009-12-04 04:09:04 UTC
(In reply to comment #85)
> Mattia,
> what's the status of this bug?

Still open as far as i know.
I started basically a binary search to figure out what is breaking suspend in sony-laptop but never had time to follow up.

I'll try to do so after Ari confirms.
Comment 87 Jim McElwaine 2009-12-13 23:25:56 UTC
I have a Sony PCG-6122M  (VAIO Z51X) running Fedora 12
Linux localhost.localdomain 2.6.31.6-166.fc12.x86_64 #1 SMP Wed Dec 9 10:46:22 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

The suspend works fine if I unload the sony_laptop module but fails to shutdown if the module is loaded
Comment 88 Zhang Rui 2009-12-15 02:23:59 UTC
(In reply to comment #85)
> Ari,
> the problem still exists in the latest kernel, i.e. 2.6.32, right?

ping ari...
Comment 89 Zhang Rui 2009-12-22 03:30:30 UTC
please re-open it if the problem still exists in the latest git kernel.
Comment 90 fanderay 2010-03-16 12:52:35 UTC
Reopening this bug as the problem continues to exist as of 2.6.33.1.  Booting with init=/bin/bash and sony-laptop driver loaded, resume succeeds.  Booting normally (multiuser state but without X), resume fails with a hard lockup.
Comment 91 Zhang Rui 2010-03-17 08:12:01 UTC
Created attachment 25563 [details]
do sony resume in pm notifier chain

please apply this patch on top of 2.6.34-rc1 and see if it helps.
Comment 92 Zhang Rui 2010-03-25 06:23:30 UTC
ping fanderay...
Comment 93 Jim McElwaine 2010-03-25 16:36:35 UTC
I've been running fedora 13 for a while now and the problem is the same.
I've tried compiling the git kernel several times but my machine won't boot. I've spent a lot of time trying to do this but can't get a working .config. If anyone can give me a pointer on how to do this I'd be grateful. I've compiled plenty of kernels in the past and never had this problem.
Comment 94 fanderay 2010-03-30 02:55:03 UTC
Hi Zhang Rui, thanks for the patch.  Unfortunately it does not change the result: resume fails with the same hard lockup.  Following the failed resume, the system starts with the LCD backlight brightness level set to 0 and has to be manually corrected.  [I believe this symptom is coincident with a change that was made some time ago that results in the ACPI rather than the platform-specific backlight driver being used on this system (i.e. /sys/class/backlight for some time now has acpi_video0 instead of sony).]
Comment 95 fanderay 2010-04-09 11:55:43 UTC
Just wanted to emphasize an important point about this problem in case it got lost in the noise:

Originally the hard lockup on resume occurred even when booting with init=/bin/bash provided the sony-laptop driver was loaded.  As we saw in Comments #25 - #28, commenting out the backlight restore seemed to fix the problem *when booting with init=/bin/bash*.  Also, using the ACPI rather than sony-specific backlight routines (which now happens by default on this system in current kernels) yields a successful resume *when booting with init=/bin/bash*.

However, on a normal system boot (but without X/fbcon), the resume always fails.  I have never seen it succeed under these conditions, except possibly as far back as .26 or so.

It therefore seems possible that there are multiple causes for the resume failure, and that the sony backlight restore was just one of them (which is now "resolved" since the ACPI backlight control routines are now used instead).  This raises the question of whether the current resume failure has anything to do with the sony-laptop driver at all; it may not.

This suggests a couple of other tests, like verifying whether resume still fails in current kernels even without the sony-laptop driver loaded.  I don't have much time to do this kind of testing at the moment, so it would be helpful if others with this laptop can assist.

It would also be good for the ACPI developers to start looking beyond sony-laptop for possible causes.
Comment 96 Florian Mickler 2010-12-17 20:36:43 UTC
Still a problem in kernels younger than 2.6.36 ?
Comment 97 fanderay 2011-03-05 18:40:25 UTC
This issue appears to have been partially fixed in 2.6.35.x and fully fixed in 2.6.36.x.  In .35 it did not hard-lock on resume but resume caused a continuous stream of radeon-related errors; it was possible to shut down afterward.  In .36 resume seemed to work correctly for the first time.  All my tests were with KMS enabled.
Comment 98 Florian Mickler 2011-03-05 23:40:51 UTC
Ok, I'm closing it as unreproducible, the fix obviously made it already to the stable tree's, so nothing to worry about. 

Thanks for testing!