Bug 12121

Summary: Hibernation regression on Toshiba Portege R500
Product: Drivers Reporter: Rafael J. Wysocki (rjw)
Component: PCIAssignee: drivers_pci (drivers_pci)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, elendil, jbarnes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-rc6-git1 Subsystem:
Regression: No Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216    
Attachments: Output of alsa-info.sh

Description Rafael J. Wysocki 2008-11-29 07:00:33 UTC
Latest working kernel version: 2.6.27
Earliest failing kernel version: 2.6.28-rc1 (AFAICS)
Distribution: openSUSE 11.0
Hardware Environment: Toshiba Portege R500
Problem Description:

Box hangs 50% of the time (or more) during resume from hibernation.

Steps to reproduce:

Hibernate and try to resume.
Comment 1 Rafael J. Wysocki 2008-11-29 07:06:23 UTC
Hibernation/resume works 100% of the time with snd_hda_intel unloaded, so this appears to be related to ALSA.
Comment 2 Rafael J. Wysocki 2008-11-29 16:00:43 UTC
I identified all patches that modified the files in sound/pci/hda between 2.6.27 and 2.6.28-rc1.  I checked in 2.6.28-rc1 and reverted them all except for one necessary for the kernel to compile.  Then, I built the kernel and retested and hibernation/resume worked 100% of the time.

Tomorrow I'm going to carry out bisection within these patches (there are only 91 of them, so it shouldn't take a lot of time).
Comment 3 Takashi Iwai 2008-11-30 00:58:42 UTC
Please run alsa-info.sh with --no-upload option, and attach the generated file.
This will contain detailed codec information.
The script is found in
    http://www.alsa-project.org/alsa-info.sh
Comment 4 Takashi Iwai 2008-11-30 01:53:28 UTC
Also, try sound git tree for-linus branch
    git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git
There are some pending patches that are not yet in the upstream.
Also it'd be helpful to try master branch of sound git tree as well.
Comment 5 Rafael J. Wysocki 2008-11-30 05:27:12 UTC
Created attachment 19077 [details]
Output of alsa-info.sh

Output of 'alsa-info.sh --no-upload' attached as requested.
Comment 6 Rafael J. Wysocki 2008-11-30 05:28:11 UTC
(In reply to comment #4)
> Also, try sound git tree for-linus branch
>     git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git
> There are some pending patches that are not yet in the upstream.
> Also it'd be helpful to try master branch of sound git tree as well.

Will do, thanks.
Comment 7 Rafael J. Wysocki 2008-11-30 05:28:38 UTC
Handled-By : Takashi Iwai <tiwai@suse.de>
Comment 8 Rafael J. Wysocki 2008-11-30 12:10:52 UTC
Comment #2 is wrong, resume from hibernation also fails with all of the shoud/pci/hda patches from between 2.6.27 and 2.6.28-rc1 reverted.

However, resume from hibernation still succeeds 100% of the time with snd_hda_intel unloaded.

I'm now going to follow the advice from comment #4 and try the new patches.
Comment 9 Rafael J. Wysocki 2008-11-30 12:41:33 UTC
The patches from the for-linus branch do not fix the problem.
Comment 10 Rafael J. Wysocki 2008-11-30 13:00:43 UTC
The patches from the master branch do not fix it too.

It appears that the breakage was caused by a patch outside of the ALSA tree, but unfortunately quite some pre-2.6.28-rc1 patches break hibernation on this box.
Comment 11 Takashi Iwai 2008-11-30 23:12:09 UTC
Thanks for checking.  I also don't think that it's from ALSA side because there are little fundamental changes about PM regarding Realtek codecs.  (I guessed it's a Sigmatel/IDT codec.)

The major difference between unloading and suspend is that the driver calls pci_set_power_state(), such as,
    pci_set_power_state(pci, pci_choose_state(pci, state));
Could be this the culprit?

BTW, you can set power_save=2 option to snd-hda-intel.  This will enable automatic power-saving after two seconds.  In your case, make sure that all unnecessary mixer elements are unmuted.  Otherwise it might not be triggered.

I'm mentioning about power_save because this is essentially an in-driver suspend/resume without the core PM, and interested whether this works at all.
Comment 12 Rafael J. Wysocki 2008-12-01 12:31:53 UTC
(In reply to comment #11)

First of all, it turns out not to be a regression from 2.6.27 (see below).

> Thanks for checking.  I also don't think that it's from ALSA side because
> there are little fundamental changes about PM regarding Realtek codecs.
> (I guessed it's a Sigmatel/IDT codec.)

The chip in the affected box is Intel 82801G (ICH7) (rev 02)

> The major difference between unloading and suspend is that the driver calls
> pci_set_power_state(), such as,
>     pci_set_power_state(pci, pci_choose_state(pci, state));
> Could be this the culprit?

No, I don't think so (BTW, you can change that to
'pci_set_power_state(pci, PCI_D3hot);', because you don't call pci_eanble_wake() anyway; also you don't need to free IRQs on resume).

Now, more info about the bug:
- I can say for certain that 2.6.27-rc6 and all of the later kernels (including current mainline and -stable) are affected.
- So far, I haven't been able to reproduce the breakage with 2.6.27-rc3 (still under stress testing).
- I am unable to reproduce it with snd_hda_intel unloaded with any of the otherwise affected kernels.
- In fact it is rather hard to reproduce.  Sometimes it happens on the first attempt to resume from hibernation, sometimes on the third or even fourth one.
- The symptom is that it doesn't resume.  Sometimes it hangs solid during resume and hard reset is necessary.  Sometimes it hangs during resume, but it can be rebooted with the magic SysRq (no useful debug info, though).  Sometimes, it just powers off immediately after trying to switch to the image kernel.
Comment 13 Rafael J. Wysocki 2008-12-01 12:33:42 UTC
"also you don't need to free IRQs on resume" -> that should be "on suspend", sorry.
Comment 14 Rafael J. Wysocki 2008-12-01 18:23:07 UTC
The problem is independent of snd_hda_intel.  Most probably it was introduced by the following commit:

commit 5f17cfce5776c566d64430f543a289e5cfa4538b
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Sep 4 01:33:59 2008 -0700

    PCI: fix pbus_size_mem() resource alignment for CardBus controllers
Comment 15 Rafael J. Wysocki 2008-12-01 18:25:23 UTC
Not-Handled-By : Takashi Iwai <tiwai@suse.de>
Notify-Also : Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Notify-Also : Jesse Barnes <jbarnes@virtuousgeek.org>
Notify-Also : Andrew Morton <akpm@linux-foundation.org>
Comment 16 Rafael J. Wysocki 2008-12-03 15:32:54 UTC
References : http://marc.info/?l=linux-kernel&m=122818451003644&w=4
Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
Notify-Also : Frans Pop <elendil@planet.nl>
Comment 17 Frans Pop 2008-12-04 03:15:34 UTC
*** Bug 11545 has been marked as a duplicate of this bug. ***
Comment 18 Rafael J. Wysocki 2008-12-07 14:34:34 UTC
This appeared to be an old problem with handling suspend-resume of PCI bridges and PCI Express ports.  For this reason, I'm dropping it from the list of regressions introduced between 2.6.26 and 2.6.27.

For completeness, patches fixing the problem for me have been posted in this thread: http://lkml.org/lkml/2008/12/6/69
Comment 19 Frans Pop 2009-08-01 19:06:16 UTC
Rafael,

AFAIK this BR could be closed as it should be fixed with the suspend/resume rework series you did for 2.6.29.

Cheers,
FJP
Comment 20 Rafael J. Wysocki 2009-08-01 19:20:28 UTC
Yes, it can be closed now, thanks for the ping.