Bug 13886

Summary: Suspend to disk no longer works in 2.6.30.2 with an EIDE drive
Product: IO/Storage Reporter: akwatts
Component: IDEAssignee: io_ide (io_ide)
Status: CLOSED INVALID    
Severity: high CC: akpm, pavel, rjw
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.30.2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 13070    

Description akwatts 2009-08-01 10:12:50 UTC
s2disk worked in 2.6.28 and no longer works in 2.6.30.2

(s2disk here means: echo disk > /sys/power/state).

Screen blanks and system hangs with no writes to disk. Hard restart
required afterwards.

Bisection identifies problem commit has hash 
295f00042aaf6b553b5f37348f89bab463d4a469.
Comment 1 Andrew Morton 2009-08-01 10:22:03 UTC
marked as regression, cc'ed s2d developers.
Comment 2 Rafael J. Wysocki 2009-08-01 10:52:07 UTC
Caused by:

commit 295f00042aaf6b553b5f37348f89bab463d4a469
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Fri Jan 2 16:12:48 2009 +0100

    ide: don't execute the next queued command from the hard-IRQ context (v2)

First-Bad-Commit : 295f00042aaf6b553b5f37348f89bab463d4a469
Comment 3 Bartlomiej Zolnierkiewicz 2009-08-01 12:50:21 UTC
On Saturday 01 August 2009 12:52:08 bugzilla-daemon@bugzilla.kernel.org wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=13886
> 
> 
> 
> 
> 
> --- Comment #2 from Rafael J. Wysocki <rjw@sisk.pl>  2009-08-01 10:52:07 ---
> Caused by:
> 
> commit 295f00042aaf6b553b5f37348f89bab463d4a469
> Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date:   Fri Jan 2 16:12:48 2009 +0100
> 
>     ide: don't execute the next queued command from the hard-IRQ context (v2)
> 
> First-Bad-Commit : 295f00042aaf6b553b5f37348f89bab463d4a469

Hi Rafal,

We have been through one similar case already, please see:

	http://bugzilla.kernel.org/show_bug.cgi?id=13371

s2d bisection will always point to this commit because it really broke s2d 
for the "whole" 12 days during early -rc phase.  The fix:

commit 2ea5521022ac8f4f528dcbae02668e02a3501a5a
Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date:   Wed Jan 14 19:19:04 2009 +0100

    ide: fix suspend regression

Last time I've spent like a week working with the reporter to find the real
bugger (in e100 networking driver, it stays unfixed after over 2 months).

I'm not going to do it again.  If the real bugger turns out to be my patch
or patch passed through me then please ping me again, till then I'm off cc:.

Also a small reminder -- I'm no longer maintainer of drivers/ide/ so please
pass everything through Dave first as he should be able to verify whether
the issue was fixed already, i.e.: there are still some outstanding bugfixes
from June needed for -stable:

	http://marc.info/?l=linux-ide&m=124910557313722&w=2

[ yes, those are fixes for a Sparc Ultra 10 specific problems reported on
  one sunny Saturday's evening and fully debugged/fixed after 2 days.. ]
Comment 4 akwatts 2009-08-01 15:13:32 UTC
The fix mentioned above (commit 2ea5521022ac8f4f528dcbae02668e02a3501a5a) does not address the problem I am encountering (which is present in 2.6.30.2).

There is no e100 hardware present here.

I am more than willing to help with the debugging process from this end.

Might we test by reverting his commit on 2.6.30.2?
Comment 5 Rafael J. Wysocki 2009-08-01 19:12:54 UTC
(In reply to comment #4)
> The fix mentioned above (commit 2ea5521022ac8f4f528dcbae02668e02a3501a5a)
> does
> not address the problem I am encountering (which is present in 2.6.30.2).
> 

Bart's point is that the problem introduced by commit 295f00042aaf6b553b5f37348f89bab463d4a469 was fixed by commit 2ea5521022ac8f4f528dcbae02668e02a3501a5a, so if you have a bisection point between the two commits, the bisection may lead to commit 295f00042aaf6b553b5f37348f89bab463d4a469 instead of the real culprit.

> There is no e100 hardware present here.

How is e100 relevant here?

> I am more than willing to help with the debugging process from this end.
> 
> Might we test by reverting his commit on 2.6.30.2?

That might be difficult, because I guess commit 295f00042aaf6b553b5f37348f89bab463d4a469 doesn't revert cleanly.

Could you instead test the kernel where commit 2ea5521022ac8f4f528dcbae02668e02a3501a5a is the head and see if the problem is present there?

[Bart, I know you're not maintaining IDE any more, but your commit was pointed out by the bisection.  Thanks for the information that it was a false positive.]

First-Bad-Commit: unknown
Comment 6 akwatts 2009-08-04 07:58:37 UTC
By way of an update:

(1) You were right, git zeroed in on a false positive. Compiling with 2ea5521022ac8f4f528dcbae02668e02a3501a5a as HEAD fixed s2disk. I apologize for not having understood what was meant the first time around.

(2) Bisecting between that and 2.6.30 narrowed down the "problem" commit to: 2f0d0fd2a605666d38e290c5c0d2907484352dc4. This is not really a problem (see below).

(3) Between 2.6.26 and 2.6.28 things broke for me (CPU ran about 17 degrees hotter while idle). The fix was noapic, nolapic, and pci=noacpi. This issue was resolved between 2.6.28 and 2.6.30.2 but pci=noacpi was needed to prevent IRQ problems (because IOAPIC support had been removed from my kernel). It turns out pci=noacpi, 2f0d0fd2a605666d38e290c5c0d2907484352dc4, and s2disk don't get along.

(4) I am happy to confirm that 2.6.30.4 works great with the re-insertion of IOAPIC uniprocessor support in the kernel and removing all APIC and ACPI boot parameters.

Sorry for the false alarm.

~Andy
Comment 7 Rafael J. Wysocki 2009-08-04 15:11:59 UTC
Thanks for the update, closing.