Bug 9530 - STD regression rc1 -> rc234, suspend fails completely
STD regression rc1 -> rc234, suspend fails completely
Status: CLOSED CODE_FIX
Product: Power Management
Classification: Unclassified
Component: Hibernation/Suspend
All Linux
: P1 normal
Assigned To: Andreas Mohr
:
Depends on: 9320
Blocks: 7216 9243
  Show dependency treegraph
 
Reported: 2007-12-09 05:39 UTC by Rafael J. Wysocki
Modified: 2007-12-22 04:58 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.24-rc1
Tree: Mainline
Regression: Yes


Attachments
2.6.24-rc4 (failing suspend) boot log (dmesg) (25.58 KB, text/plain)
2007-12-09 11:52 UTC, Andreas Mohr
Details
lspci -nn (on -rc4, JFYI) (1.63 KB, text/plain)
2007-12-09 11:54 UTC, Andreas Mohr
Details
bug9320-dbg2.patch (14.27 KB, patch)
2007-12-09 22:16 UTC, Tejun Heo
Details | Diff
dmesg of clean 2.6.24-rc4 plus bug9320-dbg2 patch, suspend works fine (33.55 KB, text/plain)
2007-12-10 13:36 UTC, Andreas Mohr
Details
bug9320-dbg3.patch (14.88 KB, patch)
2007-12-10 19:45 UTC, Tejun Heo
Details | Diff
dmesg of 2.6.24-rc4 plus db3, boot plus suspend (33.38 KB, text/plain)
2007-12-11 15:43 UTC, Andreas Mohr
Details
DSDT source of my EPOX 8K5A2+ board, latest BIOS (2003) (157.98 KB, text/plain)
2007-12-11 15:44 UTC, Andreas Mohr
Details
bug9530-dbg0.patch (18.38 KB, patch)
2007-12-11 17:00 UTC, Tejun Heo
Details | Diff
dmesg of clean 2.6.24-rc4 plus dbg0 patch, suspend works fine (45.50 KB, text/plain)
2007-12-12 13:26 UTC, Andreas Mohr
Details
bug9320-dbg6.patch (22.41 KB, patch)
2007-12-14 05:08 UTC, Tejun Heo
Details | Diff
2.6.24-rc4 with dbg6 (32.34 KB, text/plain)
2007-12-14 16:12 UTC, Andreas Mohr
Details

Description Rafael J. Wysocki 2007-12-09 05:39:07 UTC
Subject         : STD regression rc1 -> rc234, suspend fails completely
Submitter       : Andreas Mohr <andi@lisas.de>
References      : http://lkml.org/lkml/2007/12/8/34
Handled-By      : Robert Hancock <hancockr@shaw.ca>
                  Tejun Heo <htejun@gmail.com>
Comment 1 Andreas Mohr 2007-12-09 11:52:57 UTC
Created attachment 13927 [details]
2.6.24-rc4 (failing suspend) boot log (dmesg)
Comment 2 Andreas Mohr 2007-12-09 11:54:07 UTC
Created attachment 13928 [details]
lspci -nn (on -rc4, JFYI)
Comment 3 Andreas Mohr 2007-12-09 13:39:13 UTC
OK, problem semi-solved I think, see http://lkml.org/lkml/2007/12/9/139
Comment 4 Andreas Mohr 2007-12-09 14:07:16 UTC
#9320 is the root cause of this problem, and it seems they may be able to do something about _GTM failure.
Comment 5 Tejun Heo 2007-12-09 22:16:08 UTC
Created attachment 13933 [details]
bug9320-dbg2.patch

Please test this patch and report kernel log.  Thanks.
Comment 6 Andreas Mohr 2007-12-10 13:36:43 UTC
Created attachment 13954 [details]
dmesg of clean 2.6.24-rc4 plus bug9320-dbg2 patch, suspend works fine

OK, on a cleanly remade 2.6.24-rc4 with the bug9320-dbg2 patch, suspend/resume works fine with no actually failing _GTM invocation to be seen, however methinks the _GTF handling isn't quite perfect yet (this seems to be the invocation for the primary port, though, so I'm a bit confused). I should possibly do some more ASL investigations...
Comment 7 Tejun Heo 2007-12-10 19:45:26 UTC
Created attachment 13964 [details]
bug9320-dbg3.patch

* Please try this patch and post the result.

* Please post ASL of DSDT.

Thanks.
Comment 8 Andreas Mohr 2007-12-11 15:43:10 UTC
Created attachment 13983 [details]
dmesg of 2.6.24-rc4 plus db3, boot plus suspend
Comment 9 Andreas Mohr 2007-12-11 15:44:51 UTC
Created attachment 13984 [details]
DSDT source of my EPOX 8K5A2+ board, latest BIOS (2003)
Comment 10 Robert Hancock 2007-12-11 16:23:03 UTC
OK, so the taskfile your BIOS is trying to send is a SET FEATURES - transfer mode command. To figure out the right mode, it looks up the values of PMUE, PMUT, and PMPT in some lookup tables. These are located in PCI config space at 0x53 (low 3 bits), 0x53 (top 4 bits) and 0x4B (all 8 bits) respectively. I'm not sure what those values hold on your system (lspci -vvvxxx would show this) but it's likely a good bet that one of them isn't in the lookup tables and so the subsequent dereference to help it figure out one of the taskfile parameters fails.

This seems like a very similar braindead BIOS implementation to what Torsten has on his NVIDIA chipset board (bug 9320). I think it's again a matter of libata programming the controller with slightly different register values from what the BIOS expects to be there and then the BIOS code choking when it sees those.
Comment 11 Tejun Heo 2007-12-11 16:33:05 UTC
Those mode values are supposed to be programmed by _STM.  Suspend/resume cycles look like the following from ATA-ACPI's POV.

1. _GTM is called to store the current transfer mode setting.
2. suspend.
3. resume.
4. _STM is called with the parameter saved from #1 to restore transfer mode setting.
5. _GTF is called and the resulting TFs are executed.

I'll attach a debug patch.  Let's see whether we're missing _STM.
Comment 12 Tejun Heo 2007-12-11 16:38:29 UTC
Okay, dbg3 already contains that.  Here's excerpt from log in comment #8.

ata1: XXX _GTM saved on suspend
... (suspend/resume)
ata1: XXX _STM performed on resume 78:14/78:3c 15
...
ata1.00: _GTF evaluation failed (AE 0x300d)
ata1.00: ACPI: _GTF invalid, disabled
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33

So, the above _STM is supposed to fill whatever _GTF needs.  I'll post a debug patch to dump 0x53 and 0x4B before and after _GTM/_STM calls.  Arghh...
Comment 13 Tejun Heo 2007-12-11 17:00:07 UTC
Created attachment 13985 [details]
bug9530-dbg0.patch

Please apply this patch on top of -rc4 and report the log.  Thanks.
Comment 14 Andreas Mohr 2007-12-12 13:26:19 UTC
Created attachment 14002 [details]
dmesg of clean 2.6.24-rc4 plus dbg0 patch, suspend works fine

Note the I/O errors in dmesg a couple seconds after resume...
Comment 15 Robert Hancock 2007-12-12 16:03:50 UTC
It looks like after the _STM, we have reasonable values (PMPT=0x20):

pata_via 0000:00:11.1: XXX PCI 0x48=0x20209999 0x50=0xf1f60707

but by the time _GTF is evaluated, they've been changed to ones not in the BIOS tables (PMPT=0x99):

ata1: XXX evaluating _GTF
pata_via 0000:00:11.1: XXX PCI 0x48=0x99999999 0x50=0x37370707
ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
Comment 16 Tejun Heo 2007-12-13 22:38:20 UTC
OIC, thanks a lot Robert.  It seems we'll have to evaluate _GTF right after _STM and cache the result.  Will prep another patch.
Comment 17 Tejun Heo 2007-12-14 05:08:01 UTC
Created attachment 14031 [details]
bug9320-dbg6.patch

Please test this patch and report the kernel log.  Thanks.
Comment 18 Andreas Mohr 2007-12-14 16:12:55 UTC
Created attachment 14045 [details]
2.6.24-rc4 with dbg6

Hmm... I think I don't want to like this ;)
Now it "filtered out" any _GTF or _GTM invocation (or maybe you simply removed corresponding logging).

Thanks!
Comment 19 Tejun Heo 2007-12-14 17:01:28 UTC
Actually, that's exactly they way it's intended, so no problem evaluating _STM and _GTF, great.  All commands in the _GTF are SETXFERs which only disturb libata device configuration process.  Filtering them out is DTRT.

I'll forward the patchset upstream.  Thanks a lot for testing.
Comment 20 Tejun Heo 2007-12-14 22:20:05 UTC
Patchset posted.  Thanks for all the testing.

  http://thread.gmane.org/gmane.linux.ide/26379

Andreas, can you please resolve this bug as CODE_FIX?
Comment 21 Andreas Mohr 2007-12-15 03:44:09 UTC
Nice to hear that it works absolutely correctly!
Thanks for all your hard work and very timely help!
Resolving but not closing (should probably be done once fix arrived upstream).
Comment 22 Rafael J. Wysocki 2007-12-20 16:02:01 UTC
Fixed by:

commit ededa4d396b15c282aa60d6aacddfc07f0142dbf
Merge: 64396ac... 140b5e5...
Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date:   Mon Dec 17 19:29:32 2007 -0800

    Merge branch 'upstream-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ededa4d396b15c282aa60d6aacddfc07f0142dbf
Comment 23 Andreas Mohr 2007-12-22 04:58:34 UTC
[bug already closed, JFYI]

-rc6 (includes Tejun's libata patch) verified to work nicely, thanks!

Note You need to log in before you can comment on or make changes to this bug.