Subject : STD regression rc1 -> rc234, suspend fails completely Submitter : Andreas Mohr <andi@lisas.de> References : http://lkml.org/lkml/2007/12/8/34 Handled-By : Robert Hancock <hancockr@shaw.ca> Tejun Heo <htejun@gmail.com>
Created attachment 13927 [details] 2.6.24-rc4 (failing suspend) boot log (dmesg)
Created attachment 13928 [details] lspci -nn (on -rc4, JFYI)
OK, problem semi-solved I think, see http://lkml.org/lkml/2007/12/9/139
#9320 is the root cause of this problem, and it seems they may be able to do something about _GTM failure.
Created attachment 13933 [details] bug9320-dbg2.patch Please test this patch and report kernel log. Thanks.
Created attachment 13954 [details] dmesg of clean 2.6.24-rc4 plus bug9320-dbg2 patch, suspend works fine OK, on a cleanly remade 2.6.24-rc4 with the bug9320-dbg2 patch, suspend/resume works fine with no actually failing _GTM invocation to be seen, however methinks the _GTF handling isn't quite perfect yet (this seems to be the invocation for the primary port, though, so I'm a bit confused). I should possibly do some more ASL investigations...
Created attachment 13964 [details] bug9320-dbg3.patch * Please try this patch and post the result. * Please post ASL of DSDT. Thanks.
Created attachment 13983 [details] dmesg of 2.6.24-rc4 plus db3, boot plus suspend
Created attachment 13984 [details] DSDT source of my EPOX 8K5A2+ board, latest BIOS (2003)
OK, so the taskfile your BIOS is trying to send is a SET FEATURES - transfer mode command. To figure out the right mode, it looks up the values of PMUE, PMUT, and PMPT in some lookup tables. These are located in PCI config space at 0x53 (low 3 bits), 0x53 (top 4 bits) and 0x4B (all 8 bits) respectively. I'm not sure what those values hold on your system (lspci -vvvxxx would show this) but it's likely a good bet that one of them isn't in the lookup tables and so the subsequent dereference to help it figure out one of the taskfile parameters fails. This seems like a very similar braindead BIOS implementation to what Torsten has on his NVIDIA chipset board (bug 9320). I think it's again a matter of libata programming the controller with slightly different register values from what the BIOS expects to be there and then the BIOS code choking when it sees those.
Those mode values are supposed to be programmed by _STM. Suspend/resume cycles look like the following from ATA-ACPI's POV. 1. _GTM is called to store the current transfer mode setting. 2. suspend. 3. resume. 4. _STM is called with the parameter saved from #1 to restore transfer mode setting. 5. _GTF is called and the resulting TFs are executed. I'll attach a debug patch. Let's see whether we're missing _STM.
Okay, dbg3 already contains that. Here's excerpt from log in comment #8. ata1: XXX _GTM saved on suspend ... (suspend/resume) ata1: XXX _STM performed on resume 78:14/78:3c 15 ... ata1.00: _GTF evaluation failed (AE 0x300d) ata1.00: ACPI: _GTF invalid, disabled ata1.00: configured for UDMA/100 ata1.01: configured for UDMA/33 So, the above _STM is supposed to fill whatever _GTF needs. I'll post a debug patch to dump 0x53 and 0x4B before and after _GTM/_STM calls. Arghh...
Created attachment 13985 [details] bug9530-dbg0.patch Please apply this patch on top of -rc4 and report the log. Thanks.
Created attachment 14002 [details] dmesg of clean 2.6.24-rc4 plus dbg0 patch, suspend works fine Note the I/O errors in dmesg a couple seconds after resume...
It looks like after the _STM, we have reasonable values (PMPT=0x20): pata_via 0000:00:11.1: XXX PCI 0x48=0x20209999 0x50=0xf1f60707 but by the time _GTF is evaluated, they've been changed to ones not in the BIOS tables (PMPT=0x99): ata1: XXX evaluating _GTF pata_via 0000:00:11.1: XXX PCI 0x48=0x99999999 0x50=0x37370707 ACPI Exception (exoparg2-0442): AE_AML_PACKAGE_LIMIT, Index (0FFFFFFFF) is beyond end of object [20070126]
OIC, thanks a lot Robert. It seems we'll have to evaluate _GTF right after _STM and cache the result. Will prep another patch.
Created attachment 14031 [details] bug9320-dbg6.patch Please test this patch and report the kernel log. Thanks.
Created attachment 14045 [details] 2.6.24-rc4 with dbg6 Hmm... I think I don't want to like this ;) Now it "filtered out" any _GTF or _GTM invocation (or maybe you simply removed corresponding logging). Thanks!
Actually, that's exactly they way it's intended, so no problem evaluating _STM and _GTF, great. All commands in the _GTF are SETXFERs which only disturb libata device configuration process. Filtering them out is DTRT. I'll forward the patchset upstream. Thanks a lot for testing.
Patchset posted. Thanks for all the testing. http://thread.gmane.org/gmane.linux.ide/26379 Andreas, can you please resolve this bug as CODE_FIX?
Nice to hear that it works absolutely correctly! Thanks for all your hard work and very timely help! Resolving but not closing (should probably be done once fix arrived upstream).
Fixed by: commit ededa4d396b15c282aa60d6aacddfc07f0142dbf Merge: 64396ac... 140b5e5... Author: Linus Torvalds <torvalds@woody.linux-foundation.org> Date: Mon Dec 17 19:29:32 2007 -0800 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ededa4d396b15c282aa60d6aacddfc07f0142dbf
[bug already closed, JFYI] -rc6 (includes Tejun's libata patch) verified to work nicely, thanks!