Bug 74021 - crash upon resuming from suspend to ram
crash upon resuming from suspend to ram
Status: CLOSED PATCH_ALREADY_AVAILABLE
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core
x86-64 Linux
: P1 normal
Assigned To: Lv Zheng
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-14 08:50 UTC by Oswald Buddenhagen
Modified: 2014-06-24 04:31 UTC (History)
11 users (show)

See Also:
Kernel Version: 3.14
Tree: Mainline
Regression: Yes


Attachments
dmesg (54.31 KB, application/octet-stream)
2014-04-14 08:50 UTC, Oswald Buddenhagen
Details
dmi (15.48 KB, text/plain)
2014-04-14 08:51 UTC, Oswald Buddenhagen
Details
bisect.log (8.04 KB, text/plain)
2014-04-24 23:08 UTC, boohbah
Details
.config used for the kernel (144.02 KB, application/octet-stream)
2014-04-28 04:08 UTC, pedro.nariyoshi
Details
The binary comparison result (43.43 KB, patch)
2014-04-29 06:38 UTC, Lv Zheng
Details | Diff
The utility to perform binary comparison. (1.46 KB, text/plain)
2014-04-29 06:40 UTC, Lv Zheng
Details
acpi dump (92.54 KB, application/octet-stream)
2014-05-08 09:35 UTC, James Duley
Details
patch to fix the failure (1.22 KB, text/plain)
2014-05-09 16:28 UTC, Jiang Liu
Details
acpi dump from DP45SG (82.02 KB, text/plain)
2014-05-10 11:48 UTC, Oswald Buddenhagen
Details
Revert "ACPICA: Add option to favor 32-bit FADT addresses." (16.88 KB, patch)
2014-05-10 12:03 UTC, Rafael J. Wysocki
Details | Diff
TPM / ACPI: Fix resume problem on Chromebook (887 bytes, patch)
2014-05-10 12:12 UTC, Rafael J. Wysocki
Details | Diff
TPM / ACPI: Fix resume problem on Chromebook (877 bytes, patch)
2014-05-10 12:22 UTC, Rafael J. Wysocki
Details | Diff
Format refined commit (5.56 KB, patch)
2014-05-12 05:36 UTC, Lv Zheng
Details | Diff
dmesg of 3.15-rc5 with revert of broken patch (56.11 KB, text/plain)
2014-05-12 06:07 UTC, Oswald Buddenhagen
Details
[PATCH] Tables: Restore old behavor to favor 32-bit FADT addresses. (1.05 KB, patch)
2014-05-13 05:01 UTC, Lv Zheng
Details | Diff
facp-rsdt.txt (999 bytes, text/plain)
2014-05-13 07:36 UTC, Oswald Buddenhagen
Details
facp-xsdt.txt (1.38 KB, text/plain)
2014-05-13 07:37 UTC, Oswald Buddenhagen
Details
facs-32.txt (535 bytes, text/plain)
2014-05-13 07:37 UTC, Oswald Buddenhagen
Details
facs-64.txt (535 bytes, text/plain)
2014-05-13 07:38 UTC, Oswald Buddenhagen
Details
partial revert against 3.15-rc5 (919 bytes, patch)
2014-05-13 07:39 UTC, Oswald Buddenhagen
Details | Diff
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector. (3.45 KB, patch)
2014-05-13 08:55 UTC, Lv Zheng
Details | Diff
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector. (4.76 KB, patch)
2014-05-13 14:00 UTC, Lv Zheng
Details | Diff
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector. (5.81 KB, patch)
2014-05-16 01:36 UTC, Lv Zheng
Details | Diff
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector. (6.09 KB, patch)
2014-05-16 01:49 UTC, Lv Zheng
Details | Diff
Smarter way to enable 64-bit FADT addresses favor (14.80 KB, patch)
2014-05-26 08:58 UTC, Lv Zheng
Details | Diff

Description Oswald Buddenhagen 2014-04-14 08:50:14 UTC
Created attachment 132131 [details]
dmesg

with kernel 3.14, my system reboots upon attempting to resume from suspend.
3.13.7 works just fine.

attached is the dmesg and the dmidecode output.
Comment 1 Oswald Buddenhagen 2014-04-14 08:51:03 UTC
Created attachment 132141 [details]
dmi
Comment 2 Lan Tianyu 2014-04-18 01:52:55 UTC
This should be a regression between v3.13.7 and v3.14. Could you do git bisect to find the bad commit?

Please follow this link to do bisect
http://wiki.gentoo.org/wiki/Kernel_git-bisect
Comment 3 pedro.nariyoshi 2014-04-24 01:41:09 UTC
I'm having a similar problem. On a Acer Chromebook C720, kernel 3.14 (and 3.14.1) break suspend. The computer suspends fine as shown by the blinking leds, but reboots on resume. Kernel 3.13 works fine.

Should I send a dmesg/dmidecode as well?
Comment 4 boohbah 2014-04-24 22:58:12 UTC
I am also experiencing this issue on the Acer C720, and I have performed a git bisect.

[8b48463f89429af408ff695244dc627e1acff4f7] ACPI: Clean up inclusions of ACPI header files
8b48463f89429af408ff695244dc627e1acff4f7 is the first bad commit

Please let me know if you need any additional information.
Comment 5 boohbah 2014-04-24 23:08:02 UTC
Created attachment 133681 [details]
bisect.log
Comment 6 Lan Tianyu 2014-04-28 03:10:25 UTC
Hi LV:
     Please look this bug. It seems caused by your commit.
Comment 7 Lv Zheng 2014-04-28 03:54:20 UTC
Hi,

Please offer the .config that is used to build your kernel for us to examine.

Thanks in advance.
Comment 8 pedro.nariyoshi 2014-04-28 04:08:18 UTC
Created attachment 133981 [details]
.config used for the kernel

Attached is my config. (I'm using the default Arch kernel)
Comment 9 Lv Zheng 2014-04-28 06:47:57 UTC
The commit affects too many modules.
Let me split it into pieces to narrow down the affected code.
Let me upload split patches and post test requests after that.
Comment 10 Lv Zheng 2014-04-29 06:37:22 UTC
(In reply to pedro.nariyoshi from comment #8)
> Created attachment 133981 [details]
> .config used for the kernel
> 
> Attached is my config. (I'm using the default Arch kernel)

The parent commit of the bisected commit is
6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae
Linux 3.13-rc1v3.13-rc1

Please refer to:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8b48463f

I performed the following confirmation:

1. download the kernel source

2. build the kernel with this commit
# git reset --hard 8b48463f
# copy <attachment 133981 [details]> ./.config
# make menuconfig (I turned CONFIG_MODULES to n so that all modules can be examined in the same binary, I also turned CONFIG_X509_CERTIFICATE_PARSER to n in order to make kernel built)
# make
# copy vmlinux.o vmlinux.o.after

3. build the kernel without this commit
# git reset --hard 6ce4eac1
# make
# copy vmlinux.o vmlinux.o.before

4. perform a binary comparison on the vmlinux.o.after and vmlinux.o.before

The 2 binary turned out to be identical unless some differences caused by __LINE__ changes.  I can only see WARN/WARN_ON/WARN_ON_ONCE/AE_INFO generating different result.

So as we've confirmed before, this commit is a "no functional change" commit, it wasn't affected by the delayed merge and shouldn't be the culprit of your problem, please check your environment.

Let me post the result and script.  Anyone can reproduce my confirmation using the script.
Comment 11 Lv Zheng 2014-04-29 06:38:07 UTC
Created attachment 134141 [details]
The binary comparison result
Comment 12 Lv Zheng 2014-04-29 06:40:15 UTC
Created attachment 134151 [details]
The utility to perform binary comparison.

Put the 2 vmlinux.o into the same folder as this script, then execute the script.
Comment 13 Lv Zheng 2014-04-29 06:44:22 UTC
(In reply to Lv Zheng from comment #11)
> Created attachment 134141 [details]
> The binary comparison result

I added "file: WARN/AE_INFO" marks into this file.
We can see it in the right pane of the diff view.
Comment 14 Rafael J. Wysocki 2014-04-30 21:16:03 UTC
(In reply to Matt Henry from comment #4)
> I am also experiencing this issue on the Acer C720, and I have performed a
> git bisect.
> 
> [8b48463f89429af408ff695244dc627e1acff4f7] ACPI: Clean up inclusions of ACPI
> header files
> 8b48463f89429af408ff695244dc627e1acff4f7 is the first bad commit
> 
> Please let me know if you need any additional information.

Does commit 6ce4eac1f600 (Linux 3.13-rc1) work correctly for you?
Comment 15 Phillip Dixon 2014-05-04 05:38:27 UTC
I've also got an Acer C720 Chromebook running Archlinux.

It appears that this model of Chromebook relies on the tpm driver to put the tpm hardware in to the correct state on suspend or the firmware will fail to resume (this report https://code.google.com/p/chromium/issues/detail?id=221905 it occuring on a chromebook pixel but the symptoms are the same on the C720).

On 3.14 the tpm_tis module doesn't appear to work correctly on the C720. It creates entries in sysfs, but not a device node under /dev. Trying to read from from it's sysfs nodes cause a null pointer defer in the module. Since the tpm_tis driver isn't working it doesn't put the tpm hardware in the correct state and the system resets on resume.

By reverting all the 3.14 changes to drivers/char/tpm/ and rebuilding the tpm and tpm_tis modules I get a tpm_tis module that works. If I use this driver with 3.14.2 (I haven't try on 3.14 or 3.14.1) suspend and resume works on the C720.

I haven't yet tried to narrow down which of the changes to the tpm drivers introduced the breakage.
Comment 16 Rafael J. Wysocki 2014-05-07 18:23:09 UTC
OK, thanks for the effort so far, we know where to look at least.

I would bet on one of these three commits:

1569a4c4ceba ACPI / TPM: detect PPI features by checking availability of _DSM functions
84b1667dea23 ACPI / TPM: replace open-coded _DSM code with helper functions
529139c9736a ACPI / TPM: match node name instead of full path when searching for TPM device

Can you please check them?
Comment 17 James Duley 2014-05-07 23:18:13 UTC
It's 1569a4c4ceba ACPI / TPM: detect PPI features by checking availability of _DSM functions

Confirmed on acer c720 on arch linux, the other 2 work fine (albeit with a keyboard bug that has been fixed later)
Comment 18 Rafael J. Wysocki 2014-05-07 23:22:58 UTC
Does it revert cleanly on top of 3.15-rc4 and, if so, does the revert fix the problem for you?
Comment 19 James Duley 2014-05-08 03:37:37 UTC
Yes, that reverts cleanly and it works.
Comment 20 Jiang Liu 2014-05-08 08:14:43 UTC
Could anybody help to send us an ACPI dump of affected platforms? I need to check the ACPI _DSM method. Please use "acpidump > tpm_acpi.bin" to dump ACPI tables.
Comment 21 James Duley 2014-05-08 09:35:44 UTC
Created attachment 135441 [details]
acpi dump

from the stock kernel on arch (3.140
Comment 22 Jiang Liu 2014-05-09 16:28:55 UTC
Created attachment 135571 [details]
patch to fix the failure

Chromebook implements an ACPI object for TPM, but doesn't implement the _DSM method to support PPI.
Comment 23 Oswald Buddenhagen 2014-05-10 11:48:47 UTC
Created attachment 135661 [details]
acpi dump from DP45SG

i don't really appreciate the hijacking of my report for the chromebook problem, especially as it can be trivially verified that my board doesn't even have a tpm.

i bisected the problem down to 0249ed2444d65d65fc3f3f64f398f1ad0b7e54cd: ACPICA: Add option to favor 32-bit FADT addresses.
Comment 24 Rafael J. Wysocki 2014-05-10 11:56:05 UTC
OK, so we need the patch from Jiang Liu to address the Chromebook issue anyway (James, does the patch jelp?), so it is actually good that it was reported.

As to your issue, thanks for the bisect, reassigning to the ACPICA guys.
Comment 25 Rafael J. Wysocki 2014-05-10 11:57:12 UTC
(In reply to Rafael J. Wysocki from comment #24)
> OK, so we need the patch from Jiang Liu to address the Chromebook issue
> anyway (James, does the patch jelp?),

I mean "help". :-)
Comment 26 James Duley 2014-05-10 12:02:07 UTC
Yes, that patch works, thank you.
Comment 27 Rafael J. Wysocki 2014-05-10 12:03:51 UTC
Created attachment 135671 [details]
Revert "ACPICA: Add option to favor 32-bit FADT addresses."

Oswald, can you please check if this revert on top of v3.15-rc5 fixes the problem for you?
Comment 28 Rafael J. Wysocki 2014-05-10 12:11:17 UTC
(In reply to James Duley from comment #26)
> Yes, that patch works, thank you.

OK, thanks for the confirmation, I'll attach an alternative simpler patch shortly.
Comment 29 Rafael J. Wysocki 2014-05-10 12:12:18 UTC
Created attachment 135681 [details]
TPM / ACPI: Fix resume problem on Chromebook

James, does this patch help too?
Comment 30 Rafael J. Wysocki 2014-05-10 12:15:55 UTC
It may be necessary to replace -ENODEV with 0 in tpm_add_ppi() too.
Comment 31 Rafael J. Wysocki 2014-05-10 12:22:49 UTC
Created attachment 135691 [details]
TPM / ACPI: Fix resume problem on Chromebook

Yeah, it *is* necessary as far as I can say.

James, please test this patch (instead of the previous one) and let me know if it helps.
Comment 32 Oswald Buddenhagen 2014-05-10 13:08:50 UTC
yes, the revert makes it work again.

the 82567LF-2 network adapter seems to have problems with re-establishing the network link after wakeup, but this is most likely only peripherally related.
Comment 33 Brandon Casey 2014-05-11 01:27:42 UTC
Rafael, I've tested your updated patch (attachment 135691 [details]) on chromebook pixel and can confirm that it allows 3.15-rc5 to load the tpm_tis module and suspend/resume correctly.  Thanks.

Tested-by: Brandon Casey <drafnel@gmail.com>
Comment 34 Rafael J. Wysocki 2014-05-11 15:46:49 UTC
Thanks Brandon! I'll queue up that patch for 3.15, then.

Thanks Oswald! I'm going to queue up the revert for 3.15 unless Lv tells me that it will break something else.
Comment 35 James Duley 2014-05-12 00:03:59 UTC
Hi Rafael, I can also confirm it works on acer c720, thank you for your work on this.
Comment 36 Rafael J. Wysocki 2014-05-12 00:14:10 UTC
Thanks!

Patch: https://patchwork.kernel.org/patch/4153421/ (for the ACPICA issue)
Patch: https://patchwork.kernel.org/patch/4153441/ (for the TPM issue)
Comment 37 James Duley 2014-05-12 00:16:11 UTC
Also, I don't know if it makes any difference and I probably should have mentioned it earlier (woops) but, the acer c720 has always needed the kernel parameter "tpm_tis.force=1" to not hang on resume.
Comment 38 Lv Zheng 2014-05-12 01:09:00 UTC
(In reply to Oswald Buddenhagen from comment #23)
> Created attachment 135661 [details]
> acpi dump from DP45SG
> 
> i don't really appreciate the hijacking of my report for the chromebook
> problem, especially as it can be trivially verified that my board doesn't
> even have a tpm.
> 
> i bisected the problem down to 0249ed2444d65d65fc3f3f64f398f1ad0b7e54cd:
> ACPICA: Add option to favor 32-bit FADT addresses.

Hi,

Could you upload a dmesg of booting the kernel without this commit.

Thanks in advance.
Comment 39 Lv Zheng 2014-05-12 05:36:38 UTC
Created attachment 135831 [details]
Format refined commit

Hi,

I deleted the comments and empty lines from the original commit.
And re-arranged the commit by copying non "+/-" marked lines twice to be "+" and '-" marked lines.

We could talk about the regression based on this refined format.
Comment 40 Lv Zheng 2014-05-12 05:56:37 UTC
The only logic change is the following:

-		if (address32) {
-			acpi_tb_init_generic_address(address64,
-						     ACPI_ADR_SPACE_SYSTEM_IO,
-						     *ACPI_ADD_PTR(u8,
-								   &acpi_gbl_FADT,
-								   fadt_info_table
-								   [i].length),
-						     (u64) address32,
-						     fadt_info_table[i].name);
-		}

Original code unconditionally use 32bit GAS address if it is valid.

+		if (address32) {
+			if (!address64->address) {
+				acpi_tb_init_generic_address(address64,
+							     ACPI_ADR_SPACE_SYSTEM_IO,
+							     *ACPI_ADD_PTR(u8,
+									   &acpi_gbl_FADT,
+									   fadt_info_table
+									   [i].
+									   length),
+							     (u64)address32,
+							     name);
+			}

The back port will use 32bit address for GAS only when the 64bit address is not valid.

The change is reasonable, otherwise the higher versioned GAS could not be enabled.
So IMI, the root cause might be the original code has hidden the bug of XSDT checksum validation requirements with this buggy logic.

It looks we should skip the XSDT if the XSDT is not correct:
[    0.000000] ACPI BIOS Warning (bug): Incorrect checksum in table [XSDT] - 0xA0, should be 0xC9 (20131218/tbprint-214)
[    0.000000] ACPI BIOS Warning (bug): 32/64X FACS address mismatch in FADT: 0xCF661F40/0x00000000CF667E40, using 64-bit address (20131218/tbfadt-271)

Let me prepare a series of patches for you to try.
Comment 41 Oswald Buddenhagen 2014-05-12 06:07:12 UTC
Created attachment 135841 [details]
dmesg of 3.15-rc5 with revert of broken patch
Comment 42 Lv Zheng 2014-05-12 08:37:19 UTC
The old kernel:
[    0.000000] ACPI BIOS Warning (bug): Incorrect checksum in table [XSDT] - 0xA0, should be 0xC9 (20140214/tbprint-218)
[    0.000000] ACPI Warning: 32/64 FACS address mismatch in FADT - two FACS tables! (20140214/tbfadt-395)
[    0.000000] ACPI BIOS Warning (bug): 32/64X FACS address mismatch in FADT - 0xCF661F40/0x00000000CF667E40, using 32 (20140214/tbfadt-522)

The new kernel:
[    0.000000] ACPI BIOS Warning (bug): Incorrect checksum in table [XSDT] - 0xA0, should be 0xC9 (20131218/tbprint-214)
[    0.000000] ACPI BIOS Warning (bug): 32/64X FACS address mismatch in FADT: 0xCF661F40/0x00000000CF667E40, using 64-bit address (20131218/tbfadt-271)

This is because we believe 64-bit addresses should be used by default so that new features can be enabled.

We should follow Windows Vista behavior:
http://download.microsoft.com/download/5/b/9/5b97017b-e28a-4bae-ba48-174cf47d23cd/CPA002_WH06.ppt:
If the FADT Revision field is > 2, Windows Vista will use the extended 64-bit addresses in the FADT:
1. X_FIRMWARE_CTRL and X_DSDT
2. Extended addresses of ACPI fixed hardware, X_PM1a_EVT_BLK, etc.

It is 4 in you case:
[000h 0000   4]                    Signature : "FACP"    [Fixed ACPI Description Table (FADT)]
[004h 0004   4]                 Table Length : 000000F4
[008h 0008   1]                     Revision : 04

Thus it is not wrong to use 64-bit address.  Because this is the purpose of this commit, which can avoid issues on other platforms:
http://bugs.acpica.org/show_bug.cgi?id=885

For your platform, I think a quirk mechanism is proper fix.
Comment 43 Rafael J. Wysocki 2014-05-12 11:36:03 UTC
(In reply to James Duley from comment #37)
> Also, I don't know if it makes any difference and I probably should have
> mentioned it earlier (woops) but, the acer c720 has always needed the kernel
> parameter "tpm_tis.force=1" to not hang on resume.

That's a separate issue and please feel free to open another bug entry for it.
Comment 44 Rafael J. Wysocki 2014-05-12 11:43:11 UTC
(In reply to Lv Zheng from comment #42)

[cut]

> Thus it is not wrong to use 64-bit address.  Because this is the purpose of
> this commit, which can avoid issues on other platforms:
> http://bugs.acpica.org/show_bug.cgi?id=885

So the rule is: Breaking things that worked (even if that was by accident) is *much* *worse* than failing to fix things that never worked.

With that rule in mind I'm reverting your commit and you please think how to fix bug #885 without breaking the Oswald's system at the same time.

Thanks!
Comment 45 Lv Zheng 2014-05-13 00:41:54 UTC
Hi, Oswald

(In reply to Oswald Buddenhagen from comment #41)
> Created attachment 135841 [details]
> dmesg of 3.15-rc5 with revert of broken patch

According to my investigation.
On your platform, there are 2 FADT tables and 2 FACS tables.
All other table addresses are same.
All GAS addresses in the dumped FADT are same.

In order to root cause the issue, or generate a proper quirk, I need your support.
Can you upload the output of following commands using recent acpidump:

# acpidump -a 0xCF667D18 > facp-rsdt.txt
# acpidump -a 0xCF65DD98 > facp-xsdt.txt
# acpidump -a 0xCF661F40 > facs-32.txt
# acpidump -a 0xCF667E40 > facs-64.txt

Thanks in advance.
Comment 46 Lv Zheng 2014-05-13 01:15:59 UTC
(In reply to Rafael J. Wysocki from comment #44)
> (In reply to Lv Zheng from comment #42)
> 
> [cut]
> 
> > Thus it is not wrong to use 64-bit address.  Because this is the purpose of
> > this commit, which can avoid issues on other platforms:
> > http://bugs.acpica.org/show_bug.cgi?id=885
> 
> So the rule is: Breaking things that worked (even if that was by accident)
> is *much* *worse* than failing to fix things that never worked.
> 
> With that rule in mind I'm reverting your commit and you please think how to
> fix bug #885 without breaking the Oswald's system at the same time.

OK, let me reopen this bug and leverage Oswald to do more investigations here to figure out a proper way to approach BZ885 fix.

Thanks.
Comment 47 pedro.nariyoshi 2014-05-13 02:20:58 UTC
Hi Oswald,
I'm very sorry I hijacked your thread. After 3.14 was released I encountered this bug and I took a while before I reported it. I thought there was no way no one else had reported it, when I saw someone else having trouble with suspend/resume I was sure it was the same problem
Comment 48 Lv Zheng 2014-05-13 05:01:59 UTC
Created attachment 135931 [details]
[PATCH] Tables: Restore old behavor to favor 32-bit FADT addresses.

Hi, Oswald

This is a simpler function reversion than the whole commit reversion.
Could you please help to confirm if it is working for you?

Thanks in advance.
Comment 49 Oswald Buddenhagen 2014-05-13 07:36:52 UTC
Created attachment 135941 [details]
facp-rsdt.txt
Comment 50 Oswald Buddenhagen 2014-05-13 07:37:29 UTC
Created attachment 135951 [details]
facp-xsdt.txt
Comment 51 Oswald Buddenhagen 2014-05-13 07:37:56 UTC
Created attachment 135961 [details]
facs-32.txt
Comment 52 Oswald Buddenhagen 2014-05-13 07:38:17 UTC
Created attachment 135971 [details]
facs-64.txt
Comment 53 Oswald Buddenhagen 2014-05-13 07:39:51 UTC
Created attachment 135981 [details]
partial revert against 3.15-rc5

the minimal revert works, though i had to port it to the newer kernel.
Comment 54 Lv Zheng 2014-05-13 08:27:12 UTC
(In reply to Oswald Buddenhagen from comment #53)
> Created attachment 135981 [details]
> partial revert against 3.15-rc5
> 
> the minimal revert works, though i had to port it to the newer kernel.

Ah, sorry.
Thanks for your help.
It's generated on top of Rafael's bleeding-edge branch.
I think it might be helpful to reduce his rebase work if he can find a simpler reversion commit.

Best regards.
Comment 55 Lv Zheng 2014-05-13 08:55:57 UTC
Created attachment 136001 [details]
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector.

Can this also be a valid fix?

Thanks in advance.
Comment 56 Rafael J. Wysocki 2014-05-13 12:00:47 UTC
I'm going to apply the patch from comment #53.  Thanks!
Comment 57 Lv Zheng 2014-05-13 14:00:05 UTC
Created attachment 136021 [details]
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector.

Hi, Oswald

This is the refined patch of 64-bit FACS enabling.
This is still not the final version, no FACS v2 support is implemented in this patch.

The 32-bit FACS:
[000h 0000   4]                    Signature : "FACS"
[004h 0004   4]                       Length : 00000040
[008h 0008   4]           Hardware Signature : 00000000
[00Ch 0012   4]    32 Firmware Waking Vector : 00000000
[010h 0016   4]                  Global Lock : 00000000
[014h 0020   4]        Flags (decoded below) : 00000000
                      S4BIOS Support Present : 0
                  64-bit Wake Supported (V2) : 0
[018h 0024   8]    64 Firmware Waking Vector : 0000000000000000
[020h 0032   1]                      Version : 00
                                               ^^
[021h 0033   3]                     Reserved : 000000
[024h 0036   4]    OspmFlags (decoded below) : 00000000
               64-bit Wake Env Required (V2) : 0

The 64-bit FACS:
[000h 0000   4]                    Signature : "FACS"
[004h 0004   4]                       Length : 00000040
[008h 0008   4]           Hardware Signature : 00000000
[00Ch 0012   4]    32 Firmware Waking Vector : 00000000
[010h 0016   4]                  Global Lock : 00000000
[014h 0020   4]        Flags (decoded below) : 00000000
                      S4BIOS Support Present : 0
                  64-bit Wake Supported (V2) : 0
[018h 0024   8]    64 Firmware Waking Vector : 0000000000000000
[020h 0032   1]                      Version : 01
                                               ^^
[021h 0033   3]                     Reserved : 000000
[024h 0036   4]    OspmFlags (decoded below) : 00000000
               64-bit Wake Env Required (V2) : 0

The only difference introduced by the bisected patch is the FACS version.

The 64 Firmware Waking Vector field will be filled as 0 when Version=1 due to the currect implementation of acpi_set_firmware_waking_vector().

The 64 Firmware Waking Vector and Version=1 field is defined in ACPI 2.0 and 3.0.  It supersedes 32 Firmware Waking Vector field.

The 64-bit Wake Supported (V2)/64-bit Wake Env Required (V2) flag and Version=2 field is defined in ACPI 4.0 and 5.0.  Platform firmware should set the "Supported" flag to indicate its support of 64 Firmware Waking Vector field and OSPM should set "Env Required" to indicate the use of 64 Firmware Waking Vector.

When the FACS v1 table is used on your platform, I guess the platform firmware follows ACPI 2.0/3.0, using the 64-bit waking vector on resuming.  And the firmware might be thinking address 0 is a valid waking vector then jumped to it.  Thus you failed to resume from the suspend.
 
This patch tries to enable 64-bit firmware waking vector, this might be sufficient.  But we could give it a try.
Comment 58 Lv Zheng 2014-05-13 15:14:52 UTC
Sorry, some fixes.

(In reply to Lv Zheng from comment #57)
...
> to the currect implementation of acpi_set_firmware_waking_vector().
         ^^^^^^^
         current

> The 64-bit Wake Supported (V2)/64-bit Wake Env Required (V2) flag and
> Version=2 field is defined in ACPI 4.0 and 5.0.  Platform firmware should
> set the "Supported" flag to indicate its support of 64 Firmware Waking
                                                      ^^^^^^^^^^^^^^^^^^
                                                      64 bit execution
> Vector field and OSPM should set "Env Required" to indicate the use of 64
  ^^^^^^^^^^^^                                                           ^^
  environment                                                            64
> Firmware Waking Vector.
  ^^^^^^^^^^^^^^^^^^^^^^
  bit execution environment.
Thus the V2 flags are not related to this issue.  Please ignore this paragraph.

....  
> This patch tries to enable 64-bit firmware waking vector, this might be
                                                                       ^^
                                                                       not be
> sufficient.
Comment 59 Jens Frederich 2014-05-14 11:09:05 UTC
I'm having a similar problem. On a Chromebook Pixel. Kernel 3.13 (and 3.15-rc5) break suspend. The computer suspends fine as shown by the blinking leds for the first time, but reboots on second resume. No Kernel works.

Should I send a dmesg/dmidecode as well?

The tpm_tis force=1 interrupt=0 workaround has no effect.

I've tested your patch: https://patchwork.kernel.org/patch/4153441/ with a 3.15-rc5 kernel but no effect. Is my SeaBios version to old?
Comment 60 Rafael J. Wysocki 2014-05-14 11:38:47 UTC
(In reply to Jens Frederich from comment #59)
> I'm having a similar problem. On a Chromebook Pixel. Kernel 3.13 (and
> 3.15-rc5) break suspend. The computer suspends fine as shown by the blinking
> leds for the first time, but reboots on second resume. No Kernel works.
> 
> Should I send a dmesg/dmidecode as well?
> 
> The tpm_tis force=1 interrupt=0 workaround has no effect.
> 
> I've tested your patch: https://patchwork.kernel.org/patch/4153441/ with a
> 3.15-rc5 kernel but no effect. Is my SeaBios version to old?

If you had the problem before 3.14, it is a different one.  Please file a separate bug report for it.
Comment 61 Jens Frederich 2014-05-14 11:45:17 UTC
Okay
Comment 62 Lv Zheng 2014-05-16 01:36:21 UTC
Created attachment 136281 [details]
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector.

Ping Oswald.
Could you give this a try?
I refined the patch and rebased it to 3.15-rc5.
Let me describe the details after submitting this patch.
Comment 63 Lv Zheng 2014-05-16 01:39:47 UTC
Hi, Oswald

Your support is useful to improve Linux.
The bisected commit and this fix patch tries to enable 64-bit FADT/FACS addresses for Linux kernel in case they are differ from the 32-bit addresses.

The bisected commit only affects 32-bit favor of GAS/Tables addresses in FADT.

1. The comparison of dmesg:

I compared the 2 dmesg by eliminating leading timestamps.
Most of the differences for ACPI tables are caused by the format change.
And there is only address differences/printing message differences for other drivers.

- ACPI: RSDP 0x00000000000F03C0 000024 (v02 INTEL )
+ ACPI: RSDP 00000000000f03c0 000024 (v02  INTEL)
- ACPI: XSDT 0x00000000CF65EE18 000054 (v01 INTEL  DP45SG   00000117 MSFT 00010013)
+ ACPI: XSDT 00000000cf65ee18 000054 (v01 INTEL  DP45SG   00000117 MSFT 00010013)

Two boots all use XSDT.

- ACPI BIOS Warning (bug): Incorrect checksum in table [XSDT] - 0xA0, should be 0xC9 (20140214/tbprint-218)
+ ACPI BIOS Warning (bug): Incorrect checksum in table [XSDT] - 0xA0, should be 0xC9 (20131218/tbprint-214)
- ACPI: FACP 0x00000000CF65DD98 0000F4 (v04 INTEL  DP45SG   00000117 MSFT 00010013)
+ ACPI: FACP 00000000cf65dd98 0000F4 (v04  INTEL   DP45SG 00000117 MSFT 00010013)

Two boots all use same FADT.

- ACPI Warning: 32/64 FACS address mismatch in FADT - two FACS tables! (20140214/tbfadt-395)
- ACPI BIOS Warning (bug): 32/64X FACS address mismatch in FADT - 0xCF661F40/0x00000000CF667E40, using 32 (20140214/tbfadt-522)
+ ACPI BIOS Warning (bug): 32/64X FACS address mismatch in FADT: 0xCF661F40/0x00000000CF667E40, using 64-bit address (20131218/tbfadt-271)

FACS address favor is different for 2 kernel boots.

- ACPI: DSDT 0x00000000CF659018 003D83 (v01 INTEL  DP45SG   00000117 INTL 20051117)
+ ACPI: DSDT 00000000cf659018 003D83 (v01  INTEL   DP45SG 00000117 INTL 20051117)
- ACPI: FACS 0x00000000CF661F40 000040
+ ACPI: FACS 00000000cf667e40 000040

The loaded FACS is different.

- ACPI: APIC 0x00000000CF65DF18 00006C (v02 INTEL  DP45SG   00000117 MSFT 00010013)
+ ACPI: APIC 00000000cf65df18 00006C (v02  INTEL   DP45SG 00000117 MSFT 00010013)
- ACPI: MCFG 0x00000000CF668D98 00003C (v01 INTEL  DP45SG   00000117 MSFT 00000097)
+ ACPI: MCFG 00000000cf668d98 00003C (v01 INTEL  DP45SG   00000117 MSFT 00000097)
- ACPI: HPET 0x00000000CF668D18 000038 (v01 INTEL  DP45SG   00000117 AMI. 00000003)
+ ACPI: HPET 00000000cf668d18 000038 (v01 INTEL  DP45SG   00000117 AMI. 00000003)
- ACPI: ASPT 0x00000000CF668B98 00002C (v02 INTEL  PerfTune 00000117 AMI  00000003)
+ ACPI: ASPT 00000000cf668b98 00002C (v02 INTEL  PerfTune 00000117 AMI  00000003)
- ACPI: WDTT 0x00000000CF661918 0002CC (v01 INTEL  DP45SG   00000117 AMI  00000003)
+ ACPI: WDTT 00000000cf661918 0002CC (v01 INTEL  DP45SG   00000117 AMI  00000003)
- ACPI: SSDT 0x00000000CF661C18 0002CC (v01 AMI    IST      00000001 MSFT 03000001)
+ ACPI: SSDT 00000000cf661c18 0002CC (v01    AMI      IST 00000001 MSFT 03000001)
- ACPI: SSDT 0x0000000000000000 0002CC (v01 AMI    IST      00000001 MSFT 03000001)
+ ACPI: SSDT           (null) 0002CC (v01    AMI      IST 00000001 MSFT 03000001)

All other table loading is same.
And no GAS favor different can be seen.

2.
The useful FADT fields that will be examined differently by the bisected commit.
We only see FACS differs, all GAS addresses and DSDT addresses are same.

                 FACS Address : CF661F40 X------------------+
                 DSDT Address : CF659018 O------------------|+
     PM1A Event Block Address : 00000400 O------------------||+
     PM1B Event Block Address : 00000000 O------------------|||+
   PM1A Control Block Address : 00000404 O------------------||||+
   PM1B Control Block Address : 00000000 O------------------|||||+
    PM2 Control Block Address : 00000450 O------------------||||||+
       PM Timer Block Address : 00000408 O------------------|||||||+
           GPE0 Block Address : 00000420 O------------------||||||||+
           GPE1 Block Address : 00000000 O------------------|||||||||+
       PM1 Event Block Length : 04                          ||||||||||
     PM1 Control Block Length : 02                          ||||||||||
     PM2 Control Block Length : 01                          ||||||||||
        PM Timer Block Length : 04                          ||||||||||
            GPE0 Block Length : 10                          ||||||||||
            GPE1 Block Length : 00                          ||||||||||
                 FACS Address : 00000000CF667E40 -----------+|||||||||
                 DSDT Address : 00000000CF659018 ------------+||||||||
             PM1A Event Block : [Generic Address Structure]   ||||||||
                     Space ID : 01 [SystemIO]                 ||||||||
                    Bit Width : 20                            ||||||||
                   Bit Offset : 00                            ||||||||
         Encoded Access Width : 00 [Undefined/Legacy]         ||||||||
                      Address : 0000000000000400 -------------+|||||||
             PM1B Event Block : [Generic Address Structure]    |||||||
                     Space ID : 01 [SystemIO]                  |||||||
                    Bit Width : 00                             |||||||
                   Bit Offset : 00                             |||||||
         Encoded Access Width : 00 [Undefined/Legacy]          |||||||
                      Address : 0000000000000000 --------------+||||||
           PM1A Control Block : [Generic Address Structure]     ||||||
                     Space ID : 01 [SystemIO]                   ||||||
                    Bit Width : 10                              ||||||
                   Bit Offset : 00                              ||||||
         Encoded Access Width : 00 [Undefined/Legacy]           ||||||
                      Address : 0000000000000404 ---------------+|||||
           PM1B Control Block : [Generic Address Structure]      |||||
                     Space ID : 01 [SystemIO]                    |||||
                    Bit Width : 00                               |||||
                   Bit Offset : 00                               |||||
         Encoded Access Width : 00 [Undefined/Legacy]            |||||
                      Address : 0000000000000000 ----------------+||||
            PM2 Control Block : [Generic Address Structure]       ||||
                     Space ID : 01 [SystemIO]                     ||||
                    Bit Width : 08                                ||||
                   Bit Offset : 00                                ||||
         Encoded Access Width : 00 [Undefined/Legacy]             ||||
                      Address : 0000000000000450 -----------------+|||
               PM Timer Block : [Generic Address Structure]        |||
                     Space ID : 01 [SystemIO]                      |||
                    Bit Width : 20                                 |||
                   Bit Offset : 00                                 |||
         Encoded Access Width : 00 [Undefined/Legacy]              |||
                      Address : 0000000000000408 ------------------+||
                   GPE0 Block : [Generic Address Structure]         ||
                     Space ID : 01 [SystemIO]                       ||
                    Bit Width : 80                                  ||
                   Bit Offset : 00                                  ||
         Encoded Access Width : 00 [Undefined/Legacy]               ||
                      Address : 0000000000000420 -------------------+|
                   GPE1 Block : [Generic Address Structure]          |
                     Space ID : 01 [SystemIO]                        |
                    Bit Width : 00                                   |
                   Bit Offset : 00                                   |
         Encoded Access Width : 00 [Undefined/Legacy]                |
                      Address : 0000000000000000 --------------------+

All GAS Bit Width fields are same as (Length * 8).

So if you are suffering from this commit, it is likely we can improve Linux by favoring 64-bit FACS addresses using the fix patch.
Without your feedback, it is hard for us to determine if this patch is valid.
Comment 64 Lv Zheng 2014-05-16 01:49:40 UTC
Created attachment 136291 [details]
[PATCH] ACPICA: Hardware: Enable 64-bit firmware waking vector.

The attachment 136281 [details] is used for linux-pm.git/linux-next.
This is for linux.git/master.

You can choose what you want.
Comment 65 Lv Zheng 2014-05-16 01:50:14 UTC
Hi, Oswald

Here is my test request:

1. Using 3.15-rc5.
2. Do not apply attachment 135981 [details], but apply attachment 136281 [details] or attachment 136291 [details].
3. Build and boot the kernel
4. Perform the suspend/resume test
5. Report the test result here.

Thanks in advance.

Best regards
Comment 66 Oswald Buddenhagen 2014-05-17 09:05:19 UTC
the patch against 3.15-rc5 doesn't cut it:

drivers/acpi/acpica/hwxfsleep.c: In function ‘acpi_set_firmware_waking_vector’:
drivers/acpi/acpica/hwxfsleep.c:112:8: error: ‘acpi_gbl_use32_bit_facs_addresses’ undeclared (first use in this function)
   if (!acpi_gbl_use32_bit_facs_addresses
        ^
drivers/acpi/acpica/hwxfsleep.c:112:8: note: each undeclared identifier is reported only once for each function it appears in

the other patch doesn't apply on top of 07b9b1833cae4dc4375635266a965172799f732d, even with a 3-way merge (unknown sha1, so you presumably made it on top of unpublished changes).
Comment 67 Lv Zheng 2014-05-26 08:45:17 UTC
(In reply to Oswald Buddenhagen from comment #66)
> the patch against 3.15-rc5 doesn't cut it:
> 
> drivers/acpi/acpica/hwxfsleep.c: In function
> ‘acpi_set_firmware_waking_vector’:
> drivers/acpi/acpica/hwxfsleep.c:112:8: error:
> ‘acpi_gbl_use32_bit_facs_addresses’ undeclared (first use in this function)
>    if (!acpi_gbl_use32_bit_facs_addresses
>         ^
> drivers/acpi/acpica/hwxfsleep.c:112:8: note: each undeclared identifier is
> reported only once for each function it appears in
> 
> the other patch doesn't apply on top of
> 07b9b1833cae4dc4375635266a965172799f732d, even with a 3-way merge (unknown
> sha1, so you presumably made it on top of unpublished changes).

Hi, Oswald

I also have a platform reported by BIOS of a FACSv2 table, simply copying real mode waking vector address to the 64-bit address field doesn't work.
So the patch appears not to be sufficient.
I found some clues in ACPI specification.
The 32-bit waking vector address is used for the real mode environment and the 64-bit waking vector address might be used for the 4GB flat mode environment, Linux dosn't support the latter resuming environment.

For the smarter way to re-enable 64-bit address favor, I composed some commits here:
https://github.com/zetalog/acpica/tree/acpica-64bit

I have tested that they won't break resuming on my platforms. But I don't have a platform shipped with 2 different versioned FACS, I can do validation here about the 64-bit waking vector's behavior, but I cannot do validation here about which FACS BIOS will use.
Let me linuxize "smarter way" patches and port them to the linux 3.15-rc5 for you to confirm.

Thanks
Comment 68 Lv Zheng 2014-05-26 08:58:41 UTC
Created attachment 137391 [details]
Smarter way to enable 64-bit FADT addresses favor

I had trouble in rebasing one of the patches.
So I wonder if you can use this branch:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
linux-next branch.
The patch order is indicated by the series file in the archive.

I'll try again to do the back port.
Comment 69 Lv Zheng 2014-05-27 06:19:06 UTC
Hi, Oswald

I think it's better for us to wait until the current ACPICA code being merged into the linux.git/master branch.
There are too many conflicts in the table code.
Given the regression fix is ready now, it's not urgent for us to handle this (64-bit re-enabling) right now.

We just need the valid test platforms to confirm if re-enabling 64-bit addresses by excluding the 64-bit FACS and 64-bit waking vector can work for Linux.
I have tested the patches, and suspend/resume is OK on my environments.

I'll ask maintainer.  If it is necessary to perform validation on a multi-FACS platform, I'll ask you again to try the suspend/resume tests after the 201403/201404 ACPICA patches appearing in the linux upstream.

Thanks
-Lv
Comment 70 Lv Zheng 2014-06-11 00:19:27 UTC
Regression fix is shipped in v3.15-rc8.
Root caused and validated fixes are here for ACPICA upstream:
https://github.com/acpica/acpica/pull/35

Closing.

Note You need to log in before you can comment on or make changes to this bug.