Bug 88391 - Not waking up after suspend-to-RAM
Summary: Not waking up after suspend-to-RAM
Status: RESOLVED CODE_FIX
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: i386 (show other bugs)
Hardware: Intel Linux
: P1 normal
Assignee: Borislav Petkov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-18 05:56 UTC by Natrio
Modified: 2015-10-16 18:14 UTC (History)
6 users (show)

See Also:
Kernel Version: 3.17.3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
3.17.3-1-ARCH SMP PREEMPT i686 /proc/config.gz (39.18 KB, application/gzip)
2014-11-26 10:23 UTC, Natrio
Details
dirty fix (4.37 KB, patch)
2014-11-27 00:43 UTC, Borislav Petkov
Details | Diff

Description Natrio 2014-11-18 05:56:00 UTC
The 3.17.3(i686) kernel normally going to suspend, but after power resuming seems to be dead – no video, no ping, no sounds, no logs and no keyboard lights.

On linux-3.17.2(i686) all works Ok.
On x86_64(3.17.3) all works Ok.

CPU: Intel Celeron G530
M.B: Gigabyte H61M-S2PV
Chipset: Intel H61 Express
Arch Linux latest

Also reported the same bug on MSI Wind U100 which Intel Atom N270:
https://bugs.archlinux.org/task/42820#comment129958
Comment 1 Aaron Lu 2014-11-19 02:06:58 UTC
Regression from v3.17.2->v3.17.3 for i386 edition
Can you please do a git bisect to find the offending commit?
Comment 2 Natrio 2014-11-24 07:40:40 UTC
It's too difficult for me, but this bug is quite common in different i686 builds for a number of machines:
https://bugs.archlinux.org/task/42820
Comment 3 Aaron Lu 2014-11-24 07:48:06 UTC
Can you please ask someone there to do the bisect?
Comment 4 Leo Wolf 2014-11-25 23:09:53 UTC
This bisects to fb86b97, the fix for https://bugzilla.kernel.org/show_bug.cgi?id=88001 .

The Arch Linux 3.17.3 kernel has this patch as well, hence the problems.
Comment 5 Leo Wolf 2014-11-25 23:27:47 UTC
This is with early microcode enabled and happens whether the Intel microcode is loaded with the initrd or not.  No problem without early microcode updates.
Comment 6 Aaron Lu 2014-11-26 02:13:17 UTC
Boris,

Your commit fb86b97300d9 ("x86, microcode: Update BSPs microcode on resume") seems to cause resume trouble for some people, please take a look, thanks.
Comment 7 Natrio 2014-11-26 06:17:26 UTC
It must be so wrong fix, because the kernel is dying anyway, regardless of the presence or absence of the loaded microcode upgrade.

In the 3.17.2 kernel I had to disable the microcode update, since it is only updated on one of the two cores (Celeron G530, without hyper-threading):

$ dmesg | grep microcode
[    0.000000] CPU0 microcode updated early to revision 0x29, date = 2013-06-12
[    0.327680] microcode: CPU0 sig=0x206a7, pf=0x2, revision=0x29
[    0.327687] microcode: CPU1 sig=0x206a7, pf=0x2, revision=0x23
[    0.327689] perf_event_intel: PEBS disabled due to CPU errata, please upgrade microcode
[    0.327749] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

and in 3.17.3 this behavior was NOT changed.
Comment 8 Borislav Petkov 2014-11-26 10:16:40 UTC
Ok, Natrio, can you upload your kernel .config please? I need to try to reproduce it here.

Thanks.
Comment 9 Natrio 2014-11-26 10:23:43 UTC
Created attachment 158861 [details]
3.17.3-1-ARCH SMP PREEMPT i686 /proc/config.gz
Comment 10 Borislav Petkov 2014-11-26 10:43:06 UTC
Does it work if you disable CONFIG_PARAVIRT in your config? Just turn the whole damn thing off to test please.

Thanks.
Comment 11 Borislav Petkov 2014-11-26 12:01:31 UTC
Also, does your distro add microcode to the initrd, and, if so, how?
Comment 12 Borislav Petkov 2014-11-26 12:18:38 UTC
Btw, a temporary workaround is to disable the microcode loader, just add "dis_ucode_ldr" to your kernel command line.
Comment 13 Natrio 2014-11-26 12:33:11 UTC
In Arch Linux the microcode is loading like this:
$ lsinitcpio /boot/intel-ucode.img
kernel/x86/microcode/GenuineIntel.bin

in grub.cfg:
 initrd /boot/intel-ucode.img /boot/initramfs-linux.img

I said, I already had to disable it:

 initrd/boot/initramfs-linux.img

but it had no effect on this bug.
Comment 14 Natrio 2014-11-26 12:43:27 UTC
Thank you, dis_ucode_ldr workaround works for me.
Comment 15 Natrio 2014-11-26 13:11:52 UTC
I found this patch
https://projects.archlinux.org/svntogit/packages.git/tree/trunk/fix_CPU0_microcode_on_resume.patch?h=packages/linux
in Arch Linux PKGBUILD for 3.17.3 and 3.17.4
Comment 16 Borislav Petkov 2014-11-26 14:01:51 UTC
Well, Leo in comment#4 bisected it to fb86b97300d9 ("x86,
microcode: Update BSPs microcode on resume") but from
looking at the 3.17 stable tree (I have tag v3.17.4 from
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git),
this patch is not in stable yet.

So I'm guessing if you remove that patch from your kernel, it will
resume properly, correct? Can you test that?

Thanks.
Comment 17 Natrio 2014-11-26 14:13:54 UTC
I will certainly test it out next time, after CONFIG_PARAVIRT is compiled.
Comment 18 Natrio 2014-11-26 15:23:29 UTC
No changes without CONFIG_PARAVIRT
Comment 19 Natrio 2014-11-26 18:45:24 UTC
The 3.17.3 kernel, builded without this patch, wakes normally.

The 3.17.4-1-ARCH i686 (with patch) dies same like 3.17.3-1-ARCH i686, dis_ucode_ldr workaround works too.
Comment 20 Borislav Petkov 2014-11-26 20:30:55 UTC
Ok, thanks for testing and verifying my suggestions - that was very helpful!

Here's the deal:

fb86b97300d9 ("x86, microcode: Update BSPs microcode on resume") breaks
32-bit resume from suspend and I can reproduce it here too. So, it needs
to be fixed properly.

For now you can use 'dis_ucode_ldr' so that you can suspend
successfully. In addition, I'll send this upstream so that 3.18 is not
broken on 32-bit:

---
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index 2ce9051174e6..08fe6e8a726e 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -465,6 +465,7 @@ static void mc_bp_resume(void)
 
 	if (uci->valid && uci->mc)
 		microcode_ops->apply_microcode(cpu);
+#ifdef CONFIG_X86_64
 	else if (!uci->mc)
 		/*
 		 * We might resume and not have applied late microcode but still
@@ -473,6 +474,7 @@ static void mc_bp_resume(void)
 		 * applying patches early on the APs.
 		 */
 		load_ucode_ap();
+#endif
 }
 
 static struct syscore_ops mc_syscore_ops = {
--

We still need the loading on 64-bit because it fixes
https://bugzilla.kernel.org/show_bug.cgi?id=88001 and loading microcode
on the BSP is needed there for the HLE-disabling microcode patch.

And, if you'd like to test the real fix once I have it, let me know and
I'll send it to you.

So thanks again.
Comment 21 Borislav Petkov 2014-11-26 23:06:27 UTC
Ok, fixed :)

$ dmesg | grep -i microcode
[    0.000000] CPU0 microcode updated early to revision 0x1b, date = 2014-05-29
[    0.079242] CPU2 microcode updated early to revision 0x1b, date = 2014-05-29
[    0.349532] microcode: CPU0 sig=0x306a9, pf=0x10, revision=0x1b
[    0.349608] microcode: CPU1 sig=0x306a9, pf=0x10, revision=0x1b
[    0.349685] microcode: CPU2 sig=0x306a9, pf=0x10, revision=0x1b
[    0.349762] microcode: CPU3 sig=0x306a9, pf=0x10, revision=0x1b
[    0.349932] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[   20.190146] CPU0 microcode updated early to revision 0x1b, date = 2014-05-29
[   20.255867] CPU2 microcode updated early to revision 0x1b, date = 2014-05-29

Last two lines are after resume:

$ rdmsr --all 0x8b
1b00000000
1b00000000
1b00000000
1b00000000
$ grep microcode /proc/cpuinfo
microcode       : 0x1b
microcode       : 0x1b
microcode       : 0x1b
microcode       : 0x1b

Now, the fix is not small and trivial and since it is pretty late for
3.18, for it I'll send the small hunk in the previous comment which
disables reloading on 32-bit. As a workaround there, you can do

$ echo 1 > /sys/devices/system/cpu/microcode/reload

as root and microcode will get updated, provided you have installed
latest microcode to /lib/firmware/intel-ucode/.

I'll have more time for 3.19 when it all will hopefully be sorted out
properly.

Thanks.
Comment 22 Borislav Petkov 2014-11-27 00:42:46 UTC
In case you want to give it a try, I'm attaching the dirty version which works. It should apply cleanly to 3.17 except the hunk touching mc_bp_resume() but fixing up by hand should be easy.

I'll do proper fix with changelog and the whole shebang these days after having tested the hell out of it. The AMD side needs fixing too but that doesn't matter for your machine as you're running an Intel box.

Thanks.
Comment 23 Borislav Petkov 2014-11-27 00:43:20 UTC
Created attachment 158881 [details]
dirty fix
Comment 24 Natrio 2014-11-27 11:55:18 UTC
>> Comment by SATO Tatsuya (tattsan) - Thursday, 27 November 2014, 15:23 GMT+4
> I've built kernel 3.17.4 with the new patch
> https://bugzilla.kernel.org/show_bug.cgi?id=88391#c23 ,
> and it seems to work fine. The author says the patch is dirty, and he'll
> clean it up.
Comment 25 Borislav Petkov 2014-11-27 12:09:20 UTC
Ok, thanks for confirming.

Just to clarify for the arch-linux kernel: the patch in there needs the
#ifdef CONFIG_X86_64 around it as in comment #20. Btw, this patch is
wrong:

https://projects.archlinux.org/svntogit/packages.git/tree/trunk/fix_CPU0_microcode_on_resume.patch

- it should be this one:

http://git.kernel.org/tip/fb86b97300d930b57471068720c52bfa8622eab7

Now, I've stopped it from going to -stable but since the fix for
https://bugzilla.kernel.org/show_bug.cgi?id=88001 is still needed and if
you guys want to have that addressed, I can give you the upstream commit
which goes ontop of fb86b97300d930b57471068720c52bfa8622eab7 once I have
tested it.

Ask me if there still is misunderstanding.

Thanks.
Comment 26 Natrio 2014-11-27 12:34:26 UTC
Thank you, now I finally understood:

Arch Linux maintainers have taken a "test" version of your patch
https://bugzilla.kernel.org/show_bug.cgi?id=88001#c3
https://bugzilla.kernel.org/attachment.cgi?id=157641&action=diff
and applied it to a stable kernels 3.17.3 and 3.17.4

I will repost your comment to Arch Linux bugtracker.
Comment 27 Borislav Petkov 2014-11-27 12:41:02 UTC
Right, this was a dirty test for the bug reporter to verify.
Comment 28 Borislav Petkov 2015-01-13 20:48:14 UTC
Btw, 3.18-stable has received the full backport of fixes so if you want to upgrade...
Comment 29 Borislav Petkov 2015-10-16 18:14:02 UTC
Old one, closing.

Note You need to log in before you can comment on or make changes to this bug.