Bug 88391
Summary: | Not waking up after suspend-to-RAM | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Natrio (natrio) |
Component: | i386 | Assignee: | Borislav Petkov (bp) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | aaron.lu, jclw, keepitsimpleengineer, maxtram95, politesniper777, rui.zhang |
Priority: | P1 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | 3.17.3 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
3.17.3-1-ARCH SMP PREEMPT i686 /proc/config.gz
dirty fix |
Description
Natrio
2014-11-18 05:56:00 UTC
Regression from v3.17.2->v3.17.3 for i386 edition Can you please do a git bisect to find the offending commit? It's too difficult for me, but this bug is quite common in different i686 builds for a number of machines: https://bugs.archlinux.org/task/42820 Can you please ask someone there to do the bisect? This bisects to fb86b97, the fix for https://bugzilla.kernel.org/show_bug.cgi?id=88001 . The Arch Linux 3.17.3 kernel has this patch as well, hence the problems. This is with early microcode enabled and happens whether the Intel microcode is loaded with the initrd or not. No problem without early microcode updates. Boris, Your commit fb86b97300d9 ("x86, microcode: Update BSPs microcode on resume") seems to cause resume trouble for some people, please take a look, thanks. It must be so wrong fix, because the kernel is dying anyway, regardless of the presence or absence of the loaded microcode upgrade. In the 3.17.2 kernel I had to disable the microcode update, since it is only updated on one of the two cores (Celeron G530, without hyper-threading): $ dmesg | grep microcode [ 0.000000] CPU0 microcode updated early to revision 0x29, date = 2013-06-12 [ 0.327680] microcode: CPU0 sig=0x206a7, pf=0x2, revision=0x29 [ 0.327687] microcode: CPU1 sig=0x206a7, pf=0x2, revision=0x23 [ 0.327689] perf_event_intel: PEBS disabled due to CPU errata, please upgrade microcode [ 0.327749] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba and in 3.17.3 this behavior was NOT changed. Ok, Natrio, can you upload your kernel .config please? I need to try to reproduce it here. Thanks. Created attachment 158861 [details]
3.17.3-1-ARCH SMP PREEMPT i686 /proc/config.gz
Does it work if you disable CONFIG_PARAVIRT in your config? Just turn the whole damn thing off to test please. Thanks. Also, does your distro add microcode to the initrd, and, if so, how? Btw, a temporary workaround is to disable the microcode loader, just add "dis_ucode_ldr" to your kernel command line. In Arch Linux the microcode is loading like this: $ lsinitcpio /boot/intel-ucode.img kernel/x86/microcode/GenuineIntel.bin in grub.cfg: initrd /boot/intel-ucode.img /boot/initramfs-linux.img I said, I already had to disable it: initrd/boot/initramfs-linux.img but it had no effect on this bug. Thank you, dis_ucode_ldr workaround works for me. I found this patch https://projects.archlinux.org/svntogit/packages.git/tree/trunk/fix_CPU0_microcode_on_resume.patch?h=packages/linux in Arch Linux PKGBUILD for 3.17.3 and 3.17.4 Well, Leo in comment#4 bisected it to fb86b97300d9 ("x86, microcode: Update BSPs microcode on resume") but from looking at the 3.17 stable tree (I have tag v3.17.4 from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git), this patch is not in stable yet. So I'm guessing if you remove that patch from your kernel, it will resume properly, correct? Can you test that? Thanks. I will certainly test it out next time, after CONFIG_PARAVIRT is compiled. No changes without CONFIG_PARAVIRT The 3.17.3 kernel, builded without this patch, wakes normally. The 3.17.4-1-ARCH i686 (with patch) dies same like 3.17.3-1-ARCH i686, dis_ucode_ldr workaround works too. Ok, thanks for testing and verifying my suggestions - that was very helpful! Here's the deal: fb86b97300d9 ("x86, microcode: Update BSPs microcode on resume") breaks 32-bit resume from suspend and I can reproduce it here too. So, it needs to be fixed properly. For now you can use 'dis_ucode_ldr' so that you can suspend successfully. In addition, I'll send this upstream so that 3.18 is not broken on 32-bit: --- diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index 2ce9051174e6..08fe6e8a726e 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -465,6 +465,7 @@ static void mc_bp_resume(void) if (uci->valid && uci->mc) microcode_ops->apply_microcode(cpu); +#ifdef CONFIG_X86_64 else if (!uci->mc) /* * We might resume and not have applied late microcode but still @@ -473,6 +474,7 @@ static void mc_bp_resume(void) * applying patches early on the APs. */ load_ucode_ap(); +#endif } static struct syscore_ops mc_syscore_ops = { -- We still need the loading on 64-bit because it fixes https://bugzilla.kernel.org/show_bug.cgi?id=88001 and loading microcode on the BSP is needed there for the HLE-disabling microcode patch. And, if you'd like to test the real fix once I have it, let me know and I'll send it to you. So thanks again. Ok, fixed :) $ dmesg | grep -i microcode [ 0.000000] CPU0 microcode updated early to revision 0x1b, date = 2014-05-29 [ 0.079242] CPU2 microcode updated early to revision 0x1b, date = 2014-05-29 [ 0.349532] microcode: CPU0 sig=0x306a9, pf=0x10, revision=0x1b [ 0.349608] microcode: CPU1 sig=0x306a9, pf=0x10, revision=0x1b [ 0.349685] microcode: CPU2 sig=0x306a9, pf=0x10, revision=0x1b [ 0.349762] microcode: CPU3 sig=0x306a9, pf=0x10, revision=0x1b [ 0.349932] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 20.190146] CPU0 microcode updated early to revision 0x1b, date = 2014-05-29 [ 20.255867] CPU2 microcode updated early to revision 0x1b, date = 2014-05-29 Last two lines are after resume: $ rdmsr --all 0x8b 1b00000000 1b00000000 1b00000000 1b00000000 $ grep microcode /proc/cpuinfo microcode : 0x1b microcode : 0x1b microcode : 0x1b microcode : 0x1b Now, the fix is not small and trivial and since it is pretty late for 3.18, for it I'll send the small hunk in the previous comment which disables reloading on 32-bit. As a workaround there, you can do $ echo 1 > /sys/devices/system/cpu/microcode/reload as root and microcode will get updated, provided you have installed latest microcode to /lib/firmware/intel-ucode/. I'll have more time for 3.19 when it all will hopefully be sorted out properly. Thanks. In case you want to give it a try, I'm attaching the dirty version which works. It should apply cleanly to 3.17 except the hunk touching mc_bp_resume() but fixing up by hand should be easy. I'll do proper fix with changelog and the whole shebang these days after having tested the hell out of it. The AMD side needs fixing too but that doesn't matter for your machine as you're running an Intel box. Thanks. Created attachment 158881 [details]
dirty fix
>> Comment by SATO Tatsuya (tattsan) - Thursday, 27 November 2014, 15:23 GMT+4
> I've built kernel 3.17.4 with the new patch
> https://bugzilla.kernel.org/show_bug.cgi?id=88391#c23 ,
> and it seems to work fine. The author says the patch is dirty, and he'll
> clean it up.
Ok, thanks for confirming. Just to clarify for the arch-linux kernel: the patch in there needs the #ifdef CONFIG_X86_64 around it as in comment #20. Btw, this patch is wrong: https://projects.archlinux.org/svntogit/packages.git/tree/trunk/fix_CPU0_microcode_on_resume.patch - it should be this one: http://git.kernel.org/tip/fb86b97300d930b57471068720c52bfa8622eab7 Now, I've stopped it from going to -stable but since the fix for https://bugzilla.kernel.org/show_bug.cgi?id=88001 is still needed and if you guys want to have that addressed, I can give you the upstream commit which goes ontop of fb86b97300d930b57471068720c52bfa8622eab7 once I have tested it. Ask me if there still is misunderstanding. Thanks. Thank you, now I finally understood: Arch Linux maintainers have taken a "test" version of your patch https://bugzilla.kernel.org/show_bug.cgi?id=88001#c3 https://bugzilla.kernel.org/attachment.cgi?id=157641&action=diff and applied it to a stable kernels 3.17.3 and 3.17.4 I will repost your comment to Arch Linux bugtracker. Right, this was a dirty test for the bug reporter to verify. Btw, 3.18-stable has received the full backport of fixes so if you want to upgrade... Old one, closing. |