Bug 12344

Summary: Erratic behavior observed with amd64 when using Intel microcode patch loading support with 2.6.28-gentoo on x86_64
Product: Other Reporter: Bob Raitz (pappy_mcfae)
Component: OtherAssignee: other_other
Status: CLOSED UNREPRODUCIBLE    
Severity: normal CC: alan
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.28-gentoo, 2.6.28 Subsystem:
Regression: No Bisected commit-id:
Attachments: 2.6.28 config.
the buggy 2.6.28-gentoo .config
dmesg with microcode enabled
/var/log/messages file with microcode enabled.
2.6.27-gentoo-r7 .config file
/var/log/dmesg for 2.6.27-gentoo-r7
/var/log/messages for 2.6.27-gentoo-r7 kernel
.config for 2.6.28-gentoo WITHOUT microcode enabled.
dmesg for 2.6.28-gentoo with microcode disabled.
/var/log/messages file for 2.6.28-gentoo without microcode enabled.
2.6.28-gentoo .config WITH microcode enabled
/var/log/dmesg for 2.6.28-gentoo with microcode enabled.
/var/log/messages for 2.6.28-gentoo with microcode enabled.

Description Bob Raitz 2009-01-01 19:28:06 UTC
Latest working kernel version: 2.6.7.10, 2.6.27-gerntoo-r7
Earliest failing kernel version: 2.6.28, 2.6.28-gentoo
Distribution: Gentoo
Hardware Environment: amd64 (Core2 Duo)
Software Environment: ?
Problem Description: Enabling Intel microcode patch loading support in the kernel causes erratic operation (lockups, slow-downs, especially while booting)

Steps to reproduce: 
1-Enable Intel microcode patch loading support.
2-Boot system.
3-Erratic behavior.
Comment 1 Bob Raitz 2009-01-01 19:34:27 UTC
This bug has been reported to Gentoo Bugzilla (https://bugs.gentoo.org/show_bug.cgi?id=252798). At their request, I am opening a bug here. Partial list of erratic behaviors.
1) 1-2 second delay from lilo until anything shows up on the screen.
2) Long delays while initializing MS USB mouse.
3) Thirty second (or longer) initialization time for ntp-client.
4) X instability, most precisely, hard lock while exiting KDE-3.5.10.
Comment 2 Bob Raitz 2009-01-01 19:36:36 UTC
Created attachment 19594 [details]
2.6.28 config.

This is the .config for the vanilla 2.6.28 kernel. Note that the microcode is turned off, and this kernel will boot and run properly, as far as I can tell.
Comment 3 Bob Raitz 2009-01-01 19:38:03 UTC
Created attachment 19595 [details]
the buggy 2.6.28-gentoo .config 

This is the gentoo-sources .config with the offending microcode setting set to on.
Comment 4 Bob Raitz 2009-01-01 19:39:53 UTC
Created attachment 19596 [details]
dmesg with microcode enabled
Comment 5 Bob Raitz 2009-01-01 19:44:43 UTC
Created attachment 19597 [details]
/var/log/messages file with microcode enabled.
Comment 6 Dave Jones 2009-01-01 19:50:35 UTC
some of these symptoms sound unrelated. there's no way for eg that the microcode loader could affect the lilo delay, as it doesn't do anything until userspace tells it to.

It seems a bit of a stretch that the other problems are related to this too.
From the look of the dmesg, the driver isn't even being used. There should be a message in there if it was updating it, saying which version it updated to.

Finally, some of the other stuff in the dmesg is kind of worrying..

[    0.000000] AMI BIOS detected: BIOS may corrupt low RAM, working it around.

BIOS update available by any chance?

[    0.195599] alg: cipher: Test 1 failed on encryption for aes-asm
[    0.195609] 00000000: 00 01 02 03 04 05 06 07 08 08 08 08 08 08 08 08 

This is really worrying.  I'd lean towards suspecting either a compiler bug, or possible hardware failure rather than a driver that isn't even being used.

I notice from the log you're also running boinc. Keeping hardware 100% utilised is definitely going to show up problems if there are hardware problems that would otherwise not show up.
Comment 7 Bob Raitz 2009-01-01 20:00:09 UTC
Well, I'd buy that if the 2.6.27.10 kernel didn't work perfectly, and if turning on microcode didn't cause problems. 

This system has been running perfectly since it came on line. I see no way that this problem is related to anything but the microcode setting.

As for a compiler bug, that could be. I was using gcc-4.2.3 at the time, so if there was a bug with it, it could be a manifestation. I just updated to gcc-4.3.2, so I suppose I can check the compiler bug. I don't think it's going to make a difference, but I'll try anything.

I'm in the middle of updating all my software with gcc-4.3.2. As soon as that's done, I'll recompile the kernel, and see if that makes a difference.
Comment 8 Bob Raitz 2009-01-02 13:17:36 UTC
Hardware makeup of system:

core-too ~ # lspci
00:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82945G/GZ Integrated Graphics Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
01:00.0 Ethernet controller: VIA Technologies, Inc. VT86C100A [Rhine] (rev 06)
01:01.0 Multimedia audio controller: VIA Technologies Inc. ICE1712 [Envy24] PCI Multi-Channel I/O Controller (rev 02)

So now, you all know exactly what kind of system we're working with here.
Comment 9 Bob Raitz 2009-01-02 13:20:48 UTC
Created attachment 19610 [details]
2.6.27-gentoo-r7 .config file

This would be the control. With this kernel, the machine operates normally as far as I can tell. There are no obvious errors in operation.
Comment 10 Bob Raitz 2009-01-02 13:23:20 UTC
Created attachment 19611 [details]
/var/log/dmesg for 2.6.27-gentoo-r7

Note that the phrase "alg: cipher: Test 1 failed on encryption for aes-asm" does not show up.
Comment 11 Bob Raitz 2009-01-02 13:24:27 UTC
Created attachment 19612 [details]
/var/log/messages for 2.6.27-gentoo-r7 kernel
Comment 12 Bob Raitz 2009-01-02 13:26:39 UTC
Please note all kernels were compiled with gcc-4.3.2, and a "make mrproper" was done to insure no old object code from the previous gcc version (4.2.3).
Comment 13 Bob Raitz 2009-01-02 13:33:48 UTC
Created attachment 19613 [details]
.config for 2.6.28-gentoo WITHOUT microcode enabled.
Comment 14 Bob Raitz 2009-01-02 13:36:49 UTC
Created attachment 19614 [details]
dmesg for 2.6.28-gentoo with microcode disabled.

dmesg with 2.6.28-gentoo kernel. Note reappearance of "alg: cipher: Test 1 failed on encryption for aes-asm"
Comment 15 Bob Raitz 2009-01-02 13:38:12 UTC
Created attachment 19615 [details]
/var/log/messages file for 2.6.28-gentoo without microcode enabled.
Comment 16 Bob Raitz 2009-01-02 13:48:46 UTC
Created attachment 19616 [details]
2.6.28-gentoo .config WITH microcode enabled
Comment 17 Bob Raitz 2009-01-02 13:49:54 UTC
Created attachment 19617 [details]
/var/log/dmesg for 2.6.28-gentoo with microcode enabled.
Comment 18 Bob Raitz 2009-01-02 13:52:04 UTC
Created attachment 19618 [details]
/var/log/messages for 2.6.28-gentoo with microcode enabled.

Note: system is fairly unstable with this kernel. This file and the two above were pulled across the network.
Comment 19 Bob Raitz 2009-01-02 13:53:39 UTC
Hope I didn't overload, but I'd like to be able to use this kernel version.
Comment 20 Bob Raitz 2009-01-11 00:03:24 UTC
Did I miss some information I was supposed to send?