Bug 35522

Summary: microcode: CPU0: family 15 not supported
Product: Platform Specific/Hardware Reporter: Joshua Covington (joshuacov)
Component: i386Assignee: platform_i386
Status: RESOLVED INVALID    
Severity: blocking CC: bp
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.39 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: dmesg with blacklisted microcode
dump who's adamant about loading the microcode
/lib/udev/rules.d/89-microcode.rules
dmesg-2.6.39-0.fc16.x86_64 with the udev rules removed

Description Joshua Covington 2011-05-20 21:10:10 UTC
I was testing with the lastest kernel-2.6.39-0.fc16 and was flooded with the error "CPU AuthenticAMD: family 15 not supported". As a result the kernel couldn't load.

I tried to find the offending commit but only got the commit for the error message: 209b6c8fa72e8b726a0cd273a56aded55be22bfa. I'm not sure if this is the right one though.

My cpu worked fine with the 2.6.38.6 and the cpu is:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 104
model name      : AMD Turion(tm) 64 X2 Mobile Technology TL-60
stepping        : 1
cpu MHz         : 2000.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv
bogomips        : 1999.99
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 104
model name      : AMD Turion(tm) 64 X2 Mobile Technology TL-60
stepping        : 1
cpu MHz         : 2000.000
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch lbrv
bogomips        : 1999.99
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

The ouput from 2.6.38.6 is:
[    9.600723] microcode: microcode: CPU0: AMD CPU family 0xf not supported
[    9.600736] microcode: microcode: CPU1: AMD CPU family 0xf not supported
[    9.600839] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

and everything is fine.
Comment 1 Joshua Covington 2011-05-20 22:31:28 UTC
Created attachment 58842 [details]
dmesg with blacklisted microcode

The only way to start the computer is to blacklist the microcode module. This is the corresponding dmesg.

After succesfully booting with the disabled module I tried to manually load it and got the same error "microcode: CPU0: family 15 not supported" once again. No lockups this time.

Obviously something tries to load the module at startup and goes in an infinite loop (hense the error flood).
Comment 2 Borislav Petkov 2011-05-27 09:38:26 UTC
Hi,

it looks like some init script is trying very hard to load the microcode
driver and it doesn't want to give up.

You could grep in /etc/init.d for it:

grep -EriIn 'microcode' /etc/init.d/*

and upload it here so that I can stare at it. Btw, /etc/init.d is the
script directory on Debian, I dunno whici is on Fedora so you'll have to
do some more grepping :).

Thanks.
Comment 3 Joshua Covington 2011-05-27 18:43:35 UTC
grep -EriIn 'microcode' /etc/*

returns no results. I think maybe the fedora kernel is configured to load the code automatically. You can look at the config file in the rpm-package.
Comment 4 Borislav Petkov 2011-05-28 08:16:08 UTC
Created attachment 59722 [details]
dump who's adamant about loading the microcode
Comment 5 Borislav Petkov 2011-05-28 08:18:42 UTC
Yeah,

it could be an udev helper. Let's go and dump it, can you apply the
patch in the above attachment, build and boot the new kernel and send me
the dmesg from it?

Thanks.
Comment 6 Joshua Covington 2011-05-28 12:03:06 UTC
is there another way to test this instead of recompiling the kernel? on my mashine it'll take about an hour and it's really frustrating to recompile the it.
Comment 7 Joshua Covington 2011-05-28 12:05:30 UTC
I don't have this problem with the 2.6.38.x. This was definitely introduced in the 2.6.39. Is there a way to go throgh the commits and see what was changed about loading the code?

This way I'll save one recompilation.
Comment 8 Borislav Petkov 2011-05-28 14:45:47 UTC
In that case, you have to go over your system scripts, udev installation
files, etc and try to find out that piece of code which tries to load
the microcode driver and then send it to me.

I know you didn't have a problem with .38.x - the driver was broken
then, strictly speaking, because it would load and stay loaded even if
it encountered an error while loading (like the not suppported family,
in your case). Now it propagates proper error return codes and therefore
it refuses to load making that userspace script retry endlessly.
Comment 9 Joshua Covington 2011-05-28 15:07:42 UTC
Created attachment 59842 [details]
/lib/udev/rules.d/89-microcode.rules

I think this is the file responsible for loading the microcode. I hope this can help.
Comment 10 Borislav Petkov 2011-05-29 20:02:08 UTC
Oh ok,

in that case, simply remove that file and you should be fine. There are
currently no plans to support K8 microcode anyway so you don't need to
load the module on your system.

I still need to test that udev file on a K8 system though, to see why
udev doesn't give up and loops, as you say. Which version of udev is
installed on your system?

Thanks.
Comment 11 Joshua Covington 2011-05-29 22:51:30 UTC
I have

libudev-161-8.fc14.x86_64
udev-161-8.fc14.x86_64

A better way is just to blacklist the microcode module but I'm sure if the ordinary user can come up to this. That's why I filed this bug. Whatever is causing the error (faulty udev or bad returncode from the module that makes udev try over and over) should be fixed because at the end this is a regression from 2.6.38.x and block the loading process.
Comment 12 Borislav Petkov 2011-05-31 18:42:02 UTC
(In reply to comment #11)
> I have
> 
> libudev-161-8.fc14.x86_64
> udev-161-8.fc14.x86_64
>
> A better way is just to blacklist the microcode module but I'm sure if
> the ordinary user can come up to this.

Not really. It is actually very simple:

* a userspace agent tries to load the module
* there's an error/unsupported CPU configuration so the module errors out while
loading
* userspace agent gets the error and continues on with its merry existence

> That's why I filed this bug. Whatever is causing the error (faulty
> udev or bad returncode from the module that makes udev try over and
> over) should be fixed because at the end this is a regression from
> 2.6.38.x and block the loading process.

Yes, so udev should be fixed. But I'm having hard time reproducing this
with your rules file - I can't trigger it on Debian or Gentoo.

Btw, does upgrading your udev actually help?

Thanks.
Comment 13 Joshua Covington 2011-05-31 19:46:23 UTC
No, I upgraded to 167.4.fc15 and the problem still exists.

Do you need additional information like dmesg or some others logs?

My point here is that the very same userspace agent loads all other modules and when it catches an error then it goes on. No errors here. So why does this agent actually blocks the loading of the whole kernel? Should it just try to load the module in the backgrund so that the kernel itself should normally continue to load?
Comment 14 Borislav Petkov 2011-05-31 20:00:34 UTC
(In reply to comment #13)
> No, I upgraded to 167.4.fc15 and the problem still exists.
> 
> Do you need additional information like dmesg or some others logs?
> 
> My point here is that the very same userspace agent loads all other modules
> and
> when it catches an error then it goes on.

What does that exactly mean? Do you keep getting

[    9.600723] microcode: microcode: CPU0: AMD CPU family 0xf not supported

in dmesg continuously?

> No errors here. So why does this agent actually blocks the loading
> of the whole kernel? Should it just try to load the module in the
> backgrund so that the kernel itself should normally continue to load?

No - this module shouldn't load on your machine because it has no K8
support (simply because we don't need to add it, and probably never will :)) so
I think removing the .rules file is the simplest solution.

Thanks.
Comment 15 Joshua Covington 2011-05-31 20:54:25 UTC
Yes, I do keep getting exactly the same error.

I don't think that removing those rules is a good solution. This could be a temporary fix but I still think this is something with the module itself. 

Asking all users with k8 processors to remove those rules isn't exactly a solution. And I'm still puzzled why you cannot reproduce it. I'll try to attach a dmesg. Maybe this can help.
Comment 16 Joshua Covington 2011-05-31 21:07:16 UTC
Created attachment 60272 [details]
dmesg-2.6.39-0.fc16.x86_64 with the udev rules removed

as you can see removing the udev rules "fixes" the problem this is just a temporary solution.
Comment 17 Joshua Covington 2011-05-31 21:20:53 UTC
I tried to manually load the module after the kernel is fully loaded:

[root@localhost ~]# modprobe -v microcode
insmod /lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko 
FATAL: Error inserting microcode (/lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko): Invalid argument

and after this I got the following into dmesg:

[  170.459160] microcode: CPU0: family 15 not supported

Maybe this can give some additional clues:

[root@localhost ~]# modinfo microcode
filename:       /lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko
alias:          devname:cpu/microcode
alias:          char-major-10-184
license:        GPL
author:         Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
description:    Microcode Update Driver
license:        GPL
author:         Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
description:    Microcode Update Driver
license:        GPL v2
author:         Peter Oruba
description:    AMD Microcode Update Driver
srcversion:     0ED67A420545A3D09246FE1
depends:        
vermagic:       2.6.39-0.fc16.x86_64 SMP mod_unload

This "Invalid argument" message still makes me believe that the problem is with the microcode module itself but not with the udev.
Comment 18 Borislav Petkov 2011-06-01 06:23:47 UTC
(In reply to comment #17)
> I tried to manually load the module after the kernel is fully loaded:
> 
> [root@localhost ~]# modprobe -v microcode
> insmod /lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko 
> FATAL: Error inserting microcode
> (/lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko):
> Invalid argument
> 
> and after this I got the following into dmesg:
> 
> [  170.459160] microcode: CPU0: family 15 not supported

no, this is exactly the desired behavior. The error code is propagated
to userspace so that someone loading the module actually knows that it
failed. So if something loads the module, it should see the -EINVAL (aka
Invalid Argument) and don't try anymore.
Comment 19 Joshua Covington 2011-06-01 07:25:37 UTC
hmm, I'm out of ideas. The only difference that I saw between manually loading the module and the automatic loading is that after manually loading it I wasn't flooded with those error messages. modprobe just exited without trying for ever.

So why does only this modules causes this error? Is it something particular to it or...?

Now it's up to you.
Comment 20 Joshua Covington 2011-06-03 14:06:39 UTC
I finally recompiled the kernel with your patch. The problem is when the module is automatically loaded it send constantly the stack to the output but it's not written in the dmesg. Do you know how to "save" this output to the dmesg so that it can be viewed on the next restart?

When manually loading the module I get this:

[root@localhost ~]# modprobe -v microcode
insmod /lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko 
FATAL: Error inserting microcode (/lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko): Invalid argument

and this:

[  118.066576] microcode: CPU0: family 15 not supported
[  118.066594] Pid: 1743, comm: modprobe Not tainted 2.6.39-1.fc14.x86_64 #1
[  118.066601] Call Trace:
[  118.066629]  [<ffffffffa0425f42>] collect_cpu_info_amd+0x4a/0x78 [microcode]
[  118.066646]  [<ffffffffa0425000>] ? 0xffffffffa0424fff
[  118.066660]  [<ffffffffa0425023>] collect_cpu_info_local+0x23/0x2a [microcode]
[  118.066675]  [<ffffffff81080652>] smp_call_function_single+0x8f/0xfd
[  118.066689]  [<ffffffffa04253bf>] microcode_init_cpu+0x51/0xc7 [microcode]
[  118.066703]  [<ffffffffa042549e>] mc_sysdev_add+0x69/0x8f [microcode]
[  118.066718]  [<ffffffff812e8df5>] sysdev_driver_register+0xc3/0x139
[  118.066728]  [<ffffffffa042c000>] ? 0xffffffffa042bfff
[  118.066741]  [<ffffffffa042c0c3>] microcode_init+0xc3/0x13e [microcode]
[  118.066755]  [<ffffffff81072c1e>] ? __blocking_notifier_call_chain+0x56/0x60
[  118.066768]  [<ffffffff81002099>] do_one_initcall+0x7f/0x136
[  118.066780]  [<ffffffff810854d5>] sys_init_module+0xa6/0x1e4
[  118.066795]  [<ffffffff814828c2>] system_call_fastpath+0x16/0x1b
Comment 21 Borislav Petkov 2011-06-03 17:44:17 UTC
(In reply to comment #20)
> I finally recompiled the kernel with your patch. The problem is when
> the module is automatically loaded it send constantly the stack to the
> output but it's not written in the dmesg.

The stack dump does land in dmesg - simply do 'dmesg' on the console
after doing the modprobe.

> Do you know how to "save" this output to the dmesg so that it can be
> viewed on the next restart?

I'm afraid I don't understand what you're asking here. dmesg is the
kernel buffer of the currently running kernel. Syslog files like
/var/log/syslog or /var/log/messages contain logs from older kernels.

> 
> When manually loading the module I get this:
> 
> [root@localhost ~]# modprobe -v microcode
> insmod /lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko 
> FATAL: Error inserting microcode
> (/lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko):
> Invalid argument
> 
> and this:
> 
> [  118.066576] microcode: CPU0: family 15 not supported
> [  118.066594] Pid: 1743, comm: modprobe Not tainted 2.6.39-1.fc14.x86_64 #1
				  ^^^^^^^^

Well, this wasn't needed; I was interested in the process that loads the
microcode driver (marked above) but you've already found out by grepping
that it was udev causing it and that removing the rules file fixes the
issue.

I think you should close this bug as RESOLVED now - there's nothing else
I can do here. If you still feel that udev needs fixing please open a
bug against Fedora's udev.

Thanks.
Comment 22 Joshua Covington 2011-06-21 22:10:23 UTC
I filed a bug against udev: https://bugzilla.redhat.com/show_bug.cgi?id=715118
Comment 23 Joshua Covington 2011-06-30 18:00:32 UTC
This has been fixed with the following udev rules:

KERNEL=="microcode", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu"
KERNEL=="devices/platform/microcode", ACTION=="add", RUN+="/sbin/microcode_ctl
-Qu"

More information here: https://bugzilla.redhat.com/show_bug.cgi?id=690930#c26

The fix is in microcode_ctl-1.17-15.fc15