Bug 35522
Summary: | microcode: CPU0: family 15 not supported | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Joshua Covington (joshuacov) |
Component: | i386 | Assignee: | platform_i386 |
Status: | RESOLVED INVALID | ||
Severity: | blocking | CC: | bp |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.39 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg with blacklisted microcode
dump who's adamant about loading the microcode /lib/udev/rules.d/89-microcode.rules dmesg-2.6.39-0.fc16.x86_64 with the udev rules removed |
Description
Joshua Covington
2011-05-20 21:10:10 UTC
Created attachment 58842 [details]
dmesg with blacklisted microcode
The only way to start the computer is to blacklist the microcode module. This is the corresponding dmesg.
After succesfully booting with the disabled module I tried to manually load it and got the same error "microcode: CPU0: family 15 not supported" once again. No lockups this time.
Obviously something tries to load the module at startup and goes in an infinite loop (hense the error flood).
Hi, it looks like some init script is trying very hard to load the microcode driver and it doesn't want to give up. You could grep in /etc/init.d for it: grep -EriIn 'microcode' /etc/init.d/* and upload it here so that I can stare at it. Btw, /etc/init.d is the script directory on Debian, I dunno whici is on Fedora so you'll have to do some more grepping :). Thanks. grep -EriIn 'microcode' /etc/* returns no results. I think maybe the fedora kernel is configured to load the code automatically. You can look at the config file in the rpm-package. Created attachment 59722 [details]
dump who's adamant about loading the microcode
Yeah, it could be an udev helper. Let's go and dump it, can you apply the patch in the above attachment, build and boot the new kernel and send me the dmesg from it? Thanks. is there another way to test this instead of recompiling the kernel? on my mashine it'll take about an hour and it's really frustrating to recompile the it. I don't have this problem with the 2.6.38.x. This was definitely introduced in the 2.6.39. Is there a way to go throgh the commits and see what was changed about loading the code? This way I'll save one recompilation. In that case, you have to go over your system scripts, udev installation files, etc and try to find out that piece of code which tries to load the microcode driver and then send it to me. I know you didn't have a problem with .38.x - the driver was broken then, strictly speaking, because it would load and stay loaded even if it encountered an error while loading (like the not suppported family, in your case). Now it propagates proper error return codes and therefore it refuses to load making that userspace script retry endlessly. Created attachment 59842 [details]
/lib/udev/rules.d/89-microcode.rules
I think this is the file responsible for loading the microcode. I hope this can help.
Oh ok, in that case, simply remove that file and you should be fine. There are currently no plans to support K8 microcode anyway so you don't need to load the module on your system. I still need to test that udev file on a K8 system though, to see why udev doesn't give up and loops, as you say. Which version of udev is installed on your system? Thanks. I have libudev-161-8.fc14.x86_64 udev-161-8.fc14.x86_64 A better way is just to blacklist the microcode module but I'm sure if the ordinary user can come up to this. That's why I filed this bug. Whatever is causing the error (faulty udev or bad returncode from the module that makes udev try over and over) should be fixed because at the end this is a regression from 2.6.38.x and block the loading process. (In reply to comment #11) > I have > > libudev-161-8.fc14.x86_64 > udev-161-8.fc14.x86_64 > > A better way is just to blacklist the microcode module but I'm sure if > the ordinary user can come up to this. Not really. It is actually very simple: * a userspace agent tries to load the module * there's an error/unsupported CPU configuration so the module errors out while loading * userspace agent gets the error and continues on with its merry existence > That's why I filed this bug. Whatever is causing the error (faulty > udev or bad returncode from the module that makes udev try over and > over) should be fixed because at the end this is a regression from > 2.6.38.x and block the loading process. Yes, so udev should be fixed. But I'm having hard time reproducing this with your rules file - I can't trigger it on Debian or Gentoo. Btw, does upgrading your udev actually help? Thanks. No, I upgraded to 167.4.fc15 and the problem still exists. Do you need additional information like dmesg or some others logs? My point here is that the very same userspace agent loads all other modules and when it catches an error then it goes on. No errors here. So why does this agent actually blocks the loading of the whole kernel? Should it just try to load the module in the backgrund so that the kernel itself should normally continue to load? (In reply to comment #13) > No, I upgraded to 167.4.fc15 and the problem still exists. > > Do you need additional information like dmesg or some others logs? > > My point here is that the very same userspace agent loads all other modules > and > when it catches an error then it goes on. What does that exactly mean? Do you keep getting [ 9.600723] microcode: microcode: CPU0: AMD CPU family 0xf not supported in dmesg continuously? > No errors here. So why does this agent actually blocks the loading > of the whole kernel? Should it just try to load the module in the > backgrund so that the kernel itself should normally continue to load? No - this module shouldn't load on your machine because it has no K8 support (simply because we don't need to add it, and probably never will :)) so I think removing the .rules file is the simplest solution. Thanks. Yes, I do keep getting exactly the same error. I don't think that removing those rules is a good solution. This could be a temporary fix but I still think this is something with the module itself. Asking all users with k8 processors to remove those rules isn't exactly a solution. And I'm still puzzled why you cannot reproduce it. I'll try to attach a dmesg. Maybe this can help. Created attachment 60272 [details]
dmesg-2.6.39-0.fc16.x86_64 with the udev rules removed
as you can see removing the udev rules "fixes" the problem this is just a temporary solution.
I tried to manually load the module after the kernel is fully loaded: [root@localhost ~]# modprobe -v microcode insmod /lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko FATAL: Error inserting microcode (/lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko): Invalid argument and after this I got the following into dmesg: [ 170.459160] microcode: CPU0: family 15 not supported Maybe this can give some additional clues: [root@localhost ~]# modinfo microcode filename: /lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko alias: devname:cpu/microcode alias: char-major-10-184 license: GPL author: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> description: Microcode Update Driver license: GPL author: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> description: Microcode Update Driver license: GPL v2 author: Peter Oruba description: AMD Microcode Update Driver srcversion: 0ED67A420545A3D09246FE1 depends: vermagic: 2.6.39-0.fc16.x86_64 SMP mod_unload This "Invalid argument" message still makes me believe that the problem is with the microcode module itself but not with the udev. (In reply to comment #17) > I tried to manually load the module after the kernel is fully loaded: > > [root@localhost ~]# modprobe -v microcode > insmod /lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko > FATAL: Error inserting microcode > (/lib/modules/2.6.39-0.fc16.x86_64/kernel/arch/x86/kernel/microcode.ko): > Invalid argument > > and after this I got the following into dmesg: > > [ 170.459160] microcode: CPU0: family 15 not supported no, this is exactly the desired behavior. The error code is propagated to userspace so that someone loading the module actually knows that it failed. So if something loads the module, it should see the -EINVAL (aka Invalid Argument) and don't try anymore. hmm, I'm out of ideas. The only difference that I saw between manually loading the module and the automatic loading is that after manually loading it I wasn't flooded with those error messages. modprobe just exited without trying for ever. So why does only this modules causes this error? Is it something particular to it or...? Now it's up to you. I finally recompiled the kernel with your patch. The problem is when the module is automatically loaded it send constantly the stack to the output but it's not written in the dmesg. Do you know how to "save" this output to the dmesg so that it can be viewed on the next restart? When manually loading the module I get this: [root@localhost ~]# modprobe -v microcode insmod /lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko FATAL: Error inserting microcode (/lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko): Invalid argument and this: [ 118.066576] microcode: CPU0: family 15 not supported [ 118.066594] Pid: 1743, comm: modprobe Not tainted 2.6.39-1.fc14.x86_64 #1 [ 118.066601] Call Trace: [ 118.066629] [<ffffffffa0425f42>] collect_cpu_info_amd+0x4a/0x78 [microcode] [ 118.066646] [<ffffffffa0425000>] ? 0xffffffffa0424fff [ 118.066660] [<ffffffffa0425023>] collect_cpu_info_local+0x23/0x2a [microcode] [ 118.066675] [<ffffffff81080652>] smp_call_function_single+0x8f/0xfd [ 118.066689] [<ffffffffa04253bf>] microcode_init_cpu+0x51/0xc7 [microcode] [ 118.066703] [<ffffffffa042549e>] mc_sysdev_add+0x69/0x8f [microcode] [ 118.066718] [<ffffffff812e8df5>] sysdev_driver_register+0xc3/0x139 [ 118.066728] [<ffffffffa042c000>] ? 0xffffffffa042bfff [ 118.066741] [<ffffffffa042c0c3>] microcode_init+0xc3/0x13e [microcode] [ 118.066755] [<ffffffff81072c1e>] ? __blocking_notifier_call_chain+0x56/0x60 [ 118.066768] [<ffffffff81002099>] do_one_initcall+0x7f/0x136 [ 118.066780] [<ffffffff810854d5>] sys_init_module+0xa6/0x1e4 [ 118.066795] [<ffffffff814828c2>] system_call_fastpath+0x16/0x1b (In reply to comment #20) > I finally recompiled the kernel with your patch. The problem is when > the module is automatically loaded it send constantly the stack to the > output but it's not written in the dmesg. The stack dump does land in dmesg - simply do 'dmesg' on the console after doing the modprobe. > Do you know how to "save" this output to the dmesg so that it can be > viewed on the next restart? I'm afraid I don't understand what you're asking here. dmesg is the kernel buffer of the currently running kernel. Syslog files like /var/log/syslog or /var/log/messages contain logs from older kernels. > > When manually loading the module I get this: > > [root@localhost ~]# modprobe -v microcode > insmod /lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko > FATAL: Error inserting microcode > (/lib/modules/2.6.39-1.fc14.x86_64/kernel/arch/x86/kernel/microcode.ko): > Invalid argument > > and this: > > [ 118.066576] microcode: CPU0: family 15 not supported > [ 118.066594] Pid: 1743, comm: modprobe Not tainted 2.6.39-1.fc14.x86_64 #1 ^^^^^^^^ Well, this wasn't needed; I was interested in the process that loads the microcode driver (marked above) but you've already found out by grepping that it was udev causing it and that removing the rules file fixes the issue. I think you should close this bug as RESOLVED now - there's nothing else I can do here. If you still feel that udev needs fixing please open a bug against Fedora's udev. Thanks. I filed a bug against udev: https://bugzilla.redhat.com/show_bug.cgi?id=715118 This has been fixed with the following udev rules: KERNEL=="microcode", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu" KERNEL=="devices/platform/microcode", ACTION=="add", RUN+="/sbin/microcode_ctl -Qu" More information here: https://bugzilla.redhat.com/show_bug.cgi?id=690930#c26 The fix is in microcode_ctl-1.17-15.fc15 |