Kernel Bug Tracker – Bug 12153
2.6.28-rc2: Crypto change sometimes breaks modprobing
Last modified: 2008-12-07 08:12:30 UTC
Subject : 2.6.28-rc2: runaway loop modprobe char-major-5-1
Submitter : Meelis Roos <email@example.com>
Date : 2008-10-28 10:15
References : http://marc.info/?l=linux-kernel&m=122518916014197&w=4
Notify-Also : Patrick McHardy <firstname.lastname@example.org>
This entry is being used for tracking a regression from 2.6.27. Please don't
close it until the problem is fixed in the mainline.
http://lkml.org/lkml/2008/10/28/98 is the full thread on LKML.
Need a full dmesg
Created attachment 19187 [details]
register tty before pci, acpi, video, ...
Currently it looks like something is accessing /dev/console from initramfs before it is registered in the kernel.
The "dead" device node may trigger the kernel module auto-loader, which forks a usermode process, which then may try to access /dev/console itself.
This patch moves the tty init before pci, acpi, video initialization. It may have other unwanted side-effects, we don't know now, but it boots fine here with and without initramfs.
If this fixes the modprobe loop, we would at least know what's going on.
Any chance to give it a try?
That's a no-go. The tty console device is commonly PCI, the VGA text mode console is weird exception along with certain serial consoles.
(In reply to comment #4)
> That's a no-go. The tty console device is commonly PCI, the VGA text mode
> console is weird exception along with certain serial consoles.
What's the exact problem you are seeing? It just registers the device with the driver core earlier.
It is supposed to return -ENODEV, instead of needlessly trying to load a module for that dev_t.
So we stick our fingers in our ears and pretend that returning -ENODEV to a bug in the userspace is better ?
Its a useful test, but the bisect is still needed to find the actual triggering change, and then maybe to trace the user space and see what combination of user space funny and having serial console enabled is a trigger
(In reply to comment #6)
> So we stick our fingers in our ears and pretend that returning -ENODEV
> to a bug in the userspace is better ?
It's better to return -ENODEV for an non-existing console device, then forking a module loading binary, that eventually wants to access the console device itself.
> Its a useful test, but the bisect is still needed to find the actual
> triggering change, and then maybe to trace the user space and see what
> combination of user space funny and having serial console enabled is
> a trigger
I suspect initramfs wants to log something, and accesses /dev/console, but the device is dead, and the kernel module loader forks modprobe which tries to log something itself, which accesses the dead device, and forks modprobe, and ...
I does not make much sense to me to have the kernel start random initramfs userspace, but run modprobe if /dev/console is opened.
Initializing all the pci, video, acpi drivers, but have the kernel module loader choke if /dev/console is accessed, just sound like a bug. Any random change in any of these drivers might result in a request_module() call, or any access to a device node will trigger this bug.
If the module loader asked to load 5-1 is opening the file again then the userspace is buggy and needs fixing. The real question is whose userspace is buggy, why and what change has triggered the bug - ie can we avoid it nicely. Re-ordering hacks that are likely to have undefined and untestable results are *not* an option this close to a release.
(In reply to comment #8)
> If the module loader asked to load 5-1 is opening the file again then the
> userspace is buggy and needs fixing.
If modprobe opens /dev/console it is by no means buggy, it's the kernel failure, to call modprobe again, as a result of modprobe opening a /dev file.
Having /dev/console not registered while adding all the major drivers and running userspace at that time, and have the dumb module loader active -- that combination is just a bug.
Wrong. The way the loader works for all devices and has always worked is that the user space triggered by hotplug is responsible for not causing recursions.
Please go and bisect the bug, until there is a bisection this is all timewasting.
(In reply to comment #10)
No, it's not.
> The way the loader works for all devices and has always worked is
> that the user space triggered by hotplug is responsible for not
> causing recursions.
This is wrong. Userspace can access /dev/console any time. There was never such a broken rule, not any time.
It is caused by a request_module(). And such a loop can happen with any random kernel driver change. It is a kernel bug, that the dead /dev/console device causes a modprobe fork when request_module() is called.
The earlier attached patch fixes it by providing an initialized /dev/console instead of having the module loader going crazy in a loop.
It this specific case it is caused by:
Setting CONFIG_CRYPTO_MANAGER from 'm' to 'y' makes the loop go away.
The kernel needs to be fixed, with the patch attached here, or in any other way that prevents the module loader to load stuff that is needed, compiled in, but not available.
>No, it's not.
Please go and read the original design discussions for the hotplug interface.
Reassigning and adding Herbert Xu to the cc list so he can fix the crypto test patch if feasible.
(Removed from regression blockers as its now documented how to avoid it and its incredibly obscure anyway)
(In reply to comment #12)
> >> Wrong.
> >No, it's not.
> Please go and read the original design discussions for the hotplug interface.
There is no such rule, and we will not establish completely fragile, obscure and useless rules for hotplug, because the kernel console device and the module loader logic is broken for /dev/console access.
The crypto drivers, as all other drivers too, are allowed using request_module(), it's the /dev/console registering, and the module loader which needs to be fixed.
As I said go read the original design discussion. You can bleat all you like but there is a policy, long standing, about how the module tools must behave. See the original design discussion, see the AF_UNIX socket discussion from about 1999 as well.
(In reply to comment #16)
> As I said go read the original design discussion. You can bleat all you like
> but there is a policy, long standing, about how the module tools must behave.
> See the original design discussion, see the AF_UNIX socket discussion from
> about 1999 as well.
It's a kernel bug, if you like it or not. It needs to be fixed in one way or the other. Userspace will not work around your misguided idea of hotplug. And you are in a position to define such non-sensical hotplug rules.
Userspace already does access /dev/console in shipped products, and it can rightfully expect current kernels to work with the currently shipped product.
Please reopen it (I can't) for something that needs fixing. It will just happen again, with the next innocent driver change.