It is possible to have systems (like UML) where devfs_create_partitions() is never called from register_disk() in fs/partitions/check.c. When shutting down a system like this del_gendisk() gets called. del_gendisk() does not check to see if devfs_create_partitions() has ever been called, it just always calls devfs_remove_partitions(). devfs_remove_partitions() calls devfs_dealloc_unique_number(). dealloc will down space->semaphore which is not initialized and you get a kernel fault.
Created attachment 123 [details] adds semphore check in dealloc Given the problem description, if the semaphore isn't initialized when dealloc is called, this patch makes a check to see if it is initialized. If not, it will initialize it. The check is already in the alloc function. Let me know if that helps.
This is not the correct fix. The semaphore was not initialized because none of the disks in the UML system contain paritition tables. If you initialize the uninitalized semaphore I just trap two lines later in the devfs_dealloc_unique_number() since the incoming parameters are garbage. The higher order problem is that devfs_remove_partitions() is being called for a disk with no partitions. This works as a bandaid: if (!space->sem_initialised) return;
Created attachment 124 [details] Check to see if we have partitions This patch just adds a check in del_gendisk() that parallels the one in register_disk() for minors, to detect if we have partitions or not.
The patch: if (disk->minors != 1) /* If we had partitions, do this */ devfs_remove_partitions(disk); works ok for me. Although I seem to recall trying this fix a couple of months ago and it had some problem that I can't remember anymore. Of course other things in the kernel may have been fixed in the meantime eliminating the old problem. I will leave it patched into my system and report any new problems.
Does anyone know the status of this patch? Has it made it to the mainline kernel yet?
Taking a look at 2.5.68-bk11, devfs_dealloc_unique_number() and devfs_remove_partitions() are gone. Does that mean the problem is fixed?
I don't have access to system where I found this bug anymore. You can test this on a normal kernel by enabling exitcall processing when the kernel shuts down. A normal kernel doesn't bother doing this because you are going to reboot/power off after a shutdown. exitcall processing is used to get devices back to their power-on state and clean things up. When a UML kernel is shut down the host OS still needs to be working so it runs the exitcall list to reset everything it has been using. exitcall is defined in linux/init.h. All kernels build a list of function pointers in the section - .exitcall.exit. Most kernels ignore this list, to test run it add a little loop like this to the kernel reboot code: exitcall_t *call = &__exitcall_end; while (--call >= &__exitcall_begin) (*call)(); BTW, it would be good for normal kernel test systems to always run with exitcalls turned on, it makes a lot of bugs show up. I haven't done this recently so there may be a bunch of new problems. After you have an exitcall enabled kernel, just mount a disk without a partition table and then exit the kernel. If the bug is there the code will segfault.
devfs is no longer available as of 2.6.13-rc1.