After lots of hard work, i managed to intercept and write down kernel log messages. Somewhere after enabling my encrypted LVM, mounting root, something in initrd is loading ipw2200 early in boot process. (ipw200 looks to be initializing in parallel to the radeon drm - however disableing modeseting doesn't change anything). It looks like this libipw: 802.11 data/management/control stack, git-xxx libipw: Copyright (C) 2004-2005 Intel Corporation <jketreno@linux.intel.com> ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, xyxyx ipw2200: Copyright(c) 2003-2006 Intel Corporation ipw2200 0000:0b:02.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ipw2200: Detected Intel PRO/Wireless 2915ABG Network Connection drm: something with radeon/drm/kms probably, irrelevant ================================================== INFO: possible recursive locking detected ================================================== modprobe/1095 is trying lock: (rtnl_mutex){+.+.+.}, at: [<c146ba7f>] rtnl_lock+0xf/0x20 but task is already holding lock: (rtnl_mutex){+.+.+.}, at: [<c146ba7f>] rtnl_lock+0xf/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU 0 ----- lock(rtnl_mutex); lock(rtnl_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by modprobe/1095: #0: (&__lockdep_no_validate__){......}, at: [<c12fdfb8>] __driver_attach+0x48/0xa0 #1: (&__lockdep_no_validate__){......}, at: [<c12fdfc4>] __driver_attach+0x54/0xa0 #2: (rtnl_mutex){+.+.+.} at [<c146ba7f>] rtnl_lock+0xf/0x20 #2: (&priv->mutex){+.+.+.} at [<f8cc7bd1>] ipw_net_init+0x41/0x2f0 Call trace ? printk+0x18 __lock_acquire+0x126c ? _raw_spin_unlock_irqrestore+0x31 ? trace_hardirq_on_caller+0xec lock_acquire+0x78 ? rtnl_lock+0xf mutex_lock_nested+0x44 ? rtnl_lock+0xf ? mutex_unlocked+0x8 ? rfkill_register+0x1ef rtnl_lock+0xf wiphy_register+0x469 ? __kmalloc+0x115 ? ipw_net_init+0xc0 ipw_net_init+0x1b9 register_netdevice+0xdd register_netdev+0x1b ipw_pci_probe+0xb4c pci_device_probe+0x8d ? sysfs_create_link+0x12 driver_probe_device+0x66 __driver_attach+0x91 ? driver_probe_device+0x180 bus_for_each_dev+0x41 ? pci_device_probe+0x120 driver_attach+0x1c ? driver_probe_device+0x180 bus_add_driver+0x1a7 ? pci_device_probe+0x180 driver_register+0x65 ? __raw_spin_lock_init+0x31 __pci_register_driver+0x52 ipw_init+0x31 do_one_initcall+0x2f ? 0xf8cdefff sys_init_module+0xb06 After this, system have some problems with multiple subsystems, including udev timeouts, read-only file systems, etc, etc. I belive that commit ecb443355 introduced this regression. commit ecb4433550f0620f3d1471ae7099037ede30a91e Author: Stanislaw Gruszka <sgruszka@redhat.com> Date: Fri Aug 12 14:00:59 2011 +0200 mac80211: fix suspend/resume races with unregister hw Thanks.
In fact I first performed git bisect to find offending commit first. Call trace proves that indeed it is a problem. there is rtnl_mutex deadlock, and ecb443355 introduced usage of rtnl_lock() in wiphy_register function, without proper nesting notation. Writing down kernel log was hard, because boot_delay is not active in this stage of booting - it only affects up to the loading of initrd and/or mounting root fs AFAIK. Also just after call trace was displayed radeon changed mode settings, resolution and cleared screen, including Shift-PgDown/Up kernel log buffer). And becasue system have problems with file system, syslog isn't started yet, and boot is never ending, there is no way to dump kernel log. Thanks.
in drivers/net/wireless/ipw2x00/ipw2200.c ipw_net_init calls wiphy_register which will try to acquire the rtnl lock. Sadly though, ipw_net_init is appointed the .ndo_init function in the net_device_ops structure of the ipw2200 driver and that function is called from register_netdevice in net/core/dev.c which has a big fat ASSERT_RTNL() above that call. ASSERT_RTNL() is a macro which dump_stack's if the rtnl_lock is not hold. Under the assumption that Stanislaw's change to mac80211 was tested and agreed upon it seems that ipw2200 is doing something weird here and shouldnt call wiphy_register in its ndo_init callback. Adding
(s/Adding/Flo/)
Does this problem persist with kernel 3.2 or later?
Yes it is, we fixed that: commit 7cabafcea793c003503a118da58da358b0692930 Author: Stanislaw Gruszka <sgruszka@redhat.com> Date: Wed Sep 14 16:47:50 2011 +0200 ipw2x00: fix rtnl mutex deadlock