Bug 42122 - deadlock when using ipw2200
Summary: deadlock when using ipw2200
Alias: None
Product: Drivers
Classification: Unclassified
Component: network-wireless (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Stanislaw Gruszka
Depends on:
Blocks: 40982 7216
  Show dependency tree
Reported: 2011-08-31 15:41 UTC by Witold Baryluk
Modified: 2012-01-15 09:45 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.1.0-rc4+
Regression: Yes
Bisected commit-id:


Description Witold Baryluk 2011-08-31 15:41:15 UTC
After lots of hard work, i managed to intercept and write down kernel log messages.

Somewhere after enabling my encrypted LVM, mounting root, something in initrd is loading ipw2200 early in boot process. (ipw200 looks to be initializing in parallel to the radeon drm - however disableing modeseting doesn't change anything).

It looks like this

libipw: 802.11 data/management/control stack, git-xxx
libipw: Copyright (C) 2004-2005 Intel Corporation <jketreno@linux.intel.com>
ipw2200: Intel(R) PRO/Wireless 2200/2915 Network Driver, xyxyx
ipw2200: Copyright(c) 2003-2006 Intel Corporation
ipw2200 0000:0b:02.0: PCI INT A -> GSI 21 (level, low) -> IRQ 21
ipw2200: Detected Intel PRO/Wireless 2915ABG Network Connection
drm: something with radeon/drm/kms probably, irrelevant
INFO: possible recursive locking detected

modprobe/1095 is trying lock:
  (rtnl_mutex){+.+.+.}, at: [<c146ba7f>] rtnl_lock+0xf/0x20

but task is already holding lock:
  (rtnl_mutex){+.+.+.}, at: [<c146ba7f>] rtnl_lock+0xf/0x20

other info that might help us debug this:

 Possible unsafe locking scenario:

  CPU 0

  *** DEADLOCK ***

May be due to missing lock nesting notation

4 locks held by modprobe/1095:

#0: (&__lockdep_no_validate__){......}, at: [<c12fdfb8>] __driver_attach+0x48/0xa0
#1: (&__lockdep_no_validate__){......}, at: [<c12fdfc4>] __driver_attach+0x54/0xa0
#2: (rtnl_mutex){+.+.+.} at [<c146ba7f>] rtnl_lock+0xf/0x20
#2: (&priv->mutex){+.+.+.} at [<f8cc7bd1>] ipw_net_init+0x41/0x2f0

Call trace

  ? printk+0x18
  ? _raw_spin_unlock_irqrestore+0x31
  ? trace_hardirq_on_caller+0xec
  ? rtnl_lock+0xf
  ? rtnl_lock+0xf
  ? mutex_unlocked+0x8
  ? rfkill_register+0x1ef
  ? __kmalloc+0x115
  ? ipw_net_init+0xc0
  ? sysfs_create_link+0x12
  ? driver_probe_device+0x180
  ? pci_device_probe+0x120
  ? driver_probe_device+0x180
  ? pci_device_probe+0x180
  ? __raw_spin_lock_init+0x31
  ? 0xf8cdefff

After this, system have some problems with multiple subsystems, including udev timeouts, read-only file systems, etc, etc.

I belive that commit ecb443355 introduced this regression.

commit ecb4433550f0620f3d1471ae7099037ede30a91e
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Fri Aug 12 14:00:59 2011 +0200

    mac80211: fix suspend/resume races with unregister hw

Comment 1 Witold Baryluk 2011-08-31 15:48:55 UTC
In fact I first performed git bisect to find offending commit first. Call trace proves that indeed it is a problem. there is rtnl_mutex deadlock, and ecb443355 introduced usage of rtnl_lock() in wiphy_register function, without proper nesting notation.

Writing down kernel log was hard, because boot_delay is not active in this stage of booting - it only affects up to the loading of initrd and/or mounting root fs AFAIK. Also just after call trace was displayed radeon changed mode settings, resolution and cleared screen, including Shift-PgDown/Up kernel log buffer). And becasue system have problems with file system, syslog isn't started yet, and boot is never ending, there is no way to dump kernel log.

Comment 2 Florian Mickler 2011-09-05 21:16:43 UTC
in drivers/net/wireless/ipw2x00/ipw2200.c ipw_net_init calls wiphy_register which will try to acquire the rtnl lock. 

Sadly though, ipw_net_init is appointed the .ndo_init function in the net_device_ops structure of the ipw2200 driver and that function is called from 
 register_netdevice in net/core/dev.c which has a big fat ASSERT_RTNL() above that call. 

ASSERT_RTNL() is a macro which dump_stack's if the rtnl_lock is not hold. Under the assumption that Stanislaw's change to mac80211 was tested and agreed upon it seems that ipw2200 is doing something weird here and shouldnt call wiphy_register in its ndo_init callback. 

Comment 3 Florian Mickler 2011-09-05 21:18:32 UTC
Comment 4 John W. Linville 2012-01-13 19:04:32 UTC
Does this problem persist with kernel 3.2 or later?
Comment 5 Stanislaw Gruszka 2012-01-15 09:45:09 UTC
Yes it is, we fixed that:

commit 7cabafcea793c003503a118da58da358b0692930
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Wed Sep 14 16:47:50 2011 +0200

    ipw2x00: fix rtnl mutex deadlock

Note You need to log in before you can comment on or make changes to this bug.