Bug 23932

Summary: orinoco_cs BUG: sleeping function called from invalid context
Product: Drivers Reporter: Matt Domsch (Matt_Domsch)
Component: network-wirelessAssignee: drivers_network-wireless (drivers_network-wireless)
Status: CLOSED CODE_FIX    
Severity: normal CC: akpm, florian, kilroyd, linville
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.35.6-48.fc14.i686 Tree: Fedora
Regression: Yes
Attachments: Speculative fix
Fix for v2.6.35

Description Matt Domsch 2010-11-28 19:41:33 UTC
Kernel BUG reported when running 'modprobe orinico_cs' on Fedora 14 release kernel 2.6.35.6-48.fc14.i686.  Seen on two different laptops with similar orinico_cs wireless cards.  Fedora's kernel has no patches to this driver to distinguish it from mainline.  Does not fail with Fedora 12 kernel-PAE-2.6.32.23-170.fc12.i686

Trace follows.

[  378.882649] orinoco_cs 0.15 (David Gibson <hermes@gibson.dropbear.id.au>, Pavel Roskin <proski@gnu.org>, et al)
[  378.882928] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1074
[  378.882946] in_atomic(): 0, irqs_disabled(): 1, pid: 1843, name: modprobe
[  378.882962] Pid: 1843, comm: modprobe Not tainted 2.6.35.6-48.fc14.i686 #1
[  378.882973] Call Trace:
[  378.883009]  [<c042ada5>] __might_sleep+0xc3/0xca
[  378.883036]  [<c07a9d3e>] do_page_fault+0x146/0x275
[  378.883042]  [<c07a9bf8>] ? do_page_fault+0x0/0x275
[  378.883042]  [<c07a7acf>] error_code+0x73/0x78
[  378.883042]  [<d895802c>] ? orinoco_lock+0x1d/0x42 [orinoco]
[  378.883042]  [<d8959e13>] orinoco_interrupt+0x34/0x351 [orinoco]
[  378.883042]  [<c045e1d5>] ? __raw_local_irq_save+0x19/0x1e
[  378.883042]  [<c045e1d5>] ? __raw_local_irq_save+0x19/0x1e
[  378.883042]  [<c04834ba>] ? synchronize_irq+0x3c/0xab
[  378.883042]  [<c07a702d>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[  378.883042]  [<c0483ac9>] ? disable_irq_nosync+0x51/0x57
[  378.883042]  [<c04840dd>] request_threaded_irq+0x12e/0x147
[  378.883042]  [<d8959ddf>] ? orinoco_interrupt+0x0/0x351 [orinoco]
[  378.883042]  [<c0693949>] pcmcia_request_irq+0x2f/0x41
[  378.883042]  [<d893e13f>] orinoco_cs_probe+0x76/0xee [orinoco_cs]
[  378.883042]  [<c0691ab5>] pcmcia_device_probe+0x133/0x210
[  378.883042]  [<c0650f00>] ? driver_sysfs_add+0x10/0x53
[  378.883042]  [<c0651262>] driver_probe_device+0x114/0x20d
[  378.883042]  [<c06513a3>] __driver_attach+0x48/0x64
[  378.883042]  [<c0650445>] bus_for_each_dev+0x42/0x65
[  378.883042]  [<c0650eee>] driver_attach+0x19/0x1b
[  378.883042]  [<c065135b>] ? __driver_attach+0x0/0x64
[  378.883042]  [<c0650b75>] bus_add_driver+0xc9/0x215
[  378.883042]  [<c05b0c32>] ? kset_find_obj+0x12/0x4e
[  378.883042]  [<c065157d>] driver_register+0x7c/0xdb
[  378.883042]  [<c06924b9>] pcmcia_register_driver+0xdd/0x11e
[  378.883042]  [<d8968000>] ? init_orinoco_cs+0x0/0x1e [orinoco_cs]
[  378.883042]  [<d896801c>] init_orinoco_cs+0x1c/0x1e [orinoco_cs]
[  378.883042]  [<c0401246>] do_one_initcall+0x4f/0x139
[  378.883042]  [<c0451e51>] ? blocking_notifier_call_chain+0x11/0x13
[  378.883042]  [<c04621a4>] sys_init_module+0x7f/0x19b
[  378.883042]  [<c07a73cc>] syscall_call+0x7/0xb
[  378.883042] BUG: unable to handle kernel NULL pointer dereference at 00000030
[  378.883042] IP: [<d895802c>] orinoco_lock+0x1d/0x42 [orinoco]
[  378.883042] *pde = 00000000 
[  378.883042] Oops: 0000 [#1] SMP 
[  378.883042] last sysfs file: /sys/devices/platform/regulatory.0/uevent
[  378.883042] Modules linked in: orinoco_cs(+) orinoco cfg80211 rfkill fuse ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 snd_maestro3 snd_ac97_codec ac97_bus ppdev snd_seq snd_seq_device parpo
rt_pc microcode dcdbas parport snd_pcm snd_timer snd_page_alloc snd i2c_piix4 i2c_core 3c59x joydev soundcore mii video yenta_socket output [last unloaded: mperf]
[  378.883042] 
[  378.883042] Pid: 1843, comm: modprobe Not tainted 2.6.35.6-48.fc14.i686 #1 Inspiron 3800   C600GT   /Inspiron 3800   C600GT          
[  378.883042] EIP: 0060:[<d895802c>] EFLAGS: 00010046 CPU: 0
[  378.883042] EIP is at orinoco_lock+0x1d/0x42 [orinoco]
[  378.883042] EAX: d52ac4e0 EBX: d52ac2a0 ECX: 00000000 EDX: d648de14
[  378.883042] ESI: d648de14 EDI: d52ac4e0 EBP: d648ddd8 ESP: d648ddcc
[  378.883042]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  378.883042] Process modprobe (pid: 1843, ti=d648c000 task=d5add860 task.ti=d648c000)
[  378.883042] Stack:
[  378.883042]  00000282 d52ac2a0 00000000 d648de38 d8959e13 c045e1d5 d648ddf0 c045e1d5
[  378.883042] <0> 00000292 00000292 d648ddfc d52ac5bc d648de28 c04834ba c097fd58 d648de10
[  378.883042] <0> c07a702d d648de28 c0483ac9 00000286 00000003 c097fd00 c097fd00 c77bb1e9
[  378.883042] Call Trace:
[  378.883042]  [<d8959e13>] ? orinoco_interrupt+0x34/0x351 [orinoco]
[  378.883042]  [<c045e1d5>] ? __raw_local_irq_save+0x19/0x1e
[  378.883042]  [<c045e1d5>] ? __raw_local_irq_save+0x19/0x1e
[  378.883042]  [<c04834ba>] ? synchronize_irq+0x3c/0xab
[  378.883042]  [<c07a702d>] ? _raw_spin_unlock_irqrestore+0x13/0x15
[  378.883042]  [<c0483ac9>] ? disable_irq_nosync+0x51/0x57
[  378.883042]  [<c04840dd>] ? request_threaded_irq+0x12e/0x147
[  378.883042]  [<d8959ddf>] ? orinoco_interrupt+0x0/0x351 [orinoco]
[  378.883042]  [<c0693949>] ? pcmcia_request_irq+0x2f/0x41
[  378.883042]  [<d893e13f>] ? orinoco_cs_probe+0x76/0xee [orinoco_cs]
[  378.883042]  [<c0691ab5>] ? pcmcia_device_probe+0x133/0x210
[  378.883042]  [<c0650f00>] ? driver_sysfs_add+0x10/0x53
[  378.883042]  [<c0651262>] ? driver_probe_device+0x114/0x20d
[  378.883042]  [<c06513a3>] ? __driver_attach+0x48/0x64
[  378.883042]  [<c0650445>] ? bus_for_each_dev+0x42/0x65
[  378.883042]  [<c0650eee>] ? driver_attach+0x19/0x1b
[  378.883042]  [<c065135b>] ? __driver_attach+0x0/0x64
[  378.883042]  [<c0650b75>] ? bus_add_driver+0xc9/0x215
[  378.883042]  [<c05b0c32>] ? kset_find_obj+0x12/0x4e
[  378.883042]  [<c065157d>] ? driver_register+0x7c/0xdb
[  378.883042]  [<c06924b9>] ? pcmcia_register_driver+0xdd/0x11e
[  378.883042]  [<d8968000>] ? init_orinoco_cs+0x0/0x1e [orinoco_cs]
[  378.883042]  [<d896801c>] ? init_orinoco_cs+0x1c/0x1e [orinoco_cs]
[  378.883042]  [<c0401246>] ? do_one_initcall+0x4f/0x139
[  378.883042]  [<c0451e51>] ? blocking_notifier_call_chain+0x11/0x13
[  378.883042]  [<c04621a4>] ? sys_init_module+0x7f/0x19b
[  378.883042]  [<c07a73cc>] ? syscall_call+0x7/0xb
[  378.883042] Code: 89 e5 0f 1f 44 00 00 3e 80 60 08 fe 5d c3 55 89 e5 57 56 53 0f 1f 44 00 00 8b 88 28 03 00 00 89 c3 89 d6 8d b8 40 02 00 00 89 f8 <ff> 51 30 31 c0 83 bb 44 02 00 00 00 74 12 8b 8b 28 03 00 0
0 89 
[  378.883042] EIP: [<d895802c>] orinoco_lock+0x1d/0x42 [orinoco] SS:ESP 0068:d648ddcc
[  378.883042] CR2: 0000000000000030
[  378.883042] ---[ end trace a3eecd9d8c3040c5 ]---
Comment 1 Matt Domsch 2010-11-28 19:43:29 UTC
Also filed against Fedora 14 at
https://bugzilla.redhat.com/show_bug.cgi?id=657864
Comment 2 John W. Linville 2010-11-29 16:17:42 UTC
Any chance you could do a bisect between 2.6.34 and 2.6.35?

I suspect this relates to the locking changes introduced to facilitate adding orinoco_usb.
Comment 3 Matt Domsch 2010-11-29 20:08:51 UTC
yes, but it'll take some time, as that machine is a very old slow laptop, and it crashes on reboot, so automated build/bisect/test isn't possible.
Comment 4 Matt Domsch 2010-11-30 05:00:40 UTC
Stock 2.6.35 fails.
Comment 5 Matt Domsch 2010-12-03 13:48:51 UTC
bisect is going slowly, but I've got it down to the space between 
3b51cc996e81d8a113416d8094fa4a88f8360a51
and
f8965467f366fd18f01feafb5db10512d7b4422c
which is mostly David Kilroy's orinico_usb additions.

I'm continuing to bisect through that portion...

git bisect start
# bad: [9fe6206f400646a2322096b56c59891d530e8d51] Linux 2.6.35
git bisect bad 9fe6206f400646a2322096b56c59891d530e8d51
# good: [655ffee284dfcf9a24ac0343f3e5ee6db85b85c5] wireless: convert to use netdev_for_each_mc_addr
git bisect good 655ffee284dfcf9a24ac0343f3e5ee6db85b85c5
# good: [655ffee284dfcf9a24ac0343f3e5ee6db85b85c5] wireless: convert to use netdev_for_each_mc_addr
git bisect good 655ffee284dfcf9a24ac0343f3e5ee6db85b85c5
# bad: [f8965467f366fd18f01feafb5db10512d7b4422c] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
git bisect bad f8965467f366fd18f01feafb5db10512d7b4422c
# good: [3b51cc996e81d8a113416d8094fa4a88f8360a51] Merge branch 'master' into for-davem
git bisect good 3b51cc996e81d8a113416d8094fa4a88f8360a51
Comment 6 Matt Domsch 2010-12-04 16:12:43 UTC
bisect complete

593ef09c9e70c92c0d76c67a1c03a5d44d3aec82 is good, and is the commit immediately before.

bcad6e80f3fb0d6724c3814cf32258bbcf1d67db introduces the failure.

commit bcad6e80f3fb0d6724c3814cf32258bbcf1d67db
Author: David Kilroy <kilroyd@googlemail.com>
Date:   Sat May 1 14:05:40 2010 +0100

    orinoco: encapsulate driver locking
    
    Local bus and USB drivers will need to do locking differently.
    
    The original orinoco_usb patches had a boolean variable controlling
    whether spin_lock_bh was used, or irq based locking. This version
    provides wrappers for the lock functions and the drivers specify the
    functions pointers needed.
    
    This will introduce a performance penalty, but I'm not expecting it to
    be noticable.
    
    Signed-off-by: David Kilroy <kilroyd@googlemail.com>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>
Comment 7 kilroyd 2010-12-04 18:47:35 UTC
Created attachment 39012 [details]
Speculative fix

Thanks for taking the time to bisect.

I've taken a quick look at the commit, paying special attention to orinoco_interrupt where the bug is being reported.

Before this change, orinoco_interrupt called orinoco_lock => spin_lock_irqsave.

After the change we have an extra level of indirection: orinoco_lock => priv->hw.ops->lock_irqsave (hermes_lock_irqsave) => spin_lock_irqsave.

The behaviour before and after this commit should be the same.

I'll have to guess that priv->hw.op->lock_irqsave is pointing somewhere odd, and you're getting a really unfortunate interrupt. I'm attaching a speculative patch (based on v2.6.35) - please give it a spin.
Comment 8 kilroyd 2010-12-05 10:53:37 UTC
If the patch is confirmed to fix this issue, we have the same problem in spectrum_cs. The other orinoco drivers look ok.

Note that the code in both these drivers has had various PCMCIA cleanups since v2.6.35, so the original patch does not apply cleanly to wireless-next/wireless-testing.
Comment 9 Matt Domsch 2010-12-07 02:03:53 UTC
Success.

Dec  6 20:03:08 localhost kernel: [   76.934094] pcmcia_socket pcmcia_socket1: pccard: PCMCIA card inserted into slot 1
Dec  6 20:03:08 localhost kernel: [   76.934180] pcmcia_socket pcmcia_socket1: cs: memory probe 0x0d0000-0x0effff: clean.
Dec  6 20:03:08 localhost kernel: [   76.951159] pcmcia_socket pcmcia_socket1: cs: memory probe 0x60000000-0x60ffffff: clean.
Dec  6 20:03:08 localhost kernel: [   76.961447] pcmcia_socket pcmcia_socket1: cs: memory probe 0xa0000000-0xa0ffffff: clean.
Dec  6 20:03:08 localhost kernel: [   76.971955] pcmcia 1.0: pcmcia: registering new device pcmcia1.0 (IRQ: 11)
Dec  6 20:03:09 localhost kernel: [   77.577335] cfg80211: Calling CRDA to update world regulatory domain
Dec  6 20:03:09 localhost kernel: [   77.876495] cfg80211: World regulatory domain updated:
Dec  6 20:03:09 localhost kernel: [   77.876570]     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Dec  6 20:03:09 localhost kernel: [   77.876637]     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:09 localhost kernel: [   77.876702]     (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:09 localhost kernel: [   77.876767]     (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:09 localhost kernel: [   77.876831]     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:09 localhost kernel: [   77.876894]     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:09 localhost kernel: [   77.984504] orinoco_cs 1.0: Hardware identity 0001:0001:0004:0002
Dec  6 20:03:09 localhost kernel: [   77.984677] orinoco_cs 1.0: Station identity  001f:0001:0006:000a
Dec  6 20:03:09 localhost kernel: [   77.984740] orinoco_cs 1.0: Firmware determined as Lucent/Agere 6.10
Dec  6 20:03:09 localhost kernel: [   78.147755] orinoco_cs 1.0: Hardware identity 0001:0001:0004:0002
Dec  6 20:03:09 localhost kernel: [   78.148070] orinoco_cs 1.0: Station identity  001f:0002:0009:0030
Dec  6 20:03:09 localhost kernel: [   78.148217] orinoco_cs 1.0: Firmware determined as Lucent/Agere 9.48
Dec  6 20:03:09 localhost kernel: [   78.148358] orinoco_cs 1.0: Ad-hoc demo mode supported
Dec  6 20:03:09 localhost kernel: [   78.148495] orinoco_cs 1.0: IEEE standard IBSS ad-hoc mode supported
Dec  6 20:03:09 localhost kernel: [   78.148636] orinoco_cs 1.0: WEP supported, 104-bit key
Dec  6 20:03:09 localhost kernel: [   78.148773] orinoco_cs 1.0: WPA-PSK supported
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): driver supports SSID scans (scan_capa 0x01).
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): new 802.11 WiFi device (driver: 'orinoco_cs' ifindex: 3)
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): exported as /org/freedesktop/NetworkManager/Devices/1
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): now managed
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): device state change: 1 -> 2 (reason 2)
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): bringing up device.
Dec  6 20:03:10 localhost kernel: [   78.517436] ADDRCONF(NETDEV_UP): eth1: link is not ready
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): preparing device.
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): deactivating device (reason: 2).
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): supplicant interface state:  starting -> ready
Dec  6 20:03:10 localhost NetworkManager[900]: <info> (eth1): device state change: 2 -> 3 (reason 42)
Dec  6 20:03:10 localhost NetworkManager[900]: <info> Updating /etc/hosts with new system hostname
Dec  6 20:03:10 localhost kernel: [   78.749835] cfg80211: Calling CRDA for country: US
Dec  6 20:03:10 localhost NetworkManager[900]: <info> found WiFi radio killswitch rfkill0 (at /sys/devices/pci0000:00/0000:00:03.1/1.0/ieee80211/phy0/rfkill0) (driver <unknown>)
Dec  6 20:03:10 localhost kernel: [   78.914947] cfg80211: Regulatory domain changed to country: US
Dec  6 20:03:10 localhost kernel: [   78.915194]     (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
Dec  6 20:03:10 localhost kernel: [   78.917705]     (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2700 mBm)
Dec  6 20:03:10 localhost kernel: [   78.917854]     (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 1700 mBm)
Dec  6 20:03:10 localhost kernel: [   78.918001]     (5250000 KHz - 5330000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:10 localhost kernel: [   78.918177]     (5490000 KHz - 5600000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:10 localhost kernel: [   78.918326]     (5650000 KHz - 5710000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
Dec  6 20:03:10 localhost kernel: [   78.918472]     (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 3000 mBm)
Dec  6 20:03:10 localhost nm-dispatcher.action: nm_dispatcher_action: Invalid connection: '(null)' / 'connection setting not found' invalid: 1
Comment 10 kilroyd 2010-12-07 19:03:59 UTC
Created attachment 39222 [details]
Fix for v2.6.35

Attached full fix for orinoco_cs and spectrum_cs against v2.6.35.

A patch has been sent to linux-wireless, prepared against current wireless-testing (master-2010-12-02)
Comment 11 Florian Mickler 2011-01-23 16:24:25 UTC
fixed in .37-rc6 by 
commit 229bd792be0bad245b78ed8f119952733a4752e5
Author: David Kilroy <kilroyd@googlemail.com>
Date:   Tue Dec 7 18:50:42 2010 +0000

    orinoco: initialise priv->hw before assigning the interrupt