Bug 7657

Summary: BUG: NULL pointer dereference in ieee80211softmac_get_network_by_bssid_locked
Product: Networking Reporter: Michael Bommarito (mjbommar)
Component: WirelessAssignee: networking_wireless (networking_wireless)
Status: CLOSED CODE_FIX    
Severity: high    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.19-git13 Subsystem:
Regression: --- Bisected commit-id:
Attachments: kernel log
.config
OOPS dmesg
Patch

Description Michael Bommarito 2006-12-09 12:11:08 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.18.x
Distribution: Ubuntu Dapper

Hardware Environment: 
Model: Compaq r4000
CPU: AMD Athlon64 3200+
Wireless Controller: BCM4306

lspci:
0000:00:00.0 Host bridge: ATI Technologies Inc RS480 Host Bridge
0000:00:01.0 PCI bridge: ATI Technologies Inc: Unknown device 5a3f
0000:00:04.0 PCI bridge: ATI Technologies Inc: Unknown device 5a36
0000:00:13.0 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller
0000:00:13.1 USB Controller: ATI Technologies Inc IXP SB400 USB Host Controller
0000:00:13.2 USB Controller: ATI Technologies Inc IXP SB400 USB2 Host
Controller0000:00:14.0 SMBus: ATI Technologies Inc IXP SB400 SMBus Controller
(rev 10)
0000:00:14.1 IDE interface: ATI Technologies Inc Standard Dual Channel PCI IDE
Controller ATI
0000:00:14.3 ISA bridge: ATI Technologies Inc IXP SB400 PCI-ISA Bridge
0000:00:14.4 PCI bridge: ATI Technologies Inc IXP SB400 PCI-PCI Bridge
0000:00:14.5 Multimedia audio controller: ATI Technologies Inc IXP SB400 AC'97
Audio Controller (rev 01)
0000:00:14.6 Modem: ATI Technologies Inc ATI SB400 - AC'97 Modem Controller (rev 01)
0000:00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
0000:00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
0000:00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
0000:00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
0000:01:05.0 VGA compatible controller: ATI Technologies Inc ATI Radeon XPRESS
200M 5955 (PCIE)
0000:03:00.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
0000:03:02.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless
LAN Controller (rev 03)
0000:03:04.0 CardBus bridge: Texas Instruments PCIxx21/x515 Cardbus Controller
0000:03:04.3 Mass storage controller: Texas Instruments PCIxx21 Integrated
FlashMedia Controller
0000:03:04.4 0805: Texas Instruments PCI6411, PCI6421, PCI6611, PCI6621,
PCI7411, PCI7421, PCI7611, PCI7621 Secure Digital (SD) Controller
0000:03:06.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)

Software Environment:
ver_linux:
Gnu C                  4.0.3
Gnu make               3.81beta4
binutils               2.16.91
util-linux             2.12r
mount                  2.12r
module-init-tools      3.2.2
e2fsprogs              1.38
jfsutils               1.1.8
reiserfsprogs          3.6.19
reiser4progs           1.0.5
xfsprogs               2.7.7
pcmciautils            012
pcmcia-cs              3.2.8
PPP                    2.4.4b1
Linux C Library        2.3.6
Dynamic linker (ldd)   2.3.6
Procps                 3.2.6
Net-tools              1.60
Console-tools          0.2.3
Sh-utils               5.93
udev                   079
wireless-tools         28

Problem Description:
Dec  9 14:22:05 metagnosis kernel: [  109.616000] BUG: unable to handle kernel
NULL pointer dereference at virtual address 00000000
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  printing eip:
Dec  9 14:22:05 metagnosis kernel: [  109.616000] c0310113
Dec  9 14:22:05 metagnosis kernel: [  109.616000] *pde = 00000000
Dec  9 14:22:05 metagnosis kernel: [  109.616000] Oops: 0000 [#1]
Dec  9 14:22:05 metagnosis kernel: [  109.616000] Modules linked in: xfs
nls_utf8 ntfs sr_mod scsi_mod usbhid snd_atiixp snd_atiixp_modem snd_ac97_codec
snd_ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore pcspkr
psmouse serio_raw snd_page_alloc ehci_hcd ohci_hcd usbcore
Dec  9 14:22:05 metagnosis kernel: [  109.616000] CPU:    0
Dec  9 14:22:05 metagnosis kernel: [  109.616000] EIP:   
0060:[ieee80211softmac_get_network_by_bssid_locked+35/96]    Not tainted VLI
Dec  9 14:22:05 metagnosis kernel: [  109.616000] EFLAGS: 00010086  
(2.6.19-git13 #2)
Dec  9 14:22:05 metagnosis kernel: [  109.616000] EIP is at
ieee80211softmac_get_network_by_bssid_locked+0x23/0x60
Dec  9 14:22:05 metagnosis kernel: [  109.616000] eax: edce8980   ebx: 00000206
  ecx: 00000000   edx: 00000000
Dec  9 14:22:05 metagnosis kernel: [  109.616000] esi: 00000008   edi: edce8a20
  ebp: edce8ca4   esp: c16a7e84
Dec  9 14:22:05 metagnosis kernel: [  109.616000] ds: 007b   es: 007b   ss: 0068
Dec  9 14:22:05 metagnosis kernel: [  109.616000] Process events/0 (pid: 4,
ti=c16a6000 task=edecba90 task.ti=c16a6000)
Dec  9 14:22:05 metagnosis kernel: [  109.616000] Stack: c16a7e94 edce8a0c
00000206 00000008 edce8a20 edce8980 c0310159 edce8980 
Dec  9 14:22:05 metagnosis kernel: [  109.616000]        c031179e edce8980
000000c0 edce8a20 c0311b7f 00000001 00000003 edee4450 
Dec  9 14:22:05 metagnosis kernel: [  109.616000]        00000000 00000086
00000086 c16a7ee8 edce89bc 00000000 00000000 e2bff0d0 
Dec  9 14:22:05 metagnosis kernel: [  109.616000] Call Trace:
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_get_network_by_bssid+9/16]
ieee80211softmac_get_network_by_bssid+0x9/0x10
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_send_disassoc_req+62/112]
ieee80211softmac_send_disassoc_req+0x3e/0x70
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_assoc_work+271/1152] ieee80211softmac_assoc_work+0x10f/0x480
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_assoc_notify_scan+0/16]
ieee80211softmac_assoc_notify_scan+0x0/0x10
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_notify_callback+76/96] ieee80211softmac_notify_callback+0x4c/0x60
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_notify_callback+0/96] ieee80211softmac_notify_callback+0x0/0x60
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_assoc_notify_scan+0/16]
ieee80211softmac_assoc_notify_scan+0x0/0x10
Dec  9 14:22:05 metagnosis kernel: [  109.616000] 
[ieee80211softmac_notify_callback+0/96] ieee80211softmac_notify_callback+0x0/0x60
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [run_workqueue+99/288]
run_workqueue+0x63/0x120
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [worker_thread+254/352]
worker_thread+0xfe/0x160
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [default_wake_function+0/16]
default_wake_function+0x0/0x10
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [worker_thread+0/352]
worker_thread+0x0/0x160
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [kthread+215/224]
kthread+0xd7/0xe0
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [kthread+0/224] kthread+0x0/0xe0
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  [kernel_thread_helper+7/24]
kernel_thread_helper+0x7/0x18
Dec  9 14:22:05 metagnosis kernel: [  109.616000]  =======================
Dec  9 14:22:05 metagnosis kernel: [  109.616000] Code: ff 53 9d 5b c3 8d 76 00
55 8d a8 24 03 00 00 57 56 53 83 ec 08 89 54 24 04 8b 90 24 03 00 00 eb 0a 8d b4
26 00 00 00 00 8b 14 24 <8b> 02 89 04 24 0f 18 00 90 39 ea 74 1f 8b 7c 24 04 8d
72 08 b8 
Dec  9 14:22:05 metagnosis kernel: [  109.616000] EIP:
[ieee80211softmac_get_network_by_bssid_locked+35/96]
ieee80211softmac_get_network_by_bssid_locked+0x23/0x60 SS:ESP 0068:c16a7e84

Steps to reproduce:
$ iwconfig eth1 essid ...

This has occured since at least -git7 and I've tested up to -git13 (didn't see
any relevant changes in -git14, although if someone thinks otherwise I can try
again).

After the OOPS, the keyboard goes dead although other non-network operation is
fine.  Obviously the machine is also unable to shutdown when init hangs trying
to shut network interfaces off.

Here are the configurations I've tried:
* bcm43xx data transfer:
** DMA
** DMA + PIO
* MSI
** on
** off
* Preemption 
** voluntary
** full
* ieee80211 and bcm43xx
** as module
** as static

Have been busy as of late and not watching mailing list, so other than the dmesg
and giving the list of flags I've flipped, I can't offer much other direction.
Comment 1 Michael Bommarito 2006-12-09 12:13:36 UTC
Created attachment 9769 [details]
kernel log

Kernel log (debugging enabled)
Comment 2 Michael Bommarito 2006-12-09 12:14:39 UTC
Created attachment 9770 [details]
.config

The last of many .config attempts
Comment 3 Michael Bommarito 2006-12-10 08:36:42 UTC
I'd jumped from 2.6.18.2 to 2.6.19-git7 and missed 2.6.19.

Tested 2.6.19 this morning and it appears to work without hitch.

Based on the changelog since 2.6.19 and symbols in the calltrace, I'll posit
that one of these commits is the culprit:
* 359f2d17e32b32f53577375f83fb06d34e31bfe8 
* cc8ce997d2a4e524b1acea44beaf5bcfefdb1bfe
* 2b50c24554d31c2db2f93b1151b5991e62f96594
* 571d6eee9b5bce28fcbeb7588890ad5ca3f8c718
* b6d2b1db0637ff35127f3cc38c04f289a0ee0579

Have some RL problems to work on but I'll try reversing these commits later this
afternoon and determining which (if any) is the problem (assuming no one beats
me to it).
Comment 4 Michael Bommarito 2006-12-11 08:03:12 UTC
Quick reversal of previously mentioned commits led to too many conflicts in the
mm/scheduler code for me to deal with.  The responsible parties should probably
be the ones to do the merging, although I'm relatively confident the bug is the
result of an INIT_WORK call on an unchecked data argument.

That said though, just to be safe, I applied -git17 with its single ieee80211
patch and, as was to be assumed, the OOPS remained.
Comment 5 Michael Bommarito 2006-12-11 08:29:42 UTC
Created attachment 9783 [details]
OOPS dmesg

Reversed prior -git, applied -git17, did `make mrproper`, set all DEBUG flags
to y, and did a full rebuild.

This is the dmesg from the first WARNING through the OOPS and my SysRQ dump.
Comment 6 Michael Bommarito 2006-12-13 10:14:08 UTC
Created attachment 9803 [details]
Patch

Patch to pass assocation work instead of mac device structure