Bug 10821

Summary: rt25xx: lock dependency warning, association failure, and kmalloc corruption
Product: Networking Reporter: Christian Casteyde (casteyde.christian)
Component: WirelessAssignee: Ivo van Doorn (IvDoorn)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: blocking CC: bunk, IvDoorn, linville
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.26-rc4 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 10492    
Attachments: dmesg log just after inserting
next logs when starting wpa_supplicant
QOS queue initialization
Don't use mac80211 workqueue
Make entry->flags and rt2x00->flags atomic
free pointer corrupt with -rc6+bits.diff patch+first patch for lock
Make rt2x00->flags atomic and fix race in remove_dev()
Make rt2x00->flags atomic and fix race in remove_dev() - v2
dmesg output for 2.6.26-rc8+bits2.diff+work.diff
Make rt2x00->flags atomic and fix race in remove_dev() - v3
dmesg output for 2.6.26-rc8+bits3.diff+work.diff
dmesg output for 2.6.26 final

Description Christian Casteyde 2008-05-29 14:30:43 UTC
Latest working kernel version: 2.6.25.4
Earliest failing kernel version: 2.6.26-rc4
Previous rc not tested
Distribution: Bluewhite64 (64bit slackware)
Hardware Environment: rt2570 USB dongle
Software Environment: wpa_supplicant
Problem Description:
rt25xx USB wireless driver doesn't seem to work anymore.
First, when I plug it in, I get a lock dependency warning.
Second, it doesn't associate anymore (always falling in timeout).
And last, it corrupts the kernel memory when I unplug it.

The first dmesg output is:
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.26-rc4 #4
-------------------------------------------------------
rt2500usb/3123 is trying to acquire lock:
 (rtnl_mutex){--..}, at: [<ffffffff80548562>] rtnl_lock+0x12/0x20

but task is already holding lock:
 (&rt2x00dev->intf_work){--..}, at: [<ffffffff80243c32>] run_workqueue+0x122/0x220

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&rt2x00dev->intf_work){--..}:
       [<ffffffff80255d72>] __lock_acquire+0xcc2/0x1120
       [<ffffffff80256227>] lock_acquire+0x57/0x80
       [<ffffffff80243c77>] run_workqueue+0x167/0x220
       [<ffffffff802447d6>] worker_thread+0xa6/0x110
       [<ffffffff8024778d>] kthread+0x4d/0x80
       [<ffffffff8020c238>] child_rip+0xa/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 ((name)){--..}:
       [<ffffffff80255d72>] __lock_acquire+0xcc2/0x1120
       [<ffffffff80256227>] lock_acquire+0x57/0x80
       [<ffffffff80244435>] flush_workqueue+0x45/0x70
       [<ffffffff806143c1>] ieee80211_stop+0x1b1/0x3c0
       [<ffffffff8053dfd3>] dev_close+0x63/0xb0
       [<ffffffff8053dc01>] dev_change_flags+0xa1/0x1d0
       [<ffffffff8059098c>] devinet_ioctl+0x5bc/0x760
       [<ffffffff80590fed>] inet_ioctl+0x6d/0x90
       [<ffffffff8052fdc7>] sock_ioctl+0xc7/0x250
       [<ffffffff8029e6a1>] vfs_ioctl+0x31/0x90
       [<ffffffff8029e960>] do_vfs_ioctl+0x260/0x2e0
       [<ffffffff8029ea2a>] sys_ioctl+0x4a/0x80
       [<ffffffff8020b79b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (rtnl_mutex){--..}:
       [<ffffffff80255ba0>] __lock_acquire+0xaf0/0x1120
       [<ffffffff80256227>] lock_acquire+0x57/0x80
       [<ffffffff8063b37f>] mutex_lock_nested+0x8f/0x2c0
       [<ffffffff80548562>] rtnl_lock+0x12/0x20
       [<ffffffff80629887>] ieee80211_iterate_active_interfaces+0x27/0xa0
       [<ffffffff8042f59e>] rt2x00lib_intf_scheduled+0x1e/0x20
       [<ffffffff80243c7d>] run_workqueue+0x16d/0x220
       [<ffffffff802447d6>] worker_thread+0xa6/0x110
       [<ffffffff8024778d>] kthread+0x4d/0x80
       [<ffffffff8020c238>] child_rip+0xa/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by rt2500usb/3123:
 #0:  ((name)){--..}, at: [<ffffffff80243c32>] run_workqueue+0x122/0x220
 #1:  (&rt2x00dev->intf_work){--..}, at: [<ffffffff80243c32>] run_workqueue+0x122/0x220

stack backtrace:
Pid: 3123, comm: rt2500usb Not tainted 2.6.26-rc4 #4

Call Trace:
 [<ffffffff80253ac3>] print_circular_bug_tail+0x83/0x90
 [<ffffffff80253159>] ? print_circular_bug_entry+0x49/0x60
 [<ffffffff80255ba0>] __lock_acquire+0xaf0/0x1120
 [<ffffffff80548562>] ? rtnl_lock+0x12/0x20
 [<ffffffff80256227>] lock_acquire+0x57/0x80
 [<ffffffff80548562>] ? rtnl_lock+0x12/0x20
 [<ffffffff8063b37f>] mutex_lock_nested+0x8f/0x2c0
 [<ffffffff80430030>] ? rt2x00lib_intf_scheduled_iter+0x0/0x100
 [<ffffffff80548562>] rtnl_lock+0x12/0x20
 [<ffffffff80629887>] ieee80211_iterate_active_interfaces+0x27/0xa0
 [<ffffffff8042f580>] ? rt2x00lib_intf_scheduled+0x0/0x20
 [<ffffffff8042f59e>] rt2x00lib_intf_scheduled+0x1e/0x20
 [<ffffffff80243c7d>] run_workqueue+0x16d/0x220
 [<ffffffff802447d6>] worker_thread+0xa6/0x110
 [<ffffffff80247ba0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80244730>] ? worker_thread+0x0/0x110
 [<ffffffff8024778d>] kthread+0x4d/0x80
 [<ffffffff8020c238>] child_rip+0xa/0x12
 [<ffffffff8020bdf3>] ? restore_args+0x0/0x30
 [<ffffffff80247740>] ? kthread+0x0/0x80
 [<ffffffff8020c22e>] ? child_rip+0x0/0x12


2./When I start wpa_supplicant, I get:
ADDRCONF(NETDEV_UP): wlan0: link is not ready
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authentication with AP 00:07:cb:55:12:b0 timed out
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: RX authentication from 00:07:cb:55:12:b0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: association with AP 00:07:cb:55:12:b0 timed out
ADDRCONF(NETDEV_UP): wlan0: link is not ready
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authentication with AP 00:07:cb:55:12:b0 timed out
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: RX authentication from 00:07:cb:55:12:b0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: association with AP 00:07:cb:55:12:b0 timed out


3./ When I unplug it, the kernel barks:
usb 1-4: USB disconnect, address 3
=============================================================================
BUG kmalloc-96: Freepointer corrupt
-----------------------------------------------------------------------------

INFO: Allocated in rt2x00queue_alloc_entries+0x49/0xc0 age=51192 cpu=0 pid=3152
INFO: Freed in sysfs_release+0x62/0xa0 age=51227 cpu=0 pid=3131
INFO: Slab 0xffffe200011383c8 objects=24 used=10 fp=0xffff81004eb7f9d8 flags=0x40000000000000c3
INFO: Object 0xffff81004eb7f150 @offset=336 fp=0xffff81004a4c5210

Bytes b4 0xffff81004eb7f140:  00 ae ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a .<AE><FF><FF>....ZZZZZZZ
Z
  Object 0xffff81004eb7f150:  00 00 00 00 00 00 00 00 28 9f 0c 4d 00 81 ff ff ........(..M..<FF><FF>
  Object 0xffff81004eb7f160:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81004eb7f170:  78 f1 b7 4e 00 81 ff ff 08 51 4c 4a 00 81 ff ff x<F1><B7>N..<FF><FF>.QLJ
..<FF><FF>
  Object 0xffff81004eb7f180:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81004eb7f190:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81004eb7f1a0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
 Redzone 0xffff81004eb7f1b0:  cc cc cc cc cc cc cc cc                         <CC><CC><CC><CC><CC><CC>
<CC><CC>
 Padding 0xffff81004eb7f1f0:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Pid: 191, comm: khubd Not tainted 2.6.26-rc4 #4
Call Trace:
 [<ffffffff8028ac39>] print_trailer+0xf9/0x160
 [<ffffffff8028adbc>] object_err+0x3c/0x50
 [<ffffffff8028b15a>] check_object+0x15a/0x250
 [<ffffffff8028c695>] __slab_free+0x265/0x370
 [<ffffffff8043181f>] ? rt2x00queue_uninitialize+0x1f/0x60
 [<ffffffff8043181f>] ? rt2x00queue_uninitialize+0x1f/0x60
 [<ffffffff8028c91a>] kfree+0xaa/0x120
 [<ffffffff8043181f>] rt2x00queue_uninitialize+0x1f/0x60
 [<ffffffff8042f5cd>] rt2x00lib_uninitialize+0x2d/0x40
 [<ffffffff8042f881>] rt2x00lib_remove_dev+0x21/0x40
 [<ffffffff80432a6a>] rt2x00usb_disconnect+0x2a/0x70
 [<ffffffff804983d0>] usb_unbind_interface+0x70/0xd0
 [<ffffffff803eb02c>] __device_release_driver+0x6c/0xa0
 [<ffffffff803eb15b>] device_release_driver+0x2b/0x40
 [<ffffffff803ea2ad>] bus_remove_device+0x8d/0xb0
 [<ffffffff803e8b3b>] device_del+0x10b/0x190
 [<ffffffff804950d1>] usb_disable_device+0x91/0x120
 [<ffffffff80490413>] usb_disconnect+0xa3/0x140
 [<ffffffff80491866>] hub_thread+0x426/0xf10
 [<ffffffff80247ba0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80491440>] ? hub_thread+0x0/0xf10
 [<ffffffff8024778d>] kthread+0x4d/0x80
 [<ffffffff8020c238>] child_rip+0xa/0x12
 [<ffffffff8020bdf3>] ? restore_args+0x0/0x30
 [<ffffffff80247740>] ? kthread+0x0/0x80
 [<ffffffff8020c22e>] ? child_rip+0x0/0x12

In the end the device is not usable anymore. It indeed associates with kernel 2.6.25.4.

Steps to reproduce:
I guess try a Ralink USB dongle with this chipset.
Comment 1 Ivo van Doorn 2008-06-13 10:55:22 UTC
Please retest with 2.6.26-rc5 or 2.6.26-rc6
Comment 2 Christian Casteyde 2008-06-13 11:11:43 UTC
Both still fail with -rc6.
At device plug in, I get:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.26-rc6 #10
-------------------------------------------------------
rt2500usb/3488 is trying to acquire lock:
 (rtnl_mutex){--..}, at: [<ffffffff805479d2>] rtnl_lock+0x12/0x20

but task is already holding lock:
 (&rt2x00dev->intf_work){--..}, at: [<ffffffff80243d52>] run_workqueue+0x122/0x220

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&rt2x00dev->intf_work){--..}:
       [<ffffffff80255ec2>] __lock_acquire+0xcc2/0x1120
       [<ffffffff80256377>] lock_acquire+0x57/0x80
       [<ffffffff80243d97>] run_workqueue+0x167/0x220
       [<ffffffff802448f6>] worker_thread+0xa6/0x110
       [<ffffffff802478ad>] kthread+0x4d/0x80
       [<ffffffff8020c238>] child_rip+0xa/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 ((name)){--..}:
       [<ffffffff80255ec2>] __lock_acquire+0xcc2/0x1120
       [<ffffffff80256377>] lock_acquire+0x57/0x80
       [<ffffffff80244555>] flush_workqueue+0x45/0x70
       [<ffffffff80613936>] ieee80211_stop+0x1c6/0x3e0
       [<ffffffff8053d433>] dev_close+0x63/0xb0
       [<ffffffff8053d061>] dev_change_flags+0xa1/0x1d0
       [<ffffffff8058fe8b>] devinet_ioctl+0x5ab/0x750
       [<ffffffff805904ed>] inet_ioctl+0x6d/0x90
       [<ffffffff8052f217>] sock_ioctl+0xc7/0x250
       [<ffffffff8029d731>] vfs_ioctl+0x31/0x90
       [<ffffffff8029d9f0>] do_vfs_ioctl+0x260/0x2e0
       [<ffffffff8029daba>] sys_ioctl+0x4a/0x80
       [<ffffffff8020b79b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (rtnl_mutex){--..}:
       [<ffffffff80255cf0>] __lock_acquire+0xaf0/0x1120
       [<ffffffff80256377>] lock_acquire+0x57/0x80
       [<ffffffff8063a7bf>] mutex_lock_nested+0x8f/0x2c0
       [<ffffffff805479d2>] rtnl_lock+0x12/0x20
       [<ffffffff80628f27>] ieee80211_iterate_active_interfaces+0x27/0xa0
       [<ffffffff8042e7ae>] rt2x00lib_intf_scheduled+0x1e/0x20
       [<ffffffff80243d9d>] run_workqueue+0x16d/0x220
       [<ffffffff802448f6>] worker_thread+0xa6/0x110
       [<ffffffff802478ad>] kthread+0x4d/0x80
       [<ffffffff8020c238>] child_rip+0xa/0x12
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by rt2500usb/3488:
 #0:  ((name)){--..}, at: [<ffffffff80243d52>] run_workqueue+0x122/0x220
 #1:  (&rt2x00dev->intf_work){--..}, at: [<ffffffff80243d52>] run_workqueue+0x122/0x220

stack backtrace:
Pid: 3488, comm: rt2500usb Not tainted 2.6.26-rc6 #10

Call Trace:
 [<ffffffff80253c13>] print_circular_bug_tail+0x83/0x90
 [<ffffffff802532a9>] ? print_circular_bug_entry+0x49/0x60
 [<ffffffff80255cf0>] __lock_acquire+0xaf0/0x1120
 [<ffffffff805479d2>] ? rtnl_lock+0x12/0x20
 [<ffffffff80256377>] lock_acquire+0x57/0x80
 [<ffffffff805479d2>] ? rtnl_lock+0x12/0x20
 [<ffffffff8063a7bf>] mutex_lock_nested+0x8f/0x2c0
 [<ffffffff8042f240>] ? rt2x00lib_intf_scheduled_iter+0x0/0x100
 [<ffffffff805479d2>] rtnl_lock+0x12/0x20
 [<ffffffff80628f27>] ieee80211_iterate_active_interfaces+0x27/0xa0
 [<ffffffff8042e790>] ? rt2x00lib_intf_scheduled+0x0/0x20
 [<ffffffff8042e7ae>] rt2x00lib_intf_scheduled+0x1e/0x20
 [<ffffffff80243d9d>] run_workqueue+0x16d/0x220
 [<ffffffff802448f6>] worker_thread+0xa6/0x110
 [<ffffffff80247cc0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80244850>] ? worker_thread+0x0/0x110
 [<ffffffff802478ad>] kthread+0x4d/0x80
 [<ffffffff8020c238>] child_rip+0xa/0x12
 [<ffffffff8020bdf3>] ? restore_args+0x0/0x30
 [<ffffffff80247860>] ? kthread+0x0/0x80
 [<ffffffff8020c22e>] ? child_rip+0x0/0x12

ADDRCONF(NETDEV_UP): wlan0: link is not ready

After that, I can do iwlist scan now. However, association still fails with:

/usr/sbin/wpa_supplicant -Bw -c/etc/wpa_supplicant.conf -Dwext -ieth1
->
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authentication with AP 00:07:cb:55:12:b0 timed out
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: RX authentication from 00:07:cb:55:12:b0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: RX authentication from 00:07:cb:55:12:b0 (alg=0 transaction=2 status=0)
wlan0: authenticated
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: authentication frame received from 00:07:cb:55:12:b0, but not in authenticate state - ignored
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: associate with AP 00:07:cb:55:12:b0
wlan0: association with AP 00:07:cb:55:12:b0 timed out
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: Initial auth_alg=0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: association frame received from 00:07:cb:55:12:b0, but not in associate state - ignored
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authenticate with AP 00:07:cb:55:12:b0
wlan0: authentication with AP 00:07:cb:55:12:b0 timed out
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)
wlan0: deauthenticated
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)
wlan0: RX deauthentication from 00:07:cb:55:12:b0 (reason=15)

Then at device removal:
usb 1-3: USB disconnect, address 4
=============================================================================
BUG kmalloc-96: Freepointer corrupt
-----------------------------------------------------------------------------

INFO: Allocated in rt2x00queue_alloc_entries+0x49/0xc0 age=82828 cpu=0 pid=3519
INFO: Freed in sysfs_release+0x62/0xa0 age=82869 cpu=0 pid=3496
INFO: Slab 0xffffe20001118688 objects=24 used=13 fp=0xffff81004e2679d8 flags=0x40000000000000c3
INFO: Object 0xffff81004e267d20 @offset=3360 fp=0xffff81003dd37000

Bytes b4 0xffff81004e267d10:  95 4e 01 00 01 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a .N......ZZZZZZZZ
  Object 0xffff81004e267d20:  00 00 00 00 00 00 00 00 08 56 3b 4d 00 81 ff ff .........V;M..��
  Object 0xffff81004e267d30:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81004e267d40:  48 7d 26 4e 00 81 ff ff 70 ae d8 3d 00 81 ff ff H}&N..��p��=..��
  Object 0xffff81004e267d50:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81004e267d60:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
  Object 0xffff81004e267d70:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
 Redzone 0xffff81004e267d80:  cc cc cc cc cc cc cc cc                         ��������
 Padding 0xffff81004e267dc0:  5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
Pid: 191, comm: khubd Not tainted 2.6.26-rc6 #10

Call Trace:
 [<ffffffff80289cc9>] print_trailer+0xf9/0x160
 [<ffffffff80289e4c>] object_err+0x3c/0x50
 [<ffffffff8028a1ea>] check_object+0x15a/0x250
 [<ffffffff8028b725>] __slab_free+0x265/0x370
 [<ffffffff80430a0f>] ? rt2x00queue_uninitialize+0x1f/0x60
 [<ffffffff80430a0f>] ? rt2x00queue_uninitialize+0x1f/0x60
 [<ffffffff8028b9aa>] kfree+0xaa/0x120
 [<ffffffff80430a0f>] rt2x00queue_uninitialize+0x1f/0x60
 [<ffffffff8042e7dd>] rt2x00lib_uninitialize+0x2d/0x40
 [<ffffffff8042ea91>] rt2x00lib_remove_dev+0x21/0x40
 [<ffffffff80431c5a>] rt2x00usb_disconnect+0x2a/0x70
 [<ffffffff80497660>] usb_unbind_interface+0x70/0xd0
 [<ffffffff803ea19c>] __device_release_driver+0x6c/0xa0
 [<ffffffff803ea2cb>] device_release_driver+0x2b/0x40
 [<ffffffff803e941d>] bus_remove_device+0x8d/0xb0
 [<ffffffff803e7cab>] device_del+0x10b/0x190
 [<ffffffff80494361>] usb_disable_device+0x91/0x120
 [<ffffffff8048f663>] usb_disconnect+0xa3/0x140
 [<ffffffff80490ab6>] hub_thread+0x426/0xf50
 [<ffffffff80247cc0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff80490690>] ? hub_thread+0x0/0xf50
 [<ffffffff802478ad>] kthread+0x4d/0x80
 [<ffffffff8020c238>] child_rip+0xa/0x12
 [<ffffffff8020bdf3>] ? restore_args+0x0/0x30
 [<ffffffff80247860>] ? kthread+0x0/0x80
 [<ffffffff8020c22e>] ? child_rip+0x0/0x12

so the problems are still there.
Comment 3 Ivo van Doorn 2008-06-16 08:36:57 UTC
Could you test wireless-testing.git to see if the problem persists in there?
Comment 4 Christian Casteyde 2008-06-22 14:28:12 UTC
Yes, I did the tests:
-rc7 : unchanged, same results.

wireless-testing from this morning (22/06 10h GMT+1):
- the circular lock dependency at plug in is still there, but doesn't always trigger (therefore this is a race condition that triggers it). I appended the dmesg log for it.
- the scan is possible, but it's still impossible to associate. Also appended.
- at least, the free corruption at unplug is fixed (the most important I think for data integrity).
Comment 5 Christian Casteyde 2008-06-22 14:29:28 UTC
Created attachment 16581 [details]
dmesg log just after inserting

not reproduceable at each insertion
Comment 6 Christian Casteyde 2008-06-22 14:30:52 UTC
Created attachment 16582 [details]
next logs when starting wpa_supplicant

note that the "link" led is not active, whereas it is in 2.6.25. I guess the driver is not functionnal, however scanning works.
Comment 7 Ivo van Doorn 2008-06-22 15:03:53 UTC
Thanks for the test, I'll check which patches are between 2.6.26-rc6 and wireless-testing which might have solved this free corruption.

The locking issue seems to be triggered by a 'ifdown' event, but you get this warning during start which is odd. You *should* be okay as long as no 'ifdown' command is given. But I have spotted a possible problem when 'ifdown' is called while rt2x00 is still busy configuring the hardware. I'll see what I can do about that. 

The leds are at this moment non-informative, if they don't blink it doesn't mean the driver doesn't work. I still have to investigate that issue some more, but leds aren't an high priority for me. ;)
Comment 8 Christian Casteyde 2008-06-22 15:10:22 UTC
From experience, I'm not sure udev doesn't trigger some configuration scripts that could do ifdown before reconfiguring the interface. In fact it is very possible a ifdown is executed when I plug it, because to do test manually, I didn't activated the wlan0 if in my rc scripts. It is slackware, the scripts are easy to understand, I can look at it if necessary.
OK for the leds :-)
Comment 9 Ivo van Doorn 2008-06-23 12:06:51 UTC
Created attachment 16587 [details]
QOS queue initialization

Please test attached patch to see if the segmentation fault still persists.

I don't have high hopes on this patch, but it seems to be the only patch in wireless-testing that fixed a similar crash, and we might be lucky and this works for your crash as well. :)
Comment 10 Adrian Bunk 2008-06-24 07:22:15 UTC
Handled-By      : Ivo van Doorn <IvDoorn@gmail.com>
Comment 11 Ivo van Doorn 2008-06-24 09:36:24 UTC
Created attachment 16600 [details]
Don't use mac80211 workqueue

Above the previous patch, please try this patch as well.
This should fix the locking dependency problem
Comment 12 Christian Casteyde 2008-06-24 14:56:32 UTC
With the two patches on -rc7, I didn't managed to reproduce the lock error at plug in, but it doesn't occur very often anyway. So it may not be significant.
The free pointer corruption at removal is still there however (and very reproduceable).
It's still impossible to associate - and the link led is still off (whereas it is on on 2.6.25 and I associate, so I guess even if it is not used in the driver, it is a clue that the hardware indeed works for my key).
Comment 13 Ivo van Doorn 2008-06-24 15:08:45 UTC
Ok, I'll push the second patch as bugfix for the lockdep issue tomorrow (I'm quite confident that this patch is indeed the bugfix). But let me know if it reappears :)

I'll see what I can discover on that free pointer
Comment 14 Ivo van Doorn 2008-06-26 11:09:52 UTC
Created attachment 16640 [details]
Make entry->flags and rt2x00->flags atomic

Could you try this patch to see if this helps in fixing the freepointer problem?
Currently my best guess is a race condition where the queues are freed twice, so far you are the only one reporting this, so it would seem rare enough for a race condition. ;)
Comment 15 Christian Casteyde 2008-06-28 00:42:25 UTC
I removed the queue patch and added our last patch (bits.diff) on -rc7, it still crashes. I append the log.
Sorry ;-)
Comment 16 Christian Casteyde 2008-06-28 00:44:21 UTC
Created attachment 16649 [details]
free pointer corrupt with -rc6+bits.diff patch+first patch for lock

may be this one will speak a little more?
Comment 17 Ivo van Doorn 2008-06-29 10:43:02 UTC
Created attachment 16650 [details]
Make rt2x00->flags atomic and fix race in remove_dev()

Ok updated patch attached. I missed one if-statement which could perhaps lead to a race condition. Let's hope for the best. :)
Comment 18 Christian Casteyde 2008-07-01 22:40:09 UTC
with -rc8, the last patch does'nt build. I got:

drivers/net/wireless/rt2x00/rt2x00dev.c: In function ‘rt2x00lib_intf_scheduled_iter’:
drivers/net/wireless/rt2x00/rt2x00dev.c:434: erreur: ‘DEVICE_ENABLED_RADIO’ undeclared (first use in this function)
drivers/net/wireless/rt2x00/rt2x00dev.c:434: erreur: (Each undeclared identifier is reported only once
drivers/net/wireless/rt2x00/rt2x00dev.c:434: erreur: for each function it appears in.)
make[4]: *** [drivers/net/wireless/rt2x00/rt2x00dev.o] Erreur 1
make[3]: *** [drivers/net/wireless/rt2x00] Erreur 2
make[2]: *** [drivers/net/wireless] Erreur 2
make[1]: *** [drivers/net] Erreur 2
make: *** [drivers] Erreur 2
  CC      net/ipv6/af_inet6.o
make: *** Attente des tâches non terminées....
Comment 19 Ivo van Doorn 2008-07-02 02:48:26 UTC
Created attachment 16691 [details]
Make rt2x00->flags atomic and fix race in remove_dev() - v2

Sorry about that, I hadn't created the patch on top of 16600 :S
This new patch should do the trick.
Comment 20 Christian Casteyde 2008-07-02 14:25:29 UTC
Created attachment 16703 [details]
dmesg output for 2.6.26-rc8+bits2.diff+work.diff

Well, sorry, it still does not work.
I appended another dmesg, it's quite different, maybe usefull.
Comment 21 Ivo van Doorn 2008-07-02 15:04:59 UTC
Created attachment 16704 [details]
Make rt2x00->flags atomic and fix race in remove_dev() - v3

Argh, forgot a negation somewhere. :S
Please try updated patch.
Comment 22 Christian Casteyde 2008-07-03 10:22:34 UTC
Created attachment 16721 [details]
dmesg output for 2.6.26-rc8+bits3.diff+work.diff

sorry, still not there :-(
Comment 23 Christian Casteyde 2008-07-14 13:19:02 UTC
Created attachment 16816 [details]
dmesg output for 2.6.26 final

For the record, here is the dmesg output for final 2.6.26.
It shows the logs first when executing wpa_supplicant, which doesn't manage to associate to my AP, then the free pointer when I remove the device (after killing wpa_supplicant btw).
I managed to get the link led working fot the device, it was a change in kernel configuration (had to add support for led devices, then add led driver in rt driver). However, it effectively doesn't help.
Comment 24 Christian Casteyde 2008-08-19 13:28:42 UTC
Everything seems to work with 2.6.27-rc3 now.

I managed to plug it in, associate, get an IP address, access to the Internet, and unplug the device event without ifconfig it down, without any problem or any warning from the kernel compiled with hacking options.

Thanks