Bug 16099
Summary: | joining a mesh causes kernel fault with rt73 | ||
---|---|---|---|
Product: | Networking | Reporter: | Christian Mehlis (mehlis) |
Component: | Wireless | Assignee: | John W. Linville (linville) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | andrey, gwingerde, IvDoorn, javier, linville, mehlis |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.34-generic | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
kernel.log
lsusb lspci -v full syslog full kern.log full messages 0001-mac80211-avoid-scheduling-while-atomic-in-mesh_rx_pl.patch 0001-mac80211-avoid-scheduling-while-atomic-in-mesh_rx_pl.patch |
Created attachment 26605 [details]
lsusb
Created attachment 26606 [details]
lspci -v
Hmmm, the kernel.log file looks a bit odd, as information as to why the stack trace was generated seems to be missing. Is this really all there is in the kernel log, or did you only include parts of the full log? Created attachment 26623 [details]
full syslog
Created attachment 26624 [details]
full kern.log
3.5 mb
Created attachment 26625 [details]
full messages
3.3 mb
steps to reproduce: create a mesh device on host a (rt73) create a mesh device on host b iw event -t on a shows: 1275510283.394517: mesh0: new station 00:16:e3:97:2c:82 then this stuff is happening on a OK. It seems to be a "Scheduling in atomic" error, indicating we are trying to schedule with a spinlock held. John, looking at this, it seems that the mesh code is calling the mac80211 bss_info_changed callback function of the driver with a spinlock held, where the rest of mac80211 will never do that. Therefore I believe this to be a bug in the mesh code, rather than in rt2x00. Should we consider marking rt2x00 as not supporting mesh mode? I've already submitted a patch which disables Mesh mode upstream. I'll recheck when I get back from Berlin later this week. Apparently I forgot that patch earlier. I send it upstream a few minutes ago: [PATCH 1/2] mac80211: Fix bss_info_changed comment regarding sleeping [PATCH 2/2] rt2x00: Disable Mesh mode for USB drivers The first one is just a documentation fix for mac80211, while the second patch solves the actual problem (although it simply removes the Mesh feature for USB devices). Alright, sorry for my delay... It looks to me like this is called by calling mesh_plink_inc_estab_count (which calls ieee80211_bss_info_change_notify) from inside mesh_rx_plink_frame while holding sta->lock. I don't really see why we need to hold sta->lock while incrementing that count. Am I on crack? :-) *time passes* OK, so the bits related to mesh_plink_dec_estab_count were slightly more complicated. Hopefully I'm not missing anything -- patch to follow... Created attachment 26888 [details]
0001-mac80211-avoid-scheduling-while-atomic-in-mesh_rx_pl.patch
Created attachment 26889 [details]
0001-mac80211-avoid-scheduling-while-atomic-in-mesh_rx_pl.patch
commit c937019761a758f2749b1f3a032b7a91fb044753 Author: John W. Linville <linville@tuxdriver.com> Date: Mon Jun 21 17:14:07 2010 -0400 mac80211: avoid scheduling while atomic in mesh_rx_plink_frame While mesh_rx_plink_frame holds sta->lock... mesh_rx_plink_frame -> mesh_plink_inc_estab_count -> ieee80211_bss_info_change_notify ...but ieee80211_bss_info_change_notify is allowed to sleep. A driver taking advantage of that allowance can cause a scheduling while atomic bug. Similar paths exist for mesh_plink_dec_estab_count, so work around those as well. http://bugzilla.kernel.org/show_bug.cgi?id=16099 Also, correct a minor kerneldoc comment error (mismatched function names). Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: stable@kernel.org |
Created attachment 26604 [details] kernel.log if two mesh devices merging into one mesh, the rt73 module fails, see attachment