Latest working kernel version: 2.6.19.2 Earliest failing kernel version: 2.6.24.2 Distribution: Debian Lenny/Sid Hardware Environment: athlon XP 2400+ using a zd1211 device (driver zd1211rw) Software Environment: X11 with Gnome; crashed while using firefox (iceweasel) Problem Description: System crashes completely. It seems related to wireless network usage, I've used my system several times without connecting the wifi device (and without any other network interface enabled). I haven't found the problem on 2.6.19.2 kernel I think because zd1211rw driver didn't work for my card Here's the log (not flushed to disk!!!) ------------------------------ Kernel BUG at kernel/timer.c: 607! Invalid opcode: 0000 [#1] Modules linked in: cpufreq_stats nls_cp437 sbp2 scsi_mod loop zd1211rw ieee80211softmac parport_pc parport ohci1394 snd_intel8x0 ieee1394 sis900 ehci_hcd ide_cd cdrom fan asus_acpi backlight battery ac Pid 3239, comm: firefox-bin Not tainted (2.6.24.2 #1) EIP:0060 :[<c011e54b>] EFLAGS:00210007 CPU:0 EIP is at cascade+0x3b/0x57 EAX:0 EBX:0 ECX:5 EDX:d9eb3ca4 ESI:5 EDI:c0485640 EBP:d9ecdf30 ESP:d9ecdf30 DS:007b ES:007b FS:0000 GS:0033 SS:0068 ... Call trace [<c011e6ad>] run_timer_softirq+0x55/0x141 [<c012b8e3>] tick_handle_periodic+0xf/0x54 [<c011bdcc>] __do_softirq+0x35/0x75 [<c011be2e>] do_softirq+022/0x26 [<c01055b0>] do_IRQ+0x58/0x6b [<c033b1a7>] schedule+0x1f0/0x20a [<c01045e7>] common_interrupt+0x23/0x28 Kernel Panic - not syncing: Fatal exception in interrupt Steps to reproduce: Stress network
Doh, some stupid code is calling init_timer() on an enqueued timer. I whip up a patch which allows us to debug this.
Created attachment 14974 [details] debug patch Marco, can you please apply the attached patch and provide the debug output ? Thanks, tglx
Reply-To: akpm@linux-foundation.org On Fri, 22 Feb 2008 11:16:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=10068 > > Summary: timer.c crash using WI-FI (current process: firefox) > Product: Timers > Version: 2.5 > KernelVersion: 2.6.24.2 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Other > AssignedTo: johnstul@us.ibm.com > ReportedBy: zacmarco@yahoo.it > > > Latest working kernel version: 2.6.19.2 > Earliest failing kernel version: 2.6.24.2 > Distribution: Debian Lenny/Sid > Hardware Environment: athlon XP 2400+ using a zd1211 device (driver zd1211rw) > Software Environment: X11 with Gnome; crashed while using firefox (iceweasel) > > Problem Description: > System crashes completely. It seems related to wireless network usage, I've > used my system several times without connecting the wifi device (and without > any other network interface enabled). > I haven't found the problem on 2.6.19.2 kernel I think because zd1211rw > driver > didn't work for my card > Here's the log (not flushed to disk!!!) > > ------------------------------ > > Kernel BUG at kernel/timer.c: 607! > Invalid opcode: 0000 [#1] > Modules linked in: cpufreq_stats nls_cp437 sbp2 scsi_mod loop zd1211rw > ieee80211softmac parport_pc parport ohci1394 snd_intel8x0 ieee1394 sis900 > ehci_hcd ide_cd cdrom fan asus_acpi backlight battery ac > > Pid 3239, comm: firefox-bin Not tainted (2.6.24.2 #1) > EIP:0060 :[<c011e54b>] EFLAGS:00210007 CPU:0 > EIP is at cascade+0x3b/0x57 > EAX:0 EBX:0 ECX:5 EDX:d9eb3ca4 > ESI:5 EDI:c0485640 EBP:d9ecdf30 ESP:d9ecdf30 > DS:007b ES:007b FS:0000 GS:0033 SS:0068 > > ... > > Call trace > > [<c011e6ad>] run_timer_softirq+0x55/0x141 > [<c012b8e3>] tick_handle_periodic+0xf/0x54 > [<c011bdcc>] __do_softirq+0x35/0x75 > [<c011be2e>] do_softirq+022/0x26 > [<c01055b0>] do_IRQ+0x58/0x6b > [<c033b1a7>] schedule+0x1f0/0x20a > [<c01045e7>] common_interrupt+0x23/0x28 > > Kernel Panic - not syncing: Fatal exception in interrupt > urgh. Yes, it's probably a wireless driver bug. But look at the BUG_ON(): static int cascade(tvec_base_t *base, tvec_t *tv, int index) { /* cascade all the timers from tv up one level */ struct timer_list *timer, *tmp; struct list_head tv_list; list_replace_init(tv->vec + index, &tv_list); /* * We are removing _all_ timers from the list, so we * don't have to detach them individually. */ list_for_each_entry_safe(timer, tmp, &tv_list, entry) { BUG_ON(tbase_get_base(timer->base) != base); internal_add_timer(base, timer); } return index; } if we're going to detect some bug, we shold provide _some_ information telling the poor programmer what he did wrong! This one is very obscure. Seems we found a timer on CPU A's list, but the timer thinks it's on timer B's list. Or not on a list at all. Question is: what sequence of timer interace calls could have caused this to occur? And can we add a check for that bug at the time where it occurs, rather later on in the timer interrupt handler?
On 02/25, Andrew Morton wrote: > > On Fri, 22 Feb 2008 11:16:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=10068 > > > > Summary: timer.c crash using WI-FI (current process: firefox) > > Product: Timers > > Version: 2.5 > > KernelVersion: 2.6.24.2 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: blocking > > Priority: P1 > > Component: Other > > AssignedTo: johnstul@us.ibm.com > > ReportedBy: zacmarco@yahoo.it > > > > > > Latest working kernel version: 2.6.19.2 > > Earliest failing kernel version: 2.6.24.2 > > Distribution: Debian Lenny/Sid > > Hardware Environment: athlon XP 2400+ using a zd1211 device (driver > zd1211rw) > > Software Environment: X11 with Gnome; crashed while using firefox > (iceweasel) > > > > Problem Description: > > System crashes completely. It seems related to wireless network usage, I've > > used my system several times without connecting the wifi device (and > without > > any other network interface enabled). > > I haven't found the problem on 2.6.19.2 kernel I think because zd1211rw > driver > > didn't work for my card > > Here's the log (not flushed to disk!!!) > > > > ------------------------------ > > > > Kernel BUG at kernel/timer.c: 607! > > Invalid opcode: 0000 [#1] > > Modules linked in: cpufreq_stats nls_cp437 sbp2 scsi_mod loop zd1211rw > > ieee80211softmac parport_pc parport ohci1394 snd_intel8x0 ieee1394 sis900 > > ehci_hcd ide_cd cdrom fan asus_acpi backlight battery ac > > > > Pid 3239, comm: firefox-bin Not tainted (2.6.24.2 #1) > > EIP:0060 :[<c011e54b>] EFLAGS:00210007 CPU:0 > > EIP is at cascade+0x3b/0x57 > > EAX:0 EBX:0 ECX:5 EDX:d9eb3ca4 > > ESI:5 EDI:c0485640 EBP:d9ecdf30 ESP:d9ecdf30 > > DS:007b ES:007b FS:0000 GS:0033 SS:0068 > > > > ... > > > > Call trace > > > > [<c011e6ad>] run_timer_softirq+0x55/0x141 > > [<c012b8e3>] tick_handle_periodic+0xf/0x54 > > [<c011bdcc>] __do_softirq+0x35/0x75 > > [<c011be2e>] do_softirq+022/0x26 > > [<c01055b0>] do_IRQ+0x58/0x6b > > [<c033b1a7>] schedule+0x1f0/0x20a > > [<c01045e7>] common_interrupt+0x23/0x28 > > > > Kernel Panic - not syncing: Fatal exception in interrupt > > > > urgh. > > Yes, it's probably a wireless driver bug. But look at the BUG_ON(): > > static int cascade(tvec_base_t *base, tvec_t *tv, int index) > { > /* cascade all the timers from tv up one level */ > struct timer_list *timer, *tmp; > struct list_head tv_list; > > list_replace_init(tv->vec + index, &tv_list); > > /* > * We are removing _all_ timers from the list, so we > * don't have to detach them individually. > */ > list_for_each_entry_safe(timer, tmp, &tv_list, entry) { > BUG_ON(tbase_get_base(timer->base) != base); > internal_add_timer(base, timer); > } > > return index; > } > > if we're going to detect some bug, we shold provide _some_ information > telling the poor programmer what he did wrong! This one is very obscure. > > Seems we found a timer on CPU A's list, but the timer thinks it's on timer > B's list. Or not on a list at all. > > Question is: what sequence of timer interace calls could have caused this > to occur? And can we add a check for that bug at the time where it occurs, > rather later on in the timer interrupt handler? Most probably the pending timer was corrupted. Say it was freed/reused without del_timer(), or re-initialized. Marco, could you try this patch http://bugzilla.kernel.org/attachment.cgi?id=14183 ? see also http://bugzilla.kernel.org/attachment.cgi?id=14183 The Thomas's patch can also help, but if the pending timer was overwriten ->init_site could be dirtied too. Oleg.
On 02/26, Oleg Nesterov wrote: > > On 02/25, Andrew Morton wrote: > > > > On Fri, 22 Feb 2008 11:16:40 -0800 (PST) bugme-daemon@bugzilla.kernel.org > wrote: > > > > > Kernel BUG at kernel/timer.c: 607! > > > Invalid opcode: 0000 [#1] > > > Modules linked in: cpufreq_stats nls_cp437 sbp2 scsi_mod loop zd1211rw > > > ieee80211softmac parport_pc parport ohci1394 snd_intel8x0 ieee1394 sis900 > > > ehci_hcd ide_cd cdrom fan asus_acpi backlight battery ac > > > > > > Pid 3239, comm: firefox-bin Not tainted (2.6.24.2 #1) > > > EIP:0060 :[<c011e54b>] EFLAGS:00210007 CPU:0 > > > EIP is at cascade+0x3b/0x57 > > > EAX:0 EBX:0 ECX:5 EDX:d9eb3ca4 > > > ESI:5 EDI:c0485640 EBP:d9ecdf30 ESP:d9ecdf30 > > > DS:007b ES:007b FS:0000 GS:0033 SS:0068 > > > > > > ... > > > > > > Call trace > > > > > > [<c011e6ad>] run_timer_softirq+0x55/0x141 > > > [<c012b8e3>] tick_handle_periodic+0xf/0x54 > > > [<c011bdcc>] __do_softirq+0x35/0x75 > > > [<c011be2e>] do_softirq+022/0x26 > > > [<c01055b0>] do_IRQ+0x58/0x6b > > > [<c033b1a7>] schedule+0x1f0/0x20a > > > [<c01045e7>] common_interrupt+0x23/0x28 > > > > > > Kernel Panic - not syncing: Fatal exception in interrupt > > > > > > > urgh. > > > > Yes, it's probably a wireless driver bug. But look at the BUG_ON(): > > > > static int cascade(tvec_base_t *base, tvec_t *tv, int index) > > { > > /* cascade all the timers from tv up one level */ > > struct timer_list *timer, *tmp; > > struct list_head tv_list; > > > > list_replace_init(tv->vec + index, &tv_list); > > > > /* > > * We are removing _all_ timers from the list, so we > > * don't have to detach them individually. > > */ > > list_for_each_entry_safe(timer, tmp, &tv_list, entry) { > > BUG_ON(tbase_get_base(timer->base) != base); > > internal_add_timer(base, timer); > > } > > > > return index; > > } > > > > if we're going to detect some bug, we shold provide _some_ information > > telling the poor programmer what he did wrong! This one is very obscure. > > > > Seems we found a timer on CPU A's list, but the timer thinks it's on timer > > B's list. Or not on a list at all. > > > > Question is: what sequence of timer interace calls could have caused this > > to occur? And can we add a check for that bug at the time where it occurs, > > rather later on in the timer interrupt handler? > > Most probably the pending timer was corrupted. Say it was freed/reused > without del_timer(), or re-initialized. > > Marco, could you try this patch > http://bugzilla.kernel.org/attachment.cgi?id=14183 > ? > > see also http://bugzilla.kernel.org/attachment.cgi?id=14183 Argh. It can't be applied because of "time: clean hungarian notation from timers" commit a6fa8e5a6172a5a5bc06ed04f34e50b36c978127 Please find the re-diff below. hopefully it still works. but it doesn't like CONFIG_HOTPLUG_CPU. Oleg. --- MM/include/linux/timer.h~TMR_DBG 2008-02-17 23:40:09.000000000 +0300 +++ MM/include/linux/timer.h 2008-02-26 04:07:15.000000000 +0300 @@ -8,6 +8,7 @@ struct tvec_base; struct timer_list { + void (*next_func)(unsigned long); struct list_head entry; unsigned long expires; --- MM/kernel/timer.c~TMR_DBG 2008-02-17 23:41:28.000000000 +0300 +++ MM/kernel/timer.c 2008-02-26 04:14:15.000000000 +0300 @@ -58,12 +58,19 @@ EXPORT_SYMBOL(jiffies_64); #define TVN_MASK (TVN_SIZE - 1) #define TVR_MASK (TVR_SIZE - 1) +struct xxx { + void (*next_func)(unsigned long); + struct list_head list; +}; + +#define tox(p) list_entry((p), struct timer_list, entry) + struct tvec { - struct list_head vec[TVN_SIZE]; + struct xxx vec[TVN_SIZE]; }; struct tvec_root { - struct list_head vec[TVR_SIZE]; + struct xxx vec[TVR_SIZE]; }; struct tvec_base { @@ -256,7 +263,7 @@ static void internal_add_timer(struct tv { unsigned long expires = timer->expires; unsigned long idx = expires - base->timer_jiffies; - struct list_head *vec; + struct xxx *vec; if (idx < TVR_SIZE) { int i = expires & TVR_MASK; @@ -291,7 +298,9 @@ static void internal_add_timer(struct tv /* * Timers are FIFO: */ - list_add_tail(&timer->entry, vec); + list_add_tail(&timer->entry, &vec->list); + timer->next_func = tox(timer->entry.next)->function; + tox(timer->entry.prev)->next_func = timer->function; } #ifdef CONFIG_TIMER_STATS @@ -351,6 +360,7 @@ static inline void detach_timer(struct t { struct list_head *entry = &timer->entry; + tox(entry->prev)->next_func = timer->next_func; __list_del(entry->prev, entry->next); if (clear_pending) entry->next = NULL; @@ -594,15 +604,22 @@ static int cascade(struct tvec_base *bas /* cascade all the timers from tv up one level */ struct timer_list *timer, *tmp; struct list_head tv_list; + void (*func)(unsigned long) = tv->vec[index].next_func; - list_replace_init(tv->vec + index, &tv_list); + list_replace_init(&tv->vec[index].list, &tv_list); /* * We are removing _all_ timers from the list, so we * don't have to detach them individually. */ list_for_each_entry_safe(timer, tmp, &tv_list, entry) { - BUG_ON(tbase_get_base(timer->base) != base); + if (tbase_get_base(timer->base) != base || timer->function != func) { + print_symbol(KERN_CRIT "ERR!! 1 %s\n", (unsigned long)func); + print_symbol(KERN_CRIT "ERR!! 2 %s\n", (unsigned long)timer->function); + printk(KERN_CRIT "ERR!! 3 %p %p\n", base, timer->base); + break; + } + func = timer->next_func; internal_add_timer(base, timer); } @@ -624,8 +641,8 @@ static inline void __run_timers(struct t spin_lock_irq(&base->lock); while (time_after_eq(jiffies, base->timer_jiffies)) { - struct list_head work_list; - struct list_head *head = &work_list; + struct xxx work_list; + struct list_head *head = &work_list.list; int index = base->timer_jiffies & TVR_MASK; /* @@ -637,7 +654,7 @@ static inline void __run_timers(struct t !cascade(base, &base->tv4, INDEX(2))) cascade(base, &base->tv5, INDEX(3)); ++base->timer_jiffies; - list_replace_init(base->tv1.vec + index, &work_list); + list_replace_init(&base->tv1.vec[index].list, &work_list.list); while (!list_empty(head)) { void (*fn)(unsigned long); unsigned long data; @@ -1264,13 +1281,13 @@ static int __cpuinit init_timers_cpu(int spin_lock_init(&base->lock); for (j = 0; j < TVN_SIZE; j++) { - INIT_LIST_HEAD(base->tv5.vec + j); - INIT_LIST_HEAD(base->tv4.vec + j); - INIT_LIST_HEAD(base->tv3.vec + j); - INIT_LIST_HEAD(base->tv2.vec + j); + INIT_LIST_HEAD(&base->tv5.vec[j].list); + INIT_LIST_HEAD(&base->tv4.vec[j].list); + INIT_LIST_HEAD(&base->tv3.vec[j].list); + INIT_LIST_HEAD(&base->tv2.vec[j].list); } for (j = 0; j < TVR_SIZE; j++) - INIT_LIST_HEAD(base->tv1.vec + j); + INIT_LIST_HEAD(&base->tv1.vec[j].list); base->timer_jiffies = jiffies; return 0;
On Mon, 25 Feb 2008, bugme-daemon@bugzilla.kernel.org wrote: > > if we're going to detect some bug, we shold provide _some_ information > telling the poor programmer what he did wrong! This one is very obscure. > > Seems we found a timer on CPU A's list, but the timer thinks it's on timer > B's list. Or not on a list at all. The timer was enqueued while some stupid code called init_timer() > Question is: what sequence of timer interace calls could have caused this > to occur? And can we add a check for that bug at the time where it occurs, > rather later on in the timer interrupt handler? I'm looking into that, but it's pretty hard to detect that in init_timer() reliably. We had this other problem with bluetooth as well, where a timer was not deleted before the data structure which contained the timer was freed. The problem in both cases is that the timer list is corrupted and we have no chance to detect it _before_ the shit hits the fan. Thanks, tglx
Hi all! Yesterday evening I've tried using the kernel with Thomas' patch for about 5 hours, but, sorry, it didn't crash! This evening (here at work I can't try it because I don't have my PC) I'll try your latest patch and I will provide you a feedback. Thanks
I've reproduced the bug (this time in a workqueue method called during a sw interrupt), but this time I have no debug output, because log wasn't flushed to disk and, this time, after some minutes, it started printing continously on screen this message: zd1211rw 4-5:1.0: Could not allocate skb. Now I'm going to try your latest patch. Another question: is it normal the wireless driver prints periodically (about every 30 seconds) the message SoftMAC: Open Authentication completed with 00:01:38:8e:5f:43 where it's printed the AP MAC ? THX
> I've reproduced the bug (this time in a workqueue method called during a sw > interrupt), but this time I have no debug output, because log wasn't flushed > to > disk and, this time, after some minutes, it started printing continously on > screen this message: Hang on for a couple of minutes. I created a debug patch, which should allow the box to survive and tell us exactly where the wreckage happens. I'm right now testing it myself with a couple of known timer wreckage variants to make sure that it works. Thanks, tglx
Created attachment 15017 [details] (timer) objects debug facility Please remove the previous debug patches and apply this one. Enable CONFIG_DEBUG_OBJECT_OPS and CONFIG_DEBUG_OBJECT_TIMERS CONFIG_DEBUG_OBJECT_FREE is optional (I guess your problem is covered by the two above options already) Thanks, tglx
Created attachment 15018 [details] (timer) objects debug facility v2 Doh, forgot to refresh the patch before uploading.
Created attachment 15019 [details] (timer) objects debug facility v3(aka picked-the-right-file-this-time) /me feels really stupid I really should stay away from GUI tools, which require to select a file per mouse click, when I'm tired. I have not found a sane way to create a bugzilla attachment via mail :( Pointers are welcome ! Sorry for the noise. tglx
Hi! I've tried your patch but it crashed again!!! Sorry, I don't have any debug output because yesterday evening I had not too much time to reproduce the bug again. I hope this evening I could give you more informations. Can you tell me what can I do to help you more? ( apart from writing in a better english ;) )
On 02/28, bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10068 > > > > > > ------- Comment #13 from zacmarco@yahoo.it 2008-02-28 01:36 ------- > Hi! > I've tried your patch but it crashed again!!! Do you mean you still see the same BUG_ON() with the Thomas'patch applied? In that case, perhaps you can try the patch I sent. It is not as generic as Thomas's, it is just a quick dirty hack to catch this particular BUG(). BTW, thanks a lot for your efforts ;) Oleg.
Ok. Here's the BUG_ON() output (with Thomas' patch): BUG: unable to handle kernel NULL pointer dereference at virtual address 00000014 printing eip: c0123781 *pde = 00000000 The call stack seems the same as other tests. Yesterday I tried to patch sources with Oleg's patch but the "patch" command gave me an error (like if starting file was not the same). May I patch the official 2.6.24.2 file with your patch? By the way, in a few minutes I'll try to re-patch it (now, I'm VERY VERY VERY sorry to work on Windows, It's the only way I could stay connected).
On Thu, 28 Feb 2008, bugme-daemon@bugzilla.kernel.org wrote: > ------- Comment #15 from zacmarco@yahoo.it 2008-02-28 12:55 ------- > Ok. > Here's the BUG_ON() output (with Thomas' patch): > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 00000014 > printing eip: c0123781 *pde = 00000000 Which CONFIG options did you enable ? Thanks, tglx
On 02/28, bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10068 > > > > > > ------- Comment #15 from zacmarco@yahoo.it 2008-02-28 12:55 ------- > > Yesterday I tried to patch sources with Oleg's patch but the "patch" command > gave me an error (like if starting file was not the same). May I patch the > official 2.6.24.2 file with your patch? Ah, so you are using 2.6.24. In that case please use the first patch http://bugzilla.kernel.org/attachment.cgi?id=14183 sorry for the confusion! Oleg.
Patch applied but... it crashed instantaneally! For Thomas: I've tried to add the options you told me (with your patch applied), but it seems that launching the "make" command they disappeared. I've added these options on the .config . Is it right?
hmm, the disappearing probably happens because those options depend on CONFIG_DEBUG_KERNEL which is probably not set in your .config Please use either "make menuconfig" or try to add: CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_OBJECT_OPS=y CONFIG_DEBUG_OBJECT_TIMERS=y CONFIG_DEBUG_OBJECT_FREE=y If that does not help, please attach your .config. I'll fix it for you. Thanks, tglx
On 03/05, bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10068 > > ------- Comment #18 from zacmarco@yahoo.it 2008-03-05 11:42 ------- > Patch applied but... it crashed instantaneally! Not that I am really surprised, but it was tested (not by me). Could you be more verbose, what exactly happens? Please send me privately include/linux/timer.h + kernel/timer.c with this patch applied. And .config, please. (to avoid a possible confusion, Thomas's patch is better, but in case it can't catch this bug...) Oleg.
Sorry for the delay, I was busy last week. Today I've tried Thomas' patch (I had a problem on the debugobjects.c file applying the patch, but I've solved the problem cutting and pasting all code lines from the patch file). The system remains up, but I've found the following trace on system logs: ODEBUG: init active object: db112e94 timer_list WARNING: at lib/debugobjects.c:63 debug_print_object() Pid: 2023, comm: softmac Not tainted 2.6.24.2 #10 [<c01c5181>] debug_object_op+0x89/0xe0 [<c0120168>] init_timer+0x18/0x40 [<e098f813>] ieee80211softmac_auth_req+0x6b/0x9c [ieee80211softmac] [<e0991543>] ieee80211softmac_assoc_work+0x292/0x392 [ieee80211softmac] [<e0991643>] ieee80211softmac_assoc_notify_scan+0x0/0x10 [ieee80211softmac] [<e0991ab6>] ieee80211softmac_notify_callback+0x40/0x48 [ieee80211softmac] [<e0991a76>] ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac] [<e0991978>] ieee80211softmac_call_events_locked+0xdc/0xee [ieee80211softmac] [<e0991643>] ieee80211softmac_assoc_notify_scan+0x0/0x10 [ieee80211softmac] [<e0991a76>] ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac] [<c01250bf>] run_workqueue+0x6b/0xdf [<c0335f0f>] schedule+0x1f0/0x20a [<c01256b2>] worker_thread+0x0/0xc2 [<c0125766>] worker_thread+0xb4/0xc2 [<c0127baa>] autoremove_wake_function+0x0/0x33 [<c01256b2>] worker_thread+0x0/0xc2 [<c0127a4a>] kthread+0x36/0x5c [<c0127a14>] kthread+0x0/0x5c [<c0104757>] kernel_thread_helper+0x7/0x10 ======================= I hope I coul help you with this trace
On 03/09, bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=10068 > > ------- Comment #21 from zacmarco@yahoo.it 2008-03-09 15:53 ------- > Sorry for the delay, I was busy last week. > Today I've tried Thomas' patch (I had a problem on the debugobjects.c file > applying the patch, but I've solved the problem cutting and pasting all code > lines from the patch file). The system remains up, but I've found the > following > trace on system logs: > > ODEBUG: init active object: db112e94 timer_list > WARNING: at lib/debugobjects.c:63 debug_print_object() > Pid: 2023, comm: softmac Not tainted 2.6.24.2 #10 > [<c01c5181>] debug_object_op+0x89/0xe0 > [<c0120168>] init_timer+0x18/0x40 > [<e098f813>] ieee80211softmac_auth_req+0x6b/0x9c [ieee80211softmac] > [<e0991543>] ieee80211softmac_assoc_work+0x292/0x392 [ieee80211softmac] > [<e0991643>] ieee80211softmac_assoc_notify_scan+0x0/0x10 [ieee80211softmac] > [<e0991ab6>] ieee80211softmac_notify_callback+0x40/0x48 [ieee80211softmac] > [<e0991a76>] ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac] > [<e0991978>] ieee80211softmac_call_events_locked+0xdc/0xee > [ieee80211softmac] > [<e0991643>] ieee80211softmac_assoc_notify_scan+0x0/0x10 [ieee80211softmac] > [<e0991a76>] ieee80211softmac_notify_callback+0x0/0x48 [ieee80211softmac] > [<c01250bf>] run_workqueue+0x6b/0xdf > [<c0335f0f>] schedule+0x1f0/0x20a > [<c01256b2>] worker_thread+0x0/0xc2 > [<c0125766>] worker_thread+0xb4/0xc2 > [<c0127baa>] autoremove_wake_function+0x0/0x33 > [<c01256b2>] worker_thread+0x0/0xc2 > [<c0127a4a>] kthread+0x36/0x5c > [<c0127a14>] kthread+0x0/0x5c > [<c0104757>] kernel_thread_helper+0x7/0x10 > ======================= > > I hope I coul help you with this trace Thanks a lot! this does help. might be related to [Bug 8937] BUG prempt in workqueue.c http://bugzilla.kernel.org/show_bug.cgi?id=8937 Oleg.
(In reply to comment #21) > Sorry for the delay, I was busy last week. > Today I've tried Thomas' patch (I had a problem on the debugobjects.c file > applying the patch, but I've solved the problem cutting and pasting all code > lines from the patch file). The system remains up, but I've found the > following > trace on system logs: Yep, that's the intention of the patch to keep the system alive and point out the place where the problem happens at the same time. > ODEBUG: init active object: db112e94 timer_list That's what I suspected in http://bugzilla.kernel.org/show_bug.cgi?id=10068#c1 > WARNING: at lib/debugobjects.c:63 debug_print_object() > I hope I coul help you with this trace Yes, it should give the ieee80211 developer enough information to fix it. Thanks, tglx
Hi all. I'd like to know if you have news on this bug. Because of it seems related to ieee80211 driver, is there a related bug on that area? Thanks a lot
I'm sorry, but personally I'm being drowned in work (and in kernel stuff, mac80211 is really keeping me busy enough) and don't have time to fix bugs in ieee80211 right now, especially considering that ieee80211 has been removed in 2.6.25. I apologise.
Marco, does 2.6.25-rc7 work for you?
I've tried the 2.6.25-rc8 kernel. In .config, I've enabled the MAC80211 and disabled the softmac. Following the kernel log, it seems ok ... usb 4-5: new high speed USB device using ehci_hcd and address 3 usb 4-5: configuration #1 chosen from 1 choice usb 4-5: reset high speed USB device using ehci_hcd and address 3 zd1211rw 4-5:1.0: phy1 Apr 10 19:23:22 ZacMobile kernel: usb 4-5: new high speed USB device using ehci_hcd and address 3 Apr 10 19:23:22 ZacMobile kernel: usb 4-5: configuration #1 chosen from 1 choice Apr 10 19:23:22 ZacMobile kernel: usb 4-5: reset high speed USB device using ehci_hcd and address 3 Apr 10 19:23:22 ZacMobile kernel: zd1211rw 4-5:1.0: phy1 udev: renamed network interface wmaster0 to eth2 Apr 10 19:23:22 ZacMobile kernel: udev: renamed network interface wmaster0 to eth2 ... but if i give an iwconfig command, it results that eth2 has not wireless extension, while it exists another interface, named wmaster0_renamed I can't bring up any of the two interfaces.
Udev is misnaming your interfaces. Often this can be fixed by simply deleting /etc/udev/rules.d/70-persistent-net.rules and letting udev recreate it after a reboot.
Ok, I've done it and all seems to work (now I have to verify that it will not crash!). Only a question: is there a method for renaming the if name? It seems not to work after a rename (throug udev). Thanks a lot!
System stays up! EUREKA! :) Thank you so much!!!
So... what about the status of this BUG?
This is my interpretation of the status...correct me if I'm wrong... :-)