Bug 8937

Summary:	BUG prempt in workqueue.c
Product:	Process Management	Reporter:	Christian Casteyde (casteyde.christian)
Component:	Preemption	Assignee:	Larry Finger (Larry.Finger)
Status:	CLOSED CODE_FIX
Severity:	normal	CC:	Larry.Finger, linville, mb, oleg
Priority:	P1
Hardware:	All
OS:	Linux
Kernel Version:	2.6.22.5 + cfs20	Subsystem:
Regression:	---	Bisected commit-id:
Attachments:	test patch to detect corrupted work_structs Final version of patch to handle cancellation of work queues Proposed patch for problem (new revision) Trial patch for network freeze console photo at boot crash

Description Christian Casteyde 2007-08-25 03:38:06 UTC

Most recent kernel where this bug did not occur: unknown
Distribution: Slackware 12.0
Hardware Environment: Athlon64 3000, nForce3, Aspire 1511LMi
Software Environment: linux/crash at boot
Problem Description:

I got once a kernel BUG at boot in workqueue.c, which blocked the computer. However, syslog had the time to log the following dump:

-- parts of dmesg --
Aug 25 12:08:43 athor kernel:  BIOS-e820: 0000000000000000 - 000000000009b800 (usable)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 000000000009b800 - 00000000000a0000 (reserved)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 00000000000d2000 - 0000000000100000 (reserved)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 0000000000100000 - 000000004ff70000 (usable)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 000000004ff70000 - 000000004ff7f000 (ACPI data)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 000000004ff7f000 - 000000004ff80000 (ACPI NVS)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 000000004ff80000 - 0000000050000000 (reserved)
Aug 25 12:08:43 athor kernel:  BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
Aug 25 12:08:43 athor kernel: end_pfn_map = 1048576
Aug 25 12:08:43 athor kernel: ACPI: RSDP 000F6EA0, 0014 (r0 PTLTD )
Aug 25 12:08:43 athor kernel: ACPI: RSDT 4FF792FD, 0034 (r1 PTLTD    RSDT    6040000  LTP        0)
Aug 25 12:08:43 athor kernel: ACPI: FACP 4FF7EE3E, 0074 (r1 NVIDIA CK8       6040000 PTL_    F4240)
Aug 25 12:08:43 athor kernel: ACPI: DSDT 4FF79331, 5B0D (r1 NVIDIA      CK8  6040000 MSFT  100000E)
Aug 25 12:08:43 athor kernel: ACPI: FACS 4FF7FFC0, 0040
Aug 25 12:08:43 athor kernel: ACPI: APIC 4FF7EEB2, 0050 (r1 NVIDIA NV_APIC_  6040000  LTP        0)
Aug 25 12:08:43 athor kernel: ACPI: SSDT 4FF7EF02, 00D6 (r1 PTLTD  POWERNOW  6040000  LTP        1)
Aug 25 12:08:43 athor kernel: ACPI: BOOT 4FF7EFD8, 0028 (r1 PTLTD  $SBFTBL$  6040000  LTP        1)
Aug 25 12:08:43 athor kernel: Zone PFN ranges:
Aug 25 12:08:43 athor kernel:   DMA             0 ->     4096
Aug 25 12:08:43 athor kernel:   DMA32        4096 ->  1048576
Aug 25 12:08:43 athor kernel:   Normal    1048576 ->  1048576
Aug 25 12:08:43 athor kernel: early_node_map[2] active PFN ranges
Aug 25 12:08:43 athor kernel:     0:        0 ->      155
Aug 25 12:08:43 athor kernel:     0:      256 ->   327536
Aug 25 12:08:43 athor kernel: Built 1 zonelists.  Total pages: 321465
Aug 25 12:08:43 athor kernel: Initializing CPU#0
Aug 25 12:08:43 athor kernel: PID hash table entries: 4096 (order: 12, 32768 bytes)
Aug 25 12:08:43 athor kernel: Console: colour VGA+ 80x25
Aug 25 12:08:43 athor kernel: Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Aug 25 12:08:43 athor kernel: Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
Aug 25 12:08:43 athor kernel: Checking aperture...
Aug 25 12:08:43 athor kernel: CPU 0: aperture @ d8000000 size 128 MB
Aug 25 12:08:43 athor kernel: Memory: 1282320k/1310144k available (3739k kernel code, 27060k reserved,
 1679k data, 200k init)
Aug 25 12:08:43 athor kernel: Calibrating delay using timer specific routine.. 3593.81 BogoMIPS (lpj=5
986903)
Aug 25 12:08:43 athor kernel: Mount-cache hash table entries: 256
Aug 25 12:08:43 athor kernel: CPU: AMD Athlon 64 Processor 3000+ stepping 0a
Aug 25 12:08:43 athor kernel: result 12467786
Aug 25 12:08:43 athor kernel: PCI: Bus #03 (-#06) is hidden behind  bridge #02 (-#02) (try 'pci=assign
-busses')
Aug 25 12:08:43 athor kernel: Please report the result to linux-kernel to fix this permanently
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK1] (IRQs 16 17 18 19) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK2] (IRQs 16 17 18 19) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK3] (IRQs 16 17 18 19) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK4] (IRQs 16 17 18 19) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK5] (IRQs 16 17 18 19) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LSMB] (IRQs 20 21) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LUS0] (IRQs 20 21) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LUS1] (IRQs 20 21) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LUS2] (IRQs 20 21) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LMAC] (IRQs 20 21) *0, disabled.
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LACI] (IRQs 22) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LMCI] (IRQs 20 21) *0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LPID] (IRQs 20 21) *0, disabled.
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LTID] (IRQs 20 21) *0, disabled.
Aug 25 12:08:43 athor kernel: PCI: Cannot allocate resource region 0 of device 0000:00:00.0
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK5] enabled at IRQ 19
Aug 25 12:08:43 athor kernel: PCI: Failed to allocate mem resource #10:4000000@d4000000 for 0000:02:07
.0
Aug 25 12:08:43 athor kernel: PCI: Bus 3, cardbus bridge: 0000:02:07.0
Aug 25 12:08:43 athor kernel:   IO window: 00003000-000030ff
Aug 25 12:08:43 athor kernel:   IO window: 00003400-000034ff
Aug 25 12:08:43 athor kernel:   PREFETCH window: 60000000-63ffffff
Aug 25 12:08:43 athor kernel: PCI: Failed to allocate mem resource #6:20000@f0000000 for 0000:01:00.0
Aug 25 12:08:43 athor kernel: PCI: Enabling device 0000:02:07.0 (0106 -> 0107)
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK3] enabled at IRQ 18
Aug 25 12:08:43 athor kernel: IP route cache hash table entries: 65536 (order: 7, 524288 bytes)
Aug 25 12:08:43 athor kernel: TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
Aug 25 12:08:43 athor kernel: TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
Aug 25 12:08:43 athor kernel: fuse init (API version 7.8)
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LMCI] enabled at IRQ 21
Aug 25 12:08:43 athor kernel: Marking TSC unstable due to possible TSC halt in C2
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK2] enabled at IRQ 17
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LNK4] enabled at IRQ 16
Aug 25 12:08:43 athor kernel: NFORCE3-150: BIOS didn't set cable bits correctly. Enabling workaround.
Aug 25 12:08:43 athor kernel: hda: IC25N060ATMR04-0, ATA DISK drive
Aug 25 12:08:43 athor kernel: ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Aug 25 12:08:43 athor kernel: hdc: MATSHITADVD-RAM UJ-820S, ATAPI CD/DVD-ROM drive
Aug 25 12:08:43 athor kernel: ide1 at 0x170-0x177,0x376 on irq 15
Aug 25 12:08:43 athor kernel: NOTE: The dv1394 driver is unsupported and may be removed in a future Li
nux release. Use raw1394 instead.
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LUS2] enabled at IRQ 20
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LUS0] enabled at IRQ 21
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LUS1] enabled at IRQ 20
Aug 25 12:08:43 athor kernel: psmouse.c: Failed to enable mouse on isa0060/serio1
Aug 25 12:08:43 athor kernel: ACPI: PCI Interrupt Link [LACI] enabled at IRQ 22
Aug 25 12:08:43 athor kernel: nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
Aug 25 12:08:43 athor kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Aug 25 12:08:43 athor kernel: VFS: Mounted root (ext3 filesystem) readonly.

-- fsck has not be done yet, since the fs is RO --
-- here is the bug, ho là là ! --

Aug 25 12:08:49 athor kernel: ------------[ cut here ]------------
Aug 25 12:08:49 athor kernel: kernel BUG at kernel/workqueue.c:258!
Aug 25 12:08:49 athor kernel: invalid opcode: 0000 [1] PREEMPT
Aug 25 12:08:49 athor kernel: CPU 0
Aug 25 12:08:49 athor kernel: Modules linked in:
Aug 25 12:08:49 athor kernel: Pid: 4, comm: events/0 Not tainted 2.6.22.5-cfs-v20.3 #4
Aug 25 12:08:49 athor kernel: RIP: 0010:[<ffffffff80240bea>]  [<ffffffff80240bea>] run_workqueue+0x13a
/0x1a0
Aug 25 12:08:49 athor kernel: RSP: 0018:ffff81004ff0deb0  EFLAGS: 00010286
Aug 25 12:08:49 athor kernel: RAX: 0000000000000000 RBX: ffff81004eb776f0 RCX: 0000000000110c8c
Aug 25 12:08:49 athor kernel: RDX: ffff81004c23dd70 RSI: ffff81004ff00040 RDI: ffff81004f50f790
Aug 25 12:08:49 athor kernel: RBP: ffff81004eb776e8 R08: ffff81004ff0c000 R09: 00000000ffffffff
Aug 25 12:08:49 athor kernel: R10: 0000000000000093 R11: 00000000ffffffff R12: ffff81004ff671c0
Aug 25 12:08:49 athor kernel: R13: ffff81004eb776f8 R14: 0000000000000000 R15: 0000000000000000
Aug 25 12:08:49 athor kernel: FS:  00002aac2a0d0d30(0000) GS:ffffffff8074b000(0000) knlGS:000000000000
0000
Aug 25 12:08:49 athor kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Aug 25 12:08:49 athor kernel: CR2: 00007fff8a62aef8 CR3: 000000004c7ab000 CR4: 00000000000006e0
Aug 25 12:08:49 athor kernel: Process events/0 (pid: 4, threadinfo ffff81004ff0c000, task ffff81004ff0
0040)
Aug 25 12:08:49 athor kernel: Stack:  ffffffff80241260 ffff81004ff671d0 ffff81004ff671c0 ffffffff80241
260
Aug 25 12:08:49 athor kernel:  0000000000000000 ffffffff80241323 0000000000000000 ffff81004ff00040
Aug 25 12:08:49 athor kernel:  ffffffff80244cc0 ffff81004ff0def8 ffff81004ff0def8 0000000000000000
Aug 25 12:08:49 athor kernel: Call Trace:
Aug 25 12:08:49 athor kernel:  [<ffffffff80241260>] worker_thread+0x0/0x130
Aug 25 12:08:49 athor kernel:  [<ffffffff80241260>] worker_thread+0x0/0x130
Aug 25 12:08:49 athor kernel:  [<ffffffff80241323>] worker_thread+0xc3/0x130
Aug 25 12:08:49 athor kernel:  [<ffffffff80244cc0>] autoremove_wake_function+0x0/0x30
Aug 25 12:08:49 athor kernel:  [<ffffffff80241260>] worker_thread+0x0/0x130
Aug 25 12:08:49 athor kernel:  [<ffffffff8024485b>] kthread+0x4b/0x80
Aug 25 12:08:49 athor kernel:  [<ffffffff8020a388>] child_rip+0xa/0x12
Aug 25 12:08:49 athor kernel:  [<ffffffff80244810>] kthread+0x0/0x80
Aug 25 12:08:49 athor kernel:  [<ffffffff8020a37e>] child_rip+0x0/0x12
Aug 25 12:08:49 athor kernel:
Aug 25 12:08:49 athor kernel:
Aug 25 12:08:49 athor kernel: Code: 0f 0b eb fe 66 90 65 48 8b 34 25 00 00 00 00 8b 8e 38 01 00
Aug 25 12:08:49 athor kernel: RIP  [<ffffffff80240bea>] run_workqueue+0x13a/0x1a0
Aug 25 12:08:49 athor kernel:  RSP <ffff81004ff0deb0>

-- forced to reboot here --

I only saw this bug once, while regtesting another bug (8934), with 2.6.22.5-cfs. 2.6.22 was used quite a lot before. 2.6.23-rc was booted several times also, without the problem, but I cannot quarantee anything.

I still have the kernel image and System.map if you want.

Steps to reproduce:
Well, I got it only once, but it is in core components...

Comment 1 Oleg Nesterov 2007-08-25 06:12:08 UTC

Perhaps someone does INIT_WORK(pending_work).

Could you look at kernel/workqueue.s? perhaps some dumped register
holds the adress of work->func.

Comment 2 Christian Casteyde 2007-08-25 07:10:55 UTC

You want to say someone does NOT call INIT_WORK ?

Sorry, I don't have any workqueue.s file (only workqueue.o).
If I understand well, you want me to look in the kernel map
if I can find the closest function that could be work->func,
assuming this value is in a register?

I don't think I can, since all registers are in ffffXXXX and all symbols
in System.map are in the form ffffffff8XXX. The only pointers like
that are in the call stack, and already dumped in the log
(worker_thread and co).

The kernel was built with debug info, but ksymoops seems to
tell that no symbols from System.map can be found in vmlinux.
I won't be able to get work->anything, even if I could get the
value of "work", since I do not have a kernel dump and have no
dump of the memory referenced by work.

Seems to be difficult to tell who owns that workqueue (except
the process in the log: events/0).

Btw, it is not a preempt bug, sorry, I didn't read the oops
carefully enough. This bug should be reclassified. Moreover,
it is certainly not so early in the boot process, because
a. syslog was started (I found the oops in it).
b. when the system boots OK, there is not so much after the log of root mounted RO (in fact, the next log is a kdm message, so it may be anything in between).

Comment 3 Christian Casteyde 2007-08-25 07:17:52 UTC

I've just found interesting stuff: If I correlate the timestamps from syslog and messages log file, I can see that just before crashing, the system was doing the wireless configuration:

Aug 25 12:08:48 athor logger: /etc/rc.d/rc.inet1:  /sbin/iwconfig eth1 nick athor
Aug 25 12:08:48 athor logger: /etc/rc.d/rc.inet1:  /sbin/iwconfig eth1 essid "any"
Aug 25 12:08:48 athor logger: /etc/rc.d/rc.inet1: /sbin/ifconfig eth1 down
-- last info added to hunt a broadcom driver bug I was regtesting --
-- reboot --
Aug 25 12:10:19 athor syslogd 1.4.1: restart.

I have many (integration) problems with the wireless broadcom driver (see 8790 - SLUB pb, 8934 - console killed in a race condition when if is done while connecting to an "any" AP, only present in -rc3 and therefore maybe not the fault of the broadcom wireless driver neither).
So I was trying to see if 8934 was present in 2.6.22.5 (and 23-rc?) and just concluded that no.
But as ifconfig down kills the console as said in 8934, and here simply is the last thing done (and not terminated, I normally have another log after) before a complete panic, I'm enclined to say those bugs are related (if not he same, but the behaviour is different here: complete panic).

So if anything is wrong anywhere with workqueues, I would look at broadcom wireless driver (I used to use ndiswrapper and obviously, the native driver is not iron yet/has integration pb, that's certainly the reason I didn't had all these problems before. Moreover the card only associate to any AP at my home since this is the only place I can reach one...).

Comment 4 Oleg Nesterov 2007-08-25 08:33:28 UTC

Christian Casteyde wrote:
>
> You want to say someone does NOT call INIT_WORK ?

No, I meant (but this is just shot in the dark) something like

        INIT_WORK(&work, func);
        queue_work(&work);

        ...

        // it is still pending
        INIT_WORK(&work, func);

in that case we clear work->data. This has other bad implications of course,
but the first bug we hit is kernel/workqueue.c:258.

> I don't think I can, since all registers are in ffffXXXX and all symbols
> in System.map are in the form ffffffff8XXX.

OK, thanks.

> The only pointers like that are in the call stack, and already dumped in
> the log (worker_thread and co)

I think this trace is not accurate, and there is no anything which looks
like work_func_t here. However, the adress of work->func should be saved
on stack if none of registers has it.

> Seems to be difficult to tell who owns that workqueue (except
> the process in the log: events/0).

Nobody, this is keventd_wq.

Still, could you do "make kernel/workqueue.s" or at least
"objdump -d kernel/workqueue.o" ? Please send me privately. I doubt
very much I'll be able to understand it, but just in case.

Comment 5 Oleg Nesterov 2007-08-26 04:23:54 UTC

Thanks Christian!

Now this really looks like work_struct's memory was overwritten while the work
was pending. (of course, there are other possible reasons).

work->data == 0 (RAX), this was the source of BUG_ON(get_wq_data(work) != cwq).
I don't think this is possible due to some bug in workqueue.c, insert_work() is
very simple and its caller should crash if cwq was == NULL.

work->func == R13, but this is not the address of function. RBP == &work, so
work->func == &work->entry.prev (R13 - RBP == 16).

This means that it wasn't INIT_WORK() who corrupted the pending work, probably
the memory was freed/reused.

> So if anything is wrong anywhere with workqueues, I would look at broadcom
> wireless driver

Do you mean drivers/net/wireless/bcm43xx/ ? Of course, I don't understand the
code, but after the very brief look...

        /* This is the opposite of bcm43xx_init_board() */
        static void bcm43xx_free_board(struct bcm43xx_private *bcm)
        {
                bcm43xx_rng_exit(bcm);
                bcm43xx_sysfs_unregister(bcm);
                bcm43xx_periodic_tasks_delete(bcm);

So we stopped bcm->periodic_work. But this delayed_work could be restarted
by bcm->restart_work if it is pending.

                mutex_lock(&(bcm)->mutex);
                bcm43xx_shutdown_all_wireless_cores(bcm);

This sets bcm43xx_set_status(BCM43xx_STAT_SHUTTINGDOWN) which is checked
by bcm43xx_chip_reset() (bcm->restart_work->func), but it is too late.

                bcm43xx_pctl_set_crystal(bcm, 0);
                mutex_unlock(&(bcm)->mutex);
        }

No?

The usage bcm->restart_work looks a bit suspicious to me. The driver never
cancels this work explicitely, probably relying on bcm43xx_net_stop() doing
flush_scheduled_work(), but in that case the race above looks really possible.

This work is started by

        void bcm43xx_controller_restart(struct bcm43xx_private *bcm, const char *reason)
        {
                if (bcm43xx_status(bcm) != BCM43xx_STAT_INITIALIZED)
                        return;
                printk(KERN_ERR PFX "Controller RESET (%s) ...\n", reason);
                INIT_WORK(&bcm->restart_work, bcm43xx_chip_reset);
                schedule_work(&bcm->restart_work);
        }

Note that it does INIT_WORK(&bcm->restart_work). This means that the caller
_must_ ensure this work is not pending, but I can't see how this is guaranteed.

(I am not sure I understand how to add maintainers to CC: list, trying...)

Comment 6 Christian Casteyde 2007-08-26 06:14:13 UTC

From my personnal experience (not from Linux, but it's the same kind of thing), problems in generic list of tasks to execute asynchronously happen generally from a misuse, especially:
a. callbacked objects destroyed while pending task are running;
b. callbacked objects not totally initialized yet;
c. reentrancy;
d. deadlock at termination (when removing tasks already running in the workqueue).
So I'm not astonished, and effectively I don't think workqueue.c is at stage here. The BUG is really an assert here, and if work is NULL, there is a problem in the code that posted the task - or removed it / deallocated it. Saddly, this kernel was just rebuilt without runtime checks because it proved to work for a while... When it was ran with checks, there was strictly no bug trigger, but the problem only happen once after all.

I was effectively referring to driver/net/wireless/bcm43xx. It's the driver I have some problem to get it working (up to now, it was more a victim than a the source of the problem, and indeed it works well up to 2.6.22, but it's certainly not perfect and the current bug seems related to it, if I consider my logs: the last two things done were iwlist essid any and ifconfig down).

The slackware boot process is quite linear, the only thing occuring in the background may be disk I/O (mainly due to ldconfig & and fc-cache & started before). This causes many preempt in the kernel and indeed I got some bugs (with SLUB, but now it is fixed). Actually, this workload makes the other boot commands slower and are therefore more a stabilizing factor, because it removes the race conditions I encountered in the network scripts (I suspect I would get pb 8934 at each boot if it was strictly synchronous). So, even if anything could occur meanwhile (ACPI events, hardware IRQ, etc.), I'm enclined to think that the problem is indeed in the net/wireless layers and that this bug occurrence is really exceptionnal and due to a special workload I got once.

For your analysis, I didn't dived into the bcm driver code since I'm not used to Linux kernel programmation. However, if this is the ifconfig eth1 down that triggers the bug, this may really be the cause of a call to bcm43xx_net_stop.

In such a case, the scheduled work should be flushed, and everything should be nice since it wouldn't be erased / remove while running.
But of course, it is also a matter of lock:
- either there is no bug there;
- either bcm takes a lock used by the job thread to call flush_scheduled_work, and can deadlock with the working thread (that may explain 8934);
- either there is data corruption (this bug?).

By the way, I never saw logs such as "Controller RESET" in my logs, so the periodic job seems to be quite fixed. This does not help. Maybe syslog didn't get the time to flush that. But, apart from ifconfig down, the driver is more initializing / trying to associate here. I think the job was started in the normal init process (not a restart), and never terminated. free_board name make me more think to a function called at module removal for instance than a classic job call. The driver is built statically in the kernel. So I'm not sure this is there where the pb resides (my 2 cents of course).

But I agree, if a reset occurs, I don't know what would ensure that the previous work is over (maybe the state of the driver: the reset may only be triggered when we are sure the job is over or not runnable).

I'm not sure that we will find what happened. But since the problem is in the same conditions than the bug 8934, it really may be fixed in the same time when this one will be killed (repoduced at 90% on my laptop and quite dichotomized => it is half dead I think). If I were you, I would postpone this one. Existing locking problems can be fixed of course.

Comment 7 Larry Finger 2007-08-26 10:26:21 UTC

I recently became the maintainer of bcm43xx to supervise its EOL while the main development switched to an implementation using mac80211 rather than SoftMAC. As a result, I'm not very familiar with the usage of work queues, and where this code might violate the rules. One thing to note is that very little in the driver's interactions with the kernel has changed since 2.6.18, and this problem has not been reported by others. I suspect that the SLUB error that led to bug 8934 may also have been a contributor here.

As the possibility exists that this bug really is a problem with bcm43xx work queue handling, and as the driver will probably stay in mainline until 2.6.25, I would like to suggest some potential fixes.

If I understood the comment about delayed work being restarted, would this patch help?

 void bcm43xx_periodic_tasks_delete(struct bcm43xx_private *bcm)
 {
        cancel_rearming_delayed_work(&bcm->periodic_work);
+       flush_scheduled_work();
 }

This change has been compile tested only.

IMHO, we do not need to worry about the code in bcm43xx_controller_restart. That is called only when the basic programming rules of the internal microprocessor are violated, not when bringing the interface up or down. I have not seen one of those errors in many months.

Comment 8 Oleg Nesterov 2007-08-26 11:43:32 UTC

On 08/26, bugme-daemon@bugzilla.kernel.org wrote:
> 
> If I understood the comment about delayed work being restarted, would this
> patch help?
> 
>  void bcm43xx_periodic_tasks_delete(struct bcm43xx_private *bcm)
>  {
>         cancel_rearming_delayed_work(&bcm->periodic_work);
> +       flush_scheduled_work();
>  }

No. Afaics, unless we make sure that ->restart_work is not pending

	1. bcm43xx_periodic_tasks_delete() is meaningless (with or without the
	   patch above), because ->restart_work re-scheduless ->periodic_work
	   again.

	2. bcm43xx_controller_restart() is not safe, because it overwrites
	   ->restart_work via INIT_WORK().

	   (bcm43xx_periodic_tasks_setup() has a similar problem, btw).

> IMHO, we do not need to worry about the code in bcm43xx_controller_restart.
> That is called only when the basic programming rules of the internal
> microprocessor are violated, not when bringing the interface up or down. I
> have
> not seen one of those errors in many months.

Yes, I don't claim the bug is real, I don't understand the code. And, Christian
said he doesn't see "Controller RESET" in logs, so there is something else.

Oleg.

Comment 9 Larry Finger 2007-08-26 20:26:28 UTC

I am adding Michael Buesch to the CC list. He was the maintainer of bcm43xx before I got the job, and he understands the whole work queue structure much better than I do.

Larry

Comment 10 Michael Buesch 2007-08-27 06:03:43 UTC

Simple: The race with restart work does indeed seem to exist, but it does _not_ trigger here, as it would cause lots of printk messages.
In fact, restart_work is only poked when the device is under _real_ trouble. (That's basically never).
And for periodic work, we call cancel_rearming_delayed_workqueue, so we kill it on shutdown.
So I don't see how this could be caused by bcm43xx.
The stacktrace above looks pretty corrupted to me, btw. To me it looks more like a memory problem.

Comment 11 Larry Finger 2007-08-27 10:52:08 UTC

This bug looks just like the one reported in http://www.ussg.iu.edu/hypermail/linux/kernel/0708.3/0635.html. In that case, the wireless driver was iww3945, not bcm43xx. That was using 2.6.20-16-generic SMP from Ubuntu Feisty 7.04.

Whatever triggers this bug, it is clearly an improbable event and probably not related to any particular wireless driver.

Larry

Comment 12 John W. Linville 2007-08-28 06:32:23 UTC

FWIW, ipw3945 also uses softmac.  That might suggest a problem in the softmac component.

Comment 13 John W. Linville 2007-08-28 12:20:37 UTC

Christian, given that bug 8934 resolved itself w/ 2.6.23-rc4, can you try to recreate this one as well using the new kernel?

Comment 14 Christian Casteyde 2007-08-29 11:16:35 UTC

Did got two crashes at boot, but not with the same call stack. I added list debugging option, see 8934 for more details. Keep trying to reproduce and to get a full callstack, hopefully I'll get one in the next days.

... (lost due to the scroll, the second time I got it I saw bcmxxx_close or
something like that, and list_add)...
inet_ioctl+05d/0x80
sock_ioctl+0xbf/0x240
do_ioctl+0x31/0x90
vfs_ioctl+0x73/0x240
sys_ioctl+0x4a/0x80
system_call+0x7e/0x83

--> Now the second BUG()/spinlock:
BUG: spinlock lockup on CPU#0, events/0/5, ffff81004ff13000
Call Trace:
_raw_spin_lock+0x106/0x130
wq_barrier_func+0x0/0x10
_spin_lock_irq+0x3d/0x50
run_workqueue+0x80/0x190
worker_thread+0xca/0x130
autoremove_wake_function+0x0/0x40
worker_thread+0x0/0x130
kthread+0x4d/0x80
child_rip+0xa/0x12
restore_args+0x0/0x30
kthreadd+0xd4/0x160
kthread+0x0/0x80
child_rip+0x0/0x12
-- end of dump

Comment 15 Christian Casteyde 2007-08-29 13:22:15 UTC

Yeah, I got it!

Here is the full stack dump (the most I can get).

 [<ffffffff80244fba>] insert_work+0x3a/0x70
 [<ffffffff80245052>] insert_wq_barrier+0x62/0x70
 [<ffffffff802450ad>] flush_cpu_workqueue+0x4d/0x90
 [<ffffffff80245170>] wq_barrier_func+0x0/0x10
 [<ffffffff802455c5>] flush_workqueue+0x25/0x30
 [<ffffffff802455e0>] flush_scheduled_work+0x10/0x20
 [<ffffffff80402da1>] bcm43xx_net_stop+0x31/0x40
 [<ffffffff8050a7da>] dev_close+0x6a/0x90
 [<ffffffff80509789>] dev_change_flags+0x99/0x1c0
 [<ffffffff80558352>] devinet_ioctl+0x582/0x740
 [<ffffffff8050babe>] dev_ioctl+0x24e/0x380
 [<ffffffff80558cdd>] inet_ioctl+0x5d/0x80
 [<ffffffff804fcfcf>] sock_ioctl+0xbf/0x240
 [<ffffffff80295d51>] do_ioctl+0x31/0x90
 [<ffffffff80295e23>] vfs_ioctl+0x73/0x2d0
 [<ffffffff802960ca>] sys_ioctl+0x4a/0x80
 [<ffffffff8020bbee>] system_call+0x7e/0x83

BUG: scheduling while atomic: ifconfig/0x10000002/2977
INFO: lockdep is turned off.

Call Trace:
 [<ffffffff8024ffe3>] debug_show_held_locks+0x23/0x30
 [<ffffffff8022ffa2>] __schedule_bug+0x52/0x60
 [<ffffffff805afff8>] thread_return+0x165/0x29d
 [<ffffffff8022ffd4>] __cond_resched+0x24/0x50
 [<ffffffff805b06a2>] cond_resched+0x32/0x40
 [<ffffffff8027196c>] unmap_vmas+0x72c/0x770
 [<ffffffff80273fcc>] exit_mmap+0x6c/0x140
 [<ffffffff802314f7>] mmput+0x57/0xc0
 [<ffffffff802358f8>] exit_mm+0x88/0xf0
 [<ffffffff802371c5>] do_exit+0x155/0x9c0
 [<ffffffff8020d435>] die+0x55/0x70
 [<ffffffff8020d612>] do_trap+0x132/0x140
 [<ffffffff8020dbb0>] do_invalid_op+0x90/0xb0
 [<ffffffff80367e37>] __list_add+0x57/0x60
 [<ffffffff80234997>] printk+0x67/0x70
 [<ffffffff805b3b7d>] error_exit+0x0/0x96
 [<ffffffff8022eeab>] task_rq_lock+0x1b/0x30
 [<ffffffff80367e37>] __list_add+0x57/0x60
 [<ffffffff80367e37>] __list_add+0x57/0x60
 [<ffffffff80244fba>] insert_work+0x3a/0x70
 [<ffffffff80245052>] insert_wq_barrier+0x62/0x70
 [<ffffffff802450ad>] flush_cpu_workqueue+0x4d/0x90
 [<ffffffff80245170>] wq_barrier_func+0x0/0x10
 [<ffffffff802455c5>] flush_workqueue+0x25/0x30
 [<ffffffff802455e0>] flush_scheduled_work+0x10/0x20
 [<ffffffff80402da1>] bcm43xx_net_stop+0x31/0x40
 [<ffffffff8050a7da>] dev_close+0x6a/0x90
 [<ffffffff80509789>] dev_change_flags+0x99/0x1c0
 [<ffffffff80558352>] devinet_ioctl+0x582/0x740
 [<ffffffff8050babe>] dev_ioctl+0x24e/0x380
 [<ffffffff80558cdd>] inet_ioctl+0x5d/0x80
 [<ffffffff804fcfcf>] sock_ioctl+0xbf/0x240
 [<ffffffff80295d51>] do_ioctl+0x31/0x90
 [<ffffffff80295e23>] vfs_ioctl+0x73/0x2d0
 [<ffffffff802960ca>] sys_ioctl+0x4a/0x80
 [<ffffffff8020bbee>] system_call+0x7e/0x83

BUG: spinlock lockup on CPU#0, events/0/5, ffff81004ff14000

Call Trace:
 [<ffffffff80367836>] _raw_spin_lock+0x106/0x130
 [<ffffffff80245170>] wq_barrier_func+0x0/0x10
 [<ffffffff805b336d>] _spin_lock_irq+0x3d/0x50
 [<ffffffff80244e70>] run_workqueue+0x80/0x190
 [<ffffffff8024595a>] worker_thread+0xca/0x130
 [<ffffffff80249080>] autoremove_wake_function+0x0/0x40
 [<ffffffff80245890>] worker_thread+0x0/0x130
 [<ffffffff80248cad>] kthread+0x4d/0x80
 [<ffffffff8020c5d8>] child_rip+0xa/0x12
 [<ffffffff8020c190>] restore_args+0x0/0x30
 [<ffffffff80248db4>] kthreadd+0xd4/0x160
 [<ffffffff80248c60>] kthread+0x0/0x80
 [<ffffffff8020c5ce>] child_rip+0x0/0x12

I have no clue how to get the beginning of the log (before the scheduling while atomic).

Comment 16 Larry Finger 2007-08-29 18:09:58 UTC

If you can implement a serial console, you might capture that stuff. I doubt that a netconsole would help as the network, or at least the wireless part, is going down when the problem happens.

These results are highly suggestive of a race problem while shutting bcm43xx down. Michael Buesch has a patch for that section of b43, which is similar in structure, but not identical. I'm trying to adapt to bcm43xx. I'll post it here and mail it to Christian for testing. Unfortunately, I cannot duplicate it here. While I was out this afternoon, my system completed 1000 ifdown/ifup cycles. The firmware got confused about halfway through, and that led to IRQ READY timeouts, but never an oops.

Larry

Comment 17 Oleg Nesterov 2007-08-29 23:06:13 UTC

On 08/29, bugme-daemon@bugzilla.kernel.org wrote:
>
> BUG: scheduling while atomic: ifconfig/0x10000002/2977
> INFO: lockdep is turned off.
> 
> Call Trace:
>  [<ffffffff8024ffe3>] debug_show_held_locks+0x23/0x30
>  [<ffffffff8022ffa2>] __schedule_bug+0x52/0x60
>  [<ffffffff805afff8>] thread_return+0x165/0x29d
>  [<ffffffff8022ffd4>] __cond_resched+0x24/0x50
>  [<ffffffff805b06a2>] cond_resched+0x32/0x40
>  [<ffffffff8027196c>] unmap_vmas+0x72c/0x770
>  [<ffffffff80273fcc>] exit_mmap+0x6c/0x140
>  [<ffffffff802314f7>] mmput+0x57/0xc0
>  [<ffffffff802358f8>] exit_mm+0x88/0xf0
>  [<ffffffff802371c5>] do_exit+0x155/0x9c0
>  [<ffffffff8020d435>] die+0x55/0x70
>  [<ffffffff8020d612>] do_trap+0x132/0x140
>  [<ffffffff8020dbb0>] do_invalid_op+0x90/0xb0
>  [<ffffffff80367e37>] __list_add+0x57/0x60
>  [<ffffffff80234997>] printk+0x67/0x70
>  [<ffffffff805b3b7d>] error_exit+0x0/0x96
>  [<ffffffff8022eeab>] task_rq_lock+0x1b/0x30
>  [<ffffffff80367e37>] __list_add+0x57/0x60
>  [<ffffffff80367e37>] __list_add+0x57/0x60

Do you see something like "list_add corruption" in syslog?

Again, looks like the last pending work on cwq->worklist was corrupted.
Say, its memory was freed/reused without cancel/flush.

The task dies holding cwq->lock, so the next trace below (spinlock lockup
on CPU#0) is clear.

>  [<ffffffff80244fba>] insert_work+0x3a/0x70
>  [<ffffffff80245052>] insert_wq_barrier+0x62/0x70
>  [<ffffffff802450ad>] flush_cpu_workqueue+0x4d/0x90
>  [<ffffffff80245170>] wq_barrier_func+0x0/0x10
>  [<ffffffff802455c5>] flush_workqueue+0x25/0x30
>  [<ffffffff802455e0>] flush_scheduled_work+0x10/0x20
>  [<ffffffff80402da1>] bcm43xx_net_stop+0x31/0x40
>  [<ffffffff8050a7da>] dev_close+0x6a/0x90
>  [<ffffffff80509789>] dev_change_flags+0x99/0x1c0
>  [<ffffffff80558352>] devinet_ioctl+0x582/0x740
>  [<ffffffff8050babe>] dev_ioctl+0x24e/0x380
>  [<ffffffff80558cdd>] inet_ioctl+0x5d/0x80
>  [<ffffffff804fcfcf>] sock_ioctl+0xbf/0x240
>  [<ffffffff80295d51>] do_ioctl+0x31/0x90
>  [<ffffffff80295e23>] vfs_ioctl+0x73/0x2d0
>  [<ffffffff802960ca>] sys_ioctl+0x4a/0x80
>  [<ffffffff8020bbee>] system_call+0x7e/0x83
> 
> BUG: spinlock lockup on CPU#0, events/0/5, ffff81004ff14000
> 
> Call Trace:
>  [<ffffffff80367836>] _raw_spin_lock+0x106/0x130
>  [<ffffffff80245170>] wq_barrier_func+0x0/0x10
>  [<ffffffff805b336d>] _spin_lock_irq+0x3d/0x50
>  [<ffffffff80244e70>] run_workqueue+0x80/0x190
>  [<ffffffff8024595a>] worker_thread+0xca/0x130
>  [<ffffffff80249080>] autoremove_wake_function+0x0/0x40
>  [<ffffffff80245890>] worker_thread+0x0/0x130
>  [<ffffffff80248cad>] kthread+0x4d/0x80
>  [<ffffffff8020c5d8>] child_rip+0xa/0x12
>  [<ffffffff8020c190>] restore_args+0x0/0x30
>  [<ffffffff80248db4>] kthreadd+0xd4/0x160
>  [<ffffffff80248c60>] kthread+0x0/0x80
>  [<ffffffff8020c5ce>] child_rip+0x0/0x12

cwq->lock is locked by the died task above.

Oleg.

Comment 18 Oleg Nesterov 2007-08-30 10:08:51 UTC

Created attachment 12635 [details]
test patch to detect corrupted work_structs

Christian, could you try to reproduce the problem with
this patch?

It tries to detect corrupted work_structs, print, and
survive after that.

Sorry, can't guarantee this patch itself is not buggy :)

Comment 19 Michael Buesch 2007-08-30 10:31:05 UTC

are we under rtnl lock in the net_stop callback?
I'm not sure. flush_scheduled_work mustn't be called under rtnl lock.

Anyway, this is fixed in b43 and b43legacy. Please close this bug.

Comment 20 Christian Casteyde 2007-08-30 11:38:27 UTC

In response to #18:
The patch applied correctly, as well as the patch Larry gave me privatly in #16 (which so far seems to fix the schedule while atomic bug, but I have to reboot 50 times at least to get confidence, and I'm only at 10).

I finally got the following critical log in dmesg, which is indeed interesting I guess for you (since this is what you printk'd):

EXT3 FS on hda2, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Adding 506008k swap on /dev/mapper/sswap0.  Priority:-1 extents:1 across:506008k
rc.S used greatest stack depth: 3152 bytes left
!!! ieee80211softmac_auth_queue+0x0/0x170
SoftMAC: Open Authentication completed with XX:XX:XX:XX:XX:XX
CCMP: received packet without ExtIV flag from 00:1a:6b:1c:7c:d3
CCMP: received packet without ExtIV flag from 00:1a:6b:1c:7c:d3

So ieee80211softmac_auth_queue is the answer (I hope you have the question :-)

Effectively, if bcm is being ported to device scape stack, this bug may be futile to fix. However, if it shows some bug in other components, maybe it's wise to investigate.

Comment 21 Christian Casteyde 2007-08-30 11:50:32 UTC

Just more info, from the console (not dmesg):
Starting sysklogd daemons:  /usr/sbin/syslogd /usr/sbin/klogd -c 3 -x
Updating X font indexes:  /usr/bin/fc-cache -f &
Triggering udev events:  /sbin/udevtrigger --retry-failed
!!! ieee80211softmac_auth_queue+0x0/0x170
Starting OpenSSH SSH daemon:  /usr/sbin/sshd
Starting ACPI daemon:  /usr/sbin/acpid

So I just (re)discovered that there seem to be a bug in the slackware network script: it is started twice.
The first time, this is from the boot script, that calls /etc/rc.d/rc.inet1 start directly. The second time, this is from udevd, with the specific /etc/rc.d/rc.inet1 XXX_start, with XXX the device udev just discovered.

This gives some more info from the context of the network init. However, these two calls are sequentials if I understand well.

I got !!! ieee80211softmac_auth_queue+0x0/0x170 quite easily these times (5 boots later ;-). Still trying to crash with Larry's patch.

Comment 22 Larry Finger 2007-08-30 12:00:03 UTC

When you are satisfied that my patch fixes the crashes at boot, please try at least 100 /etc/rc.d/rc.inet1 stop/start cycles. The bug I tried to fix should show up here as well. At least those will go faster.

Comment 23 Christian Casteyde 2007-08-30 12:20:58 UTC

As I've just said in 8934, the loop didn't complete (still console freeze without any log), so even if the fix makes things better even in this case (or not happen), this is definitively not the same bug. I'm even wondering if the console freeze wouldn't occur with another test or in other conditions, since it is really only the virtual terminals that are broken and the system still works, except ifconfig down of course. I'm quite perplex with this console problem, I do not see the relation between ifconfig down and /dev/console...
I expected 8934 would fix 8937, I'm now considering that it will be the reverse: if the workqueue problem that seems to still be present in ieee80211softmac_auth_queue is fixed, maybe it would unlock the console (who knows). But I do not believe one minute that the disk subsystems do not use workqueues...

Returning to reboot sequence now... (still not at a great number - I'm doing a contest for the smallest uptime ;-). I'll try tomorrow too, because the problem doesn't occur some days. I have to remove your patch too to see if I could get it today.

Comment 24 Oleg Nesterov 2007-08-30 12:32:25 UTC

On 08/30, bugme-daemon@bugzilla.kernel.org wrote:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=8937
> 
> 
> 
> 
> 
> ------- Comment #20 from casteyde.christian@free.fr  2007-08-30 11:38 -------
> In response to #18:
> The patch applied correctly, as well as the patch Larry gave me privatly in
> #16
> (which so far seems to fix the schedule while atomic bug, but I have to
> reboot
> 50 times at least to get confidence, and I'm only at 10).
> 
> I finally got the following critical log in dmesg, which is indeed
> interesting
> I guess for you (since this is what you printk'd):
> 
> EXT3 FS on hda2, internal journal
> kjournald starting.  Commit interval 5 seconds
> EXT3 FS on hda3, internal journal
> EXT3-fs: mounted filesystem with ordered data mode.
> Adding 506008k swap on /dev/mapper/sswap0.  Priority:-1 extents:1
> across:506008k
> rc.S used greatest stack depth: 3152 bytes left
> !!! ieee80211softmac_auth_queue+0x0/0x170
> SoftMAC: Open Authentication completed with XX:XX:XX:XX:XX:XX
> CCMP: received packet without ExtIV flag from 00:1a:6b:1c:7c:d3
> CCMP: received packet without ExtIV flag from 00:1a:6b:1c:7c:d3
> 
> So ieee80211softmac_auth_queue is the answer (I hope you have the question
> :-)

Thanks a lot Christian!

Horror, horror, it is named "->work", grep can't help.

I'll try to look futher tomorrow. Right now, ieee80211softmac_clear_pending_work()
is supposed to stop ieee80211softmac_auth_queue_item->work, yes/no? Looks wrong,
this work re-schedules itself, cancel_delayed_work() is not enough.

(Of course, the code _can_ be correct, but at first glance it is not).

Btw, the patch I sent was not correct (surprised ?). I  _hope_ the "detect" part
is OK, but "try to survive" isn't. Please see the new version below. I don't think
we need a re-test though.

Oleg.

The same patch + "set correct cwq->next_func() when ck_work() fails".

--- q/include/linux/workqueue.h~WQ_DBG	2007-07-28 16:58:17.000000000 +0400
+++ q/include/linux/workqueue.h	2007-08-30 19:03:08.000000000 +0400
@@ -26,6 +26,7 @@ struct work_struct {
 #define WORK_STRUCT_PENDING 0		/* T if work item pending execution */
 #define WORK_STRUCT_FLAG_MASK (3UL)
 #define WORK_STRUCT_WQ_DATA_MASK (~WORK_STRUCT_FLAG_MASK)
+	work_func_t next_func;
 	struct list_head entry;
 	work_func_t func;
 };
@@ -44,7 +45,7 @@ struct execute_work {
 #define __WORK_INITIALIZER(n, f) {				\
 	.data = WORK_DATA_INIT(),				\
 	.entry	= { &(n).entry, &(n).entry },			\
-	.func = (f),						\
+	.func = (f), .next_func = NULL,				\
 	}
 
 #define __DELAYED_WORK_INITIALIZER(n, f) {			\
--- q/kernel/workqueue.c~WQ_DBG	2007-07-28 16:58:17.000000000 +0400
+++ q/kernel/workqueue.c	2007-08-30 21:25:12.000000000 +0400
@@ -33,6 +33,7 @@
 #include <linux/kallsyms.h>
 #include <linux/debug_locks.h>
 
+#define tow(p) list_entry((p), struct work_struct, entry)
 /*
  * The per-CPU workqueue (if single thread, we always use the first
  * possible cpu).
@@ -41,7 +42,9 @@ struct cpu_workqueue_struct {
 
 	spinlock_t lock;
 
+	work_func_t next_func;
 	struct list_head worklist;
+
 	wait_queue_head_t more_work;
 	struct work_struct *current_work;
 
@@ -134,6 +137,10 @@ static void insert_work(struct cpu_workq
 		list_add_tail(&work->entry, &cwq->worklist);
 	else
 		list_add(&work->entry, &cwq->worklist);
+
+	work->next_func = tow(work->entry.next)->func;
+	tow(work->entry.prev)->next_func = work->func;
+
 	wake_up(&cwq->more_work);
 }
 
@@ -236,6 +243,17 @@ int queue_delayed_work_on(int cpu, struc
 }
 EXPORT_SYMBOL_GPL(queue_delayed_work_on);
 
+static int ck_work(struct cpu_workqueue_struct *cwq, struct work_struct *work)
+{
+	if (cwq->next_func == work->func && cwq == get_wq_data(work) &&
+	    test_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
+		return 0;
+
+	printk(KERN_CRIT "!!! ");
+	print_symbol("%s\n", (unsigned long)cwq->next_func);
+	return 1;
+}
+
 static void run_workqueue(struct cpu_workqueue_struct *cwq)
 {
 	spin_lock_irq(&cwq->lock);
@@ -246,12 +264,26 @@ static void run_workqueue(struct cpu_wor
 			__FUNCTION__, cwq->run_depth);
 		dump_stack();
 	}
+again:
 	while (!list_empty(&cwq->worklist)) {
 		struct work_struct *work = list_entry(cwq->worklist.next,
 						struct work_struct, entry);
+
+		if (ck_work(cwq, work)) {
+			struct list_head *pos = &cwq->worklist;
+			while (pos->prev != &work->entry)
+				pos = pos->prev;
+
+			cwq->next_func = tow(pos)->func;
+			cwq->worklist.next = pos;
+			pos->prev = &cwq->worklist;
+			goto again;
+		}
+
 		work_func_t f = work->func;
 
 		cwq->current_work = work;
+		cwq->next_func = work->next_func;
 		list_del_init(cwq->worklist.next);
 		spin_unlock_irq(&cwq->lock);
 
@@ -411,6 +443,7 @@ static int try_to_grab_pending(struct wo
 		 */
 		smp_rmb();
 		if (cwq == get_wq_data(work)) {
+			tow(work->entry.prev)->next_func = work->next_func;
 			list_del_init(&work->entry);
 			ret = 1;
 		}

Comment 25 Oleg Nesterov 2007-08-31 01:36:29 UTC

On 08/30, Oleg Nesterov wrote:
>
> I'll try to look futher tomorrow. Right now,
> ieee80211softmac_clear_pending_work()
> is supposed to stop ieee80211softmac_auth_queue_item->work, yes/no? Looks
> wrong,
> this work re-schedules itself, cancel_delayed_work() is not enough.
> 
> (Of course, the code _can_ be correct, but at first glance it is not).

I was wrong, the code looks correct. ieee80211softmac_clear_pending_work()
sets ->running = 0 under sm->lock, ->running is checked under the same lock
before rescheduling, and we are doing flush_scheduled_work() after unlock.

I don't understand this code at all. But, could maintainers confirm that
ieee80211softmac_auth_req() can't happen/succeed when
ieee80211softmac_clear_pending_work() unlocks sm->lock?
Otherwise we seem to have a bug.

ieee80211softmac_clear_pending_work() cancells all ieee80211softmac_auth_queue_item's,
unlocks sm->lock and does flush_scheduled_work()

ieee80211softmac_auth_req() allocates and inserts a new pending work on ->auth_queue

ieee80211softmac_clear_pending_work() takes sm->lock again, and frees all items on
->auth_queue list, including the recently added ieee80211softmac_auth_queue_item.

The window is not that small due to flush_scheduled_work().

Oleg.

Comment 26 Christian Casteyde 2007-08-31 14:33:31 UTC

Larry, as far as the scheduling while atomic/spinlock lockup  is concerned, I didn't managed to reproduce with you patch, after dozens of reboots yesterday and tonight. This part is fixed, it remains only the workqueue assert I opened this bugzilla first.

So your patch is OK for me.

Thanks a lot, it was not so rare to get at boot.

Comment 27 Larry Finger 2007-08-31 15:05:47 UTC

Created attachment 12642 [details]
Final version of patch to handle cancellation of work queues

Comment 28 Larry Finger 2007-08-31 15:11:31 UTC

Since I made the patch you were testing, a posting on the bcm43xx mailing list led to some suggestions for other modifications. These should handle other corner cases. That patch is being posted here.

Please try it to make sure that your workqueue asserts are not one of those corner cases. The new patch shouldn't affect your boot problem.

I also installed the patch for softmac, but the printout has not been triggered.

Comment 29 Christian Casteyde 2007-09-01 00:42:57 UTC

The new patch still asserts with Oleg's workqueue patch (first boot, I was lucky).
This may be a softmac problem only now?

Comment 30 Christian Casteyde 2007-09-01 01:16:26 UTC

Just as I added printks in the device downing process, I can tell exactly when the workqueue asserts:

dev_close: enter
dev_close: deactivating device
dev__close: scheduled poll synchronization
dev__close: calling dev->stop
bcm43xx_net_stop: enter
bcm43xx_net_stop: stopping softmac
bcm43xx_net_stop: disabling interrupts sync
bcm43xx_net_stop: freeing board
bcm43xx_net_stop: exit
dev_close: notifying down
!!! ieee80211softmac_auth_queue+0x0/0x170
dev_close: exit

This seems to be asynchronously, after softmac is stopped. Maybe there is a pending job that reschedules itself or something like that.
I'm adding prinks in softmac now...

Comment 31 Oleg Nesterov 2007-09-01 02:10:36 UTC

On 09/01, bugme-daemon@bugzilla.kernel.org wrote:
>
> ------- Comment #29 from casteyde.christian@free.fr  2007-09-01 00:42 -------
> The new patch still asserts with Oleg's workqueue patch (first boot, I was
> lucky).
> This may be a softmac problem only now?

Regardless of any other changes, it would be nice to know what really happens.

Christian, any chance you can do yet another test with the trivial patch below?
It just tries to verify the wild guess from "Comment #25". Please don't revert
the workqueue instrumentation patch.

Thanks!

Oleg.

--- t/net/ieee80211/softmac/ieee80211softmac_module.c~	2007-09-01 12:55:28.000000000 +0400
+++ t/net/ieee80211/softmac/ieee80211softmac_module.c	2007-09-01 13:03:26.000000000 +0400
@@ -111,6 +111,8 @@ ieee80211softmac_clear_pending_work(stru
 	spin_lock_irqsave(&sm->lock, flags);
 	/* Free all pending auth work items */
 	list_for_each_entry_safe(authptr, authtmp, &sm->auth_queue, list) {
+		if (delayed_work_pending(&authptr->work))
+			printk(KERN_CRIT "!!! Yes, it happens\n");
 		list_del(&authptr->list);
 		kfree(authptr);
 	}

Comment 32 Christian Casteyde 2007-09-01 03:34:24 UTC

I had to remove many of my printks for diagnostic of bug 8934 since it slowed down the system and couldn't reproduce your assert. But the context is clearly the same, and I've finished with 8934.

So I reproduced yout assert with less logs, and finally got:
bcm43xx_net_stop: enter
bcm43xx_net_stop: stopping softmac
!!! Yes, it happens
bcm43xx_net_stop: disabling interrupts sync
bcm43xx_net_stop: freeing board
bcm43xx_net_stop: exit
!!! ieee80211softmac_auth_queue+0x0/0x170

It clearly happens (all is in softmac now, bcm seems to be fixed at first glance).

Comment 33 Oleg Nesterov 2007-09-01 03:46:38 UTC

On 09/01, bugme-daemon@bugzilla.kernel.org wrote:
> 
> ------- Comment #32 from casteyde.christian@free.fr  2007-09-01 03:34 -------
> I had to remove many of my printks for diagnostic of bug 8934 since it slowed
> down the system and couldn't reproduce your assert. But the context is
> clearly
> the same, and I've finished with 8934.
> 
> So I reproduced yout assert with less logs, and finally got:
> bcm43xx_net_stop: enter
> bcm43xx_net_stop: stopping softmac
> !!! Yes, it happens
> bcm43xx_net_stop: disabling interrupts sync
> bcm43xx_net_stop: freeing board
> bcm43xx_net_stop: exit
> !!! ieee80211softmac_auth_queue+0x0/0x170
> 
> It clearly happens (all is in softmac now, bcm seems to be fixed at first
> glance).

Thanks again Christian. This really proves that softmac/ is buggy.

As I said, I don't understand net/ at all, and I already lost the track
of what's going on here with other problems/fixes.

However, now I am very sure that the wild guess was correct, and
ieee80211softmac_module.c needs a fix regardless of any other changes.
Perhaps all we need is to cancel sm->associnfo.work properly, before
cancelling authptr->work.

I'll try to make a patch, unless maintainers take care.

Thanks!

Oleg.

Comment 34 Larry Finger 2007-09-04 08:56:40 UTC

Created attachment 12695 [details]
Proposed patch for problem (new revision)

Although I do not get the crashes on boot with my system, I was getting two different kinds of errors while running a large number of "ifdown ; ifup ; sleep 20" sequences while continuing normal work. The first hangs the machine with the caps lock light blinking at roughly a 1 Hz rate. This failure has never occurred with this patch. In the second type of failure, the firmware fails to execute correctly resulting in IRQ READY TIMEOUT messages in the log. This patch reduces the incidence of these failures by at least a factor of 5.

Comment 35 Christian Casteyde 2007-09-04 15:38:55 UTC

I guess fixing softmac for ieee80211softmac_module is useless as it would be dropped in a near future (there are at least Oleg's remark and the autolock I found in 8934, where it waits for completion of a disappeared task).

Unless some drivers are not switched over the new stack, in which case I guess the fact it is not lock correct is also known and do not deserve a bug file, right? But it wouldn't be broadcom related then.

Anyway not blocker since I can prevent freeze at startup with a sleep in my network scripts to get scan completed before ifconfig down the interface.

Comment 36 Larry Finger 2007-09-05 14:31:09 UTC

Created attachment 12720 [details]
Trial patch for network freeze

Christian,

Please try this 1-line patch with your sleep statement removed. It removed the attempt to abort a scan when the network is coming down, and waits for the scan to complete.

Thanks,

Larry

Comment 37 Christian Casteyde 2007-09-05 15:07:43 UTC

Hmm. Not really convincing.
I guess your one line patch is supposed to prevent 8934, since it was waiting for a potentially canceled job. But it still blocks, I guess in the same place, perhaps because now it waits for a finished job (in fact, wait_for_completion may work only when the job is pending when the function is called, as mentionned it is called after the check outside the lock and I do not know how it was intended to work. If it expects the job to exist, it may also fails even if it is not cancelled, that is if the completion routine removes it).

However, this one line patch does not work : I did the following tests with kernel 2.6.23-rc5-hrt1 + your last patch + your one line patch, but without Oleg's diagnose and survive patch, and I got :
- at boot: one crash, however the call stack doesn't seem to be the same as reported above, now it is in ifconfig up ! (grrr) ;
- with the test of 8934 (if up/down, beware it also does iwconfig any), still the console freeze.
Overall, that is still not really stable, sadly... your fix fixed only one aspect of the problem, I'm still thinking the whole mess is in softmac now.

The call stack for the crash at startup is attached (as an image, I'm not willing to copy all these hexa numbers :-( ).

Comment 38 Christian Casteyde 2007-09-05 15:14:35 UTC

Created attachment 12721 [details]
console photo at boot crash

To be sure it's still what Oleg's has seen, I have to repatch with his patches and reboot several times. This is a pain, should I do it?
The screen had scrolled when I shot it, I lost the first info. We can see here that there is a device open (but maybe wpa_supplicant is late there and is stucked by a precedent dev close).

Comment 39 Larry Finger 2007-09-05 15:44:15 UTC

No, I'm not sure what should be done. My one-liner was just grasping at straws. It looks as if the softmac code in that section should work, as is.

I think you have devoted enough effort to this (lost) cause. If I understand correctly, my bcm43xx patch has fixed the crash on bootup, and the hang can be worked around by adding the sleep. Is that correct? Do you have any other issues at the moment?

I certainly hope that all these problems disappear when you start using b43 with mac80211. SoftMAC has had problems from the start. Johannes Berg, the original author, has told me that it was his first major project in C, and that he doesn't understand it anymore. It is fine for routine situations, but it has a lot of corner cases. Mac80211 is not perfect, but there are a lot of people working on it, and its design is better understood.

Thanks for all your help. The bug that you found in bcm43xx was also there in b43 and b43legacy, but it is now fixed as well. You saved someone else from hitting it in the future.

Larry

Comment 40 Christian Casteyde 2007-09-06 10:36:13 UTC

Yes, adding the sleep in the script removes the remaining problems.
So for me it's workable, and as you say, I will switch to b43/mac80211 as soon as it gets ready for general use.

Thanks a lot for your patch too.