Most recent kernel where this bug did not occur: 2.6.22.x Distribution: Gentoo Hardware Environment: i386 and x86_64 Software Environment: GNU Problem Description: An oops occurs on switching I/O scheduler during active fork()ing and disk usage. Here is x86_64 ARCH dmesg part: [ 620.257633] ------------[ cut here ]------------ [ 620.257641] kernel BUG at kernel/workqueue.c:258! [ 620.257643] invalid opcode: 0000 [1] SMP [ 620.257645] CPU 0 [ 620.257647] Modules linked in: tcp_westwood ipt_REJECT xt_state iptable_filter ipt_REDIRECT ipt_owner xt_tcpudp xt_multiport iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables sdhci mmc_core sr_mod cdrom [ 620.257662] Pid: 77, comm: kblockd/0 Not tainted 2.6.23 #19 [ 620.257664] RIP: 0010:[<ffffffff8024a4f6>] [<ffffffff8024a4f6>] run_workqueue+0x116/0x170 [ 620.257672] RSP: 0018:ffff81003f4a3ea0 EFLAGS: 00010282 [ 620.257673] RAX: ffff81002fe67880 RBX: 0000000000000000 RCX: ffff81002fe67880 [ 620.257676] RDX: ffff81002fe67880 RSI: ffff81003f4a3ed0 RDI: ffff81002fe67878 [ 620.257677] RBP: ffffffff8031bd10 R08: ffff81003f4a2000 R09: ffff81003e31cdc0 [ 620.257679] R10: 0000000000000000 R11: 0000000000000000 R12: ffff81003f675b40 [ 620.257681] R13: ffff81003f675b48 R14: 0000000000000000 R15: 0000000000000000 [ 620.257684] FS: 0000000000000000(0000) GS:ffffffff806a4000(0000) knlGS:0000000000000000 [ 620.257686] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 620.257688] CR2: 00002b68aabb41de CR3: 000000002eb0a000 CR4: 00000000000006e0 [ 620.257690] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 620.257692] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 620.257694] Process kblockd/0 (pid: 77, threadinfo ffff81003f4a2000, task ffff81003f6b8000) [ 620.257696] Stack: ffffffff8024b040 ffff81003f675b58 ffff81003f675b40 ffffffff8024b040 [ 620.257700] ffff81003f675b48 ffffffff8024b0e3 0000000000000000 ffff81003f6b8000 [ 620.257703] ffffffff8024e900 ffff81003f4a3ee8 ffff81003f4a3ee8 00000000fffffffc [ 620.257706] Call Trace: [ 620.257709] [<ffffffff8024b040>] worker_thread+0x0/0x110 [ 620.257712] [<ffffffff8024b040>] worker_thread+0x0/0x110 [ 620.257714] [<ffffffff8024b0e3>] worker_thread+0xa3/0x110 [ 620.257718] [<ffffffff8024e900>] autoremove_wake_function+0x0/0x30 [ 620.257721] [<ffffffff8024b040>] worker_thread+0x0/0x110 [ 620.257723] [<ffffffff8024b040>] worker_thread+0x0/0x110 [ 620.257726] [<ffffffff8024e53b>] kthread+0x4b/0x80 [ 620.257729] [<ffffffff8020cab8>] child_rip+0xa/0x12 [ 620.257732] [<ffffffff8024e4f0>] kthread+0x0/0x80 [ 620.257734] [<ffffffff8020caae>] child_rip+0x0/0x12 [ 620.257736] [ 620.257737] [ 620.257737] Code: 0f 0b eb fe 66 0f 1f 44 00 00 65 48 8b 34 25 00 00 00 00 8b [ 620.257746] RIP [<ffffffff8024a4f6>] run_workqueue+0x116/0x170 [ 620.257749] RSP <ffff81003f4a3ea0> the i386 oops looks like this one (can provide later). Please note the problem is always occurs in "kblockd/X" process (X = 0 ~~ SMP CPUs) Steps to reproduce: Need 2.6.23.1 with CFQ and deadline I/O schedulers loaded. and these 3 simultaneous tasks trigger the problem: 1. something like this: # while :; do > echo deadline > /sys/block/sda/queue/scheduler > echo cfq > /sys/block/sda/queue/scheduler >done 2. a fork()/clone() torture (possible, you can apply any other way - should not matter). I'm running a simple perl script: ------------CUT---------- #!/usr/bin/perl $SIG{CHLD}=IGNORE; while (1) { $r=fork; exit if (defined($r) && $r == 0); } ------------/CUT---------- it's not a fork bomb. It's forking in 1 thread. Children are just exiting. 3. some high disk usage. Like this: # while :; do tar xjf ~/linux-2.6.23.1.tar.bz2; rm -rf linux-2.6.23.1; done It could take some time to occur, like 10-30 minutes. Almost always system is fully hanging when this oops occurs.
Looks like we have an additional failure place on this bug at block/cfq-iosched.c:1147 It's on i386 arch. Unfortunately, I can't recreate the problem at this place of code. But pretty sure that it's the same problem. Here is dmesg: Oct 18 10:14:59 beta [107202.940998] ------------[ cut here ]------------ Oct 18 10:14:59 beta [107202.941193] kernel BUG at /usr/src/linux-2.6.23/block/cfq-iosched.c:1147! Oct 18 10:14:59 beta [107202.941487] invalid opcode: 0000 [#1] SMP Oct 18 10:14:59 beta [107202.941926] Modules linked in: ohci_hcd usbcore floppy sr_mod cdrom Oct 18 10:14:59 beta [107202.942480] CPU: 0 Oct 18 10:14:59 beta [107202.942481] EIP: 0060:[<c02d7dfb>] Not tainted VLI Oct 18 10:14:59 beta [107202.942482] EFLAGS: 00010086 (2.6.23 #1) Oct 18 10:14:59 beta [107202.943015] EIP is at cfq_put_queue+0x4b/0x80 Oct 18 10:14:59 beta [107202.943195] eax: eb10c210 ebx: eb10c210 ecx: eb10c1b8 edx: eb10c1b8 Oct 18 10:14:59 beta [107202.943406] esi: c85a5300 edi: c22b9000 ebp: eb057f18 esp: eb057ee0 Oct 18 10:14:59 beta [107202.943615] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Oct 18 10:14:59 beta [107202.943813] Process bash (pid: 19625, ti=eb056000 task=d4c80bc0 task.ti=eb056000) Oct 18 10:14:59 beta [107202.944030] Stack: 00000001 c85a5300 c02d7fb9 dcff3cc0 dcff3d0c ca939360 c02cbbf3 c22b9000 Oct 18 10:14:59 beta [107202.944850] dcff3cc0 c02cbfda 00000044 00000001 00000009 ca939900 64616564 656e696c Oct 18 10:14:59 beta [107202.945668] 00000000 00000000 c22b90d0 c22b93a8 c22b9000 c0871b3c c02cf64c eafa9000 Oct 18 10:14:59 beta [107202.946487] Call Trace: Oct 18 10:14:59 beta [107202.946774] [<c02d7fb9>] cfq_exit_queue+0x89/0xb0 Oct 18 10:14:59 beta [107202.947018] [<c02cbbf3>] elevator_exit+0x23/0x50 Oct 18 10:14:59 beta [107202.947263] [<c02cbfda>] elv_iosched_store+0x16a/0x1a0 Oct 18 10:14:59 beta [107202.947513] [<c02cf64c>] queue_attr_store+0x4c/0x80 Oct 18 10:14:59 beta [107202.947759] [<c01a8bda>] sysfs_write_file+0xaa/0x120 Oct 18 10:14:59 beta [107202.948008] [<c0168586>] vfs_write+0xa6/0x160 Oct 18 10:14:59 beta [107202.948250] [<c01a8b30>] sysfs_write_file+0x0/0x120 Oct 18 10:14:59 beta [107202.948496] [<c0168cc1>] sys_write+0x41/0x70 Oct 18 10:14:59 beta [107202.948736] [<c0102b1e>] sysenter_past_esp+0x5f/0x85 Oct 18 10:14:59 beta [107202.948985] ======================= Oct 18 10:14:59 beta [107202.949155] Code: 01 00 85 c0 75 29 8b 43 2c 8b 53 28 01 d0 85 c0 75 21 f6 43 50 01 75 1f 3b 5e 44 74 1e 89 da a1 10 f2 cf c0 5b 5e e9 d5 c6 e8 ff f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 31 c9 89 da 89 Oct 18 10:14:59 beta [107202.953398] EIP: [<c02d7dfb>] cfq_put_queue+0x4b/0x80 SS:ESP 0068:eb057ee0
Changed severity to high as the second bug occured without any unusual actions. Just a running system.
Created attachment 13212 [details] debug patch to detect corrupted work_structs > [ 620.257641] kernel BUG at kernel/workqueue.c:258! > ... > Steps to reproduce: Could you try to reproduce with this patch? Hopefully it can detect the bad work. Not sure this is really useful, but still.
> Could you try to reproduce with this patch? Unfortunately, no. The second oops occured on a system without any abnormal activities. It was running fine almost a week before the crash and is running fine now. several other systems on the same vmlinuz binary are running ok. So, I'm afraid it's just a _very_ unlikely failure to be reproduced.
> Could you try to reproduce with this patch? Sorry, just misread your post... The patch is for the first recreatable problem - so, will try it and post here my results.
After a brief look, I suspect we have 2 bugs in block/ related to workqueus. 1. cfq_exit_queue() forgets to cancel cfq_data->unplug_work, I think this _might_ explain the first trace. 2. blk_sync_queue() doesn't cancel request_queue->unplug_work, this is not related.
Created attachment 13215 [details] perhaps a fix for the first trace Nick, any chance you can check if this patch makes any difference? Would be nice to know Jens's opinion, of course :)
Good news! Thanks to Oleg for the patch (id=13215). It really does fix the problem :) My test system was running 2 hours with the torture tasks without any problems. And I have 2 additional notices about the initial problem: - It's SMP-only problem. I wasn't able to reproduce it on an UP i386 machine. - The problem doesn't appear (or possible appears much less likely) once RAM is big enough to store in cache whole FS activity without forced cache submitting. I wasn't able to reproduce the problem on a 4Gb RAM system unpacking/removing the 2.6.23 package (is it too small for now?? ;))) 2Gb RAM system effectively shows the problem though. So, I'd like to see the patch submitted in .23 branch :)
Just found now a way to recreate the second dump problem. Pretty easy. There are several notices on this: - First notice: the problem appears only if a disk-active task is running in IDLE I/O class. - Second notice: it is _LIKELY_ the problem came with this patch: "cfq: async queue allocation per priority" its URL: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c2dea2d1fdbce86942dba0a968c523d8b7858bb5 - Third notice: the problem is gone (just can't recreate it in the described way) once the patch above ^^^ is rolled back So, to recreate you are running 2 tasks: 1. something like this (the sleep()s _ARE_ important!): # while :; do echo cfq > /sys/block/sda/queue/scheduler; sleep 1; echo deadline > /sys/block/sda/queue/scheduler; sleep 1; done 2. something IDLE-I/O-niced (!!important) disk-active (the system cache should be small enough to force disk flushing) # ionice -c3 -p $$ # while :; do cp -a linux-2.6.23 linux-2.6.23_1; rm -rf linux-2.6.23_1; done and here, we need some magick ;) Just start to interrupt the task (with ctrl+c) and run it again in 2-3 seconds. The second posted above BUG appears after just several runnings/stops.... Trying to recreate the problem on the latest 2.6.24-rc1-git13. Will update later.
Just easily recreated the problem in the latest 2.6.24-rc1-git13 [ 364.789660] ------------[ cut here ]------------ [ 364.789677] kernel BUG at block/cfq-iosched.c:1147! [ 364.789683] invalid opcode: 0000 [#1] PREEMPT [ 364.789693] Modules linked in: nfs iptable_raw ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle xt_TCPMSS xt_tcpudp iptable_filter ip_tables x_tables [ 364.789756] [ 364.789762] Pid: 5900, comm: bash Not tainted (2.6.24-rc1-git13 #1) [ 364.789768] EIP: 0060:[<c021fd4f>] EFLAGS: 00010046 CPU: 0 [ 364.789779] EIP is at cfq_put_queue+0x48/0x74 [ 364.789784] EAX: 00000000 EBX: c5d73128 ECX: c18d93c0 EDX: c10bae60 [ 364.789790] ESI: c2ceb380 EDI: c2706200 EBP: c5d21f1c ESP: c5d21ee8 [ 364.789796] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 364.789802] Process bash (pid: 5900, ti=c5d20000 task=c2c9f350 task.ti=c5d20000) [ 364.789810] Stack: c2ceb380 00000001 c02208b1 c28bd600 c28bd62c c0216dbf c28bd600 c2766ad0 [ 364.789854] c021714c 00000044 b7f1c000 00000009 c28bd9c0 64616564 656e696c 00000000 [ 364.789893] 00000000 c2766bb0 c2766e50 c2766ad0 c042bcf0 c0219eb3 c5d31000 c042bc68 [ 364.789926] Call Trace: [ 364.789937] [<c02208b1>] cfq_exit_queue+0x79/0xab [ 364.789947] [<c0216dbf>] elevator_exit+0x1b/0x32 [ 364.789957] [<c021714c>] elv_iosched_store+0x177/0x1c6 [ 364.789968] [<c0219eb3>] queue_attr_store+0x40/0x61 [ 364.789978] [<c019645d>] sysfs_write_file+0xa8/0xe9 [ 364.789990] [<c01963b5>] sysfs_write_file+0x0/0xe9 [ 364.789999] [<c015adc2>] vfs_write+0x87/0x140 [ 364.790011] [<c015b33d>] sys_write+0x41/0x6a [ 364.790022] [<c01027ee>] sysenter_past_esp+0x5f/0x85 [ 364.790033] ======================= [ 364.790037] Code: 06 5e 00 00 85 c0 75 27 8b 43 28 03 43 2c 85 c0 75 21 f6 43 50 01 75 1f 39 5e 44 74 1e 89 da a1 a8 91 48 c0 5b 5e e9 73 75 f3 ff <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 31 c9 89 da 89 [ 364.790357] EIP: [<c021fd4f>] cfq_put_queue+0x48/0x74 SS:ESP 0068:c5d21ee8 [ 364.790377] note: bash[5900] exited with preempt_count 1
Created attachment 13393 [details] (may be) fix IOPRIO_CLASS_IDLE accounting On 11/04, bugme-daemon@bugzilla.kernel.org wrote: > > ------- Comment #9 from gentuu@gmail.com 2007-11-04 01:00 ------- > > - First notice: the problem appears only if a disk-active task is running in > IDLE I/O class. Great work, Nickolay. Looks like cfq_put_async_queues() is completely wrong wrt IOPRIO_CLASS_IDLE. Could you try the patch?
Yes, the patch does fix the bug! Thank you Oleg. I saw that Jens already accepted the patch. So, closing this bug now :) Congrats!
*** Bug 9633 has been marked as a duplicate of this bug. ***