Distribution: RHEL 4 Update 1 --------------------------- Hardware Environment: SMP, 2 way Intel(R) Xeon(TM) CPU 2.40GHz, 1.5 GB RAM Network Interface (Tigon3) Disk I/O SCSI storage controller: usion-MPT Dual Ultra320 --------------------------- Software Environment: Linux x225b.in.ibm.com 2.6.13-rc3-mm2 #1 Wed Jul 27 17:43:05 IST 2005 i686 i686 i386 GNU/Linux ------------------------------ Problem Description: While executing the BASHMEMORY test encountered this OOPs message ------------[ cut here ]------------ kernel BUG at <bad filename>:48361! invalid operand: 0000 [#1] SMP last sysfs file: Modules linked in: ipv6 ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables button battery ac i2e CPU: 3 EIP: 0060:[<c016f52a>] Not tainted VLI EFLAGS: 00010246 (2.6.13-rc3-mm2-I) EIP is at submit_bh+0x16a/0x180 eax: 00004005 ebx: e3bc8754 ecx: 00000000 edx: e3bc8754 esi: 00000003 edi: 000000f6 ebp: 00000200 esp: f7ddbe7c ds: 007b es: 007b ss: 0068 Process kjournald (pid: 1122, threadinfo=f7ddb000 task=f7c35030) Stack: 00d81040 00000000 00000008 00000000 00000000 e3bc8754 00000001 e3bc8754 00000003 000000f6 00000200 c016f5fc c5fe8000 00000000 c5f92200 e69028c0 f78e2880 c01e43cf 00000000 c5eed030 00000000 c0515855 c0513d36 f78e28bc Call Trace: [<c016f5fc>] ll_rw_block+0xbc/0xc0 [<c01e43cf>] journal_commit_transaction+0x34f/0xff0 [<c0515855>] _spin_unlock_irq+0x5/0x10 [<c0513d36>] schedule+0x656/0xd40 [<c013aff0>] autoremove_wake_function+0x0/0x50 [<c0515739>] _spin_lock_irqsave+0x9/0x10 [<c012ee56>] try_to_del_timer_sync+0x46/0x50 [<c01e72b0>] kjournald+0xa0/0x220 [<c013aff0>] autoremove_wake_function+0x0/0x50 [<c013aff0>] autoremove_wake_function+0x0/0x50 [<c0104022>] ret_from_fork+0x6/0x14 [<c01e7200>] commit_timeout+0x0/0x10 [<c01e7210>] kjournald+0x0/0x220 [<c0102465>] kernel_thread_helper+0x5/0x10 Code: 7d 10 00 b8 a1 ff ff ff 0f 49 44 24 10 89 44 24 10 89 e8 e8 29 09 00 00 8b 44 24 10 83 c4 1c 5b 5 Linux version 2.6.13-rc3-mm2-II (root@x225b.in.ibm.com) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22.5 ----------------
Jan, this might be due to the changes you made? It's abummer that the file-n-line got mangled. I wonder why that happened?
Huh...I'd be really interested which of those three BUGs in submit_bh() has been triggered. The only one which does not seem obviously satisfied to me is the !mapped test and I did not change anything in this direction. I'm leaving now on holiday for a week. If you manage to find out in the mean time which of those three BUGs was triggered it'd be really nice. I'll have a deeper look on the problem when I'll return.
Jan I was not able to replicate the bug. I am trying to take dump so that we can analyze it, will update as soon as I can do so Regards
I tested the 2.6.13-rc4-mm1 kernel with same test suite. I got the same OOps message. I have captured the dump with kdump, i will attach preliminary analysis of the dump ..
Created attachment 5485 [details] Preliminary Analysis of dump taken through kdump I am attaching the preliminary analysis on the dump, Let me know if you need any specific information
Wrong thread. It was thread 4 which oopsed. Really all we need to know is which BUG triggered in suybmit_bh().
Created attachment 5502 [details] Log buffer out put got through analyzing the kernel dump This is the log buff out put, got from a dump captured when the same Oops occured while testing 2.6.13-rc4-mm1 kernel. Here submit_bh is in 2nd thread.
OK, so as Jan surmised, the buffer isn't buffer_mapped().
Hmm.. So we have !mapped buffer in t_sync_datalist. Which mount options are you using on that filesystem? As every buffer is mapped in prepare_write() before it is attached to t_sync_datalist in ext3_ordered_commit_write() we must race with something that unmaps the buffer (but leaves it dirty?) before the commit happens. I'm not sure how this can happen - probably some ugly race with truncate(). I'll try to find out more...
BTW are you able to reproduce the bug with vanilla kernel (i.e. without -mm patches)?
Jan It is mounted during bootup and fstab entry for it /dev/sda2 / ext3 defaults 1 1 and I haven't tested this on Vanilla kernel.
Yes, it happens with vanilla kernel (SMP x86_64 with preempt). Seems fairly easy to reproduce, both times i've left the bash-shared-memory on and come back an hour later it has been crashed buffer.c:2752 is BUG_ON(!buffer_mapped(bh)); ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at "/home/alex/x86/linux-2.6/fs/buffer.c":2752 invalid operand: 0000 [1] PREEMPT SMP CPU 1 Modules linked in: nfsd exportfs lockd sunrpc snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc Pid: 249, comm: kjournald Not tainted 2.6.13-rc6 RIP: 0010:[<ffffffff8018822b>] <ffffffff8018822b>{submit_bh+59} RSP: 0018:ffff810002247d28 EFLAGS: 00010246 RAX: 0000000000000a00 RBX: ffff8100213e6938 RCX: 8000000000000000 RDX: ffff81001a476b98 RSI: ffff8100213e6938 RDI: 0000000000000001 RBP: ffff8100213e6938 R08: ffff810001000000 R09: ffff81000193f030 R10: 0000000000001000 R11: ffff810001000000 R12: 0000000000000001 R13: 0000000000000000 R14: ffff81003ff46000 R15: ffff81003fdb569c FS: 00002aaaaaf15090(0000) GS:ffffffff804bc880(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002aaab6cbd000 CR3: 000000002e5f7000 CR4: 00000000000006e0 Process kjournald (pid: 249, threadinfo ffff810002246000, task ffff81003fd7a130) Stack: 0000000000000008 ffff8100213e6938 0000000000000172 0000000000000001 0000000000000200 ffffffff80188399 ffff81001cbae468 ffff810032b42e10 0000000000000200 ffff81003bc5fe18 Call Trace:<ffffffff80188399>{ll_rw_block+105} <ffffffff801d5d43>{journal_commit_transaction+1299} <ffffffff802f4083>{_spin_unlock_irq+67} <ffffffff802f261a>{thread_return+186} <ffffffff80142a09>{lock_timer_base+41} <ffffffff802f4014>{_spin_unlock_irqrestore+68} <ffffffff801d8f5c>{kjournald+252} <ffffffff80150830>{autoremove_wake_function+0} <ffffffff80150830>{autoremove_wake_function+0} <ffffffff801d8e40>{commit_timeout+0} <ffffffff802f4083>{_spin_unlock_irq+67} <ffffffff8010f9be>{child_rip+8} <ffffffff801d8e60>{kjournald+0} <ffffffff8010f9b6>{child_rip+0}
Ok, thanks a lot for testing. So I can reduce my search just to vanilla kernel.. Can you please describe (or point me to a description) what does BASHMEMORY test do?
I can never remember where akpm has hidden it so I've also put it at serkiaden.mine.nu/ext3-tools.tar.gz And regarding what it does: /* * bash-shared-mapping.c - Andrew Morton <andrewm@uow.edu.au> * * Create a huge and holey shared mapping, then conduct multithreaded * write() I/O on some of it, while truncating and expanding it. General * idea is to try to force pageout activity into the file while the kernel * is writing to and truncating the file. We also perform pageout of areas * which are subject to write() and vice versa. All sorts of stuff. * * It is good to run a concurrent task which uses heaps of memory, to force * pageouts. * * A good combination on a 1gigabyte machine is: * * bash-shared-mapping -t5 foo 1000000000 & * while true * do * usemem 1000 * done */
Thanks for the description. So it's probably some race between write and truncate. I'm now leaving for a combinatorics conference for one week but afterwards I'll have a look at a problem if it's open so far.
OK. I think I've found a culprit of the problem. Briefly said the problem is that block_write_full_page() calls block_invalidatepage() in case it raced with truncate and not filesystem's private a_ops->invalidatepage(). block_invalidatepage() simply invalidates buffers without removing it from transactions lists and this leads to problems. Can you please try attached patch if it fixes the problem (I did not get to testing it seriously yet, so don't run it on any machine with valuable data ;)?
Created attachment 5758 [details] Patch making block_write_full_page() call a_ops->invalidatepage()
Sorry about the delay. Jan, it still hits the same BUG_ON() with your patch applied...
Thanks for testing. I'll try to find some other cause..
I tried to reproduce the problems here but I failed :( BTW: Are you running the bash-memory-mapping test with additional memory load (as suggested in the comments) or without it? I also analyzed the code once more but I did not find any other possibility how BOG_ON can be triggered. Can you try to run the test with the attached debugging patch (and also with my fix)? Thanks.
Created attachment 5824 [details] Debugging patch to detect !mapped buffers
Hmm, I still cannot reproduce the problem - Alexander, can you please give a spin to my debugging patch (together with by fix) so that I know more about the problem? Thanks.
I left it on for a while the other day but it hadn't crashed when I came back so I'm starting to get curious if I had somewhat screwed up the patch handling earlier and your first fix might really fix it. Sorry for the confusion, I'll leave it on when I leave today and maybe we can reach a conclusion. Haven't been home all too much lately that's why it's taking so long.
It was on for a bit more than 3h today and survived. It has so far always bugged out within an hour. I'll be able to leave it on for at least 8h tomorrow which should be pretty definate. Sorry if this has caused you grief.
Linus latest tree plus your invalidate fix survived 10h of bash-shared-mapping today.
(wakes up) Yeah, that makes sense. Jan, could you please send me a final patch? Perhaps it'd be better to use do_invalidatepage() rather than open-coding it...
I Noticed the same bug with 2.6.13-mm2 kernel, I have taken memory dump throug kdump let me know if you need any information at this stage.
Thanks everyone for testing and traces. I'm attaching current version of the fix for completeness (the fix is the same, just the "style" is changed a bit).
Created attachment 5991 [details] Patch fixing problems with unmapped buffers in transactions lists
The patch from this bug is already included in 2.6.16.