Bug 34642
Summary: | kjournald failed | ||
---|---|---|---|
Product: | File System | Reporter: | O01eg (o01eg) |
Component: | Other | Assignee: | Jan Kara (jack) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | akpm, alan, kernel, olaf |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.36-3.2 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
/proc/config.gz
Debugging patch for assertion failure in journal_commit_transaction Patch fixing assertion failure in JBD |
With new kernel: [ 430.603855] ------------[ cut here ]------------ [ 430.603865] kernel BUG at fs/jbd/commit.c:505! [ 430.603869] invalid opcode: 0000 [#1] PREEMPT SMP [ 430.603878] last sysfs file: /sys/devices/platform/it87.656/temp3_alarm [ 430.603882] CPU 0 [ 430.603885] Modules linked in: ppp_deflate zlib_deflate bsd_comp ppp_async option usb_wwan netconsole oss_hdaudio osscore tun ppdev parport_pc parport [ 430.603913] [ 430.603918] Pid: 1634, comm: kjournald Tainted: G W 2.6.38.3 #1 Gigabyte Technology Co., Ltd. P35T-DS3P/P35T-DS3P [ 430.603931] RIP: 0010:[<ffffffff8118435e>] [<ffffffff8118435e>] journal_commit_transaction+0x94d/0x1213 [ 430.603944] RSP: 0018:ffff88007ba57d30 EFLAGS: 00010202 [ 430.603948] RAX: 0000000000000012 RBX: ffff880063b4d200 RCX: ffff88007ba56010 [ 430.603952] RDX: 0000000000000100 RSI: ffff88007cc28458 RDI: ffff88007bb27024 [ 430.603956] RBP: ffff88007ba57e50 R08: 0000000000000000 R09: 0000000000000000 [ 430.603960] R10: 0000000000000000 R11: ffff88007fc0c5d0 R12: ffff88007bb27000 [ 430.603964] R13: ffff88007bb2714c R14: ffff88007bf71800 R15: 0000000000000000 [ 430.603969] FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 430.603974] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 430.603978] CR2: 00007fc6cdd51000 CR3: 0000000063900000 CR4: 00000000000006f0 [ 430.603982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 430.603986] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 430.603990] Process kjournald (pid: 1634, threadinfo ffff88007ba56000, task ffff88007d232da0) [ 430.603994] Stack: [ 430.603997] ffff88007ba56000 ffff88007ba57fd8 ffff88007ba56010 0000006441fd3500 [ 430.604072] ffff88007ba56000 ffff88007ba56000 ffff88007bf71800 ffff88007bb2714c [ 430.604082] ffff88007bb27000 ffff88007bb27024 ffff880000000091 ffff88007ba56010 [ 430.604092] Call Trace: [ 430.604099] [<ffffffff81073405>] ? autoremove_wake_function+0x0/0x34 [ 430.604105] [<ffffffff81066a5f>] ? lock_timer_base+0x27/0x4d [ 430.604110] [<ffffffff81066d2e>] ? try_to_del_timer_sync+0x7b/0x85 [ 430.604116] [<ffffffff81187559>] kjournald+0xe6/0x22c [ 430.604121] [<ffffffff81073405>] ? autoremove_wake_function+0x0/0x34 [ 430.604127] [<ffffffff81187473>] ? kjournald+0x0/0x22c [ 430.604132] [<ffffffff81073006>] kthread+0x7d/0x85 [ 430.604139] [<ffffffff8102b754>] kernel_thread_helper+0x4/0x10 [ 430.604201] [<ffffffff81072f89>] ? kthread+0x0/0x85 [ 430.604207] [<ffffffff8102b750>] ? kernel_thread_helper+0x0/0x10 [ 430.604210] Code: 48 c7 85 58 ff ff ff 00 00 00 00 45 31 f6 c7 85 48 ff ff ff 00 00 00 00 48 c7 85 50 ff ff ff 00 00 00 00 4d 89 ef e9 8d 02 00 00 <0f> 0b eb fe 41 f6 04 24 02 74 24 49 8b 45 00 f0 80 60 02 f7 4c [ 430.604689] RIP [<ffffffff8118435e>] journal_commit_transaction+0x94d/0x1213 [ 430.604696] RSP <ffff88007ba57d30> [ 430.604721] ---[ end trace 9a427e3322e84bd6 ]--- Thanks for report! OK, so we wrongly estimated number of needed credits (the assertion is that number of buffers in BJ_Metadata list is larger than the number of remaining transaction credits). I guess the commit 523334ba508a8baaf5fc9f15fbad9ed04f334f48 should have fixed this (merged in 2.6.39-rc3). So can you possibly try latest 2.6.39-rc kernel? Thanks. Thanks. It's look like fixed. No, it isn't. Bug wasn't fixed but it is caused more rarely. [91087.583431] ------------[ cut here ]------------ [91087.583452] kernel BUG at fs/jbd/commit.c:503! [91087.583464] invalid opcode: 0000 [#1] PREEMPT SMP [91087.583482] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/sda10/uevent [91087.583500] CPU 3 [91087.583507] Modules linked in: ppp_deflate zlib_deflate bsd_comp ppp_async option usb_wwan ipt_LOG xt_state iptable_filter ipt_MASQUERADE ipt_REDIRECT iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_DSCP xt_dscp xt_string xt_NFQUEUE xt_multiport xt_mark xt_hashlimit xt_conntrack xt_connmark nf_conntrack oss_hdaudio osscore tun ppdev parport_pc parport [91087.583644] [91087.583651] Pid: 1643, comm: kjournald Tainted: G M W 2.6.39-rc6 #1 Gigabyte Technology Co., Ltd. P35T-DS3P/P35T-DS3P [91087.583678] RIP: 0010:[<ffffffff8118497b>] [<ffffffff8118497b>] journal_commit_transaction+0x972/0x1267 [91087.583703] RSP: 0018:ffff88007b06dcf0 EFLAGS: 00010202 [91087.583714] RAX: 0000000000000012 RBX: ffff880035334380 RCX: ffff88007b06c010 [91087.583728] RDX: 0000000000000100 RSI: ffff88007cc28468 RDI: ffff88007b8d8824 [91087.583742] RBP: ffff88007b06de50 R08: ffff880035334380 R09: ffff880018a2bba8 [91087.583757] R10: 0000000000011640 R11: 0007ffffffffffff R12: ffff88007b8d8800 [91087.583770] R13: ffff88007b8d894c R14: ffff88007b82b000 R15: 0000000000000000 [91087.583784] FS: 0000000000000000(0000) GS:ffff88007fd80000(0000) knlGS:0000000000000000 [91087.583799] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [91087.583813] CR2: 00007f2f59680fc0 CR3: 0000000006c64000 CR4: 00000000000006e0 [91087.583828] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [91087.583841] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [91087.583855] Process kjournald (pid: 1643, threadinfo ffff88007b06c000, task ffff88007d160f20) [91087.583870] Stack: [91087.583876] 0000000000000003 0000000000000000 ffff88007b06de50 ffff88007b06c010 [91087.583898] ffff88007b06c000 ffff88007b06c010 ffff88007b06c010 000052d7fb0d8a8c [91087.583921] ffff88007b06c000 ffff88007b8d894c ffff88007b82b000 ffff88007b8d8800 [91087.583943] Call Trace: [91087.583955] [<ffffffff81072071>] ? wake_up_bit+0x25/0x25 [91087.583968] [<ffffffff810653e3>] ? lock_timer_base+0x27/0x4d [91087.583982] [<ffffffff810656b2>] ? try_to_del_timer_sync+0x7b/0x85 [91087.583996] [<ffffffff81187bb0>] kjournald+0xe6/0x22c [91087.584003] [<ffffffff81072071>] ? wake_up_bit+0x25/0x25 [91087.584003] [<ffffffff81187aca>] ? commit_timeout+0xb/0xb [91087.584003] [<ffffffff81071c72>] kthread+0x7d/0x85 [91087.584003] [<ffffffff815cc954>] kernel_thread_helper+0x4/0x10 [91087.584003] [<ffffffff81071bf5>] ? kthread_worker_fn+0x14c/0x14c [91087.584003] [<ffffffff815cc950>] ? gs_change+0xb/0xb [91087.584003] Code: 48 c7 85 28 ff ff ff 00 00 00 00 45 31 f6 c7 85 20 ff ff ff 00 00 00 00 48 c7 85 18 ff ff ff 00 00 00 00 4d 89 ef e9 8c 02 00 00 <0f> 0b eb fe 41 f6 04 24 02 74 24 49 8b 45 00 f0 80 60 02 f7 4c [91087.584003] RIP [<ffffffff8118497b>] journal_commit_transaction+0x972/0x1267 [91087.584003] RSP <ffff88007b06dcf0> [91087.599334] ---[ end trace a30b2d7091e47ff4 ]--- Drat, so probably there's a similar problem elsewhere. What kind of load do you use (because I haven't seen a report like this in ages, so you must do something special). It started unexpected and I also didn't get this bug before. In first causes it was linked with thunderbird activity. I try to check volume (to find badblocks), try to backup ~/.thunderbird/ and restore them (to delete wrong inodes) but it wasn't help. After you fix thunderbird could work entire day. P.S. I missed, I use "data=journal" option for this volume. (In reply to comment #6) > P.S. I missed, I use "data=journal" option for this volume. Ah, that explains a lot! data=journal is not used/tested very much and uses rather different code paths. I'll have a look. This report indicates that data=journal has still issues with 3.2 https://bugzilla.novell.com/show_bug.cgi?id=747122 Umm, I forgot there's this bug open and updated only https://bugzilla.novell.com/show_bug.cgi?id=747122. So there's commit b22570d9abb3d844e65c15c8bc0d57a78129e3b4 which fixed bug in data=journal mode handling causing exactly the reported assertion failure (Olaf's failures with older kernels were exactly because of this problem). OTOH 3.2 should already have that commit so it may not be all. I'm just checking with Olaf whether he is really able to reproduce with 3.2 and can gather some debug data. I'll attach the debugging patch here so if there's somebody who can reproduce the issue, please run with the debugging patch and report. Thanks. Created attachment 73288 [details]
Debugging patch for assertion failure in journal_commit_transaction
The patch should work with pretty much any kernel version modulo some minor context changes...
Created attachment 80581 [details]
Patch fixing assertion failure in JBD
This patch was confirmed to fix the issue. I'll push it to Linus in the next merge window.
Closing the bug. |
Created attachment 56882 [details] /proc/config.gz Sometimes I get such messages in kernel: May 6 23:43:55 o01eg kernel: [ 495.167733] ------------[ cut here ]------------ May 6 23:43:55 o01eg kernel: [ 495.167738] kernel BUG at fs/jbd/commit.c:529! May 6 23:43:55 o01eg kernel: [ 495.167741] invalid opcode: 0000 [#1] PREEMPT SMP May 6 23:43:55 o01eg kernel: [ 495.167746] last sysfs file: /sys/devices/platform/it87.656/temp2_input May 6 23:43:55 o01eg kernel: [ 495.167749] CPU 2 May 6 23:43:55 o01eg kernel: [ 495.167750] Modules linked in: ppp_deflate zlib_deflate bsd_comp ppp_async option usb_wwan oss_hdaudio osscore vhba vboxnetadp vboxnetflt vboxdrv tun nvidiafb ppdev parport_pc fb_ddc parport vgastate May 6 23:43:55 o01eg kernel: [ 495.167770] May 6 23:43:55 o01eg kernel: [ 495.167774] Pid: 1909, comm: kjournald Not tainted 2.6.36-gentoo-r5 #1 P35T-DS3P/P35T-DS3P May 6 23:43:55 o01eg kernel: [ 495.167777] RIP: 0010:[<ffffffff811788a3>] [<ffffffff811788a3>] journal_commit_transaction+0x94d/0x1262 May 6 23:43:55 o01eg kernel: [ 495.167787] RSP: 0018:ffff88007c57fd30 EFLAGS: 00010202 May 6 23:43:55 o01eg kernel: [ 495.167790] RAX: 0000000000000023 RBX: ffff880071ac5bc0 RCX: ffff88007c57e010 May 6 23:43:55 o01eg kernel: [ 495.167794] RDX: 0000000000000100 RSI: ffff88007f46c468 RDI: ffff88007e6a6c24 May 6 23:43:55 o01eg kernel: [ 495.167797] RBP: ffff88007c57fe50 R08: 0000000000000000 R09: ffff880001b0d560 May 6 23:43:55 o01eg kernel: [ 495.167800] R10: 0000000000000002 R11: 0000000000000002 R12: ffff88007e6a6c00 May 6 23:43:55 o01eg kernel: [ 495.167803] R13: ffff88007e6a6d4c R14: ffff88007eac3000 R15: 0000000000000000 May 6 23:43:55 o01eg kernel: [ 495.167807] FS: 0000000000000000(0000) GS:ffff880001b00000(0000) knlGS:0000000000000000 May 6 23:43:55 o01eg kernel: [ 495.167811] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 6 23:43:55 o01eg kernel: [ 495.167814] CR2: 00000000015c9660 CR3: 000000006549b000 CR4: 00000000000006e0 May 6 23:43:55 o01eg kernel: [ 495.167817] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 6 23:43:55 o01eg kernel: [ 495.167820] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 6 23:43:55 o01eg kernel: [ 495.167824] Process kjournald (pid: 1909, threadinfo ffff88007c57e000, task ffff88007e7fc7b0) May 6 23:43:55 o01eg kernel: [ 495.167827] Stack: May 6 23:43:55 o01eg kernel: [ 495.167829] ffff880001b12540 ffff88007f8c47b0 ffff88007c57e010 000000734a4bcd0b May 6 23:43:55 o01eg kernel: [ 495.167833] <0> ffff88007c57e000 ffff88007c57e000 ffff88007eac3000 ffff88007e6a6d4c May 6 23:43:55 o01eg kernel: [ 495.167839] <0> ffff88007e6a6c00 0000000000000121 ffff88007e6a6c24 ffff88007c57e010 May 6 23:43:55 o01eg kernel: [ 495.167845] Call Trace: May 6 23:43:55 o01eg kernel: [ 495.167852] [<ffffffff81061551>] ? lock_timer_base+0x27/0x4d May 6 23:43:55 o01eg kernel: [ 495.167857] [<ffffffff81061820>] ? try_to_del_timer_sync+0x7b/0x85 May 6 23:43:55 o01eg kernel: [ 495.167862] [<ffffffff8117bb05>] kjournald+0xe6/0x22c May 6 23:43:55 o01eg kernel: [ 495.167867] [<ffffffff8106d995>] ? autoremove_wake_function+0x0/0x34 May 6 23:43:55 o01eg kernel: [ 495.167872] [<ffffffff8117ba1f>] ? kjournald+0x0/0x22c May 6 23:43:55 o01eg kernel: [ 495.167875] [<ffffffff8106d58e>] kthread+0x7d/0x85 May 6 23:43:55 o01eg kernel: [ 495.167881] [<ffffffff81026754>] kernel_thread_helper+0x4/0x10 May 6 23:43:55 o01eg kernel: [ 495.167885] [<ffffffff8106d511>] ? kthread+0x0/0x85 May 6 23:43:55 o01eg kernel: [ 495.167889] [<ffffffff81026750>] ? kernel_thread_helper+0x0/0x10 May 6 23:43:55 o01eg kernel: [ 495.167892] Code: 48 c7 85 58 ff ff ff 00 00 00 00 45 31 f6 c7 85 48 ff ff ff 00 00 00 00 48 c7 85 50 ff ff ff 00 00 00 00 4d 89 ef e9 8d 02 00 00 <0f> 0b eb fe 41 f6 04 24 02 74 24 49 8b 45 00 f0 80 60 02 ef 4c May 6 23:43:55 o01eg kernel: [ 495.167934] RIP [<ffffffff811788a3>] journal_commit_transaction+0x94d/0x1262 May 6 23:43:55 o01eg kernel: [ 495.167939] RSP <ffff88007c57fd30> May 6 23:43:55 o01eg kernel: [ 495.167943] ---[ end trace 51ef317e59fa350a ]--- After this any proccess freezes (state D) which has wrote to the one ext3 partition mounted with option "defaults,data=ordered". Other partition are not affected. Read operation is possible.