Subject : [BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Date : 2008-03-20 13:13 References : http://lkml.org/lkml/2008/3/20/39 Handled-By : Patch : This entry is being used for tracking a regression from 2.6.24. Please don't close it until the problem is fixed in the mainline.
Kamalesh, Is this still an issue with 2.6.25-rc8 ? Any one looked at this issue ? Let me know. Thanks, Badari
I was able to recreate this on 2.6.25-rc8. I saw soft lockups on jfs rather than a trap. I'll try to debug it further tomorrow.
(In reply to comment #2) > I was able to recreate this on 2.6.25-rc8. I saw soft lockups on jfs rather > than a trap. I'll try to debug it further tomorrow. > Dave, I tried reproducing the bug with 2.6.25-rc8-git7 and 2.6.25-rc8 kernels, but the kernel gets the following lock trace more than 1000 times "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call Trace: [c0000000dea6b790] [c0000000008a99d0] net_sysctl_root+0xb1e0/0x25d78 (unreliable) [c0000000dea6b960] [c0000000000111d0] .__switch_to+0x100/0x180 [c0000000dea6b9f0] [c0000000005b7fec] .schedule+0x26c/0x770 [c0000000dea6bb10] [c0000000005b9168] .__mutex_lock_slowpath+0xe8/0x1a0 [c0000000dea6bbe0] [c00000000010114c] .lookup_create+0x2c/0xd0 [c0000000dea6bc70] [c000000000105134] .sys_mkdirat+0xb4/0x140 [c0000000dea6bdb0] [c000000000013ff4] .compat_sys_mkdir+0x14/0x30 [c0000000dea6be30] [c0000000000086ac] syscall_exit+0x0/0x40 INFO: task fsstress:11118 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call Trace: [c0000000dea7b960] [c0000000000111d0] .__switch_to+0x100/0x180 [c0000000dea7b9f0] [c0000000005b7fec] .schedule+0x26c/0x770 [c0000000dea7bb10] [c0000000005b9168] .__mutex_lock_slowpath+0xe8/0x1a0 [c0000000dea7bbe0] [c00000000010114c] .lookup_create+0x2c/0xd0 [c0000000dea7bc70] [c000000000105134] .sys_mkdirat+0xb4/0x140 [c0000000dea7bdb0] [c000000000013ff4] .compat_sys_mkdir+0x14/0x30 [c0000000dea7be30] [c0000000000086ac] syscall_exit+0x0/0x40 INFO: task fsstress:11119 blocked for more than 120 seconds.
That's similar to what I've seen on jfs. It doesn't look like like the same problem originally reported, but I can get it consistently. I was able to reproduce it on 2.6.25-rc1, but not 2.6.24.
I'm convinced that the "blocked for more than 120 seconds" warnings are not related to the originally reported problem, which I can't recreate. I am able to get rid of the warnings and stack traces as suggested: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
Dave, I checked with the 2.6.25 kernel, the bug is reproducible after 99763 warnings of "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel BUG at fs/mpage.c:476! cpu 0x0: Vector: 700 (Program Check) at [c0000000bf976bd0] pc: c00000000012faac: .__mpage_writepage+0xd0/0x618 lr: c0000000000c7d90: .write_cache_pages+0x228/0x3e8 sp: c0000000bf976e50 msr: 8000000000029032 current = 0xc0000000b1bd7f60 paca = 0xc000000000663b00 pid = 16625, comm = fsstress kernel BUG at fs/mpage.c:476! enter ? for help [c0000000bf9773d0] c0000000000c7d90 .write_cache_pages+0x228/0x3e8 [c0000000bf977540] c0000000001300a8 .mpage_writepages+0x54/0x8c [c0000000bf9775e0] c0000000001fedc8 .jfs_writepages+0x1c/0x34 [c0000000bf977660] c0000000000c7ff0 .do_writepages+0x68/0xa4 [c0000000bf9776e0] c0000000000bff6c .__filemap_fdatawrite_range+0x88/0xb8 [c0000000bf9777d0] c0000000000c0248 .filemap_write_and_wait+0x2c/0x68 [c0000000bf977860] c0000000000c0bf4 .generic_file_buffered_write+0x65c/0x6c8 [c0000000bf9779a0] c0000000000c0f60 .__generic_file_aio_write_nolock+0x300/0x3ec [c0000000bf977aa0] c0000000000c10cc .generic_file_aio_write+0x80/0x114 [c0000000bf977b60] c0000000000f8204 .do_sync_write+0xc4/0x124 [c0000000bf977cf0] c0000000000f8a38 .vfs_write+0xd8/0x1a4 [c0000000bf977d90] c0000000000f93c4 .sys_write+0x4c/0x8c [c0000000bf977e30] c000000000008734 syscall_exit+0x0/0x40 --- Exception: c01 (System Call) at 000000000ff0d8c8 SP (ffb3c720) is in userspace
Is this still present in more recent kernels ?
I'm closing this now. Please reopen if it is still reproducible in recent kernels!