Bug 10290 - [BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc
[BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc
Status: CLOSED INSUFFICIENT_DATA
Product: Platform Specific/Hardware
Classification: Unclassified
Component: PPC-64
All Linux
: P1 normal
Assigned To: Anton Blanchard
:
Depends on:
Blocks: 9832
  Show dependency treegraph
 
Reported: 2008-03-20 03:04 UTC by Rafael J. Wysocki
Modified: 2011-03-30 23:53 UTC (History)
7 users (show)

See Also:
Kernel Version: 2.6.25-rc6
Tree: Mainline
Regression: Yes


Attachments

Description Rafael J. Wysocki 2008-03-20 03:04:56 UTC
Subject    : [BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc
Submitter  : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date       : 2008-03-20 13:13
References : http://lkml.org/lkml/2008/3/20/39
Handled-By :
Patch      :

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Badari Pulavarty 2008-04-07 16:42:52 UTC
Kamalesh,

Is this still an issue with 2.6.25-rc8 ? Any one looked at this issue ?
Let me know.

Thanks,
Badari
Comment 2 Dave Kleikamp 2008-04-09 13:55:43 UTC
I was able to recreate this on 2.6.25-rc8. I saw soft lockups on jfs rather than a trap.  I'll try to debug it further tomorrow.
Comment 3 Kamalesh Babulal 2008-04-09 23:27:50 UTC
(In reply to comment #2)
> I was able to recreate this on 2.6.25-rc8. I saw soft lockups on jfs rather
> than a trap.  I'll try to debug it further tomorrow.
> 

Dave,

I tried reproducing the bug with 2.6.25-rc8-git7 and 2.6.25-rc8 kernels, but the kernel gets the following lock trace more than 1000 times

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000dea6b790] [c0000000008a99d0] net_sysctl_root+0xb1e0/0x25d78 (unreliable)
[c0000000dea6b960] [c0000000000111d0] .__switch_to+0x100/0x180
[c0000000dea6b9f0] [c0000000005b7fec] .schedule+0x26c/0x770
[c0000000dea6bb10] [c0000000005b9168] .__mutex_lock_slowpath+0xe8/0x1a0
[c0000000dea6bbe0] [c00000000010114c] .lookup_create+0x2c/0xd0
[c0000000dea6bc70] [c000000000105134] .sys_mkdirat+0xb4/0x140
[c0000000dea6bdb0] [c000000000013ff4] .compat_sys_mkdir+0x14/0x30
[c0000000dea6be30] [c0000000000086ac] syscall_exit+0x0/0x40
INFO: task fsstress:11118 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000dea7b960] [c0000000000111d0] .__switch_to+0x100/0x180
[c0000000dea7b9f0] [c0000000005b7fec] .schedule+0x26c/0x770
[c0000000dea7bb10] [c0000000005b9168] .__mutex_lock_slowpath+0xe8/0x1a0
[c0000000dea7bbe0] [c00000000010114c] .lookup_create+0x2c/0xd0
[c0000000dea7bc70] [c000000000105134] .sys_mkdirat+0xb4/0x140
[c0000000dea7bdb0] [c000000000013ff4] .compat_sys_mkdir+0x14/0x30
[c0000000dea7be30] [c0000000000086ac] syscall_exit+0x0/0x40
INFO: task fsstress:11119 blocked for more than 120 seconds.
Comment 4 Dave Kleikamp 2008-04-14 14:34:38 UTC
That's similar to what I've seen on jfs.  It doesn't look like like the same problem originally reported, but I can get it consistently.  I was able to reproduce it on 2.6.25-rc1, but not 2.6.24.
Comment 5 Dave Kleikamp 2008-04-18 12:30:09 UTC
I'm convinced that the "blocked for more than 120 seconds" warnings are not related to the originally reported problem, which I can't recreate.  I am able to get rid of the warnings and stack traces as suggested:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
Comment 6 Kamalesh Babulal 2008-05-02 01:26:41 UTC
Dave,

I checked with the 2.6.25 kernel, the bug is reproducible after 99763 warnings of "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

kernel BUG at fs/mpage.c:476!
cpu 0x0: Vector: 700 (Program Check) at [c0000000bf976bd0]
    pc: c00000000012faac: .__mpage_writepage+0xd0/0x618
    lr: c0000000000c7d90: .write_cache_pages+0x228/0x3e8
    sp: c0000000bf976e50
   msr: 8000000000029032
  current = 0xc0000000b1bd7f60
  paca    = 0xc000000000663b00
    pid   = 16625, comm = fsstress
kernel BUG at fs/mpage.c:476!
enter ? for help
[c0000000bf9773d0] c0000000000c7d90 .write_cache_pages+0x228/0x3e8
[c0000000bf977540] c0000000001300a8 .mpage_writepages+0x54/0x8c
[c0000000bf9775e0] c0000000001fedc8 .jfs_writepages+0x1c/0x34
[c0000000bf977660] c0000000000c7ff0 .do_writepages+0x68/0xa4
[c0000000bf9776e0] c0000000000bff6c .__filemap_fdatawrite_range+0x88/0xb8
[c0000000bf9777d0] c0000000000c0248 .filemap_write_and_wait+0x2c/0x68
[c0000000bf977860] c0000000000c0bf4 .generic_file_buffered_write+0x65c/0x6c8
[c0000000bf9779a0] c0000000000c0f60 .__generic_file_aio_write_nolock+0x300/0x3ec
[c0000000bf977aa0] c0000000000c10cc .generic_file_aio_write+0x80/0x114
[c0000000bf977b60] c0000000000f8204 .do_sync_write+0xc4/0x124
[c0000000bf977cf0] c0000000000f8a38 .vfs_write+0xd8/0x1a4
[c0000000bf977d90] c0000000000f93c4 .sys_write+0x4c/0x8c
[c0000000bf977e30] c000000000008734 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff0d8c8
SP (ffb3c720) is in userspace
Comment 7 Alan 2010-01-19 22:09:02 UTC
Is this still present in more recent kernels ?
Comment 8 Florian Mickler 2010-08-17 18:18:30 UTC
I'm closing this now. 

Please reopen if it is still reproducible in recent kernels!

Note You need to log in before you can comment on or make changes to this bug.