Bug 10290

Summary: [BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc
Product: Platform Specific/Hardware Reporter: Rafael J. Wysocki (rjw)
Component: PPC-64Assignee: Anton Blanchard (anton)
Status: CLOSED INSUFFICIENT_DATA    
Severity: normal CC: alan, florian, jmoyer, kamalesh, pbadari, shaggy, yugzhang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc6 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    

Description Rafael J. Wysocki 2008-03-20 03:04:56 UTC
Subject    : [BUG] Linux 2.6.25-rc6 - kernel BUG at fs/mpage.c:476! on powerpc
Submitter  : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Date       : 2008-03-20 13:13
References : http://lkml.org/lkml/2008/3/20/39
Handled-By :
Patch      :

This entry is being used for tracking a regression from 2.6.24.  Please don't
close it until the problem is fixed in the mainline.
Comment 1 Badari Pulavarty 2008-04-07 16:42:52 UTC
Kamalesh,

Is this still an issue with 2.6.25-rc8 ? Any one looked at this issue ?
Let me know.

Thanks,
Badari
Comment 2 Dave Kleikamp 2008-04-09 13:55:43 UTC
I was able to recreate this on 2.6.25-rc8. I saw soft lockups on jfs rather than a trap.  I'll try to debug it further tomorrow.
Comment 3 Kamalesh Babulal 2008-04-09 23:27:50 UTC
(In reply to comment #2)
> I was able to recreate this on 2.6.25-rc8. I saw soft lockups on jfs rather
> than a trap.  I'll try to debug it further tomorrow.
> 

Dave,

I tried reproducing the bug with 2.6.25-rc8-git7 and 2.6.25-rc8 kernels, but the kernel gets the following lock trace more than 1000 times

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000dea6b790] [c0000000008a99d0] net_sysctl_root+0xb1e0/0x25d78 (unreliable)
[c0000000dea6b960] [c0000000000111d0] .__switch_to+0x100/0x180
[c0000000dea6b9f0] [c0000000005b7fec] .schedule+0x26c/0x770
[c0000000dea6bb10] [c0000000005b9168] .__mutex_lock_slowpath+0xe8/0x1a0
[c0000000dea6bbe0] [c00000000010114c] .lookup_create+0x2c/0xd0
[c0000000dea6bc70] [c000000000105134] .sys_mkdirat+0xb4/0x140
[c0000000dea6bdb0] [c000000000013ff4] .compat_sys_mkdir+0x14/0x30
[c0000000dea6be30] [c0000000000086ac] syscall_exit+0x0/0x40
INFO: task fsstress:11118 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call Trace:
[c0000000dea7b960] [c0000000000111d0] .__switch_to+0x100/0x180
[c0000000dea7b9f0] [c0000000005b7fec] .schedule+0x26c/0x770
[c0000000dea7bb10] [c0000000005b9168] .__mutex_lock_slowpath+0xe8/0x1a0
[c0000000dea7bbe0] [c00000000010114c] .lookup_create+0x2c/0xd0
[c0000000dea7bc70] [c000000000105134] .sys_mkdirat+0xb4/0x140
[c0000000dea7bdb0] [c000000000013ff4] .compat_sys_mkdir+0x14/0x30
[c0000000dea7be30] [c0000000000086ac] syscall_exit+0x0/0x40
INFO: task fsstress:11119 blocked for more than 120 seconds.
Comment 4 Dave Kleikamp 2008-04-14 14:34:38 UTC
That's similar to what I've seen on jfs.  It doesn't look like like the same problem originally reported, but I can get it consistently.  I was able to reproduce it on 2.6.25-rc1, but not 2.6.24.
Comment 5 Dave Kleikamp 2008-04-18 12:30:09 UTC
I'm convinced that the "blocked for more than 120 seconds" warnings are not related to the originally reported problem, which I can't recreate.  I am able to get rid of the warnings and stack traces as suggested:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
Comment 6 Kamalesh Babulal 2008-05-02 01:26:41 UTC
Dave,

I checked with the 2.6.25 kernel, the bug is reproducible after 99763 warnings of "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

kernel BUG at fs/mpage.c:476!
cpu 0x0: Vector: 700 (Program Check) at [c0000000bf976bd0]
    pc: c00000000012faac: .__mpage_writepage+0xd0/0x618
    lr: c0000000000c7d90: .write_cache_pages+0x228/0x3e8
    sp: c0000000bf976e50
   msr: 8000000000029032
  current = 0xc0000000b1bd7f60
  paca    = 0xc000000000663b00
    pid   = 16625, comm = fsstress
kernel BUG at fs/mpage.c:476!
enter ? for help
[c0000000bf9773d0] c0000000000c7d90 .write_cache_pages+0x228/0x3e8
[c0000000bf977540] c0000000001300a8 .mpage_writepages+0x54/0x8c
[c0000000bf9775e0] c0000000001fedc8 .jfs_writepages+0x1c/0x34
[c0000000bf977660] c0000000000c7ff0 .do_writepages+0x68/0xa4
[c0000000bf9776e0] c0000000000bff6c .__filemap_fdatawrite_range+0x88/0xb8
[c0000000bf9777d0] c0000000000c0248 .filemap_write_and_wait+0x2c/0x68
[c0000000bf977860] c0000000000c0bf4 .generic_file_buffered_write+0x65c/0x6c8
[c0000000bf9779a0] c0000000000c0f60 .__generic_file_aio_write_nolock+0x300/0x3ec
[c0000000bf977aa0] c0000000000c10cc .generic_file_aio_write+0x80/0x114
[c0000000bf977b60] c0000000000f8204 .do_sync_write+0xc4/0x124
[c0000000bf977cf0] c0000000000f8a38 .vfs_write+0xd8/0x1a4
[c0000000bf977d90] c0000000000f93c4 .sys_write+0x4c/0x8c
[c0000000bf977e30] c000000000008734 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff0d8c8
SP (ffb3c720) is in userspace
Comment 7 Alan 2010-01-19 22:09:02 UTC
Is this still present in more recent kernels ?
Comment 8 Florian Mickler 2010-08-17 18:18:30 UTC
I'm closing this now. 

Please reopen if it is still reproducible in recent kernels!