I have a server with JFS over SW RAID-5 which serves as a public FTP archive (ftp.linux.cz). For about half a year now am experiencing random lockups after about 3 to 10 days of uptime. Occasionally the server panics and reboots, but during most of the lockups the server is totally dead and I have to reset it using master switch - even a sysrq/break over the serial console does not print anything. I have tried to run various stress tests (like a parallel kernel compile in a loop or memtest86), but it does not seem that the hardware is a problem here. Even during higher loads (like a recent Fedora release) the lockups are not more frequent. Yesterday I have compiled 2.6.30 and decided to enable various debugging options. Today the server reported the following error: ================================= [ INFO: inconsistent lock state ] 2.6.30 #1 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd1/250 [HC0[0]:SC0[0]:HE1:SE1] takes: (&jfs_ip->rdwrlock#2){++++?+}, at: [<ffffffff8038accf>] jfs_get_block+0x4b/0x236 {RECLAIM_FS-ON-W} state was registered at: [<ffffffff8025833f>] mark_held_locks+0x4d/0x6b [<ffffffff80258405>] lockdep_trace_alloc+0xa8/0xc3 [<ffffffff8029407d>] kmem_cache_alloc+0x22/0x1a4 [<ffffffff80420bea>] radix_tree_preload+0x38/0x86 [<ffffffff8026b5a7>] add_to_page_cache_locked+0x24/0xaf [<ffffffff8026b65e>] add_to_page_cache_lru+0x2c/0x5f [<ffffffff8026b7a5>] read_cache_page_async+0x72/0x13e [<ffffffff8026b87f>] read_cache_page+0xe/0x45 [<ffffffff8039f8a7>] __get_metapage+0xd3/0x42f [<ffffffff80397667>] dbAllocAG+0x139/0x33c [<ffffffff80397f0c>] dbAlloc+0x3b0/0x441 [<ffffffff8039e748>] extAlloc+0x1d2/0x47a [<ffffffff8038ae2e>] jfs_get_block+0x1aa/0x236 [<ffffffff802babf2>] nobh_write_begin+0x1d1/0x3d6 [<ffffffff8038a9d2>] jfs_write_begin+0x1e/0x20 [<ffffffff8026c013>] generic_file_buffered_write+0x12f/0x2e5 [<ffffffff8026c6c1>] __generic_file_aio_write_nolock+0x358/0x38c [<ffffffff8026cf6d>] generic_file_aio_write+0x69/0xc5 [<ffffffff80298f8a>] do_sync_write+0xe7/0x12d [<ffffffff8029991c>] vfs_write+0xae/0x137 [<ffffffff80299a69>] sys_write+0x47/0x6e [<ffffffff8020af6b>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 49274967 hardirqs last enabled at (49274967): [<ffffffff802941bb>] kmem_cache_alloc+0x160/0x1a4 hardirqs last disabled at (49274966): [<ffffffff802940bb>] kmem_cache_alloc+0x60/0x1a4 softirqs last enabled at (49274890): [<ffffffff8023c61a>] __do_softirq+0x116/0x124 softirqs last disabled at (49274867): [<ffffffff8020c0bc>] call_softirq+0x1c/0x28 other info that might help us debug this: no locks held by kswapd1/250. stack backtrace: Pid: 250, comm: kswapd1 Not tainted 2.6.30 #1 Call Trace: [<ffffffff80257d9d>] print_usage_bug+0x1b3/0x1c4 [<ffffffff80258999>] ? check_usage_forwards+0x0/0x9c [<ffffffff802580b1>] mark_lock+0x303/0x544 [<ffffffff80259a7e>] __lock_acquire+0x7fd/0x16a6 [<ffffffff8025a90e>] ? __lock_acquire+0x168d/0x16a6 [<ffffffff8025a9ac>] lock_acquire+0x85/0xa9 [<ffffffff8038accf>] ? jfs_get_block+0x4b/0x236 [<ffffffff8024e4ca>] down_write_nested+0x43/0x52 [<ffffffff8038accf>] ? jfs_get_block+0x4b/0x236 [<ffffffff802b8153>] ? create_empty_buffers+0x9a/0xa9 [<ffffffff8038accf>] jfs_get_block+0x4b/0x236 [<ffffffff805d1f36>] ? _spin_unlock+0x26/0x2a [<ffffffff802b80fc>] ? create_empty_buffers+0x43/0xa9 [<ffffffff802ba65e>] __block_write_full_page+0x101/0x2f1 [<ffffffff802b90ed>] ? end_buffer_async_write+0x0/0x156 [<ffffffff8038ac84>] ? jfs_get_block+0x0/0x236 [<ffffffff802ba91d>] block_write_full_page_endio+0xcf/0xd3 [<ffffffff802ba931>] block_write_full_page+0x10/0x12 [<ffffffff8038aa2a>] jfs_writepage+0x13/0x15 [<ffffffff80275e9d>] shrink_page_list+0x3b0/0x69a [<ffffffff802767c4>] shrink_list+0x2a1/0x5b6 [<ffffffff80258610>] ? trace_hardirqs_on+0xd/0xf [<ffffffff80276d61>] shrink_zone+0x288/0x332 [<ffffffff8027768c>] kswapd+0x47c/0x641 [<ffffffff80233f56>] ? finish_task_switch+0x3b/0xdc [<ffffffff80274f69>] ? isolate_pages_global+0x0/0x22d [<ffffffff8024b0d4>] ? autoremove_wake_function+0x0/0x38 [<ffffffff80258610>] ? trace_hardirqs_on+0xd/0xf [<ffffffff80277210>] ? kswapd+0x0/0x641 [<ffffffff80277210>] ? kswapd+0x0/0x641 [<ffffffff8024ad69>] kthread+0x56/0x83 [<ffffffff8020bfba>] child_rip+0xa/0x20 [<ffffffff8020b9bc>] ? restore_args+0x0/0x30 [<ffffffff8024ad13>] ? kthread+0x0/0x83 [<ffffffff8020bfb0>] ? child_rip+0x0/0x20 I don't know exactly whether this is the cause of the lockups, but nevertheless it seems like a bug somewhere in JFS. I usually try to run the latest kernel on that server, so the lockups are definitely there at least since 2.6.28. My HW configuration is: 2x Opteron 244 Tyan S2882 mainboard (AMD 8111 chipset) 12 GB RAM 3ware 9550SX Broadcom BCM5704 NIC The userland is Fedora 10 x86_64.
This was brought up in the jfs-discussion mailing list a while back, but it hasn't been resolved. I'm on vacation now, but will try to find some time to dig into the lockdep code to see if I can understand what "inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage" means. http://sourceforge.net/mailarchive/forum.php?thread_name=20090517174828.7777a54c.krzysztof.h1%40wp.pl&forum_name=jfs-discussion
Apparently, not a regression from 2.6.29, so dropping from the list of recent regressions.
Created attachment 23045 [details] Fix potential deadlock in __read_cache_page() The problem is that __read_cache_page() is calling add_to_page_cache_lru() with a hard-code GFP_KERNEL. Passing mapping_gfp_mask(mapping) eliminates the warning (at least in my testing).
I could reproduce the warning pretty easily on an Ultra5 (loop mounted 4 GB disk image, very slow box in general, easy to generate I/O load). After applying the patch to 2.6.33-rc6 it does NOT print this "inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage" warning any more. However, if I increased the load even further, another, slightly different warning is printed. Steps to reproduce, without the patch: # ls -lgos /mnt/test.img 4198404 -rw-r--r-- 1 4294967296 2010-02-03 11:09 /mnt/test.img # losetup -f /mnt/test.img # mkfs.jfs /dev/loop0 # mount -t jfs /dev/loop0 /mnt/test # dbench -t 600 -D /mnt/test -x 50 [...2 minutes later...] [ 581.301947] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. Steps to reproduce, with the patch: # dbench -t 600 -D /mnt/test -x 50 -> no warning. # dbench -t 600 -D /mnt/test -x 70 [...a few seconds later...] [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #1 --------------------------------------------------------- dbench/8886 just changed the state of lock: (&jfs_ip->commit_mutex){+.+.+.}, at: [<00000000100b4d58>] jfs_mkdir+0x98/0x380 [jfs] but this lock was taken by another, RECLAIM_FS-safe lock in the past: (&jfs_ip->rdwrlock#3){+.+.-.} I'll attach the full dmesg to this bug.
Created attachment 24984 [details] dmesg after running "dbench -t 600 -D /mnt/test -x 70" with Dave's patch applied
This still happens with 2.6.34-rc3 on x86-64, this time it was easily triggered by "tar -cf - /usr | tar -C /mnt/disk/tar -xf -" with /mnt/disk being a JFS partition: http://nerdbynature.de/bits/2.6.34-rc3/jfs/
I get this on 3.0.38 on ARM. Is this the same bug? ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.0.38-dg+ #44 --------------------------------------------------------- kswapd0/17 just changed the state of lock: (&jfs_ip->rdwrlock#2){++++-.}, at: [<c0181be8>] jfs_get_block+0x44/0x290 but this lock took another, RECLAIM_FS-unsafe lock in the past: (&jfs_ip->commit_mutex){+.+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&jfs_ip->commit_mutex); local_irq_disable(); lock(&jfs_ip->rdwrlock); lock(&jfs_ip->commit_mutex); <Interrupt> lock(&jfs_ip->rdwrlock); [...rest of trace attached...]
Created attachment 76181 [details] Full trace from ARM 3.0.38 kernel
This bug relates to a very old kernel. Closing as obsolete.
Yep. I have moved ftp.linux.cz to XFS long time ago.