Bug 13613
Summary: | lockups with JFS (inconsistent lock state) | ||
---|---|---|---|
Product: | File System | Reporter: | Jan "Yenya" Kasprzak (kas) |
Component: | JFS | Assignee: | Dave Kleikamp (shaggy) |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | alan, dg, kernel, rjw, sega01 |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.0.38 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Fix potential deadlock in __read_cache_page()
dmesg after running "dbench -t 600 -D /mnt/test -x 70" with Dave's patch applied Full trace from ARM 3.0.38 kernel |
Description
Jan "Yenya" Kasprzak
2009-06-24 09:35:42 UTC
This was brought up in the jfs-discussion mailing list a while back, but it hasn't been resolved. I'm on vacation now, but will try to find some time to dig into the lockdep code to see if I can understand what "inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage" means. http://sourceforge.net/mailarchive/forum.php?thread_name=20090517174828.7777a54c.krzysztof.h1%40wp.pl&forum_name=jfs-discussion Apparently, not a regression from 2.6.29, so dropping from the list of recent regressions. Created attachment 23045 [details]
Fix potential deadlock in __read_cache_page()
The problem is that __read_cache_page() is calling add_to_page_cache_lru() with
a hard-code GFP_KERNEL. Passing mapping_gfp_mask(mapping) eliminates the warning (at least in my testing).
I could reproduce the warning pretty easily on an Ultra5 (loop mounted 4 GB disk image, very slow box in general, easy to generate I/O load). After applying the patch to 2.6.33-rc6 it does NOT print this "inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage" warning any more. However, if I increased the load even further, another, slightly different warning is printed. Steps to reproduce, without the patch: # ls -lgos /mnt/test.img 4198404 -rw-r--r-- 1 4294967296 2010-02-03 11:09 /mnt/test.img # losetup -f /mnt/test.img # mkfs.jfs /dev/loop0 # mount -t jfs /dev/loop0 /mnt/test # dbench -t 600 -D /mnt/test -x 50 [...2 minutes later...] [ 581.301947] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. Steps to reproduce, with the patch: # dbench -t 600 -D /mnt/test -x 50 -> no warning. # dbench -t 600 -D /mnt/test -x 70 [...a few seconds later...] [ INFO: possible irq lock inversion dependency detected ] 2.6.33-rc6 #1 --------------------------------------------------------- dbench/8886 just changed the state of lock: (&jfs_ip->commit_mutex){+.+.+.}, at: [<00000000100b4d58>] jfs_mkdir+0x98/0x380 [jfs] but this lock was taken by another, RECLAIM_FS-safe lock in the past: (&jfs_ip->rdwrlock#3){+.+.-.} I'll attach the full dmesg to this bug. Created attachment 24984 [details]
dmesg after running "dbench -t 600 -D /mnt/test -x 70" with Dave's patch applied
This still happens with 2.6.34-rc3 on x86-64, this time it was easily triggered by "tar -cf - /usr | tar -C /mnt/disk/tar -xf -" with /mnt/disk being a JFS partition: http://nerdbynature.de/bits/2.6.34-rc3/jfs/ I get this on 3.0.38 on ARM. Is this the same bug? ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.0.38-dg+ #44 --------------------------------------------------------- kswapd0/17 just changed the state of lock: (&jfs_ip->rdwrlock#2){++++-.}, at: [<c0181be8>] jfs_get_block+0x44/0x290 but this lock took another, RECLAIM_FS-unsafe lock in the past: (&jfs_ip->commit_mutex){+.+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&jfs_ip->commit_mutex); local_irq_disable(); lock(&jfs_ip->rdwrlock); lock(&jfs_ip->commit_mutex); <Interrupt> lock(&jfs_ip->rdwrlock); [...rest of trace attached...] Created attachment 76181 [details]
Full trace from ARM 3.0.38 kernel
This bug relates to a very old kernel. Closing as obsolete. Yep. I have moved ftp.linux.cz to XFS long time ago. |