Latest working kernel version: (I think at least 2.6.25.4 works) Earliest failing kernel version: Distribution: Minimal Debian sid (unstable) Hardware Environment: qemu x86 Software Environment: Problem Description: Mere rm -rf after mounting on an intentionally corrupted partition occasionally causes "BUG: unable to handle kernel paging request" in ext2_free_blocks. Unfortunately the issue seems to be timing sensitive (or something), doing it on the same filesystem only sometimes results in the crash :( But I have reproduced it something like 6 times now with brief testing. If you wish, I can attach some filesystems with which I have been able to reproduce this at least once. Another thing I could do is take a look at it with the new kernel debugger (which I haven't tried yet) if none of you are able to figure out this from the traces. Is there something you would specifically want me to take a look at? The local and referenced variables at ext2_free_blocks(), I guess? I think I ran quite extensive tests on 2.6.25.4 & ext2, so I suspect (but am not sure, I've made some changes to the way I test) this bug is newer than 2.6.25.4. I could do some bisecting too, but I haven't managed to automate the thing yet. Here's a script I run under qemu, google for zzuf (it's a fuzzer), and timeout is from the Debian package `timeout': ---------- #!/bin/sh if [ "`hostname`" != "fstest" ]; then echo "This is a dangerous script." echo "Set your hostname to \`fstest\' if you want to use it." exit 1 fi umount /dev/hdb umount /dev/hdc /etc/init.d/sysklogd stop /etc/init.d/klogd stop /etc/init.d/cron stop mount /dev/hda / -t ext3 -o remount,ro || exit 1 ulimit -t 20 for ((s=$1; s<1000000000; s++)); do umount /mnt echo '***** zzuffing *****' seed $s zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit mount /dev/hdb /mnt -o errors=continue || continue cd /mnt || continue cp -r doc doc2 >&/dev/null find -xdev >&/dev/null find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null mkdir tmp >&/dev/null echo whoah >tmp/filu 2>/dev/null rm -rf /mnt/* >&/dev/null cd / done ---------- The attached backtraces all start from the time of mounting the filesystem.
Backtrace, filesystem hdb.849: [ 3796.272178] attempt to access beyond end of device [ 3796.272839] hdb: rw=0, want=134237336, limit=20480 [ 3796.273106] Buffer I/O error on device hdb, logical block 67118667 [ 3796.276009] attempt to access beyond end of device [ 3796.277156] hdb: rw=0, want=134237336, limit=20480 [ 3796.278263] Buffer I/O error on device hdb, logical block 67118667 [ 3796.391961] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #1281: directory entry across blocks - offset=1536, inode=50659344, rec_len=29812, name_len=121 [ 3796.395802] EXT2-fs error (device hdb): ext2_readdir: bad page in #1281 [ 3796.458033] EXT2-fs error (device hdb): ext2_readdir: bad page in #1281 [ 3796.596656] EXT2-fs error (device hdb): ext2_readdir: bad page in #1281 [ 3796.601485] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 256 [ 3796.608617] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 67118667, count = 1 [ 3796.611922] BUG: unable to handle kernel paging request at c7e95ffc [ 3796.613327] IP: [<c031f151>] read_block_bitmap+0xa7/0x14b [ 3796.614573] *pde = 00007067 *pte = 07e95160 [ 3796.615585] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 3796.616533] [ 3796.616905] Pid: 5933, comm: rm Not tainted (2.6.27-rc2 #4) [ 3796.618081] EIP: 0060:[<c031f151>] EFLAGS: 00000246 CPU: 0 [ 3796.619315] EIP is at read_block_bitmap+0xa7/0x14b [ 3796.619417] EAX: ffffffff EBX: c7e96800 ECX: c7e96000 EDX: 00000001 [ 3796.619417] ESI: c2f40428 EDI: c7aba400 EBP: c7bd5d7c ESP: c7bd5d4c [ 3796.619417] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 3796.619417] Process rm (pid: 5933, ti=c7bd5000 task=c58aa5e0 task.ti=c7bd5000) [ 3796.619417] Stack: c620d024 c620d028 c620d024 c620d098 00000002 c78f8500 00000000 00000000 [ 3796.619417] c7e96000 c7e96400 014f1c6d 00000003 c7bd5de0 c031f2ef c061da3c c7bd5da0 [ 3796.619417] c024982b c10ff2d8 c10ff2c0 c7bd5dd8 c0948780 00000029 00000000 0000002a [ 3796.619417] Call Trace: [ 3796.619417] [<c031f2ef>] ? ext2_free_blocks+0xfa/0x2b7 [ 3796.619417] [<c024982b>] ? activate_page+0x9a/0xa2 [ 3796.619417] [<c0322c5f>] ? ext2_free_branches+0x12d/0x190 [ 3796.619417] [<c02820ef>] ? __getblk+0x27/0x294 [ 3796.619417] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 3796.619417] [<c032320c>] ? ext2_truncate+0x54a/0x5a7 [ 3796.619417] [<c0322431>] ? ext2_update_inode+0x1a0/0x31f [ 3796.619417] [<c0543169>] ? _spin_unlock+0x1d/0x20 [ 3796.619417] [<c0323ae8>] ? ext2_delete_inode+0x7e/0x86 [ 3796.619417] [<c0323a6a>] ? ext2_delete_inode+0x0/0x86 [ 3796.619417] [<c027558b>] ? generic_delete_inode+0x62/0xd5 [ 3796.619417] [<c0275711>] ? generic_drop_inode+0x113/0x162 [ 3796.619417] [<c027469c>] ? iput+0x47/0x4e [ 3796.619417] [<c026bb14>] ? do_unlinkat+0xc3/0x13d [ 3796.619417] [<c054225f>] ? mutex_unlock+0x8/0xa [ 3796.619417] [<c026dc08>] ? vfs_readdir+0x60/0x85 [ 3796.619417] [<c026d94c>] ? filldir64+0x0/0xd7 [ 3796.619417] [<c026dcc4>] ? sys_getdents64+0x97/0xa1 [ 3796.619417] [<c026bc86>] ? sys_unlinkat+0x23/0x36 [ 3796.619417] [<c0202dfe>] ? syscall_call+0x7/0xb [ 3796.619417] ======================= [ 3796.619417] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 [ 3796.619417] EIP: [<c031f151>] read_block_bitmap+0xa7/0x14b SS:ESP 0068:c7bd5d4c [ 3796.619417] ---[ end trace 827753bbb10816b5 ]---
Backtrace, filesystem hdb.10000057, with some of the messages removed to fit in a 64k comment: ***** zzuffing ***** seed 10000057 ./runtest: line 31: 10409 Killed timeout 30 cp -r doc doc2 >&/dev/null [ 289.403150] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #11: inode out of bounds - offset=10240, inode=8388608, rec_len=1024, name_len=0 [ 289.404505] EXT2-fs error (device hdb): ext2_readdir: bad page in #11 [ 289.407140] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #1281: unaligned directory entry - offset=176, inode=33751052, rec_len=25966, name_len=116 ... ... ... ... ... ... ... [ 291.368239] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 6816452, count = 1 [ 291.370703] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1213178368, count = 1 [ 291.373268] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 7078488, count = 1 [ 291.375763] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1420214750, count = 1 [ 291.378346] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 6554412, count = 1 [ 291.380783] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1479317508, count = 1 [ 291.383262] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4718764, count = 1 [ 291.385706] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 1024 [ 291.387722] attempt to access beyond end of device [ 291.388782] hdb: rw=0, want=2425549960, limit=20480 [ 291.390063] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=1519, block=1212774979 [ 291.392175] attempt to access beyond end of device [ 291.393250] hdb: rw=0, want=2425704338, limit=20480 [ 291.394536] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=1519, block=1212852168 [ 291.396873] attempt to access beyond end of device [ 291.398062] hdb: rw=0, want=2425051470, limit=20480 [ 291.399468] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=1519, block=1212525734 [ 291.402177] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 16 [ 291.404377] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 131072, count = 1 [ 291.407194] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 1 [ 291.409154] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 2048 [ 291.411169] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 16384, count = 1 [ 291.413593] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 4096 [ 291.415624] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 33816576, count = 1 [ 291.418238] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks in system zones - Block = 128, count = 1 [ 291.420690] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 67108864, count = 1 [ 291.423224] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 1024 [ 291.425259] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 8208 [ 291.427317] attempt to access beyond end of device [ 291.428366] hdb: rw=0, want=264146, limit=20480 [ 291.429548] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=1519, block=132072 [ 291.431780] BUG: unable to handle kernel paging request at c11e3ffc [ 291.433237] IP: [<c031f151>] read_block_bitmap+0xa7/0x14b [ 291.434486] *pde = 078b8163 *pte = 011e3160 [ 291.435512] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 291.435763] [ 291.435763] Pid: 10422, comm: rm Not tainted (2.6.27-rc2 #4) [ 291.435763] EIP: 0060:[<c031f151>] EFLAGS: 00000246 CPU: 0 [ 291.435763] EIP is at read_block_bitmap+0xa7/0x14b [ 291.435763] EAX: ffffffff EBX: c11e4800 ECX: c11e4000 EDX: 00000001 [ 291.435763] ESI: c74e35e8 EDI: c7ac8400 EBP: c7b38cfc ESP: c7b38ccc [ 291.435763] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 291.435763] Process rm (pid: 10422, ti=c7b38000 task=c78acbc0 task.ti=c7b38000) [ 291.435763] Stack: c740aa64 c740aa68 c740aa64 c740aad8 00000002 c7a05500 00000000 00000000 [ 291.435763] c11e4000 c11e4400 01b01e00 00000003 c7b38d60 c031f2ef c7b38d74 00000296 [ 291.435763] c7b38d90 c0878546 c7b38d84 c740a954 c1025c80 00000029 00000000 0000002a [ 291.435763] Call Trace: [ 291.435763] [<c031f2ef>] ? ext2_free_blocks+0xfa/0x2b7 [ 291.435763] [<c0322c5f>] ? ext2_free_branches+0x12d/0x190 [ 291.435763] [<c02820ef>] ? __getblk+0x27/0x294 [ 291.435763] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 291.435763] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 291.435763] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 291.435763] [<c03231b4>] ? ext2_truncate+0x4f2/0x5a7 [ 291.435763] [<c02441f2>] ? mempool_free+0x6b/0x73 [ 291.435763] [<c0281ad0>] ? submit_bh+0xd2/0xf1 [ 291.435763] [<c0282f82>] ? sync_dirty_buffer+0x60/0xd5 [ 291.435763] [<c0322431>] ? ext2_update_inode+0x1a0/0x31f [ 291.435763] [<c0543169>] ? _spin_unlock+0x1d/0x20 [ 291.435763] [<c0323ae8>] ? ext2_delete_inode+0x7e/0x86 [ 291.435763] [<c0323a6a>] ? ext2_delete_inode+0x0/0x86 [ 291.435763] [<c027558b>] ? generic_delete_inode+0x62/0xd5 [ 291.435763] [<c0275711>] ? generic_drop_inode+0x113/0x162 [ 291.435763] [<c027469c>] ? iput+0x47/0x4e [ 291.435763] [<c026bb14>] ? do_unlinkat+0xc3/0x13d [ 291.435763] [<c054225f>] ? mutex_unlock+0x8/0xa [ 291.435763] [<c026dc08>] ? vfs_readdir+0x60/0x85 [ 291.435763] [<c026d94c>] ? filldir64+0x0/0xd7 [ 291.435763] [<c026dcc4>] ? sys_getdents64+0x97/0xa1 [ 291.435763] [<c026bc86>] ? sys_unlinkat+0x23/0x36 [ 291.435763] [<c0202dfe>] ? syscall_call+0x7/0xb [ 291.435763] ======================= [ 291.435763] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 [ 291.435763] EIP: [<c031f151>] read_block_bitmap+0xa7/0x14b SS:ESP 0068:c7b38ccc [ 291.435763] ---[ end trace 23466f100f17dfb7 ]---
Backtrace for filesystem hdb.20000113.txt: ***** zzuffing ***** seed 20000113 [ 481.954567] attempt to access beyond end of device [ 481.955149] hdb: rw=0, want=4213772, limit=20480 [ 481.955453] attempt to access beyond end of device [ 481.955686] hdb: rw=0, want=4213772, limit=20480 [ 482.008464] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #1573: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0 [ 482.009420] EXT2-fs error (device hdb): ext2_readdir: bad page in #1573 [ 482.016734] attempt to access beyond end of device [ 482.016734] hdb: rw=0, want=8408544, limit=20480 [ 482.017081] Buffer I/O error on device hdb, logical block 4204271 [ 482.018544] attempt to access beyond end of device [ 482.018808] hdb: rw=0, want=8408544, limit=20480 [ 482.019047] Buffer I/O error on device hdb, logical block 4204271 [ 482.035546] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #1601: directory entry across blocks - offset=96, inode=1605, rec_len=32792, name_len=16 [ 482.036456] EXT2-fs error (device hdb): ext2_readdir: bad page in #1601 [ 482.085122] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #1281: rec_len is smaller than minimal - offset=1116, inode=1658, rec_len=4, name_len=3 [ 482.086043] EXT2-fs error (device hdb): ext2_readdir: bad page in #1281 [ 482.091082] EXT2-fs error (device hdb): ext2_readdir: bad page in #1573 [ 482.093049] EXT2-fs error (device hdb): ext2_readdir: bad page in #1601 [ 482.128294] EXT2-fs error (device hdb): ext2_readdir: bad page in #1281 [ 482.132962] EXT2-fs error (device hdb): ext2_readdir: bad page in #1573 [ 482.134867] EXT2-fs error (device hdb): ext2_readdir: bad page in #1601 [ 482.263682] EXT2-fs error (device hdb): ext2_readdir: bad page in #1281 [ 482.268684] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 2106885, count = 1 [ 482.271479] EXT2-fs error (device hdb): ext2_xattr_delete_inode: inode 1527: bad block -2147483648 [ 482.274609] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 4096 [ 482.275275] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 524288, count = 1 [ 482.276551] attempt to access beyond end of device [ 482.276994] hdb: rw=0, want=131074, limit=20480 [ 482.277508] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=1540, block=65536 [ 482.278579] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 131072, count = 1 [ 482.279277] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1073741824, count = 1 [ 482.279998] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.280691] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.281382] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.282068] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.282751] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.283433] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.284101] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 134217727, count = 1 [ 482.284769] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294959103, count = 1 [ 482.285465] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.286147] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.286977] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.287682] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.288343] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 [ 482.289033] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 2449473535, count = 1 [ 482.289731] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 3565830418, count = 1 [ 482.290419] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4294965384, count = 1 [ 482.291290] BUG: unable to handle kernel paging request at c4b97ffc [ 482.292742] IP: [<c031f151>] read_block_bitmap+0xa7/0x14b [ 482.294030] *pde = 06d6a163 *pte = 04b97160 [ 482.295075] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 482.296058] [ 482.296458] Pid: 21153, comm: rm Not tainted (2.6.27-rc2 #4) [ 482.296734] EIP: 0060:[<c031f151>] EFLAGS: 00000246 CPU: 0 [ 482.296734] EIP is at read_block_bitmap+0xa7/0x14b [ 482.296734] EAX: ffffffff EBX: c4b98800 ECX: c4b98000 EDX: 00000001 [ 482.296734] ESI: c2c24700 EDI: c7ac7800 EBP: c1ac0d3c ESP: c1ac0d0c [ 482.296734] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 482.296734] Process rm (pid: 21153, ti=c1ac0000 task=c7890000 task.ti=c1ac0000) [ 482.296734] Stack: c1ac0d24 c0543211 c2c23dc8 c1ac0d24 c0543169 c79e7500 00000000 00000000 [ 482.296734] c4b98000 c4b98400 ffffffff 00000001 c1ac0da0 c031f2ef c7ac7800 c05cc70f [ 482.296734] c05d8d94 fffff888 00000001 c740aad8 00000002 000003fe 00000000 000003ff [ 482.296734] Call Trace: [ 482.296734] [<c0543211>] ? _spin_lock+0x32/0x38 [ 482.296734] [<c0543169>] ? _spin_unlock+0x1d/0x20 [ 482.296734] [<c031f2ef>] ? ext2_free_blocks+0xfa/0x2b7 [ 482.296734] [<c0543122>] ? _spin_unlock_irq+0x1d/0x21 [ 482.296734] [<c024982b>] ? activate_page+0x9a/0xa2 [ 482.296734] [<c0322c5f>] ? ext2_free_branches+0x12d/0x190 [ 482.296734] [<c02820ef>] ? __getblk+0x27/0x294 [ 482.296734] [<c0281ffa>] ? __find_get_block+0x6d/0x13b [ 482.296734] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 482.296734] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 482.296734] [<c03231e0>] ? ext2_truncate+0x51e/0x5a7 [ 482.296734] [<c0322431>] ? ext2_update_inode+0x1a0/0x31f [ 482.296734] [<c0543169>] ? _spin_unlock+0x1d/0x20 [ 482.296734] [<c0323ae8>] ? ext2_delete_inode+0x7e/0x86 [ 482.296734] [<c0323a6a>] ? ext2_delete_inode+0x0/0x86 [ 482.296734] [<c027558b>] ? generic_delete_inode+0x62/0xd5 [ 482.296734] [<c0275711>] ? generic_drop_inode+0x113/0x162 [ 482.296734] [<c027469c>] ? iput+0x47/0x4e [ 482.296734] [<c026bb14>] ? do_unlinkat+0xc3/0x13d [ 482.296734] [<c054225f>] ? mutex_unlock+0x8/0xa [ 482.296734] [<c026dc08>] ? vfs_readdir+0x60/0x85 [ 482.296734] [<c026d94c>] ? filldir64+0x0/0xd7 [ 482.296734] [<c026dcc4>] ? sys_getdents64+0x97/0xa1 [ 482.296734] [<c026bc86>] ? sys_unlinkat+0x23/0x36 [ 482.296734] [<c0202dfe>] ? syscall_call+0x7/0xb [ 482.296734] ======================= [ 482.296734] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 [ 482.296734] EIP: [<c031f151>] read_block_bitmap+0xa7/0x14b SS:ESP 0068:c1ac0d0c [ 482.296734] ---[ end trace b455507981ff20f4 ]---
Backtrace for filesystem hdb.30000019, with some error messages removed: ***** zzuffing ***** seed 30000019 [ 107.103882] attempt to access beyond end of device [ 107.105397] hdb: rw=0, want=536890588, limit=20480 [ 107.106539] Buffer I/O error on device hdb, logical block 268445293 [ 107.109260] attempt to access beyond end of device [ 107.110498] hdb: rw=0, want=536890588, limit=20480 [ 107.111538] Buffer I/O error on device hdb, logical block 268445293 [ 107.185925] EXT2-fs error (device hdb): ext2_check_page: bad entry in directory #1609: directory entry across blocks - offset=44, inode=1611, rec_len=1044, name_len=9 [ 107.189303] EXT2-fs error (device hdb): ext2_readdir: bad page in #1609 [ 107.505070] EXT2-fs error (device hdb): ext2_readdir: bad page in #1609 [ 107.699813] EXT2-fs error (device hdb): ext2_readdir: bad page in #1609 [ 108.273863] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 4096 [ 108.293188] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 33619980, count = 1 [ 108.296045] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks in system zones - Block = 46, count = 1 ... ... ... ... ... ... [ 109.129738] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 485613776, count = 1 [ 109.130288] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 7602660, count = 1 [ 109.130834] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 930139778, count = 1 [ 109.131388] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 8126708, count = 1 [ 109.131936] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1042418564, count = 1 [ 109.132494] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 6816452, count = 1 [ 109.133039] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1213178368, count = 1 [ 109.133593] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 7078488, count = 1 [ 109.134137] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1420214750, count = 1 [ 109.134693] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 6554412, count = 1 [ 109.135237] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 1479317508, count = 1 [ 109.135791] EXT2-fs error (device hdb): ext2_free_blocks: Freeing blocks not in datazone - block = 4718764, count = 1 [ 109.136341] EXT2-fs error (device hdb): ext2_free_blocks: bit already cleared for block 1024 [ 109.136743] attempt to access beyond end of device [ 109.136935] hdb: rw=0, want=2425549960, limit=20480 [ 109.137288] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=126, block=1212774979 [ 109.137744] attempt to access beyond end of device [ 109.137935] hdb: rw=0, want=2425704338, limit=20480 [ 109.138282] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=126, block=1212852168 [ 109.138691] attempt to access beyond end of device [ 109.138884] hdb: rw=0, want=2425051470, limit=20480 [ 109.139235] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=126, block=1212525734 [ 109.139712] attempt to access beyond end of device [ 109.139901] hdb: rw=0, want=264146, limit=20480 [ 109.140235] EXT2-fs error (device hdb): ext2_free_branches: Read failure, inode=126, block=132072 [ 109.140845] BUG: unable to handle kernel paging request at c71a3ffc [ 109.141143] IP: [<c031f151>] read_block_bitmap+0xa7/0x14b [ 109.141433] *pde = 078f3163 *pte = 071a3160 [ 109.141685] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 109.141899] [ 109.142009] Pid: 3972, comm: rm Not tainted (2.6.27-rc2 #4) [ 109.142217] EIP: 0060:[<c031f151>] EFLAGS: 00000246 CPU: 0 [ 109.142458] EIP is at read_block_bitmap+0xa7/0x14b [ 109.142643] EAX: ffffffff EBX: c71a4800 ECX: c71a4000 EDX: 00000001 [ 109.142875] ESI: c7595a10 EDI: c7ac3400 EBP: c6bd9cc8 ESP: c6bd9c98 [ 109.143105] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 109.143312] Process rm (pid: 3972, ti=c6bd9000 task=c78f9940 task.ti=c6bd9000) [ 109.143568] Stack: c740aa64 c740aa68 c740aa64 c740aad8 00000002 c79e1500 00000000 00000000 [ 109.143995] c71a4000 c71a4400 01281ca7 00000003 c6bd9d2c c031f2ef c061da3c c6bd9cec [ 109.144407] c024982b c10e5498 c10e5480 c6bd9d24 c0948780 00000029 00000000 0000002a [ 109.144822] Call Trace: [ 109.144966] [<c031f2ef>] ? ext2_free_blocks+0xfa/0x2b7 [ 109.148071] [<c024982b>] ? activate_page+0x9a/0xa2 [ 109.149303] [<c0322c5f>] ? ext2_free_branches+0x12d/0x190 [ 109.149303] [<c02820ef>] ? __getblk+0x27/0x294 [ 109.149303] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 109.149303] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 109.149303] [<c0322c20>] ? ext2_free_branches+0xee/0x190 [ 109.149303] [<c03231b4>] ? ext2_truncate+0x4f2/0x5a7 [ 109.149303] [<c0322431>] ? ext2_update_inode+0x1a0/0x31f [ 109.149303] [<c0323ae8>] ? ext2_delete_inode+0x7e/0x86 [ 109.149303] [<c0323a6a>] ? ext2_delete_inode+0x0/0x86 [ 109.149303] [<c027558b>] ? generic_delete_inode+0x62/0xd5 [ 109.149303] [<c0275711>] ? generic_drop_inode+0x113/0x162 [ 109.149303] [<c027469c>] ? iput+0x47/0x4e [ 109.149303] [<c0272117>] ? dentry_iput+0x6b/0xb1 [ 109.149303] [<c02721c9>] ? d_kill+0x1d/0x37 [ 109.149303] [<c0273afb>] ? dput+0x58/0x10a [ 109.149303] [<c026bc44>] ? do_rmdir+0xa4/0xc3 [ 109.149303] [<c026bc94>] ? sys_unlinkat+0x31/0x36 [ 109.149303] [<c0202dfe>] ? syscall_call+0x7/0xb [ 109.149303] ======================= [ 109.149303] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 [ 109.149303] EIP: [<c031f151>] read_block_bitmap+0xa7/0x14b SS:ESP 0068:c6bd9c98 [ 109.149303] ---[ end trace a0739d57ca47c4e4 ]---
Reply-To: akpm@linux-foundation.org (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 7 Aug 2008 05:53:37 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11266 > > Summary: unable to handle kernel paging request in > ext2_free_blocks > Product: File System > Version: 2.5 > KernelVersion: 2.6.27-rc2 + patch for #10976 (now in -mm) > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: ext2 > AssignedTo: akpm@osdl.org > ReportedBy: sliedes@cc.hut.fi > > > Latest working kernel version: (I think at least 2.6.25.4 works) > Earliest failing kernel version: > Distribution: Minimal Debian sid (unstable) > Hardware Environment: qemu x86 > Software Environment: > Problem Description: > > Mere rm -rf after mounting on an intentionally corrupted partition > occasionally > causes "BUG: unable to handle kernel paging request" in ext2_free_blocks. > > Unfortunately the issue seems to be timing sensitive (or something), doing it > on the same filesystem only sometimes results in the crash :( But I have > reproduced it something like 6 times now with brief testing. > > If you wish, I can attach some filesystems with which I have been able to > reproduce this at least once. > > Another thing I could do is take a look at it with the new kernel debugger > (which I haven't tried yet) if none of you are able to figure out this from > the > traces. Is there something you would specifically want me to take a look at? > The local and referenced variables at ext2_free_blocks(), I guess? > > I think I ran quite extensive tests on 2.6.25.4 & ext2, so I suspect (but am > not sure, I've made some changes to the way I test) this bug is newer than > 2.6.25.4. I could do some bisecting too, but I haven't managed to automate > the > thing yet. > > Here's a script I run under qemu, google for zzuf (it's a fuzzer), and > timeout > is from the Debian package `timeout': > > ---------- > #!/bin/sh > > if [ "`hostname`" != "fstest" ]; then > echo "This is a dangerous script." > echo "Set your hostname to \`fstest\' if you want to use it." > exit 1 > fi > > umount /dev/hdb > umount /dev/hdc > /etc/init.d/sysklogd stop > /etc/init.d/klogd stop > /etc/init.d/cron stop > mount /dev/hda / -t ext3 -o remount,ro || exit 1 > > ulimit -t 20 > > for ((s=$1; s<1000000000; s++)); do > umount /mnt > echo '***** zzuffing *****' seed $s > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit > mount /dev/hdb /mnt -o errors=continue || continue > cd /mnt || continue > cp -r doc doc2 >&/dev/null > find -xdev >&/dev/null > find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null > mkdir tmp >&/dev/null > echo whoah >tmp/filu 2>/dev/null > rm -rf /mnt/* >&/dev/null > cd / > done > ---------- > > The attached backtraces all start from the time of mounting the filesystem. > Yes, please do test 2.6.26. Aneesh, your recent changes to the ext2 block allocator would have to be prime suspects here.
On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote: > Yes, please do test 2.6.26. Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2. Sami
On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote: > On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote: > > Yes, please do test 2.6.26. > > Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2. 2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4 working (unless something changed between those two versions). Sami
> On Thu, Aug 07, 2008 at 11:07:17PM +0300, Sami Liedes wrote: > > On Thu, Aug 07, 2008 at 10:52:51AM -0700, Andrew Morton wrote: > > > Yes, please do test 2.6.26. > > > > Did that. I can reproduce the same crash on 2.6.26 and 2.6.26.2. > > 2.6.25.15 crashes too, so I might have been wrong about 2.6.25.4 > working (unless something changed between those two versions). I think this is the same problem Vegard reported in http://marc.info/?l=linux-ext4&m=121637999611618&w=2. The problem seems to be in ext2_valid_block_bitmap() which does bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); offset = bitmap_blk - group_first_block; if (!ext2_test_bit(offset, bh->b_data)) (and similarly for inode bitmap). Now when the group descriptor is corrupted, this simply accesses beyond the bh->b_data... The patch below should hopefully fix the issue. Can you test it please? Honza -- Jan Kara <jack@suse.cz> SuSE CR Labs --- From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001 From: Jan Kara <jack@suse.cz> Date: Mon, 18 Aug 2008 16:45:11 +0200 Subject: [PATCH] ext2: Check for corrupted group descriptor before using data in it We have to check whether a group descriptor isn't corrupted in read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try to access bits outside of bitmap and Oops happens. CC: Vegard Nossum <vegard.nossum@gmail.com> CC: Sami Liedes <sliedes@cc.hut.fi> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext2/balloc.c | 29 +++++++++++++++++++++++++++++ 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c index 10bb02c..9104712 100644 --- a/fs/ext2/balloc.c +++ b/fs/ext2/balloc.c @@ -113,6 +113,17 @@ err_out: return 0; } +static int ext2_block_in_group(struct super_block *sb, + unsigned int block_group, ext2_fsblk_t block) +{ + if (block < ext2_group_first_block_no(sb, block_group)) + return 0; + if (block >= ext2_group_first_block_no(sb, block_group) + + EXT2_BLOCKS_PER_GROUP(sb)) + return 0; + return 1; +} + /* * Read the bitmap for a given block_group,and validate the * bits for block/inode/inode tables are set in the bitmaps @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int block_group) desc = ext2_get_group_desc(sb, block_group, NULL); if (!desc) return NULL; + if (!ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_block_bitmap)) || + !ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_inode_bitmap)) || + !ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_inode_table)) || + !ext2_block_in_group(sb, block_group, + le32_to_cpu(desc->bg_inode_table) + + EXT2_SB(sb)->s_itb_per_group - 1)) { + ext2_error(sb, __func__, "Corrupted group descriptor - " + "block_group = %u, block_bitmap = %u, " + "inode_bitmap = %u, inode_table = %u", + block_group, + le32_to_cpu(desc->bg_block_bitmap), + le32_to_cpu(desc->bg_inode_bitmap), + le32_to_cpu(desc->bg_inode_table)); + return NULL; + } bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); bh = sb_getblk(sb, bitmap_blk); if (unlikely(!bh)) {
On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote: > > From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack@suse.cz> > Date: Mon, 18 Aug 2008 16:45:11 +0200 > Subject: [PATCH] ext2: Check for corrupted group descriptor before using data > in it > > We have to check whether a group descriptor isn't corrupted in > read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try > to access bits outside of bitmap and Oops happens. > > CC: Vegard Nossum <vegard.nossum@gmail.com> > CC: Sami Liedes <sliedes@cc.hut.fi> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/ext2/balloc.c | 29 +++++++++++++++++++++++++++++ > 1 files changed, 29 insertions(+), 0 deletions(-) > > diff --git a/fs/ext2/balloc.c b/fs/ext2/balloc.c > index 10bb02c..9104712 100644 > --- a/fs/ext2/balloc.c > +++ b/fs/ext2/balloc.c > @@ -113,6 +113,17 @@ err_out: > return 0; > } > > +static int ext2_block_in_group(struct super_block *sb, > + unsigned int block_group, ext2_fsblk_t block) > +{ > + if (block < ext2_group_first_block_no(sb, block_group)) > + return 0; > + if (block >= ext2_group_first_block_no(sb, block_group) + > + EXT2_BLOCKS_PER_GROUP(sb)) > + return 0; > + return 1; > +} > + > /* > * Read the bitmap for a given block_group,and validate the > * bits for block/inode/inode tables are set in the bitmaps > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int > block_group) > desc = ext2_get_group_desc(sb, block_group, NULL); > if (!desc) > return NULL; > + if (!ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_block_bitmap)) || > + !ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_inode_bitmap)) || > + !ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_inode_table)) || > + !ext2_block_in_group(sb, block_group, > + le32_to_cpu(desc->bg_inode_table) + > + EXT2_SB(sb)->s_itb_per_group - 1)) { > + ext2_error(sb, __func__, "Corrupted group descriptor - " > + "block_group = %u, block_bitmap = %u, " > + "inode_bitmap = %u, inode_table = %u", > + block_group, > + le32_to_cpu(desc->bg_block_bitmap), > + le32_to_cpu(desc->bg_inode_bitmap), > + le32_to_cpu(desc->bg_inode_table)); > + return NULL; > + } > bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); > bh = sb_getblk(sb, bitmap_blk); > if (unlikely(!bh)) { Do we need to do this validation every time we do a read_block_bitmap ? I guess we need to move the validation where we read the desc blocks from the disk. -aneesh
On Mon, Aug 18, 2008 at 07:58:46AM -0700, bugme-daemon@bugzilla.kernel.org wrote: > I think this is the same problem Vegard reported in > http://marc.info/?l=linux-ext4&m=121637999611618&w=2. > The problem seems to be in ext2_valid_block_bitmap() which does > > bitmap_blk = le32_to_cpu(desc->bg_block_bitmap); > offset = bitmap_blk - group_first_block; > if (!ext2_test_bit(offset, bh->b_data)) > > (and similarly for inode bitmap). Now when the group descriptor is > corrupted, this simply accesses beyond the bh->b_data... > The patch below should hopefully fix the issue. Can you test it > please? Yes, I've ran it for a few hours * 4 qemus now and have not seen the bug again, so it's probably fixed by this patch. Sami
Reply-To: adilger@sun.com On Aug 18, 2008 22:21 +0530, Aneesh Kumar wrote: > > +static int ext2_block_in_group(struct super_block *sb, > > + unsigned int block_group, ext2_fsblk_t block) > > +{ > > + if (block < ext2_group_first_block_no(sb, block_group)) > > + return 0; > > + if (block >= ext2_group_first_block_no(sb, block_group) + > > + EXT2_BLOCKS_PER_GROUP(sb)) > > + return 0; > > + return 1; > > +} > > + > > /* > > * Read the bitmap for a given block_group,and validate the > > * bits for block/inode/inode tables are set in the bitmaps > > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned int > block_group) > > desc = ext2_get_group_desc(sb, block_group, NULL); > > if (!desc) > > return NULL; > > + if (!ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_block_bitmap)) || > > + !ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_inode_bitmap)) || > > + !ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_inode_table)) || > > + !ext2_block_in_group(sb, block_group, > > + le32_to_cpu(desc->bg_inode_table) + > > + EXT2_SB(sb)->s_itb_per_group - 1)) { Isn't equivalent checking done in ext2_check_descriptors()? It would make sense to abstract out the "check one group and return error" code and use it in both places. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Mon 18-08-08 21:24:10, Andreas Dilger wrote: > On Aug 18, 2008 22:21 +0530, Aneesh Kumar wrote: > > > +static int ext2_block_in_group(struct super_block *sb, > > > + unsigned int block_group, ext2_fsblk_t block) > > > +{ > > > + if (block < ext2_group_first_block_no(sb, block_group)) > > > + return 0; > > > + if (block >= ext2_group_first_block_no(sb, block_group) + > > > + EXT2_BLOCKS_PER_GROUP(sb)) > > > + return 0; > > > + return 1; > > > +} > > > + > > > /* > > > * Read the bitmap for a given block_group,and validate the > > > * bits for block/inode/inode tables are set in the bitmaps > > > @@ -129,6 +140,24 @@ read_block_bitmap(struct super_block *sb, unsigned > int block_group) > > > desc = ext2_get_group_desc(sb, block_group, NULL); > > > if (!desc) > > > return NULL; > > > + if (!ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_block_bitmap)) || > > > + !ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_inode_bitmap)) || > > > + !ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_inode_table)) || > > > + !ext2_block_in_group(sb, block_group, > > > + le32_to_cpu(desc->bg_inode_table) + > > > + EXT2_SB(sb)->s_itb_per_group - 1)) { > > Isn't equivalent checking done in ext2_check_descriptors()? It would make > sense to abstract out the "check one group and return error" code and use > it in both places. Actually yes, it is. Good point. Sami, is it the case that you have mounted the filesystem, then intentionally corrupted it and after that the kernel oopsed (as opposed to first corrupting the filesystem image and mounting it after that)? That would explain how corrupted values could get to read_block_bitmap() even though ext2_check_descriptors() checked them. Honza
On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote: > > Isn't equivalent checking done in ext2_check_descriptors()? It would make > > sense to abstract out the "check one group and return error" code and use > > it in both places. > Actually yes, it is. Good point. Sami, is it the case that you have > mounted the filesystem, then intentionally corrupted it and after that > the kernel oopsed (as opposed to first corrupting the filesystem image and > mounting it after that)? That would explain how corrupted values could get > to read_block_bitmap() even though ext2_check_descriptors() checked them. No, that's not what I do. I corrupt the fs before mounting it, then mount it, perform normal filesystem operations on it and unmount it. Here's the most current script I use (zzuf is the fuzzer): ------------------------------------------------------------ #!/bin/sh if [ "`hostname`" != "fstest" ]; then echo "This is a dangerous script." echo "Set your hostname to \`fstest\' if you want to use it." exit 1 fi umount /dev/hdb umount /dev/hdc /etc/init.d/sysklogd stop /etc/init.d/klogd stop /etc/init.d/cron stop mount /dev/hda / -t ext3 -o remount,ro || exit 1 #ulimit -t 20 for ((s=$1; s<1000000000; s++)); do umount /mnt echo '***** zzuffing *****' seed $s zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit mount /dev/hdb /mnt -t ext2 -o errors=continue || continue cd /mnt || continue timeout 30 cp -r doc doc2 >&/dev/null timeout 30 find -xdev >&/dev/null timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null timeout 30 mkdir tmp >&/dev/null timeout 30 echo whoah >tmp/filu 2>/dev/null timeout 30 rm -rf /mnt/* >&/dev/null cd / done ------------------------------------------------------------ Sami
On Mon, Aug 18, 2008 at 04:58:41PM +0200, Jan Kara wrote: > From 06953717138efe3ad535e78343beb7204ac0d274 Mon Sep 17 00:00:00 2001 > From: Jan Kara <jack@suse.cz> > Date: Mon, 18 Aug 2008 16:45:11 +0200 > Subject: [PATCH] ext2: Check for corrupted group descriptor before using data > in it > > We have to check whether a group descriptor isn't corrupted in > read_block_bitmap(). Otherwise ext2_valid_block_bitmap() will try > to access bits outside of bitmap and Oops happens. I think something similar is needed for ext3, or at least the backtrace looks similar to me (tell me if you want me to file a separate bug for it): ------------------------------------------------------------ [ 1303.485714] EXT3-fs unexpected failure: !jh->b_committed_data; [ 1303.485714] inconsistent data on disk [ 1303.485714] BUG: unable to handle kernel paging request at c7edfffc [ 1303.485714] IP: [<c02ddca9>] read_block_bitmap+0xa3/0x147 [ 1303.485714] *pde = 00007067 *pte = 07edf160 [ 1303.485714] Oops: 0000 [#1] DEBUG_PAGEALLOC [ 1303.485714] [ 1303.485714] Pid: 17001, comm: rm Not tainted (2.6.27-rc3 #2) [ 1303.485714] EIP: 0060:[<c02ddca9>] EFLAGS: 00000246 CPU: 0 [ 1303.485714] EIP is at read_block_bitmap+0xa3/0x147 [ 1303.485714] EAX: ffffffff EBX: c7ee0800 ECX: c7ee0000 EDX: 00000001 [ 1303.485714] ESI: c3c40690 EDI: c7abd000 EBP: c79c4c9c ESP: c79c4c6c [ 1303.485714] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 1303.485714] Process rm (pid: 17001, ti=c79c4000 task=c79189a0 task.ti=c79c4000) [ 1303.485714] Stack: 00000246 00000001 00000246 c7abda3c c7413aa0 c5d7f800 00000000 00000000 [ 1303.485714] c7ee0000 00000000 00000000 c3c25064 c79c4cf4 c02dde1f c3c405b0 c79c4ccc [ 1303.485714] c0317987 00000001 c0314a9b 00000029 0000002a c7abd000 c7440000 c5d7f8ac [ 1303.485714] Call Trace: [ 1303.485714] [<c02dde1f>] ? ext3_free_blocks_sb+0x93/0x3d6 [ 1303.485714] [<c0317987>] ? journal_revoke+0x81/0xe3 [ 1303.485714] [<c0314a9b>] ? do_get_write_access+0x381/0x49c [ 1303.485714] [<c02ed428>] ? __ext3_journal_revoke+0x1e/0x44 [ 1303.485714] [<c02de18d>] ? ext3_free_blocks+0x2b/0x7f [ 1303.485714] [<c02e3694>] ? ext3_clear_blocks+0x11f/0x141 [ 1303.485714] [<c02e377a>] ? ext3_free_data+0xc4/0x133 [ 1303.485714] [<c02e3a0e>] ? ext3_free_branches+0x225/0x22d [ 1303.485714] [<c02e3891>] ? ext3_free_branches+0xa8/0x22d [ 1303.485714] [<c02e3891>] ? ext3_free_branches+0xa8/0x22d [ 1303.485714] [<c02e407d>] ? ext3_truncate+0x667/0x8af [ 1303.485714] [<c03153e2>] ? journal_start+0xb2/0x112 [ 1303.485714] [<c031540d>] ? journal_start+0xdd/0x112 [ 1303.485714] [<c03153e2>] ? journal_start+0xb2/0x112 [ 1303.485714] [<c02eb243>] ? ext3_journal_start_sb+0x29/0x4a [ 1303.485714] [<c02e4389>] ? ext3_delete_inode+0xc4/0xdb [ 1303.485714] [<c02e42c5>] ? ext3_delete_inode+0x0/0xdb [ 1303.485714] [<c0276c2b>] ? generic_delete_inode+0x62/0xd5 [ 1303.485714] [<c0276db1>] ? generic_drop_inode+0x113/0x162 [ 1303.485714] [<c0275d3c>] ? iput+0x47/0x4e [ 1303.485714] [<c02737a7>] ? dentry_iput+0x6b/0xb1 [ 1303.485714] [<c0273859>] ? d_kill+0x1d/0x37 [ 1303.485714] [<c027519b>] ? dput+0x58/0x10a [ 1303.485714] [<c026d2a4>] ? do_rmdir+0xa4/0xc3 [ 1303.485714] [<c026d2f4>] ? sys_unlinkat+0x31/0x36 [ 1303.485714] [<c0202f3e>] ? syscall_call+0x7/0xb [ 1303.485714] ======================= [ 1303.485714] Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 0 1 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 [ 1303.485714] EIP: [<c02ddca9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c79c4c6c [ 1303.485714] ---[ end trace ba199677255b7e73 ]--- ------------------------------------------------------------ $ addr2line -e vmlinux -i 0xc02ddca9 include/asm/bitops.h:305 fs/ext3/balloc.c:98 fs/ext3/balloc.c:167 98 if (!ext3_test_bit(offset, bh->b_data)) 99 /* bad block bitmap */ 100 goto err_out; ------------------------------------------------------------ Sami
> On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote: > > > Isn't equivalent checking done in ext2_check_descriptors()? It would > make > > > sense to abstract out the "check one group and return error" code and use > > > it in both places. > > Actually yes, it is. Good point. Sami, is it the case that you have > > mounted the filesystem, then intentionally corrupted it and after that > > the kernel oopsed (as opposed to first corrupting the filesystem image and > > mounting it after that)? That would explain how corrupted values could get > > to read_block_bitmap() even though ext2_check_descriptors() checked them. > > No, that's not what I do. I corrupt the fs before mounting it, then > mount it, perform normal filesystem operations on it and unmount it. OK, thanks. Then we must somehow corrupt group descriptor block during the operation. Because I'm pretty sure it *is* corrupted - the oops is: unable to handle kernel paging request at c7e95ffc. If we look into registers, we see ECX has c7e96000 (which is probably bh->b_data). In the second oops it's exactly the same - ECX has c11e4000, the oops is at address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to pass negative offset into ext2_test_bit(). But as Andreas pointed out, when we load descriptors into memory, we check that both bitmaps and inode table is in ext2_check_descriptors()... The other possibility would be that we managed to corrupts s_first_data_block in the superblock. Anyway, both possibilities don't look very likely. I'll try to reproduce the problem and maybe get more insight... How large is your filesystem BTW? > Here's the most current script I use (zzuf is the fuzzer): > > ------------------------------------------------------------ > #!/bin/sh > > if [ "`hostname`" != "fstest" ]; then > echo "This is a dangerous script." > echo "Set your hostname to \`fstest\' if you want to use it." > exit 1 > fi > > umount /dev/hdb > umount /dev/hdc > /etc/init.d/sysklogd stop > /etc/init.d/klogd stop > /etc/init.d/cron stop > mount /dev/hda / -t ext3 -o remount,ro || exit 1 > > #ulimit -t 20 > > for ((s=$1; s<1000000000; s++)); do > umount /mnt > echo '***** zzuffing *****' seed $s > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit > mount /dev/hdb /mnt -t ext2 -o errors=continue || continue > cd /mnt || continue > timeout 30 cp -r doc doc2 >&/dev/null > timeout 30 find -xdev >&/dev/null > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null > timeout 30 mkdir tmp >&/dev/null > timeout 30 echo whoah >tmp/filu 2>/dev/null > timeout 30 rm -rf /mnt/* >&/dev/null > cd / > done > ------------------------------------------------------------ Honza
On Wed, Aug 20, 2008 at 12:25:33PM +0200, Jan Kara wrote: > OK, thanks. Then we must somehow corrupt group descriptor block during > the operation. Because I'm pretty sure it *is* corrupted - the oops > is: unable to handle kernel paging request at c7e95ffc. If we look into > registers, we see ECX has c7e96000 (which is probably bh->b_data). In > the second oops it's exactly the same - ECX has c11e4000, the oops is at > address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to > pass negative offset into ext2_test_bit(). But as Andreas pointed out, > when we load descriptors into memory, we check that both bitmaps and > inode table is in ext2_check_descriptors()... The other possibility > would be that we managed to corrupts s_first_data_block in the > superblock. Anyway, both possibilities don't look very likely. I'll try > to reproduce the problem and maybe get more insight... How large is your > filesystem BTW? My FS is 10 MiB and tries to be diverse in its contents. It has a copy of my /dev and a small partial copy of /usr/share/doc. I put the pristine (non-corrupted) filesystem at http://www.hut.fi/~sliedes/fsdebug-hdc-ext2.bz2 (520k compressed). I've been thinking I should write a script to prepare the root filesystem for the tests, but haven't got that far yet. Basically (unless I forget some step) I use debootstrap to bootstrap a minimal Debian system, create some needed devices in it (hd[abc], ttyS0 at least), set the hostname to fstest, configure getty to listen to ttyS0, copy the script to /root/runtest (the script's first parameter is the seed) and install some Debian packages (zzuf and timeout at least). Then I make four copies of the images and run four qemus in parallel since I have four cpus, modifying the first parameter (initial seed) of the runtest script, e.g. 0, 10M, 20M, 30M. I guess the approach might be useful for those who write the code too (or people closer to them than me), since I've already found a fair number of bugs with it in a fairly short period of time (#10871, #10882, #10976, #11250, #11253, #11266 for ext[23] bugs, also one ext4 bug I hit when an ext3 fs was detected as ext4; search bugzilla for my email to see the rest of the bugs). The current root filesystem is 144M compressed (yeah, there's a lot of stuff irrelevant to the tests there), I could upload it somewhere if that helps. After that running the tests is a matter of running something like qemu -kernel bzImage -append 'root=/dev/hda console=ttyS0,115200n8' \ -hda hda -hdb hdb -hdc hdc -nographic -serial pty , attaching a screen session to the allocated pty, logging in as root and running ./runtest $seed. Also the tests are not as comprehensive as I'd like. As an example, some years ago I stress tested reiser4 (it was already "ready") with pretty mundane operations (without corrupting the fs) and it worked, but I've got it to break badly at three separate times in separate ways just by normally using Debian's aptitude - the breakage was in flock(), and the current tests don't test flock()). Other things to test would be at least hard links and fifos... The level of automation isn't quite what I'd like either, optimally there would just be a single script that takes the kernel image, filesystem type and number of parallel instances as arguments and runs the tests. Sami
Reply-To: adilger@sun.com On Aug 20, 2008 12:25 +0200, Jan Kara wrote: > > On Tue, Aug 19, 2008 at 11:13:39AM +0200, Jan Kara wrote: > > > > Isn't equivalent checking done in ext2_check_descriptors()? It would > make > > > > sense to abstract out the "check one group and return error" code and > use > > > > it in both places. > > > Actually yes, it is. Good point. Sami, is it the case that you have > > > mounted the filesystem, then intentionally corrupted it and after that > > > the kernel oopsed (as opposed to first corrupting the filesystem image > and > > > mounting it after that)? That would explain how corrupted values could > get > > > to read_block_bitmap() even though ext2_check_descriptors() checked them. > > > > No, that's not what I do. I corrupt the fs before mounting it, then > > mount it, perform normal filesystem operations on it and unmount it. > OK, thanks. Then we must somehow corrupt group descriptor block during > the operation. Oh, interesting... The data in the journal is probably corrupt, but all of the superblock/gdt sanity checks are done BEFORE the journal is replayed. It would seem that the ext*_fill_super() code should do the sanity checks, and then recheck the superblock and group descriptors after the journal is replayed. The superblock checking code can be moved out of ext*_fill_super() into a helper function like ext*_check_super()) and then calling ext*_check_super() and ext*_check_descriptors() again after journal replay. Having journal checksums enabled (ext4) would also detect this problem before the journal replay corrupts the filesystem metadata. It doesn't look possible that we can do journal recovery before loading the GDT because ext*_load_journal()->ext*_get_journal() is doing iget() and this needs the GDT to read the journal inode. It might also make sense to just clean up the superblock and group descriptor table and goto the beginning of fill_super() because in some cases the superblock contents may have changed in important ways (e.g. crash after resize of the filesystem which is only in the journal). > Because I'm pretty sure it *is* corrupted - the oops > is: unable to handle kernel paging request at c7e95ffc. If we look into > registers, we see ECX has c7e96000 (which is probably bh->b_data). In > the second oops it's exactly the same - ECX has c11e4000, the oops is at > address c11e3ffc. So in both cases it is ECX-4. So somehow we managed to > pass negative offset into ext2_test_bit(). But as Andreas pointed out, > when we load descriptors into memory, we check that both bitmaps and > inode table is in ext2_check_descriptors()... The other possibility > would be that we managed to corrupts s_first_data_block in the > superblock. Anyway, both possibilities don't look very likely. I'll try > to reproduce the problem and maybe get more insight... How large is your > filesystem BTW? > > > Here's the most current script I use (zzuf is the fuzzer): > > > > ------------------------------------------------------------ > > #!/bin/sh > > > > if [ "`hostname`" != "fstest" ]; then > > echo "This is a dangerous script." > > echo "Set your hostname to \`fstest\' if you want to use it." > > exit 1 > > fi > > > > umount /dev/hdb > > umount /dev/hdc > > /etc/init.d/sysklogd stop > > /etc/init.d/klogd stop > > /etc/init.d/cron stop > > mount /dev/hda / -t ext3 -o remount,ro || exit 1 > > > > #ulimit -t 20 > > > > for ((s=$1; s<1000000000; s++)); do > > umount /mnt > > echo '***** zzuffing *****' seed $s > > zzuf -r 0:0.03 -s $s </dev/hdc >/dev/hdb || exit > > mount /dev/hdb /mnt -t ext2 -o errors=continue || continue > > cd /mnt || continue > > timeout 30 cp -r doc doc2 >&/dev/null > > timeout 30 find -xdev >&/dev/null > > timeout 30 find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null > > timeout 30 mkdir tmp >&/dev/null > > timeout 30 echo whoah >tmp/filu 2>/dev/null > > timeout 30 rm -rf /mnt/* >&/dev/null > > cd / > > done > > ------------------------------------------------------------ Oh, hmm, this is ext2 and not ext3, so no journal... I guess my bug is still valid, but just not this one? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
I now have found an ext3 filesystem for which this bug happens pretty reproducibly on 2.6.27.4. Increasing commit interval seems to help it happen, otherwise the journal can be aborted and then the bug no longer happens. I do realize that this report is for the ext2 bug, but I hope finding a similar bug on ext3 might help (and even if this is a separate bug, this information should help resolve it). Here's how to do it: 1. bunzip2 the attached filesystem image hdb.10000097.bz2 (I did the following inside qemu, hence /dev/hdb) 2. mount /dev/hdb /mnt -t ext3 -o errors=continue,commit=300 3. cd /mnt 4. timeout 30 cp -r doc doc2 >&/dev/null (or manually break cp after 30 seconds, it's jammed anyway) 6. find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null 7. mkdir tmp >&/dev/null 8. echo whoah >tmp/filu 2>/dev/null 9. rm -rf /mnt/* >&/dev/null 10. while completing rm -rf, the following oops occurs: ------------------------------------------------------------ EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones - Block = 8234, count = 1 EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk ext3_forget: aborting transaction: IO failure in __ext3_journal_forget EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones - Block = 42, count = 3 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 25630524, count = 1 EXT3-fs error (device hdb) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk BUG: unable to handle kernel paging request at c13fbbfc IP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 *pde = 07886163 *pte = 013fb160 Oops: 0000 [#1] DEBUG_PAGEALLOC Pid: 817, comm: rm Not tainted (2.6.27.4 #1) EIP: 0060:[<c02de4f9>] EFLAGS: 00000206 CPU: 0 EIP is at read_block_bitmap+0xa3/0x147 EAX: ffffdfff EBX: c13fc820 ECX: c13fc000 EDX: 00002001 ESI: c74b15b0 EDI: c7aae400 EBP: c7b7acd0 ESP: c7b7aca0 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process rm (pid: 817, ti=c7b7a000 task=c78a1ce0 task.ti=c7b7a000) Stack: 00000001 00000000 00000000 c7aaf1c0 00000246 c79cdc00 00000001 00000000 c13fc000 00000000 00000001 c163b37c c7b7ad28 c02de66f c0315003 c740aadc c7b7ad10 c7440000 c7aaf1c0 00000029 0000202a c7aae400 c7440000 c79cdcac Call Trace: [<c02de66f>] ? ext3_free_blocks_sb+0x93/0x3d6 [<c0315003>] ? journal_forget+0xff/0x1aa [<c02edd83>] ? __ext3_journal_forget+0x19/0x3f [<c02de9dd>] ? ext3_free_blocks+0x2b/0x7f [<c02e3f8c>] ? ext3_clear_blocks+0x137/0x159 [<c02e4072>] ? ext3_free_data+0xc4/0x133 [<c02e4320>] ? ext3_free_branches+0x23f/0x247 [<c02e4189>] ? ext3_free_branches+0xa8/0x247 [<c02e4189>] ? ext3_free_branches+0xa8/0x247 [<c02e498d>] ? ext3_truncate+0x665/0x8ad [<c0316062>] ? journal_start+0xb2/0x112 [<c031608d>] ? journal_start+0xdd/0x112 [<c0316062>] ? journal_start+0xb2/0x112 [<c02ebb53>] ? ext3_journal_start_sb+0x29/0x4a [<c02e4ca4>] ? ext3_delete_inode+0xcf/0xdb [<c02e4bd5>] ? ext3_delete_inode+0x0/0xdb [<c02774b3>] ? generic_delete_inode+0x62/0xd5 [<c0277639>] ? generic_drop_inode+0x113/0x16a [<c02765ac>] ? iput+0x47/0x4e [<c026d9f4>] ? do_unlinkat+0xc3/0x13d [<c054484f>] ? mutex_unlock+0x8/0xa [<c026fb0b>] ? vfs_readdir+0x60/0x85 [<c026f84c>] ? filldir64+0x0/0xd7 [<c026fbc7>] ? sys_getdents64+0x97/0xa1 [<c026db66>] ? sys_unlinkat+0x23/0x36 [<c0202f1e>] ? syscall_call+0x7/0xb ======================= Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 EIP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c7b7aca0 ---[ end trace 780108b88e07a03e ]--- ------------------------------------------------------------
Created attachment 18592 [details] Test case, corrupted ext3 file system, bzip2 compressed
Same, please don't update this bug via the bugzilla UI - nobody is looking at it. Please stick exclusively to email, vie emailed reply-to-all in the same thread.
[Sorry for duplicates, forgot to use email instead of bugzilla web interface.] I now have found an ext3 filesystem for which this bug happens pretty reproducibly on 2.6.27.4. Increasing commit interval seems to help it happen, otherwise the journal can be aborted and then the bug no longer happens. I do realize that this report is for the ext2 bug, but I hope finding a similar bug on ext3 might help (and even if this is a separate bug, this information should help resolve it). Here's how to do it: 1. bunzip2 the attached filesystem image hdb.10000097.bz2 (I did the following inside qemu, hence /dev/hdb) 2. mount /dev/hdb /mnt -t ext3 -o errors=continue,commit=300 3. cd /mnt 4. timeout 30 cp -r doc doc2 >&/dev/null (or manually break cp after 30 seconds, it's jammed anyway) 6. find -xdev -print0 2>/dev/null |xargs -0 touch -- 2>/dev/null 7. mkdir tmp >&/dev/null 8. echo whoah >tmp/filu 2>/dev/null 9. rm -rf /mnt/* >&/dev/null 10. while completing rm -rf, the following oops occurs: ------------------------------------------------------------ EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 4294967295, count = 1 EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones - Block = 8234, count = 1 EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk ext3_forget: aborting transaction: IO failure in __ext3_journal_forget EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks in system zones - Block = 42, count = 3 EXT3-fs error (device hdb): ext3_free_blocks: Freeing blocks not in datazone - block = 25630524, count = 1 EXT3-fs error (device hdb) in ext3_free_blocks_sb: Readonly filesystem EXT3-fs unexpected failure: !jh->b_committed_data; inconsistent data on disk BUG: unable to handle kernel paging request at c13fbbfc IP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 *pde = 07886163 *pte = 013fb160 Oops: 0000 [#1] DEBUG_PAGEALLOC Pid: 817, comm: rm Not tainted (2.6.27.4 #1) EIP: 0060:[<c02de4f9>] EFLAGS: 00000206 CPU: 0 EIP is at read_block_bitmap+0xa3/0x147 EAX: ffffdfff EBX: c13fc820 ECX: c13fc000 EDX: 00002001 ESI: c74b15b0 EDI: c7aae400 EBP: c7b7acd0 ESP: c7b7aca0 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process rm (pid: 817, ti=c7b7a000 task=c78a1ce0 task.ti=c7b7a000) Stack: 00000001 00000000 00000000 c7aaf1c0 00000246 c79cdc00 00000001 00000000 c13fc000 00000000 00000001 c163b37c c7b7ad28 c02de66f c0315003 c740aadc c7b7ad10 c7440000 c7aaf1c0 00000029 0000202a c7aae400 c7440000 c79cdcac Call Trace: [<c02de66f>] ? ext3_free_blocks_sb+0x93/0x3d6 [<c0315003>] ? journal_forget+0xff/0x1aa [<c02edd83>] ? __ext3_journal_forget+0x19/0x3f [<c02de9dd>] ? ext3_free_blocks+0x2b/0x7f [<c02e3f8c>] ? ext3_clear_blocks+0x137/0x159 [<c02e4072>] ? ext3_free_data+0xc4/0x133 [<c02e4320>] ? ext3_free_branches+0x23f/0x247 [<c02e4189>] ? ext3_free_branches+0xa8/0x247 [<c02e4189>] ? ext3_free_branches+0xa8/0x247 [<c02e498d>] ? ext3_truncate+0x665/0x8ad [<c0316062>] ? journal_start+0xb2/0x112 [<c031608d>] ? journal_start+0xdd/0x112 [<c0316062>] ? journal_start+0xb2/0x112 [<c02ebb53>] ? ext3_journal_start_sb+0x29/0x4a [<c02e4ca4>] ? ext3_delete_inode+0xcf/0xdb [<c02e4bd5>] ? ext3_delete_inode+0x0/0xdb [<c02774b3>] ? generic_delete_inode+0x62/0xd5 [<c0277639>] ? generic_drop_inode+0x113/0x16a [<c02765ac>] ? iput+0x47/0x4e [<c026d9f4>] ? do_unlinkat+0xc3/0x13d [<c054484f>] ? mutex_unlock+0x8/0xa [<c026fb0b>] ? vfs_readdir+0x60/0x85 [<c026f84c>] ? filldir64+0x0/0xd7 [<c026fbc7>] ? sys_getdents64+0x97/0xa1 [<c026db66>] ? sys_unlinkat+0x23/0x36 [<c0202f1e>] ? syscall_call+0x7/0xb ======================= Code: 26 00 0f 88 94 00 00 00 8b 87 8c 02 00 00 89 45 e4 8b 55 e8 0f af 50 10 8b 40 34 03 50 14 8b 03 89 45 ec 8b 4e 14 89 4d f0 29 d0 <0f> a3 01 19 c0 85 c0 74 11 8b 43 04 89 45 ec 29 d0 0f a3 01 19 EIP: [<c02de4f9>] read_block_bitmap+0xa3/0x147 SS:ESP 0068:c7b7aca0 ---[ end trace 780108b88e07a03e ]--- ------------------------------------------------------------ Sami
If this is still seen on modern kernels then please re-open/update