Bug 10421

Summary: Filesystem failing without any warning or etc
Product: File System Reporter: Denys Fedoryshchenko (nuclearcat)
Component: XFSAssignee: XFS Guru (xfs-masters)
Status: CLOSED CODE_FIX    
Severity: high CC: bunk, jeffrey.hundstad
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.25-rc8-git1 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 9832    

Description Denys Fedoryshchenko 2008-04-07 23:11:22 UTC
Latest working kernel version: 2.6.24.3
Earliest failing kernel version: 2.6.25-rc7-git3
Distribution: Custom
Hardware Environment: Sun Fire X4100
Software Environment: Customer distro with busybox tools, mount-2.12r
Problem Description:
Each night my proxy is crashing. 
I notice that it's cannot open some cache directory. Before it was /cache4, but after i remove it from configuration of squid and even unmounted it, /cache3 start to fail. All hardware is in good condition, and i have similar problem on another proxy server, but it's more rare (not each night).

After analysing situation here is result:
ls /cache3 is empty, while it must be one directory, squid.

SUPERPROXY /cache3 # echo "xxx" >x
-ash: cannot create x: Input/output error

/dev/sde1 on /cache3 type xfs (rw,noatime,nodiratime,logbufs=8,noquota)

SUPERPROXY /cache3 # xfs_info /cache3
xfs_info: /cache3 is not a mounted XFS filesystem

SUPERPROXY /cache3 # cat /proc/mounts
rootfs / rootfs rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
none /dev/pts devpts rw,mode=600 0 0
none /proc/bus/usb usbfs rw 0 0
/dev/mapper/root /mnt/flash ext3 rw,noatime,errors=continue,data=ordered 0 0
/dev/sdc2 /cache1 xfs rw,noatime,nodiratime,logbufs=8,noquota 0 0
/dev/sdd1 /cache2 xfs rw,noatime,nodiratime,logbufs=8,noquota 0 0
/dev/sde1 /cache3 xfs rw,noatime,nodiratime,logbufs=8,noquota 0 0

SUPERPROXY ~ # umount /cache3
on dmesg appeared:
[261943.707659] xfs_force_shutdown(sde1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.  Return address = 0xf8b79dd2

SUPERPROXY ~ # xfs_repair /dev/sde1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 471077370, moving to lost+found
Phase 7 - verify and correct link counts...
done

SUPERPROXY /cache3/lost+found # ls -la
drwxr-xr-x    2 root     root           38 Apr  3 08:49 .
drwxr-xr-x    4 squid    root           35 Apr  6  2007 ..
-rw-r-----    1 squid    squid           0 Apr  1 00:38 336079762
-rw-r-----    1 squid    squid           0 Mar 19 07:11 471077370

Steps to reproduce: Difficult, i didn't found any reason why it is crashing.
Comment 1 Dave Chinner 2008-04-07 23:47:57 UTC
What's the error messages and stack trace prior to the xfs_force_shutdown
message? i.e. why did the filesystem shutdown? Were there any I/O errors?

If there is no message, please set /proc/sys/fs/xfs/error_level to 11
so the next time it occurs there is a stack trace and error message
prior to the shutdown.

Also, can you post the output of 'xfs_info <mntpt>' for the three filesystems
in question (need to do while they are mounted and not shut down). The
output of lspci would also be handy so we know what hardware you are actually
using....

Cheers,

Dave.
Comment 2 Denys Fedoryshchenko 2008-04-08 00:10:24 UTC
I will set error_level and will wait till night.

SUPERPROXY ~ # xfs_info /cache1
meta-data=/dev/sdc2              isize=256    agcount=16, agsize=869894 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=13918304, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0
SUPERPROXY ~ # xfs_info /cache2
meta-data=/dev/sdd1              isize=256    agcount=16, agsize=1120031 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=17920496, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0
SUPERPROXY ~ # xfs_info /cache3
meta-data=/dev/sde1              isize=256    agcount=16, agsize=1120031 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=17920496, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0
SUPERPROXY ~ # xfs_info /cache4
meta-data=/dev/sdf1              isize=256    agcount=16, agsize=1120031 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=17920496, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0


SUPERPROXY ~ # lspci
00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13)
00:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13)
00:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:01.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
01:01.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
01:02.0 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
01:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet Controller (Copper) (rev 03)
02:03.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1064 PCI-X Fusion-MPT SAS (rev 02)
03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
04:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13)
04:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
04:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 13)
04:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
Comment 3 Denys Fedoryshchenko 2008-04-08 13:54:15 UTC
Again FS crashed, and no new information appeared in log, even with debug set to 11

[305409.607467] xfs_force_shutdown(sdd1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.  Return address = 0xf8b79dd2
Thats all what i have.
SUPERPROXY /proc/sys/fs/xfs # xfs_repair -v /dev/sdd1
Phase 1 - find and verify superblock...
        - block cache size set to 376368 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 60243 tail block 60243
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 134763204, moving to lost+found
Phase 7 - verify and correct link counts...

        XFS_REPAIR Summary    Tue Apr  8 23:53:00 2008

Phase           Start           End             Duration
Phase 1:        04/08 23:52:23  04/08 23:52:24  1 second
Phase 2:        04/08 23:52:24  04/08 23:52:31  7 seconds
Phase 3:        04/08 23:52:31  04/08 23:52:57  26 seconds
Phase 4:        04/08 23:52:57  04/08 23:52:57
Phase 5:        04/08 23:52:57  04/08 23:52:57
Phase 6:        04/08 23:52:57  04/08 23:52:57
Phase 7:        04/08 23:52:57  04/08 23:52:57

Total run time: 34 seconds
done
SUPERPROXY /proc/sys/fs/xfs # /bin/mount -o noatime,nodiratime,logbufs=8 -L CACHE2 /cache2
SUPERPROXY /cache2/lost+found # ls -la
drwxr-xr-x    2 root     root           22 Apr  8 23:52 .
drwxr-xr-x    4 squid    root           35 Apr  6  2007 ..
-rw-r-----    1 squid    squid           0 Apr  2 23:40 134763204



SUPERPROXY /proc/sys/fs/xfs # grep "" *
age_buffer_centisecs:1500
error_level:11
filestream_centisecs:3000
inherit_noatime:1
inherit_nodefrag:1
inherit_nodump:1
inherit_nosymlinks:0
inherit_sync:1
irix_sgid_inherit:0
irix_symlink_mode:0
panic_mask:0
restrict_chown:1
rotorstep:1
stats_clear:0
xfsbufd_centisecs:100
xfssyncd_centisecs:3000
Comment 4 Dave Chinner 2008-04-08 17:45:15 UTC
> xfs_force_shutdown(sdd1,0x1) called from line 420 of file
> fs/xfs/xfs_rw.c.  Return address = 0xf8b79dd2

(sdd1,0x1) means that the shutdown flags were 0x1:

#define SHUTDOWN_META_IO_ERROR  0x0001  /* write attempt to metadata failed */

There are a total of 5 places this can come from. From:

  - xfs_trans_read_buf() we'd see I/O error alerts in the log prior
    to the shutdown
  - xfs_inactive() we'd see an error saying 'xfs_ifree returned an error'
  - xfs_bwrite() won't log an extra error
  - xfs_buf_error_relse() won't log an extra error
  - xfs_bmap_finish() won't log an extra error

The three that don't log extra errors are a result of I/O errors or
object validation errors. Object validation errors typicaly log lots
more errors and stack dumps, though, so I'm tending towards a new
set of I/O errors coming from below XFS at this point....

What I really need to know is what function is at the return address
0xf8b79dd2. 'ksymoops -A 0xf8b79dd2' or something similar should give us
the symbol....

Also, can you run xfs_check on the broken filesystem before you run
xfs_repair so we can see all the errors on disk as a result of the shutdown.
Comment 5 Denys Fedoryshchenko 2008-04-08 23:38:36 UTC
SUPERPROXY /proc # cat /proc/kallsyms |grep f8b79d
f8b79ddd t xfs_write_sync_logforce      [xfs]
f8b79d37 t xfs_bwrite   [xfs]

So it looks like xfs_bwrite.

I got a little bit more this night.

Apr  9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_inotobp: xfs_imap()  returned an error 22 on sde1.  Returning error.
Apr  9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on sde1.  Returning error.
Apr  9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_inactive:    xfs_ifree() returned an error = 22 on sde1
Apr  9 03:50:53 194.146.153.28 kernel: [340245.809495] xfs_force_shutdown(sde1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c.  Return address = 0xf8b797d7
Apr  9 03:50:53 194.146.153.28 kernel: [340245.845721] Filesystem "sde1": I/O Error Detected.  Shutting down filesystem: sde1
Apr  9 03:50:53 194.146.153.28 kernel: [340245.845721] Please umount the filesystem, and rectify the problem(s)

Then when i notice that and unmount:
Apr  9 08:54:32 194.146.153.28 kernel: [334261.479051] xfs_force_shutdown(sde1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.  Return address = 0xf8b79dd2
Comment 6 Eric Sandeen 2008-04-09 06:16:35 UTC
I wonder if turning on the debugging (all the #ifdef DEBUG) in xfs_dilocate would be useful to get more info about the apparent xfs_inotobp->xfs_imap->xfs_dilocate failure?  (if it's possible for you to do this...)
Comment 7 Denys Fedoryshchenko 2008-04-10 13:21:45 UTC
During last 2 days i dont have such crashes anymore on both hosts.
One (with only 2 XFS mounts) crashed 2-3 times and then not crashing anymore. Another with 4 mounted XFS filesystems crashed more times, and now not crashing anymore. I will wait more, if it will not appear during 1-2 weeks on last git, i will close bug.
Comment 8 Dave Chinner 2008-04-10 18:03:28 UTC
From comment #5, that error path indicates an invalid inode number in
the unlinked list. i.e. the inode being freed is being inserted into
the unlinked list, but the current inode the list points to is non-existent.

Given that the problem is not consistent and there is no corruption on
disk, this smells of a memory corruption of some kind. I think that turning
on all the memory debugging and poisoning would be a good idea.

Also if it happens again and xfs_repair puts a disconnected inode into
lost+found, can you dump the inode contents for us using:

# xfs_db -r -c 'inode <inode #>' -c p <dev>

The inode # is the name of the inode in lost+found.
Comment 9 Denys Fedoryshchenko 2008-04-11 00:16:50 UTC
Most probably it can't be hardware corruption, i have 6 Sun servers, and only 2 of them who is upgraded to 2.6.25-rcX having this issues.
I will wait more if issue will happen again.
Comment 10 Denys Fedoryshchenko 2008-04-13 11:32:05 UTC
I just had more dmesg messages while upgrading to RC9, but without debug and didn't had time to run xfs_db. But maybe will help somehow. I did now debug and if will catch issue in time will run also xfs_db.

Here dmesg:
[  317.410931] xfs_inotobp: xfs_imap()  returned an error 22 on sde1.  Returning error.
[  317.410931] xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on sde1.  Returning error.
[  317.410931] xfs_inactive:    xfs_ifree() returned an error = 22 on sde1
[  317.410931] xfs_force_shutdown(sde1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c.  Return address = 0xf8b7985f
[  317.555550] Filesystem "sde1": I/O Error Detected.  Shutting down filesystem: sde1
[  317.555731] Please umount the filesystem, and rectify the problem(s)
[  318.724694] xfs_inotobp: xfs_imap()  returned an error 22 on sdd1.  Returning error.
[  318.724702] xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on sdd1.  Returning error.
[  318.724706] xfs_inactive:    xfs_ifree() returned an error = 22 on sdd1
[  318.724712] xfs_force_shutdown(sdd1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c.  Return address = 0xf8b7985f
[  318.791361] xfs_difree: xfs_inobt_lookup_le returned()  an error 5 on sdd1.  Returning error.
[  318.896324] xfs_difree: xfs_inobt_lookup_le returned()  an error 5 on sdd1.  Returning error.
[  319.183699] xfs_iunlink_remove: xfs_itobp()  returned an error 5 on sdd1.  Returning error.
[  319.183743] xfs_iunlink_remove: xfs_trans_read_buf()  returned an error 5 on sdd1.  Returning error.
[  319.228470] Filesystem "sdd1": I/O Error Detected.  Shutting down filesystem: sdd1
[  319.228690] Please umount the filesystem, and rectify the problem(s)
[ 1110.111013] xfs_force_shutdown(sde1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.  Return address = 0xf8b79e5a
[ 1120.966164] xfs_force_shutdown(sdd1,0x1) called from line 420 of file fs/xfs/xfs_rw.c.  Return address = 0xf8b79e5a
Comment 11 Denys Fedoryshchenko 2008-04-14 12:19:54 UTC
I catch it. Some memory debug enabled (R/O protection and VM debug), others i cannot enable, cause it will cause major slowdown.
Here is debug enabled in specified function
[18882.132386] Filesystem "sde1": xfs_dilocate: agno (127) >= mp->m_sb.sb_agcount (16)
[18882.132561] Filesystem "sde1": xfs_dilocate: agbno (0x1fffff) >= mp->m_sb.sb_agblocks (0x11171f)
[18882.132900] Pid: 1987, comm: squid Not tainted 2.6.25-rc9-build-0027 #6
[18882.132924]  [<f8b61487>] xfs_dilocate+0x163/0x400 [xfs]
[18882.133139]  [<f8b673b9>] xfs_imap+0x12c/0x1d9 [xfs]
[18882.133230]  [<c013b358>] ? mark_held_locks+0x41/0x5c
[18882.133501]  [<c013b4d3>] ? trace_hardirqs_on+0xe1/0x102
[18882.133541]  [<f8b67bf6>] xfs_inotobp+0x31/0x1b1 [xfs]
[18882.133611]  [<f8b85f0c>] ? xfs_buf_read_flags+0x14/0x78 [xfs]
[18882.133728]  [<f8b7b8b2>] ? xfs_trans_read_buf+0x25f/0x29d [xfs]
[18882.133774]  [<f8b68382>] xfs_iunlink_remove+0x272/0x350 [xfs]
[18882.133966]  [<f8b69831>] xfs_ifree+0x15/0xe5 [xfs]
[18882.134016]  [<f8b823a4>] xfs_inactive+0x312/0x3cb [xfs]
[18882.134068]  [<f8b8a48a>] xfs_fs_clear_inode+0x7d/0xbe [xfs]
[18882.134109]  [<c0170e6d>] clear_inode+0x63/0xb2
[18882.134121]  [<c0170f35>] generic_delete_inode+0x79/0xcb
[18882.134133]  [<c0170f99>] generic_drop_inode+0x12/0x137
[18882.134147]  [<c0170661>] iput+0x63/0x66
[18882.134159]  [<c016907d>] do_unlinkat+0xb9/0x12b
[18882.134192]  [<c013ea70>] ? sys_futex+0xb8/0xce
[18882.134202]  [<c01037fd>] ? sysenter_past_esp+0x9a/0xa5
[18882.134210]  [<c013b4d3>] ? trace_hardirqs_on+0xe1/0x102
[18882.134222]  [<c01690ff>] sys_unlink+0x10/0x12
[18882.134227]  [<c01037c2>] sysenter_past_esp+0x5f/0xa5
[18882.134265]  =======================
[18882.134271] xfs_inotobp: xfs_imap()  returned an error 22 on sde1.  Returning error.
[18882.134276] xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on sde1.  Returning error.
[18882.134282] xfs_inactive:    xfs_ifree() returned an error = 22 on sde1
[18882.134287] xfs_force_shutdown(sde1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c.  Return address = 0xf8b823eb
[18882.203955] Filesystem "sde1": I/O Error Detected.  Shutting down filesystem: sde1
[18882.204206] Please umount the filesystem, and rectify the problem(s)
Comment 12 Eric Sandeen 2008-04-14 19:15:01 UTC
So, this is an invalid inode number; Dave thinks it's a bad number on the unlinked list, I think?  This is where slab poisoning would be good, to catch use-after frees and the like, just in case.

If repair turns up another inode, can you do as Dave suggested in comment #8?
Comment 13 Eric Sandeen 2008-04-14 19:23:15 UTC
The agno and agbno values (both derived from the (bad) inode number itself are a bit interesting, 127 and 0x1fffff, which are both all 1's in binary.

It's too bad it didn't print the actual original inode nr; I wonder if this could possibly be a negative error code that somehow snuck in as an inode number?
Comment 14 Denys Fedoryshchenko 2008-04-15 01:56:22 UTC
No, on repair there is no lost inodes this time, so there is nothing to debug over xfs_db :-(
Comment 15 Eric Sandeen 2008-04-15 09:28:50 UTC
Dave may have better ideas, but I'd be interested in something like
this in xfs_dilocate just before the xfs_stack_trace() and EINVAL return:

xfs_fs_cmn_err(CE_ALERT, mp, "inode number 0x%llx\n", ino);

to see what actual inode number we got, just to see if it offers any clues.
Comment 16 Eric Sandeen 2008-04-15 11:06:22 UTC
sgi guys, do you think there's any chance that
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c319b58b13bb22f9a2478825b06c641c825f51ec
is to blame here?
Comment 17 Denys Fedoryshchenko 2008-04-15 12:32:13 UTC
Maybe i can just reverse this patch and try without it? but seems for now it is more and more difficult to trigger bug :-(
Comment 18 Dave Chinner 2008-04-17 18:36:57 UTC
Eric, I don't think that patch has anything to do with it.

Denys, it's still worth trying a revert, though I think that
adding more printk()s in the error paths to identify the inode
and the failure path would be more useful....
Comment 19 Eric Sandeen 2008-04-17 19:22:05 UTC
Can you reproduce the bug w/o aufs?
Comment 20 Denys Fedoryshchenko 2008-04-18 03:21:09 UTC
The issue, that it is happening less and less. Since last report nothing.

I cannot disable aufs, it will not handle my load most probably. But i will try. Also i will upgrade today to release.
Comment 21 Denys Fedoryshchenko 2008-04-18 03:32:34 UTC
aufs is squid way of handling files (threads, instead linear processing (UFS) file by file). Kernel is not patched, and it is fully vanilla.
Comment 22 Eric Sandeen 2008-04-18 06:41:51 UTC
Never mind on disabling aufs; I thought you were using the kernel module.

Thanks,
-Eric
Comment 23 Denys Fedoryshchenko 2008-04-21 23:16:29 UTC
Here is with inode number
[  342.535990] Filesystem "sdd1": xfs_dilocate: agno (127) >= mp->m_sb.sb_agcount (16)
[  342.535996] Filesystem "sdd1": xfs_dilocate: agbno (0x1fffff) >= mp->m_sb.sb_agblocks (0x11171f)
[  342.536995] Filesystem "sdd1": inode number 0xffffffff
[  342.537178] Pid: 1690, comm: squid Not tainted 2.6.25-build-0028 #16
[  342.537181]  [<f8b0b0d2>] ? xfs_fs_cmn_err+0x1b/0x21 [xfs]
[  342.537223]  [<f8b0d4c9>] xfs_dilocate+0x17d/0x415 [xfs]
[  342.537324]  [<f8b133f5>] xfs_imap+0x12c/0x1d9 [xfs]
[  342.537364]  [<c013a4b4>] ? mark_held_locks+0x41/0x5c
[  342.537384]  [<c013a62f>] ? trace_hardirqs_on+0xe1/0x102
[  342.537425]  [<f8b13c32>] xfs_inotobp+0x31/0x1b1 [xfs]
[  342.537459]  [<f8b31f48>] ? xfs_buf_read_flags+0x14/0x78 [xfs]
[  342.537505]  [<f8b278ee>] ? xfs_trans_read_buf+0x25f/0x29d [xfs]
[  342.537547]  [<f8b143be>] xfs_iunlink_remove+0x272/0x350 [xfs]
[  342.537636]  [<f8b1586d>] xfs_ifree+0x15/0xe5 [xfs]
[  342.537687]  [<f8b2e3e0>] xfs_inactive+0x312/0x3cb [xfs]
[  342.537741]  [<f8b364c6>] xfs_fs_clear_inode+0x7d/0xbe [xfs]
[  342.537778]  [<c0170000>] clear_inode+0x63/0xb2
[  342.537788]  [<c01700c8>] generic_delete_inode+0x79/0xcb
[  342.537800]  [<c017012c>] generic_drop_inode+0x12/0x137
[  342.537812]  [<c016f7f4>] iput+0x63/0x66
[  342.537821]  [<c01681d1>] do_unlinkat+0xb9/0x12b
[  342.537861]  [<c013dbcc>] ? sys_futex+0xb8/0xce
[  342.537873]  [<c01037f9>] ? sysenter_past_esp+0x9a/0xa5
[  342.537883]  [<c013a62f>] ? trace_hardirqs_on+0xe1/0x102
[  342.537898]  [<c0168253>] sys_unlink+0x10/0x12
[  342.537904]  [<c01037be>] sysenter_past_esp+0x5f/0xa5
[  342.537957]  =======================
[  342.537994] xfs_inotobp: xfs_imap()  returned an error 22 on sdd1.  Returning error.
[  342.537998] xfs_iunlink_remove: xfs_inotobp()  returned an error 22 on sdd1.  Returning error.
[  342.538002] xfs_inactive:    xfs_ifree() returned an error = 22 on sdd1
[  342.538006] xfs_force_shutdown(sdd1,0x1) called from line 1737 of file fs/xfs/xfs_vnodeops.c.  Return address = 0xf8b2e427
[  342.541407] xfs_iunlink_remove: xfs_trans_read_buf()  returned an error 5 on sdd1.  Returning error.
[  342.868966] Filesystem "sdd1": I/O Error Detected.  Shutting down filesystem: sdd1
[  342.869066] Please umount the filesystem, and rectify the problem(s)
Comment 24 Denys Fedoryshchenko 2008-04-22 05:14:02 UTC
last dmesg from 2.6.25 release
Comment 25 Denys Fedoryshchenko 2008-04-22 05:23:27 UTC
one more, and i did xfs_db
       - moving disconnected inodes to lost+found ...
disconnected inode 201364285, moving to lost+found
Phase 7 - verify and correct link counts...
done
Proxy-Karam114 ~ # xfs_db -r -c 'inode 201364285' -c p /dev/sdd1
core.magic = 0x494e
core.mode = 0100640
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 1
core.uid = 101
core.gid = 101
core.flushiter = 0
core.atime.sec = Tue Apr 22 09:39:11 2008
core.atime.nsec = 497758374
core.mtime.sec = Tue Apr 22 15:18:21 2008
core.mtime.nsec = 843550184
core.ctime.sec = Tue Apr 22 15:19:16 2008
core.ctime.nsec = 945175699
core.size = 0
core.nblocks = 0
core.extsize = 0
core.nextents = 0
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 151
next_unlinked = null
u = (empty)
Comment 26 Eric Sandeen 2008-04-22 21:31:16 UTC
ok, inode number is -1...

Did you use %llx to print it, i.e. it really is 32 bits of F's not 64?

Hmm are we getting hold of a NULLAGINO somehow...

I wonder if it would be crazy to set NULLAGINO to ((xfs_agino_t)-2) to test this theory and see if you got 0xfffffffe

I wouldn't do that yet unless Dave chimes in on whether or not I'm crazy :)

I haven't really had time to look at code to see if this makes sense.
Comment 27 Denys Fedoryshchenko 2008-04-23 02:02:11 UTC
I just copy&paste request from message #15
I cannot set any risky patches on this server, also i have trip to another country soon, so probably i will have to switch to another filesystem, because i cannot handle such surprises abroad. I will try to write script to handle that.
Comment 28 Eric Sandeen 2008-04-23 05:50:11 UTC
It probably is not risky, but I was hoping for a 2nd opinion.

I do understand being hesitant to keep changing the code on your server, and I appreciate the debugging you've done so far.  It may be enough that with code inspection this bug can still be found.

When the filesystems have these problems, do you know if they went through log recovery during mount, prior to the error?
Comment 29 Denys Fedoryshchenko 2008-04-23 05:58:55 UTC
If i just unmount FS and mount back it will crash after while with other kind of errors. I checked that. So i have to unmount, mount (so log will be cleared), umount again, xfs_repair, then mount. Then it works fine for 1-2 days.
Comment 30 Dave Chinner 2008-04-23 18:36:59 UTC
An incore inode saying it's inode number is -1 (NULLFSINO)? That's a
new one. That would also explain the:

"xfs_dilocate: agno (127) >= mp->m_sb.sb_agcount (16)"

error message as well.

The reason we are trying to look up the inode is that the inode number matches
with the base unlinked inode list number (of NULLAGINO). This acutally means
"this unlinked list is empty", so we are right to be shutting down here as
there's definitely in memory corruption of some kind.

FWIW, this is definitely an in-memory problem as the inode number does not
exist on disk at all...

What I don't understand yet is why, if the inode number is -1, we didn't
abort earlier in xfs_iunlink_remove when we tried to read way beyond the
end of device to pull in the AGI buffer....

Hmmmm. I suspect that this is not a full 64 bit word corruption. I think
the lower 32bits of the inode number have been overwritten with 0xffffffff.
The agno printed in the above error message should have been much larger
than 127 (0x7f) if it was a full 64 bit word corruption.

IF we look at the agino - 0x1fffff - that's 25 bits of 1s and the agno is
7 bits of 1s that are packed tightly against the agino. Hence we have 32 bits
of 1s in a 64bit word.

On top of that the corruption had to happen between calculating the agno
from ip->i_ino and calculating the agino from ip->i_ino. In between this we
read teh AGI buffer, so there's a big window for corruption to occur there
(possible disk I/O).

I think we are definitely looking at some kind of memory corruption occurring
here and we are no closer to isolating the cause at all. It's likely to be
a use-after-free or some such issue. Can you please turn on every single
memory debugging option (slab poisoning, etc) in the kernel to try to
catch whatever is corrupting memory.

Ideally, I'd like you to run with KDB or another debugger so you can dump the
memory around the inode that has been corrupted so we can see whether or not
it's just this 32 bit word that has been corrupted.
Comment 31 Dave Chinner 2008-04-23 19:48:27 UTC
<sigh>

I'm just not with it today. Disregard my last add - I obviously can't
tell the difference between xfs_inotobp and xfs_itobp in error
messages.
Comment 32 Eric Sandeen 2008-04-24 15:18:36 UTC
Hum.  This looks similar.

http://pld.pastebin.com/f56c4c24a

Denys, how much time do you have before you bail out and can't run any more modified xfs code...

Thanks,
-Eric
Comment 33 Eric Sandeen 2008-04-24 15:23:19 UTC
Just for reference, repair output for that previous pastebin: http://pld.pastebin.com/f410a4f93
Comment 34 Denys Fedoryshchenko 2008-04-27 13:12:44 UTC
I can run for now, even if i switch to another FS, if there is anything that i can do in 1 crash - i will do it. But if i have to wait few days for answer, i cannot resist waking up each day at 3 of night, and fixing crashed proxy.
Comment 35 Denys Fedoryshchenko 2008-04-29 02:59:25 UTC
http://marc.info/?l=linux-kernel&m=120942140024743&w=2
is it related?
Comment 36 Dave Chinner 2008-04-29 16:12:00 UTC
> http://marc.info/?l=linux-kernel&m=120942140024743&w=2
> is it related?

No.
Comment 37 Marco Berizzi 2008-05-05 06:57:18 UTC
maybe related?

http://www.ussg.iu.edu/hypermail/linux/kernel/0805.0/1969.html
Comment 38 Dave Chinner 2008-05-05 16:35:41 UTC
Marco, yes, it probably is related. Any information you can supply on
how you tripped over this would be appreciated...
Comment 39 Eric Sandeen 2008-05-08 06:58:05 UTC
I might be hitting something similar here.  It's an odd setup, 2.6.25 xfs backported to an older kernel; I need to try upstream. But if I run this test overnight...

#!/bin/bash

mkdir /mnt/test/dir1
mkdir /mnt/test/dir2
mkdir /mnt/test/dir3
mkdir /mnt/test/dir4

dd if=/dev/zero of=/mnt/test/dir1/file1 bs=1M count=20480 &
dd if=/dev/zero of=/mnt/test/dir2/file2 bs=1M count=20480 &
dd if=/dev/zero of=/mnt/test/dir3/file3 bs=1M count=20480 &
dd if=/dev/zero of=/mnt/test/dir4/file4 bs=1M count=20480 &
sync
sync
rm -f /mnt/test/dir1/file1 &
rm -f /mnt/test/dir2/file2 &
rm -f /mnt/test/dir3/file3 &
rm -f /mnt/test/dir4/file4 &
wait

I eventually hit something similar.  I changed NULLAGINO and NULLFSINO:

-#define        NULLFSINO       ((xfs_ino_t)-1)
-#define        NULLAGINO       ((xfs_agino_t)-1)
+#define        NULLFSINO       ((xfs_ino_t)-2)
+#define        NULLAGINO       ((xfs_agino_t)-3)

in both userspace & kernelspace and when I trip I get:

xfs_inotobp: xfs_imap() returned an error 22 on sdb5 for inode
4294967293.  Returning error.

so that is a NULLAGINO

-Eric
Comment 40 Eric Sandeen 2008-05-09 07:23:36 UTC
Further; I actually ran the above test under restricted memory (mem=512m) on a 4cpu machine, in a mkfs; mount; run test; umount loop.

I added more printks & a dump stack and ran overnight; the stack dump is probably not that interesting, but:

Call Trace:
 [<ffffffff883f56e9>] :xfs:xfs_iunlink_remove+0x2c7/0x3df
 [<ffffffff884058aa>] :xfs:xfs_trans_reserve+0xea/0x1cc
 [<ffffffff883f6421>] :xfs:xfs_ifree+0x18/0xf3
 [<ffffffff8840df71>] :xfs:xfs_inactive+0x313/0x3d7
 [<ffffffff88416654>] :xfs:xfs_fs_clear_inode+0xa4/0xec
 [<ffffffff80022c2d>] clear_inode+0xd2/0x123
 [<ffffffff8002f340>] generic_delete_inode+0xde/0x143
 [<ffffffff8003c2f6>] do_unlinkat+0xd5/0x141
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

I also added a printk in xfs_iunlink_remove()

...
 				xfs_trans_brelse(tp, last_ibp);
 			}
 			next_ino = XFS_AGINO_TO_INO(mp, agno, next_agino);
+			if (next_ino == 4294967293) {
+				printk("ip->i_ino %llu agino %u next_ino %llu next_agino %u agno %u bucket_index %d\n", ip->i_ino, agino, next_ino, next_agino, agno, bucket_index);
+				dump_stack();
+			}
 			error = xfs_inotobp(mp, tp, next_ino, &last_dip,
 					    &last_ibp, &last_offset);
 			if (error) {

and got, in several runs:

(remember NULLAGINO is now "-3" in my kernel)

ip->i_ino 268640385 agino 204929 next_ino 4294967293 next_agino 4294967293 agno 2 bucket_index 1
ip->i_ino 134217857 agino 129 next_ino 4294967293 next_agino 4294967293 agno 1 bucket_index 1
ip->i_ino 402653313 agino 129 next_ino 4294967293 next_agino 4294967293 agno 3 bucket_index 1
ip->i_ino 134217857 agino 129 next_ino 4294967293 next_agino 4294967293 agno 1 bucket_index 1
ip->i_ino 402653313 agino 129 next_ino 4294967293 next_agino 4294967293 agno 3 bucket_index 1
Comment 41 Eric Sandeen 2008-05-09 08:46:56 UTC
The inode numbers are correct FWIW:
      132 /mnt/test/dir1/file1	268640385 /mnt/test/dir3/file3
134217857 /mnt/test/dir2/file2	402653313 /mnt/test/dir4/file4
Comment 42 Eric Sandeen 2008-05-10 17:55:07 UTC
I can hit it with CONFIG_XFS_DEBUG too:

/redhat/BUILD/xfs-kmod-0.5/xfs/xfs_inode.c, line: 2068
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at .../src/redhat/BUILD/xfs-kmod-0.5/xfs/support/debug.c:81
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:0d.0/0000:03:00.0/irq
CPU 1 
Modules linked in: xfs(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 xfrm_nalgo crypto_api cpufreq_ondemand dm_multipath video sbs backlight i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg tg3 k8_edac edac_mc i2c_nforce2 k8temp hwmon i2c_core shpchp serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc mptspi mptscsih scsi_transport_spi sd_mod scsi_mod mptbase ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 4540, comm: rm Tainted: G      2.6.18-88.el5 #1
RIP: 0010:[<ffffffff885dd23d>]  [<ffffffff885dd23d>] :xfs:assfail+0x1a/0x26
RSP: 0018:ffff8100025ddd08  EFLAGS: 00010286
RAX: 0000000000000095 RBX: 0000000000000001 RCX: ffffffff802ec9a8
RDX: ffffffff802ec9a8 RSI: 0000000000000000 RDI: ffffffff802ec9a0
RBP: ffff81000fdc4000 R08: ffffffff802ec9a8 R09: 0000000000000046
R10: ffff8100025dd9a8 R11: 0000000000000080 R12: ffff81001a2afd00
R13: ffff81000fdc4000 R14: ffff81000f2c1840 R15: 0000000000000000
FS:  00002b40c784d260(0000) GS:ffff81000116f7c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003f552c2080 CR3: 000000000249e000 CR4: 00000000000006e0
Process rm (pid: 4540, threadinfo ffff8100025dc000, task ffff81000130d860)
Stack:  0000000000000000 ffffffff885b2c2d 0000000000000001 00000001025ddd48
 ffff8100025ddd94 ffff81001d8b4400 0000000000000001 000000000000000c
 0000000000000000 00000081fffffffe 0008000000010000 0000000000000000
Call Trace:
 [<ffffffff885b2c2d>] :xfs:xfs_iunlink_remove+0x1f2/0x586
 [<ffffffff885ca88a>] :xfs:xfs_trans_log_inode+0x86/0xd4
 [<ffffffff885b5510>] :xfs:xfs_ifree+0x12d/0x209
 [<ffffffff885d28ed>] :xfs:xfs_inactive+0x3b5/0x47c
 [<ffffffff885dba39>] :xfs:xfs_fs_clear_inode+0xa4/0x111
 [<ffffffff80022c2d>] clear_inode+0xd2/0x123
 [<ffffffff8002f340>] generic_delete_inode+0xde/0x143
 [<ffffffff8003c2f6>] do_unlinkat+0xd5/0x141
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 59 0b 5f 88 c2 51 00 58 c3 41 55 83 e7 07 41 54 49 
RIP  [<ffffffff885dd23d>] :xfs:assfail+0x1a/0x26
 RSP <ffff8100025ddd08>
 <0>Kernel panic - not syncing: Fatal exception

ok time to go actually read the code ;)  does seem to be some sort of a race.
Comment 43 Timothy Shimmin 2008-05-11 18:38:14 UTC
Looking at comment #40,
I presume this means that we are calling xfs_ifree on the same
inode on multiple occassions.
So, xfs_ifree() gets called on last reference of an unlinked inode
and so it must call xfs_iunlink_remove to take the inode out
of the unlinked list.
The code checks the head of the list for a match, otherwise it
traverses the list (found from the hash bucket in the agi) looking
for the inode. And in our cases it doesn't find the inode that
it was expecting to be there as it manages to get to the end
of the list.
Is it a case of us trying to free the inode multiple times and
so we've already removed the inode from the list previously?

Just trying to get up to speed on what this is about...

--Tim
Comment 44 Eric Sandeen 2008-05-11 18:47:38 UTC
Tes, no, sorry the printk's in comment #40 were from separate runs.

If you look at the script in comment #39, each inode only gets one unlink call.

-Eric
Comment 45 Timothy Shimmin 2008-05-11 20:46:23 UTC
:-)
Okay, that makes more sense.
I imagined that there was only one unlink system call per file but I was
wondering if we were getting to ifree multiple times somehow.
I didn't realise the multiple run output ... oops.
--Tim
Comment 46 Eric Sandeen 2008-05-13 12:06:50 UTC
From doing some bisection, I'm pretty sure this:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a9759f2de38a3443d5107bddde03b4f3f550060e

is the culprit.
Comment 47 Marco Berizzi 2008-05-14 08:08:17 UTC
don't know is this is related, but after 20 days uptime linux 2.6.25 has started logging these messages:

May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=134, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (292) len=166, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164
May 14 17:00:23 Pleiadi kernel: SKB BUG: Invalid truesize (324) len=166, sizeof(sk_buff)=164
May 14 17:00:24 Pleiadi kernel: SKB BUG: Invalid truesize (260) len=118, sizeof(sk_buff)=164
Comment 48 Eric Sandeen 2008-05-14 10:26:29 UTC
so my hunch is that this part of the patch:

@@ -1191,7 +1164,7 @@ _xfs_buf_ioapply(
        /* Special code path for reading a sub page size buffer in --
         * we populate up the whole page, and hence the other metadata
         * in the same page.  This optimization is only valid when the
         * filesystem block size is not smaller than the page size.
         */

        if ((bp->b_buffer_length < PAGE_CACHE_SIZE) &&
-           (bp->b_flags & XBF_READ) && locking &&
+           (bp->b_flags & XBF_READ) &&
            (blocksize >= PAGE_CACHE_SIZE)) {
                bio = bio_alloc(GFP_NOIO, 1);
 
is allowing another metadata read to clobber our agi with on-disk data after it's been updated.

Some tracing showed:

<7>3438/2 xfs_iunlink ino 268640385 agibp ffff81001c1533c0
<7>3438/2 xfs_iunlink:1995: agi bp ffff81001c1533c0 ffff81000f19b400->agi_unlinked[1] set to agino 204929
<7>3437/2 xfs_iunlink_remove ino 134217857
[0]more>
<7>3437/2 xfs_iunlink_remove:2120: agi bp ffff81000e8a9840 ffff81001e1c1400->agi_unlinked[1] set to next_agino  4294967295
<7>3438/0 xfs_iunlink_remove ino 268640385
<4>3438/0 xfs_iunlink_remove:2078: agi bp ffff81001c1533c0 ffff81000f19b400->agi_unlinked[1] == NULLAGINO

... without any code assignments in between, we went from the proper value back to NULLAGINO (disk value)

I think this optimization is not honoring the locking for the other xfs_bufs around it, since it used to be page-based but is now per-bp.
Comment 49 Eric Sandeen 2008-05-15 07:14:30 UTC
hch's patch at http://verein.lst.de/~hch/xfs-buffer-locking (modulo a missing brace) lets my testcase pass.

Thanks,
-Eric
Comment 50 Denys Fedoryshchenko 2008-05-19 05:21:11 UTC
While compiling under 2.6.25 got error after applying patch

  CC [M]  fs/xfs/linux-2.6/xfs_buf.o
fs/xfs/linux-2.6/xfs_buf.c: In function '_xfs_buf_lookup_pages':
fs/xfs/linux-2.6/xfs_buf.c:453: error: invalid storage class for function '_xfs_buf_map_pages'
fs/xfs/linux-2.6/xfs_buf.c:449: warning: ISO C90 forbids mixed declarations and code
fs/xfs/linux-2.6/xfs_buf.c:711: error: invalid storage class for function 'mem_to_page'
fs/xfs/linux-2.6/xfs_buf.c:975: error: invalid storage class for function 'xfs_buf_wait_unpin'
fs/xfs/linux-2.6/xfs_buf.c:1001: error: invalid storage class for function 'xfs_buf_iodone_work'
fs/xfs/linux-2.6/xfs_buf.c:1105: error: invalid storage class for function '_xfs_buf_ioend'
fs/xfs/linux-2.6/xfs_buf.c:1116: error: invalid storage class for function 'xfs_buf_bio_end_io'
fs/xfs/linux-2.6/xfs_buf.c:1152: error: invalid storage class for function '_xfs_buf_ioapply'
fs/xfs/linux-2.6/xfs_buf.c:1384: error: invalid storage class for function 'xfs_alloc_bufhash'
fs/xfs/linux-2.6/xfs_buf.c:1400: error: invalid storage class for function 'xfs_free_bufhash'
fs/xfs/linux-2.6/xfs_buf.c:1414: error: invalid storage class for function 'xfs_register_buftarg'
fs/xfs/linux-2.6/xfs_buf.c:1423: error: invalid storage class for function 'xfs_unregister_buftarg'
fs/xfs/linux-2.6/xfs_buf.c:1456: error: invalid storage class for function 'xfs_setsize_buftarg_flags'
fs/xfs/linux-2.6/xfs_buf.c:1489: error: invalid storage class for function 'xfs_setsize_buftarg_early'
fs/xfs/linux-2.6/xfs_buf.c:1507: error: invalid storage class for function 'xfs_mapping_buftarg'
fs/xfs/linux-2.6/xfs_buf.c:1540: error: invalid storage class for function 'xfs_alloc_delwrite_queue'
fs/xfs/linux-2.6/xfs_buf.c:1590: error: invalid storage class for function 'xfs_buf_delwri_queue'
fs/xfs/linux-2.6/xfs_buf.c:1640: error: invalid storage class for function 'xfs_buf_runall_queues'
fs/xfs/linux-2.6/xfs_buf.c:1648: error: invalid storage class for function 'xfsbufd_wakeup'
fs/xfs/linux-2.6/xfs_buf.c:1671: error: invalid storage class for function 'xfs_buf_delwri_split'
fs/xfs/linux-2.6/xfs_buf.c:1708: error: invalid storage class for function 'xfsbufd'
fs/xfs/linux-2.6/xfs_buf.c:1846: error: expected declaration or statement at end of input
make[2]: *** [fs/xfs/linux-2.6/xfs_buf.o] Error 1
make[1]: *** [fs/xfs] Error 2
make: *** [fs] Error 2
make: *** Waiting for unfinished jobs....
make[2]: *** [net/ipv4/netfilter] Interrupt
make[1]: *** [net/netfilter] Interrupt
make[1]: *** [net/ipv4] Interrupt
make: *** [net] Interrupt
make[3]: *** [drivers/char/ipmi/modules.order] Interrupt
make[2]: *** [drivers/char/ipmi] Interrupt
make[1]: *** [drivers/char] Interrupt
make: *** [drivers] Interrupt

It looks like patch is invalid?

For example
-                       if (blocksize < PAGE_CACHE_SIZE && !PagePrivate(page)) {
+                       if (blocksize >= PAGE_CACHE_SIZE) {
+                               if (flags & XBF_READ)
+                                       bp->b_flags |= _XBF_PAGE_LOCKED;
>>>>>>> Probably here missing brace? or to remove it 3 lines up
+                       else if (!PagePrivate(page)) {
                                if (test_page_region(page, offset, nbytes))
                                        page_count++;
                        }
Comment 51 Eric Sandeen 2008-05-19 05:50:10 UTC
Yes, there's a missing brace (as noted in comment #49) :)

Try http://www.linux.sgi.com/archives/xfs/2008-05/msg00190.html

-Eric
Comment 52 Denys Fedoryshchenko 2008-05-22 07:10:52 UTC
3 days and it is fine still.
Comment 53 Denys Fedoryshchenko 2008-05-28 07:35:21 UTC
Yes it fixes the bug. Is it going to be in mainline?