Most recent kernel where this bug did not occur: not stated Distribution: Linux/GNU Debian 3.0 Sarge Hardware Environment: 2x Pentium Xeon 2.8, 512MB DDR, 3ware 9xxx SATA raid controller, 6x 250MB SATA HDDs, Raid0 = 1.36T Software Environment: libdevmapper 1.01.04 and lvm 2.01.09 stable (latest dm 1.01.05 and lvm 2.01.15 frome cvs tested too), xfsprogs stable .deb, samba 3.0.14a Problem Description: Under lvcreate/lvremove snapshot of XFS logical volume and copying data from remote host to smb share at the same time xfs_io hangs up and stays in memory as death process. It occures 1 at 20 or 10 or 4 iteration (randomly). Steps to reproduce: pvcreate /dev/sda vgcreate vg /dev/sda lvcreate -L 698G -n lv vg /dev/sda mkfs.xfs /dev/vg0 mount -t xfs -o usrquota,grpquota,noatime,nodiratime /dev/vg/lv /mnt/lv do some smb share /mnt/smb start copying data from remote host to smb share and under this process lvcreate / lvremove 2 big snapshots (347G both) with xfs_freeze -f/u interaction: xfs_freeze -f /mnt/lv lvcreate -s -l 11168 -n S1 -p rw /dev/vg/lv & sleep 7 xfs_freeze -u /mnt/lv mount -t xfs -o noatime,nodiratime,nouuid,ro /dev/vg/S1 /snapshots/S1 sleep 30 create second snapshot (named S2) identical like S1 above and mount it too sleep 3m xfs_freeze -f /mnt/lv lvremove /dev/vg/lv/S1 & sleep 1 xfs_freeze -u /mnt/lv sleep 30 remove second snapshot identically like S1 make few iterations of lvcreate/lvremove 2 snapshots like is shown above Without xfs_freeze -f/u lvcreate and lvremove snapshot hangs up always if copying data at the same time. Thanks for any help Best Regards, Hubert
If you are so inclined, you could get the cvs kernel from oss.sgi.com, which has kdb, and when xfs_io hangs you could break into the debugger and issue a "ps D" to see which threads are in D state, and backtrace them. This could offer some good clues on where things are hung up. It might also be interesting to take smb out of the picture; do you see the same issues with local IO to filesystem on the lvm device? That might be a simpler case.
Thx for your tips Eric. I did it. When xfs_io hangs i issued "ps D" in KDB and backtraced them by "btp process_id" but i don't know how to write its output to txt file. Do you ? Maybe i should use crash or other tool ? I cannot to take smb out of the picture becaus i neeed smb shares for copying data from remote host. I noticed some abnormal execution of kernel pdflush under local IO to filesystem on the lvm device it sometimes hangs and then xfssyncd hangs too and after all any IO to this lv aren't possible. Sometimes pdflush hangs under copying data to smb share and lvcreate or lvremove snapshot at the same time. I think it could be some bug.
Is this issue still present in kernel 2.6.19?
Hi Adrian, Sorry I don't know that, because I don't work about it any more at all. Maybe someone else checked it out or could check it for you. Best Regards, Hubert