Bug 7668 - XFS mount of LVM2 snapshot hangs occasionally
Summary: XFS mount of LVM2 snapshot hangs occasionally
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: i386 Linux
: P2 high
Assignee: XFS Guru
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-12-11 16:18 UTC by Kuba Pamirski
Modified: 2008-04-13 22:47 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.17.7
Tree: Mainline
Regression: ---


Attachments

Description Kuba Pamirski 2006-12-11 16:18:57 UTC
Most recent kernel where this bug did *NOT* occur: unknown
Distribution: Fedora
Hardware Environment: PC, P4 2.4GHz HT, Abit Mainboard, HD: WD2000-JS 200GB
Software Environment: LVM2, Logical Volume and LVM snapshots
Problem Description:

Mount operation applied to LVM2 snapshot hangs occasionally whenever main 
Logical Volume is in use. Possibly due to heavy IO load, congestion.

I was able to track this error with help of simple debugging methodology.

First, I discovered that hanging occurs within xlog_recover() function of 
xfs_log_recover.c source file.

xlog_recover() calls xlog_do_recover() where I placed a few printk's to give me 
a clue about closer location of the problem:

xlog_do_recover(
	xlog_t		*log,
	xfs_daddr_t	head_blk,
	xfs_daddr_t	tail_blk)
{
	int		error;
	xfs_buf_t	*bp;
	xfs_sb_t	*sbp;
printk("xlog_do_recover: #1\n");
	/*
	 * First replay the images in the log.
	 */
	error = xlog_do_log_recovery(log, head_blk, tail_blk);
	if (error) {
		return error;
	}

printk("xlog_do_recover: #2\n");
	XFS_bflush(log->l_mp->m_ddev_targp);
printk("xlog_do_recover: #3\n");

	/*
	 * If IO errors happened during recovery, bail out.
	 */
	if (XFS_FORCED_SHUTDOWN(log->l_mp)) {
		return (EIO);
	}
printk("xlog_do_recover: #4\n");

	/*
	 * We now update the tail_lsn since much of the recovery has completed
	 * and there may be space available to use.  If there were no extent
	 * or iunlinks, we can free up the entire log and set the tail_lsn to
	 * be the last_sync_lsn.  This was set in xlog_find_tail to be the
	 * lsn of the last known good LR on disk.  If there are extent frees
	 * or iunlinks they will have some entries in the AIL; so we look at
	 * the AIL to determine how to set the tail_lsn.
	 */
printk("xlog_do_recover: #5\n");
	xlog_assign_tail_lsn(log->l_mp);

	/*
	 * Now that we've finished replaying all buffer and inode
	 * updates, re-read in the superblock.
	 */
printk("xlog_do_recover: #6\n");
	bp = xfs_getsb(log->l_mp, 0);
	XFS_BUF_UNDONE(bp);
	XFS_BUF_READ(bp);
	xfsbdstrat(log->l_mp, bp);
printk("xlog_do_recover: #7\n");
	if ((error = xfs_iowait(bp))) {
printk("xlog_do_recover: #100\n");
		xfs_ioerror_alert("xlog_do_recover",
				  log->l_mp, bp, XFS_BUF_ADDR(bp));
		ASSERT(0);
		xfs_buf_relse(bp);
printk("xlog_do_recover: #101\n");
		return error;
	}

	/* Convert superblock from on-disk format */
	sbp = &log->l_mp->m_sb;
printk("xlog_do_recover: #8\n");
	xfs_xlatesb(XFS_BUF_TO_SBP(bp), sbp, 1, XFS_SB_ALL_BITS);
	ASSERT(sbp->sb_magicnum == XFS_SB_MAGIC);
	ASSERT(XFS_SB_GOOD_VERSION(sbp));
	xfs_buf_relse(bp);

printk("xlog_do_recover: #9\n");
	xlog_recover_check_summary(log);

	/* Normal transactions can now occur */
	log->l_flags &= ~XLOG_ACTIVE_RECOVERY;
	return 0;
}
 
Whenever mount operation hangs, it hangs after displaying "xlog_do_recover: #7" 
message. It hangs on xfs_iowait(bp) function call.

WCHAN of changing mount process suggested semaphore infinite wait.

In xfs_buf.h we read:

#define xfs_iowait(bp)	xfs_buf_iowait(bp)

and xfs_buf_iowait() is defined in xfs_buf.c, again.

Within this function, mount operation hangs on: down(&bp->b_iodonesema);

In order to be able to log some useful information I changed xfs_buf_iowait() 
definition in such a way that:

a. it tries to get the semaphore using down_trylock()
b. unless successfull, it logs some fields of our xfs_buf_t structure.
c. finally it hangs using original down(&bp->b_iodonesema)

My function looks as follows:

int
xfs_buf_iowait(
	xfs_buf_t		*bp)
{
	XB_TRACE(bp, "iowait", 0);
	if (atomic_read(&bp->b_io_remaining))
		blk_run_address_space(bp->b_target->bt_mapping);
	if (down_trylock(&bp->b_iodonesema))
	{
		printk("xfs_buf_iowait blocked, b_io_rem: %d, b_error: %d, 
b_flags: %d, block: %lld, dev_maj: %d, dev_min: %d, XBF_ASYNC: %d\n", 
atomic_read(&bp->b_io_remaining), bp->b_error, bp->b_flags, bp->b_bn, MAJOR(bp-
>b_target->bt_dev), MINOR(bp->b_target->bt_dev), (((bp->b_flags & XBF_ASYNC) != 
0) ? 1 : 0));
		down(&bp->b_iodonesema);
	}
	XB_TRACE(bp, "iowaited", (long)bp->b_error);
	return bp->b_error;
}
 
To be able to log even more diagnostic information I added additional printk's 
within dm_request() and dec_pending() functions of dm.c file.
This logging takes place on start and end of any io operation entering device 
mapper. To limit excessive logging I only log BIOs referring to bi_sector == 0 
which is sufficient in this case.

Now, with proper, non-hanging execution of mount operation we get something 
that looks like:

Dec 12 01:13:08 localhost kernel: [17179773.208000] XFS mounting filesystem dm-5
Dec 12 01:13:08 localhost kernel: [17179773.344000] xfs_buf_iowait blocked, 
b_io_rem: 1, b_error: 0, b_flags: 262149, block: 40960032, dev_maj: 253, 
dev_min: 5, XBF_ASYNC: 0 
...
Dec 12 01:13:15 localhost kernel: [17179786.228000] xlog_do_recover: #2 
...
Dec 12 01:13:21 localhost kernel: [17179786.300000] xlog_do_recover: #3
Dec 12 01:13:21 localhost kernel: [17179786.300000] xlog_do_recover: #4
Dec 12 01:13:24 localhost kernel: [17179786.300000] xlog_do_recover: #5
Dec 12 01:13:33 localhost kernel: [17179786.300000] xlog_do_recover: #6 

>>>>>>>>

Dec 12 01:13:33 localhost kernel: [17179786.300000] dm_request: request for 
sector 0, device: 253:5, dir: READ
Dec 12 01:13:33 localhost kernel: [17179786.300000] xlog_do_recover: #7
Dec 12 01:13:34 localhost kernel: [17179786.300000] xfs_buf_iowait blocked, 
b_io_rem: 1, b_error: 0, b_flags: 131333, block: 0, dev_maj: 253, dev_min: 5, 
XBF_ASYNC: 0
Dec 12 01:13:35 localhost kernel: [17179786.416000] dec_pending: ending bio for 
sector 0, device: 253:5, dir: READ

>>>>>>>>

Dec 12 01:13:35 localhost kernel: [17179786.416000] xlog_do_recover: #8
Dec 12 01:13:35 localhost kernel: [17179786.416000] xlog_do_recover: #9 
...

Our focus point is on xlog_do_recover: #7 line with preceding IO request, 
xfs_iowait() call that follows and on final endio of IO request.

Preceding IO request is always a READ request for block number 0.
On the following line:

Dec 12 01:13:34 localhost kernel: [17179786.300000] xfs_buf_iowait blocked, 
b_io_rem: 1, b_error: 0, b_flags: 131333, block: 0, dev_maj: 253, dev_min: 5, 
XBF_ASYNC: 0

you can read that xfs_buf_iowait call was about to block on b_iodonesema but it 
finally got woken up.

b_flags of xfs_buf_t involved had NO XBF_ASYNC flag set.

Now, with hanging path of mount execution you get something like this:

>>>>>>>>>>
Dec 12 01:20:33 localhost kernel: [17180171.176000] dm_request: request for 
sector 0, device: 253:9, dir: READ
Dec 12 01:20:33 localhost kernel: [17180171.176000] xlog_do_recover: #7
Dec 12 01:20:34 localhost kernel: [17180171.176000] xfs_buf_iowait blocked, 
b_io_rem: 1, b_error: 0, b_flags: 131349, block: 0, dev_maj: 253, dev_min: 9, 
XBF_ASYNC: 1
Dec 12 01:20:34 localhost kernel: [17180171.176000] dec_pending: ending bio for 
sector 0, device: 253:9, dir: READ 
>>>>>>>

Everything looks similarly except for the fact that preceding READ operation on 
block number 0 initiated asynchronous mode of operation.

This pattern is 100% repeatable - whenever XBF_ASYNC flag is 1 for this READ 
operation on block number 0, outstanding xfs_iowait() is never going to exit 
waiting state !


Steps to reproduce:

1. Create Logical Volume, mount it under /lv
2. Enter the following loop:

while [ true ];
do
	lvcreate -L15GB -s -c256k -n1 /dev/VolGroup00/LogVol00;
	lvcreate -L15GB -s -n2 /dev/VolGroup00/LogVol00;
	lvcreate -L15GB -s -c256k -n3 /dev/VolGroup00/LogVol00;
	lvcreate -L15GB -s -n4 /dev/VolGroup00/LogVol00;
	for i in 1;
	do
		mount -onouuid /dev/VolGroup00/1 /s/1;
		mount -onouuid /dev/VolGroup00/2 /s/2;
		mount -onouuid /dev/VolGroup00/3 /s/3;
		mount -onouuid /dev/VolGroup00/4 /s/4;
		umount /s/1;
		umount /s/2;
		umount /s/3;
		umount /s/4;
	done

	echo "Removing /dev/VolGroup00/1";
	time echo "y" | lvremove /dev/VolGroup00/1;
	echo "Removing /dev/VolGroup00/2";
	time echo "y" | lvremove /dev/VolGroup00/2;
	echo "Removing /dev/VolGroup00/3";
	time echo "y" | lvremove /dev/VolGroup00/3;
	echo "Removing /dev/VolGroup00/4";
	time echo "y" | lvremove /dev/VolGroup00/4;
	echo "Removed all snapshots";
done

3. In background run additional script to cause IO load of the system:

while [ true ];
do
	cat /dev/zero | head -c1000m > /lv/output;
	rm -f /lv/output;
done

4. After some system, script 2. will hang on some mount operation.
Comment 1 Dave Chinner 2006-12-11 17:00:46 UTC
Yeah, I'm trying to track this down myself right now. Thanks for 
the test script - that will help as the current method we have 
for tripping this takes some hours to fall over the bug. 
 
Comment 2 Natalie Protasevich 2008-04-13 17:41:42 UTC
Any updates on this problem please.
Comment 3 Dave Chinner 2008-04-13 20:14:05 UTC
Fixed a long time ago, IIRC.
Comment 4 Natalie Protasevich 2008-04-13 22:47:33 UTC
Great, thanks. Closing the bug.

Note You need to log in before you can comment on or make changes to this bug.