Most recent kernel where this bug did *NOT* occur: unknown Distribution: Fedora Hardware Environment: PC, P4 2.4GHz HT, Abit Mainboard, HD: WD2000-JS 200GB Software Environment: LVM2, Logical Volume and LVM snapshots Problem Description: Mount operation applied to LVM2 snapshot hangs occasionally whenever main Logical Volume is in use. Possibly due to heavy IO load, congestion. I was able to track this error with help of simple debugging methodology. First, I discovered that hanging occurs within xlog_recover() function of xfs_log_recover.c source file. xlog_recover() calls xlog_do_recover() where I placed a few printk's to give me a clue about closer location of the problem: xlog_do_recover( xlog_t *log, xfs_daddr_t head_blk, xfs_daddr_t tail_blk) { int error; xfs_buf_t *bp; xfs_sb_t *sbp; printk("xlog_do_recover: #1\n"); /* * First replay the images in the log. */ error = xlog_do_log_recovery(log, head_blk, tail_blk); if (error) { return error; } printk("xlog_do_recover: #2\n"); XFS_bflush(log->l_mp->m_ddev_targp); printk("xlog_do_recover: #3\n"); /* * If IO errors happened during recovery, bail out. */ if (XFS_FORCED_SHUTDOWN(log->l_mp)) { return (EIO); } printk("xlog_do_recover: #4\n"); /* * We now update the tail_lsn since much of the recovery has completed * and there may be space available to use. If there were no extent * or iunlinks, we can free up the entire log and set the tail_lsn to * be the last_sync_lsn. This was set in xlog_find_tail to be the * lsn of the last known good LR on disk. If there are extent frees * or iunlinks they will have some entries in the AIL; so we look at * the AIL to determine how to set the tail_lsn. */ printk("xlog_do_recover: #5\n"); xlog_assign_tail_lsn(log->l_mp); /* * Now that we've finished replaying all buffer and inode * updates, re-read in the superblock. */ printk("xlog_do_recover: #6\n"); bp = xfs_getsb(log->l_mp, 0); XFS_BUF_UNDONE(bp); XFS_BUF_READ(bp); xfsbdstrat(log->l_mp, bp); printk("xlog_do_recover: #7\n"); if ((error = xfs_iowait(bp))) { printk("xlog_do_recover: #100\n"); xfs_ioerror_alert("xlog_do_recover", log->l_mp, bp, XFS_BUF_ADDR(bp)); ASSERT(0); xfs_buf_relse(bp); printk("xlog_do_recover: #101\n"); return error; } /* Convert superblock from on-disk format */ sbp = &log->l_mp->m_sb; printk("xlog_do_recover: #8\n"); xfs_xlatesb(XFS_BUF_TO_SBP(bp), sbp, 1, XFS_SB_ALL_BITS); ASSERT(sbp->sb_magicnum == XFS_SB_MAGIC); ASSERT(XFS_SB_GOOD_VERSION(sbp)); xfs_buf_relse(bp); printk("xlog_do_recover: #9\n"); xlog_recover_check_summary(log); /* Normal transactions can now occur */ log->l_flags &= ~XLOG_ACTIVE_RECOVERY; return 0; } Whenever mount operation hangs, it hangs after displaying "xlog_do_recover: #7" message. It hangs on xfs_iowait(bp) function call. WCHAN of changing mount process suggested semaphore infinite wait. In xfs_buf.h we read: #define xfs_iowait(bp) xfs_buf_iowait(bp) and xfs_buf_iowait() is defined in xfs_buf.c, again. Within this function, mount operation hangs on: down(&bp->b_iodonesema); In order to be able to log some useful information I changed xfs_buf_iowait() definition in such a way that: a. it tries to get the semaphore using down_trylock() b. unless successfull, it logs some fields of our xfs_buf_t structure. c. finally it hangs using original down(&bp->b_iodonesema) My function looks as follows: int xfs_buf_iowait( xfs_buf_t *bp) { XB_TRACE(bp, "iowait", 0); if (atomic_read(&bp->b_io_remaining)) blk_run_address_space(bp->b_target->bt_mapping); if (down_trylock(&bp->b_iodonesema)) { printk("xfs_buf_iowait blocked, b_io_rem: %d, b_error: %d, b_flags: %d, block: %lld, dev_maj: %d, dev_min: %d, XBF_ASYNC: %d\n", atomic_read(&bp->b_io_remaining), bp->b_error, bp->b_flags, bp->b_bn, MAJOR(bp- >b_target->bt_dev), MINOR(bp->b_target->bt_dev), (((bp->b_flags & XBF_ASYNC) != 0) ? 1 : 0)); down(&bp->b_iodonesema); } XB_TRACE(bp, "iowaited", (long)bp->b_error); return bp->b_error; } To be able to log even more diagnostic information I added additional printk's within dm_request() and dec_pending() functions of dm.c file. This logging takes place on start and end of any io operation entering device mapper. To limit excessive logging I only log BIOs referring to bi_sector == 0 which is sufficient in this case. Now, with proper, non-hanging execution of mount operation we get something that looks like: Dec 12 01:13:08 localhost kernel: [17179773.208000] XFS mounting filesystem dm-5 Dec 12 01:13:08 localhost kernel: [17179773.344000] xfs_buf_iowait blocked, b_io_rem: 1, b_error: 0, b_flags: 262149, block: 40960032, dev_maj: 253, dev_min: 5, XBF_ASYNC: 0 ... Dec 12 01:13:15 localhost kernel: [17179786.228000] xlog_do_recover: #2 ... Dec 12 01:13:21 localhost kernel: [17179786.300000] xlog_do_recover: #3 Dec 12 01:13:21 localhost kernel: [17179786.300000] xlog_do_recover: #4 Dec 12 01:13:24 localhost kernel: [17179786.300000] xlog_do_recover: #5 Dec 12 01:13:33 localhost kernel: [17179786.300000] xlog_do_recover: #6 >>>>>>>> Dec 12 01:13:33 localhost kernel: [17179786.300000] dm_request: request for sector 0, device: 253:5, dir: READ Dec 12 01:13:33 localhost kernel: [17179786.300000] xlog_do_recover: #7 Dec 12 01:13:34 localhost kernel: [17179786.300000] xfs_buf_iowait blocked, b_io_rem: 1, b_error: 0, b_flags: 131333, block: 0, dev_maj: 253, dev_min: 5, XBF_ASYNC: 0 Dec 12 01:13:35 localhost kernel: [17179786.416000] dec_pending: ending bio for sector 0, device: 253:5, dir: READ >>>>>>>> Dec 12 01:13:35 localhost kernel: [17179786.416000] xlog_do_recover: #8 Dec 12 01:13:35 localhost kernel: [17179786.416000] xlog_do_recover: #9 ... Our focus point is on xlog_do_recover: #7 line with preceding IO request, xfs_iowait() call that follows and on final endio of IO request. Preceding IO request is always a READ request for block number 0. On the following line: Dec 12 01:13:34 localhost kernel: [17179786.300000] xfs_buf_iowait blocked, b_io_rem: 1, b_error: 0, b_flags: 131333, block: 0, dev_maj: 253, dev_min: 5, XBF_ASYNC: 0 you can read that xfs_buf_iowait call was about to block on b_iodonesema but it finally got woken up. b_flags of xfs_buf_t involved had NO XBF_ASYNC flag set. Now, with hanging path of mount execution you get something like this: >>>>>>>>>> Dec 12 01:20:33 localhost kernel: [17180171.176000] dm_request: request for sector 0, device: 253:9, dir: READ Dec 12 01:20:33 localhost kernel: [17180171.176000] xlog_do_recover: #7 Dec 12 01:20:34 localhost kernel: [17180171.176000] xfs_buf_iowait blocked, b_io_rem: 1, b_error: 0, b_flags: 131349, block: 0, dev_maj: 253, dev_min: 9, XBF_ASYNC: 1 Dec 12 01:20:34 localhost kernel: [17180171.176000] dec_pending: ending bio for sector 0, device: 253:9, dir: READ >>>>>>> Everything looks similarly except for the fact that preceding READ operation on block number 0 initiated asynchronous mode of operation. This pattern is 100% repeatable - whenever XBF_ASYNC flag is 1 for this READ operation on block number 0, outstanding xfs_iowait() is never going to exit waiting state ! Steps to reproduce: 1. Create Logical Volume, mount it under /lv 2. Enter the following loop: while [ true ]; do lvcreate -L15GB -s -c256k -n1 /dev/VolGroup00/LogVol00; lvcreate -L15GB -s -n2 /dev/VolGroup00/LogVol00; lvcreate -L15GB -s -c256k -n3 /dev/VolGroup00/LogVol00; lvcreate -L15GB -s -n4 /dev/VolGroup00/LogVol00; for i in 1; do mount -onouuid /dev/VolGroup00/1 /s/1; mount -onouuid /dev/VolGroup00/2 /s/2; mount -onouuid /dev/VolGroup00/3 /s/3; mount -onouuid /dev/VolGroup00/4 /s/4; umount /s/1; umount /s/2; umount /s/3; umount /s/4; done echo "Removing /dev/VolGroup00/1"; time echo "y" | lvremove /dev/VolGroup00/1; echo "Removing /dev/VolGroup00/2"; time echo "y" | lvremove /dev/VolGroup00/2; echo "Removing /dev/VolGroup00/3"; time echo "y" | lvremove /dev/VolGroup00/3; echo "Removing /dev/VolGroup00/4"; time echo "y" | lvremove /dev/VolGroup00/4; echo "Removed all snapshots"; done 3. In background run additional script to cause IO load of the system: while [ true ]; do cat /dev/zero | head -c1000m > /lv/output; rm -f /lv/output; done 4. After some system, script 2. will hang on some mount operation.
Yeah, I'm trying to track this down myself right now. Thanks for the test script - that will help as the current method we have for tripping this takes some hours to fall over the bug.
Any updates on this problem please.
Fixed a long time ago, IIRC.
Great, thanks. Closing the bug.