Bug 9076 - umount triggers a warning in jfs and takes almost a minute
Summary: umount triggers a warning in jfs and takes almost a minute
Status: CLOSED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: JFS (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Dave Kleikamp
URL:
Keywords:
Depends on:
Blocks: 9056
  Show dependency tree
 
Reported: 2007-09-26 14:08 UTC by Rafael J. Wysocki
Modified: 2008-09-23 03:05 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.23-rc4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Make sure special inode data is written after journal is flushed (2.56 KB, patch)
2007-10-02 12:44 UTC, Dave Kleikamp
Details | Diff

Description Rafael J. Wysocki 2007-09-26 14:08:46 UTC
References      : http://lkml.org/lkml/2007/9/4/73
Submitter       : Oliver Neukum <oliver@neukum.org>
Comment 1 Jan Kara 2007-10-02 08:45:18 UTC
I did a bit research on this bug and it seems the problem is that log->synclist is not empty on umount. We wait for lazycommit thread to wake up and empty the synclist but that seems to never happen so we time out after 50s, issue a warning and bail out. BTW: I can sometime trigger the problem also in 2.6.22.5 kernel.
I don't know JFS internals much but shouldn't we wake up the lazy commit process somehow in jfs_flush_journal()?
Comment 2 Dave Kleikamp 2007-10-02 10:54:12 UTC
The lazy commit thread gets awakened when I/O to the journal has completed.

Oliver had send me some more syslog output:

Sep 19 14:13:04 oenone kernel: jfs_flush_journal: synclist not empty
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57b8: 0e161000 00000000 000036a0 00002ae6
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57c8: 508e7f00 ffff8100 508e7f00 ffff8100
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57d8: 00000004 00000000 00000000 00000000
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57e8: 63ad9000 ffff8100 00000055 00000000
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57f8: 00000001 00002ae6 63ae5800 ffff8100
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae5808: 63ae5800 ffff8100 9f28df78 ffff8100
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae5818: 00001000 0000376c 00000000 00002ae6
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae5828: 508e7e00 ffff8100
Sep 19 14:13:04 oenone kernel: page: ffff81009f28df78: 010000000000087c ffffffff00000001
Sep 19 14:13:04 oenone kernel: page: ffff81009f28df88: ffff810063ae57b8 ffff810063ae3e28
Sep 19 14:13:04 oenone kernel: page: ffff81009f28df98: 0000000000000055 ffff81009f28df68
Sep 19 14:13:04 oenone kernel: page: ffff81009f28dfa8: ffff81009fc79790

metapage->flag = 4 (META_dirty) and page->flags = 87c, which both indicate that I/O has not started on the page.  I'm not sure why.

Hmm.  metapage->lid = 0.  That may be a problem.  I need to dig through the code a bit more.  It's been a while since I messed with this part of the code.

I'm thinking the speed of the flash may be aggravating some race that we don't normally see.
Comment 3 Dave Kleikamp 2007-10-02 11:52:48 UTC
Upon further review, metapage->lid = 0 is okay.  I think the problem is that jfs_flush_journal() tries to write all the metadata before the journal is flushed, and dirty pages that can't be written due to uncommitted transactions are not re-written.  If I'm right, I'm surprised this is seen so rarely, as this code hasn't changed in ages.
Comment 4 Dave Kleikamp 2007-10-02 12:44:11 UTC
Created attachment 13023 [details]
Make sure special inode data is written after journal is flushed

I sent this patch to Oliver to see if it fixes the problem
Comment 5 Alan 2008-09-23 03:05:39 UTC
Verified this has been applied

Note You need to log in before you can comment on or make changes to this bug.