Bug 9076

Summary: umount triggers a warning in jfs and takes almost a minute
Product: File System Reporter: Rafael J. Wysocki (rjwysocki)
Component: JFSAssignee: Dave Kleikamp (shaggy)
Status: CLOSED CODE_FIX    
Severity: normal CC: jack, shaggy
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.23-rc4 Tree: Mainline
Regression: Yes
Bug Depends on:    
Bug Blocks: 9056    
Attachments: Make sure special inode data is written after journal is flushed

Description Rafael J. Wysocki 2007-09-26 14:08:46 UTC
References      : http://lkml.org/lkml/2007/9/4/73
Submitter       : Oliver Neukum <oliver@neukum.org>
Comment 1 Jan Kara 2007-10-02 08:45:18 UTC
I did a bit research on this bug and it seems the problem is that log->synclist is not empty on umount. We wait for lazycommit thread to wake up and empty the synclist but that seems to never happen so we time out after 50s, issue a warning and bail out. BTW: I can sometime trigger the problem also in 2.6.22.5 kernel.
I don't know JFS internals much but shouldn't we wake up the lazy commit process somehow in jfs_flush_journal()?
Comment 2 Dave Kleikamp 2007-10-02 10:54:12 UTC
The lazy commit thread gets awakened when I/O to the journal has completed.

Oliver had send me some more syslog output:

Sep 19 14:13:04 oenone kernel: jfs_flush_journal: synclist not empty
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57b8: 0e161000 00000000 000036a0 00002ae6
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57c8: 508e7f00 ffff8100 508e7f00 ffff8100
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57d8: 00000004 00000000 00000000 00000000
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57e8: 63ad9000 ffff8100 00000055 00000000
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae57f8: 00000001 00002ae6 63ae5800 ffff8100
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae5808: 63ae5800 ffff8100 9f28df78 ffff8100
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae5818: 00001000 0000376c 00000000 00002ae6
Sep 19 14:13:04 oenone kernel: metapage: ffff810063ae5828: 508e7e00 ffff8100
Sep 19 14:13:04 oenone kernel: page: ffff81009f28df78: 010000000000087c ffffffff00000001
Sep 19 14:13:04 oenone kernel: page: ffff81009f28df88: ffff810063ae57b8 ffff810063ae3e28
Sep 19 14:13:04 oenone kernel: page: ffff81009f28df98: 0000000000000055 ffff81009f28df68
Sep 19 14:13:04 oenone kernel: page: ffff81009f28dfa8: ffff81009fc79790

metapage->flag = 4 (META_dirty) and page->flags = 87c, which both indicate that I/O has not started on the page.  I'm not sure why.

Hmm.  metapage->lid = 0.  That may be a problem.  I need to dig through the code a bit more.  It's been a while since I messed with this part of the code.

I'm thinking the speed of the flash may be aggravating some race that we don't normally see.
Comment 3 Dave Kleikamp 2007-10-02 11:52:48 UTC
Upon further review, metapage->lid = 0 is okay.  I think the problem is that jfs_flush_journal() tries to write all the metadata before the journal is flushed, and dirty pages that can't be written due to uncommitted transactions are not re-written.  If I'm right, I'm surprised this is seen so rarely, as this code hasn't changed in ages.
Comment 4 Dave Kleikamp 2007-10-02 12:44:11 UTC
Created attachment 13023 [details]
Make sure special inode data is written after journal is flushed

I sent this patch to Oliver to see if it fixes the problem
Comment 5 Alan 2008-09-23 03:05:39 UTC
Verified this has been applied