Bug 14602

Summary: JBD2 journal abort / checkpoint creation racy?
Product: File System Reporter: Andi Kleen (andi-bz)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, tytso
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.32-rc6 Subsystem:
Regression: No Bisected commit-id:

Description Andi Kleen 2009-11-14 12:05:13 UTC
I was testing a new file system feature that triggered IO errors
on inode read. I did not actually change the abort part here
so I believe this is unrelated to my changes.

I had one case during testing where the journal abort didn't work
and jbd2 journal abort errored out.

<putting some stress on the file system>
EXT4-fs error (device sda1): ext4_iget: triggering IO error
EXT4-fs error (device sda1): ext4_put_super: Couldn't clean up the journal

This happens in
ext4_put_super->jbd2_journal_destroy->jbd2_log_do_checkpoint
and then finally 


  if (journal->j_sb_buffer) {
                if (!is_journal_aborted(journal)) {
                        /* We can now mark the journal as empty. */
                        journal->j_tail = 0;
                        journal->j_tail_sequence =
                                ++journal->j_transaction_sequence;
                        jbd2_journal_update_superblock(journal, 1);
                } else {
                        err = -EIO; <------------ this is triggered
                }

So it looks like jbd2_log_do_checkpoint sometimes does not succeed?
It does a couple of retry, maybe that log is not enough.

The problem is not easy to trigger unfortunately, i only saw it very rarely.
I assume it's some race in the checkpoint creation.

I tried to reproduce it with jbd2 debugging enabled, but no luck so far.
Comment 1 Theodore Tso 2009-11-15 22:27:58 UTC
If the journal is aborted, then any attempts to destroy the journal, checkpoint the journal, create a new transaction, etc., will result in -EIO being returned.   In the code that you quoted, the EIO is being returned because is_journal_aborted(journal) returns true.

The normal reason why the journal gets aborted is because ext4_error() has called ext4_handle_error(), which then calls jbd2_journal_abort() and the errors behavior is remount-read-only.   The basic idea is that if some kind of file system error or corruption has been detected, we want to stop the file system from any further modifications, which might cause more damage and/or user data loss.

We don't have a good errno value to use, so we just use EIO, and the error handling after an aborted journal is admittedly not great.   It triggers a lot of scary, and to someone who isn't an ext3/ext4 veteran, misleading, error messages, and unfortunately it can cause the key initial failure to scroll off of a VT console.

So it's not a race condition; the system is functioning as designed, although at some point we may put in better error handling and some earlier tests for an aborted journal in some of the upper layers of various ext4 functions.
Comment 2 Andi Kleen 2009-11-16 16:04:33 UTC
Hmm, it still seems strange because sometimes the error happens and more often
not, even when triggering the same underlying inode IO error. I think
something is at least inconsistent.
Comment 3 Theodore Tso 2009-11-16 18:59:35 UTC
Well, an I/O error won't cause an ext4_error() --- unless the garbage returned is corrupted enough that it causes the ext4 file system code to decide to throw an ext4_error.   So the fact that you sometimes get an aborted journal (caused by an ext4_error) isn't entirely surprising.  

The ext4_error is going to depend on whether or not the ext4 fs code things the file system is corrupted, which is going to be data dependent, and that might be variable after an I/O error.   The real problem may be that we need to be doing a better job of doing error checking....