Bug 51571
Summary: | Assertion of j_running_transaction on jbd2_journal_flush() | ||
---|---|---|---|
Product: | File System | Reporter: | yjwsignal |
Component: | ext4 | Assignee: | Jan Kara (jack) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | jack |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.4.0 | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: | Patch fixing assertion failure in jbd2_journal_flush() |
Description
yjwsignal
2012-12-11 09:26:14 UTC
Thanks for detailed report! I presume you have a crash dump when you could provide such details? Can you have a look at what journal->j_running_transaction->t_handle_count is? I can see the transaction has been started some 14 ms ago. That's not too long but still plenty in terms of CPU time. What I think is happening is: Process A Process B start_this_handle(). if (journal->j_barrier_count) # false if (!journal->j_running_transaction) { #true read_unlock(&journal->j_state_lock); jbd2_journal_lock_updates() jbd2_journal_flush() write_lock(&journal->j_state_lock); if (journal->j_running_transaction) { #false ... wait for committing trans ... write_unlock(&journal->j_state_lock); ... write_lock(&journal->j_state_lock); jbd2_get_transaction(journal, new_transaction); # Sets j_running_transaction write_unlock(&journal->j_state_lock); goto repeat; # eventually blocks on j_barrier_count > 0 I will attach here a patch that should fix this... Created attachment 88941 [details]
Patch fixing assertion failure in jbd2_journal_flush()
Can you test this patch please?
Dear Jan Kara. journal->j_running_transaction->t_handle_count is 0x0. I also agree with your opinion. Thank you. But, this bug is rarely happened. ie, It is very difficult to reproduce this bug again. But, as you memtioned, I also think that this bug has connection with j_barrier_count at this race situation. I wonder that this patch has the possibility to be applyed to the jbd2 at the next release of kernel. Thank you. Thanks, t_handle_count == 0 confirms the scenario I described likely happened. I'll push the fix upstream. Patch sent. Dear Jan Kara Thank your for your help, sincerely. Besides, I want to know where is handled your patch, or by whom. please. and at which kernel site. etc. Thank you. I sent the fix to ext4 maintainer and linux-ext4 mailing list - see e.g. http://lists.openwall.net/linux-ext4/2012/12/12/11 |