Bug 15792

Summary: ext4_inode_info->i_flags modification is racy
Product: File System Reporter: Dmitry Monakhov (dmonakhov)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED CODE_FIX    
Severity: normal    
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.33-rc1 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg
xfstest testcase patch
i_flags modification history

Description Dmitry Monakhov 2010-04-15 22:13:29 UTC
Created attachment 26025 [details]
dmesg

I've got an OOPS on 8'cores host under heavy IO load
Comment 1 Dmitry Monakhov 2010-04-15 22:17:54 UTC
Created attachment 26026 [details]
xfstest testcase patch

This is trivial but proven to be useful fsstress based test-case.
i'm run it like follows
while true; do ./check 227 || break ; done
Comment 2 Dmitry Monakhov 2010-04-15 22:31:55 UTC
The oops happens because non tested branch was triggered.
./fs/ext4/extents.c
3477:    if (unlikely(EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL)) {
                if (unlikely(!eh->eh_entries)) {
                        EXT4_ERROR_INODE(inode,
                                         "eh->eh_entries == 0 ee_block %d",
                                         ex->ee_block);
### OOPS here because ex == NULL ^^^^^^^^^^^^^^^^^^^^^^^^
                        err = -EIO;
                        goto out2;
                }

Bug was introduced by following commit:
commit 273df556b6ee2065bfe96edab5888d3dc9b108d8
Author: Frank Mayhar <fmayhar@google.com>
Date:   Tue Mar 2 11:46:09 2010 -0500

And in fact it is rather trivial to fix.
But most interesting question what the hell we are doing on that error path?
inode has EXT4_EOFBLOCKS_FL flag enabled but eh->eh_entries, and in fact
after adding more debug information i've found that inode is simply blockless.
i_blocks == 0, i_size == 0.

I've collected per-inode i_flag modification history (see an attachment)
Comment 3 Dmitry Monakhov 2010-04-15 22:39:15 UTC
Created attachment 26027 [details]
i_flags modification history
Comment 4 Dmitry Monakhov 2010-04-15 22:52:22 UTC
Most interesting part is last lines

[61] [2]   truncate clr s:737987 d:737987 b:104 fl:80000 
[62] [0]   clr ext4_inode_blocks_set b:104 fl:480000 bit:40000 
[63] [2]   trunc_ext begin s:737987 d:737987 b:104 fl:480000 ml:1 

I.E.
CPU2: is doing  EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL
CPU0: is doung  ei->i_flags &= ~EXT4_HUGE_FILE_FL
CPU2: Wow  EXT4_EOFBLOCKS_FL appear again due to race with cpu0.
So even if truncate holds i_mutex it is possible to modify i_flags.
Seems that we have to modify i_flags via anomic bits operations.

A fix is almost ready. Currently i'm testing it.
Comment 5 Dmitry Monakhov 2010-04-19 03:42:20 UTC
Ohh.. one more thing, which is not actually related with the name of the bug,
But this peace of code was already mention here. Let's just document this here.

fs/ext4/extents.c 
3288:int ext4_ext_get_blocks(...)
{
....
3477:   if (unlikely(EXT4_I(inode)->i_flags & EXT4_EOFBLOCKS_FL)) {
3478:           if (unlikely(!eh->eh_entries)) {
3479:                   EXT4_ERROR_INODE(inode,
3480:                                    "eh->eh_entries == 0 ee_block %d",
3481:                                    ex->ee_block);
3482:                   err = -EIO;
3483:                   goto out2;
3484:           }
3485:           last_ex = EXT_LAST_EXTENT(eh);
3486:           if (iblock + ar.len > le32_to_cpu(last_ex->ee_block)
3487:               + ext4_ext_get_actual_len(last_ex))
##### ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
##### If depth > 0 then last_ex may not be the latest extent of the file.
##### because "eh" is just a one of leafs(writing at the middle of a file)
##### We have to find latest extent by traversing a whole path again but
##### always chose the latest eh,ex.
3488:                   EXT4_I(inode)->i_flags &= ~EXT4_EOFBLOCKS_FL;
for (  
3489:   }
Comment 6 Dmitry Monakhov 2010-04-19 14:38:10 UTC
Proposed patches
http://patchwork.ozlabs.org/patch/50468/
http://patchwork.ozlabs.org/patch/50469/

My hosts survived reasonable amount of time with these patches.
If everybody agree with the first patch, I'll prepare similar one for ext3.