Bug 206211

Summary: BUG_ON() assertion fails in __do_readpage() fs/btrfs/extent_io.c.
Product: File System Reporter: Ryusuke Goto (ryusuke1925)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: RESOLVED CODE_FIX    
Severity: normal CC: fdmanana, marcos.souza.org
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 5.5-rc5 Subsystem:
Regression: No Bisected commit-id:
Attachments: experiments text
Reproduction code
Reproduction code

Description Ryusuke Goto 2020-01-15 06:09:25 UTC
Hello.

We encounted a BUG_ON() case when we use a JAVA's persistent caching library Ehcache on btrfs.
And we tried to find out the cause of BUG_ON() case and we also tried to fix this bug.
Details are following.

[Bug]
BUG_ON() assertion fails in __do_readpage() fs/btrfs/extent_io.c.

static int __do_readpage(...) {
	...
	while (cur <= end) {
		...
		em = __get_extent_map(inode, page, pg_offset, cur, 
				end - cur + 1, get_extent, em_cached);
		...
		BUG_ON(extent_map_end(em) <= cur); // <-here
		...
	}
	...
}

kern.log is attached.

[How to reproduce]
Keep attached program running on btrfs. This program is a simplified JAVA program which uses Ehcache. The bug can be reproduced in 30 minutes to 2 hours.
We tried to find out easier reproduction code (it may be a user space C program), but we have not yet.

[Cause]
Extent map struct is over-written by another thread.
It might be a race condition.
In try_merge_map(), the extent map is over-written even though another thread is still reading. 
(We figured out previous facts by some experiments. See experiments.txt for more details.)

[Improvement]
To eliminate the race condition, it is necessary to wait for the reference count of the extent map to decrease.
By adapting this improvement, no bugs occurred for more than 72 hours running previous program.

Signed-off-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp>
---
 extent_map.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/extent_map.c b/extent_map.c
index 6f417ff..6b0fd84 100644
--- a/extent_map.c
+++ b/extent_map.c
@@ -227,6 +227,8 @@ static int mergable_maps(struct extent_map *prev, struct extent_map *next)
 	      prev->block_start == EXTENT_MAP_INLINE) ||
 	     (next->block_start < EXTENT_MAP_LAST_BYTE - 1 &&
 	      next->block_start == extent_map_block_end(prev)))) {
+		while (refcount_read(&next->refs) > 2) {
+		}
 		return 1;
 	}
 	return 0;
-- 
2.17.1

We'd like to fix this bug, because the bug still occured on our production environment.
Could you advice me next actions (sending patch? or studying more about BTRFS?) .

Thanks,

Ryusuke
Comment 1 Ryusuke Goto 2020-01-15 06:10:39 UTC
Created attachment 286821 [details]
experiments text
Comment 2 Ryusuke Goto 2020-01-15 06:12:30 UTC
Created attachment 286823 [details]
Reproduction code
Comment 3 Ryusuke Goto 2020-01-15 06:12:47 UTC
Created attachment 286825 [details]
Reproduction code
Comment 4 Filipe David Manana 2020-01-31 16:36:00 UTC
Hi,

Thanks for the analysis!

I had someone else report this to me some time ago as well, precisely using java and ehcache as well.

Your analysis is mostly correct. The problem can only happen with concurrent reads (readpage) and fsync. All other interactions are protected by the file range locks.

I sent a patch earlier today for this:

https://patchwork.kernel.org/patch/11359975/

Thanks for the report and analysis.
Comment 5 Ryusuke Goto 2020-02-03 04:03:56 UTC
I'm glad to help!

(In reply to Filipe David Manana from comment #4)
> Hi,
> 
> Thanks for the analysis!
> 
> I had someone else report this to me some time ago as well, precisely using
> java and ehcache as well.
> 
> Your analysis is mostly correct. The problem can only happen with concurrent
> reads (readpage) and fsync. All other interactions are protected by the file
> range locks.
> 
> I sent a patch earlier today for this:
> 
> https://patchwork.kernel.org/patch/11359975/
> 
> Thanks for the report and analysis.
Comment 6 Marcos Souza 2020-08-24 00:48:14 UTC
(In reply to Filipe David Manana from comment #4)
> Hi,
> 
> Thanks for the analysis!
> 
> I had someone else report this to me some time ago as well, precisely using
> java and ehcache as well.
> 
> Your analysis is mostly correct. The problem can only happen with concurrent
> reads (readpage) and fsync. All other interactions are protected by the file
> range locks.
> 
> I sent a patch earlier today for this:
> 
> https://patchwork.kernel.org/patch/11359975/
> 
> Thanks for the report and analysis.

This commit was merged in

  commit 713db356041071d16360e82247de3107ec9ed57f
  Merge: 288b27a06e4f 1b9867eb6120
  Author: Linus Torvalds <torvalds@linux-foundation.org>
  Date:   Sun Feb 16 11:43:45 2020 -0800
  
      Merge tag 'for-5.6-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Closing as fixed.