Bug 216559

Summary: btrfs crash root mount RAID0
Product: File System Reporter: Viktor Kuzmin (kvaster)
Component: btrfsAssignee: BTRFS virtual assignee (fs_btrfs)
Status: RESOLVED CODE_FIX    
Severity: high CC: dsterba, jbowler, regressions, wqu
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 6.0.0 Subsystem:
Regression: No Bisected commit-id:
Attachments: crash-1
crash-2
crash-3
crash-4
crash-5
crash-6

Description Viktor Kuzmin 2022-10-08 20:41:32 UTC
In linux 6.0.0 there was change in block-group.c file in function btrfs_rmap_block:

was:

if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
	stripe_nr = stripe_nr * map->num_stripes + i;
	stripe_nr = div_u64(stripe_nr, map->sub_stripes);
} else if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
	stripe_nr = stripe_nr * map->num_stripes + i;
}

new:

if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
		 BTRFS_BLOCK_GROUP_RAID10)) {
	stripe_nr = stripe_nr * map->num_stripes + i;
	stripe_nr = div_u64(stripe_nr, map->sub_stripes);
}

After this change I have a crash with DIVIDE by ZERO error. It seems that map->sub_stripes can be zero.

My setup is 2x 1TB nvme with space_cache=v2 and discard=async btrfs raid0.
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-11 09:19:09 UTC
You mean this change? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ac0677348f3c2

And could you please share the full log with the DIVIDE by ZERO error?
Comment 2 Viktor Kuzmin 2022-10-11 11:45:40 UTC
Created attachment 302972 [details]
crash-1
Comment 3 Viktor Kuzmin 2022-10-11 11:46:09 UTC
Created attachment 302973 [details]
crash-2
Comment 4 Viktor Kuzmin 2022-10-11 11:46:54 UTC
Created attachment 302974 [details]
crash-3
Comment 5 Viktor Kuzmin 2022-10-11 11:49:35 UTC
Created attachment 302975 [details]
crash-4
Comment 6 Viktor Kuzmin 2022-10-11 11:50:07 UTC
Created attachment 302976 [details]
crash-5
Comment 7 Viktor Kuzmin 2022-10-11 11:50:44 UTC
Created attachment 302977 [details]
crash-6
Comment 8 Viktor Kuzmin 2022-10-11 11:52:50 UTC
Yes, I'm talking about exactly this commit. I've attached screenshots of kernel crash from remote KVM. Unfortunately I have no plain text full log...

I have no more problems after reverting this commit. And the problem is with:

stripe_nr = div_u64(stripe_nr, map->sub_stripes);

It seems that map->sub_stripes is zero in my case.
Comment 9 Qu Wenruo 2022-10-11 12:14:00 UTC
I believe it was some older mkfs causing the sub_stripes to be zero in your chunk items.

Normally I would prefer to make tree-checker to reject such older (and with some invalid values) chunk items.

But I believe you'd better mount with older fs, and do a balance to get rid of such old chunks first.
Just in case if we go the reject path.
Comment 10 Viktor Kuzmin 2022-10-11 13:42:19 UTC
This server works for 5 years already, I think. I will make balance and will recheck. Thanks!
Comment 11 Qu Wenruo 2022-10-11 23:19:30 UTC
You can always verify if you have such offending chunk items by: (can be executed on mounted device)

# btrfs ins dump-tree -t chunk <device>

The target field is "sub_stripes"

	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15783 itemsize 112
		length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 2 sub_stripes 1
			stripe 0 devid 1 offset 30408704
			dev_uuid 7eec3a5e-6463-4c4b-a2c8-716abd5b08f5
			stripe 1 devid 2 offset 9437184
			dev_uuid f4e381b7-e378-497d-974d-0a8e7f7e71a7

Although sub_stripes really only makes sense for RAID10 (should be 2 for RAID10),
for other profiles they should be 1, no matter what.

If you see something like "sub_stripes 0", then that chunk should be balanced.

If there is no more chunk items with "sub_stripes 0", then it should be safe to use 6.0 kernel then.
Comment 12 Viktor Kuzmin 2022-10-12 18:30:51 UTC
Thanks. This command showed me chunks with "sub_stripes 0".

And they are gone after "btrfs balance start --full-balance --bg /".
Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-13 12:44:13 UTC
(In reply to Viktor Kuzmin from comment #12)

> And they are gone after "btrfs balance start --full-balance --bg /".

Well, good for you, but one question remains: might others fall into this trap? It sounds like it; the kernel hence should ideally be modified to handle this situation. Or not?
Comment 14 Viktor Kuzmin 2022-10-13 14:21:49 UTC
I have 12 servers with btrfs RAID0 disks setup. 9 of them were setted up mostly in different time more then a year ago and all of them have this problem - some chunks had 'sub_stripes 0'. And I think others may also fall into this trap.
Comment 15 Qu Wenruo 2022-10-21 00:23:02 UTC
I'm surprised that myself has already submitted such a patch at March 2022 for the same problem.

But at that time I don't have a real world report, nor a known progs version to cause such 0 sub_stripes.

Could you provide the history of the fses which still have the 0 sub_stripes numbers?
I'm particularly interested in which btrfs-progs is causing this problem.
Comment 16 John Bowler 2022-10-23 00:34:14 UTC
I see the same issue.  In my case it is on the RAID root and two other non-RAID btrfs partitions.  A more recently created btrfs partition does not have the problem.

In all three cases there are exactly three "sub_strips 0" items, they are contiguous item numbers and they are right at the start; either items 2,3,4 or 3,4,5 (1-based).

I'm using gentoo (I believe from the screenshots @Viktor is too) and I'm running the dev (~) release so I may have been using a mkfs.btrfs that never hit the standard world.  Nevertheless I regard the bug as a showstopper; it apparently can't be fixed from a running 6.0.x (or 6.1.x?) system.

This is my gentoo bug: https://bugs.gentoo.org/878023

If the mkfs.btrfs version is recorded in the FS I can readily retrieve it, however I am doing the balance on all three affected FSs so I hope the problem will disappear from my sight :-)
Comment 17 Qu Wenruo 2022-10-23 00:58:12 UTC
(In reply to John Bowler from comment #16)
> I see the same issue.  In my case it is on the RAID root and two other
> non-RAID btrfs partitions.  A more recently created btrfs partition does not
> have the problem.
> 
> In all three cases there are exactly three "sub_strips 0" items, they are
> contiguous item numbers and they are right at the start; either items 2,3,4
> or 3,4,5 (1-based).
> 
> I'm using gentoo (I believe from the screenshots @Viktor is too) and I'm
> running the dev (~) release so I may have been using a mkfs.btrfs that never
> hit the standard world.  Nevertheless I regard the bug as a showstopper; it
> apparently can't be fixed from a running 6.0.x (or 6.1.x?) system.
> 
> This is my gentoo bug: https://bugs.gentoo.org/878023
> 
> If the mkfs.btrfs version is recorded in the FS I can readily retrieve it,
> however I am doing the balance on all three affected FSs so I hope the
> problem will disappear from my sight :-)

Or you can try this patch:
https://patchwork.kernel.org/project/linux-btrfs/patch/90e84962486d7ab5a8bca92e329fe3ee6864680f.1666312963.git.wqu@suse.com/

This should make btrfs properly handle such older chunk items.

It will be backported to v6.0 (the only affected release AFAIK).
Comment 18 John Bowler 2022-10-23 01:52:36 UTC
> This should make btrfs properly handle such older chunk items.

Alas I no longer have a test case - I ran the btrfs balance on all my current file systems.

I can confirm that the work round works; I now have a functional 6.0.3 with all file systems mounted.
Comment 19 David Sterba 2022-10-25 08:58:20 UTC
Patch has been queued for merge and will appear in some near future 6.0.x stable. Thanks for the reports and fix.