Bug 216559 - btrfs crash root mount RAID0
Summary: btrfs crash root mount RAID0
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 high
Assignee: BTRFS virtual assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-10-08 20:41 UTC by Viktor Kuzmin
Modified: 2022-10-25 08:58 UTC (History)
4 users (show)

See Also:
Kernel Version: 6.0.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
crash-1 (335.61 KB, image/jpeg)
2022-10-11 11:45 UTC, Viktor Kuzmin
Details
crash-2 (321.37 KB, image/jpeg)
2022-10-11 11:46 UTC, Viktor Kuzmin
Details
crash-3 (262.95 KB, image/jpeg)
2022-10-11 11:46 UTC, Viktor Kuzmin
Details
crash-4 (319.98 KB, image/jpeg)
2022-10-11 11:49 UTC, Viktor Kuzmin
Details
crash-5 (369.79 KB, image/jpeg)
2022-10-11 11:50 UTC, Viktor Kuzmin
Details
crash-6 (315.57 KB, image/jpeg)
2022-10-11 11:50 UTC, Viktor Kuzmin
Details

Description Viktor Kuzmin 2022-10-08 20:41:32 UTC
In linux 6.0.0 there was change in block-group.c file in function btrfs_rmap_block:

was:

if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
	stripe_nr = stripe_nr * map->num_stripes + i;
	stripe_nr = div_u64(stripe_nr, map->sub_stripes);
} else if (map->type & BTRFS_BLOCK_GROUP_RAID0) {
	stripe_nr = stripe_nr * map->num_stripes + i;
}

new:

if (map->type & (BTRFS_BLOCK_GROUP_RAID0 |
		 BTRFS_BLOCK_GROUP_RAID10)) {
	stripe_nr = stripe_nr * map->num_stripes + i;
	stripe_nr = div_u64(stripe_nr, map->sub_stripes);
}

After this change I have a crash with DIVIDE by ZERO error. It seems that map->sub_stripes can be zero.

My setup is 2x 1TB nvme with space_cache=v2 and discard=async btrfs raid0.
Comment 1 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-11 09:19:09 UTC
You mean this change? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ac0677348f3c2

And could you please share the full log with the DIVIDE by ZERO error?
Comment 2 Viktor Kuzmin 2022-10-11 11:45:40 UTC
Created attachment 302972 [details]
crash-1
Comment 3 Viktor Kuzmin 2022-10-11 11:46:09 UTC
Created attachment 302973 [details]
crash-2
Comment 4 Viktor Kuzmin 2022-10-11 11:46:54 UTC
Created attachment 302974 [details]
crash-3
Comment 5 Viktor Kuzmin 2022-10-11 11:49:35 UTC
Created attachment 302975 [details]
crash-4
Comment 6 Viktor Kuzmin 2022-10-11 11:50:07 UTC
Created attachment 302976 [details]
crash-5
Comment 7 Viktor Kuzmin 2022-10-11 11:50:44 UTC
Created attachment 302977 [details]
crash-6
Comment 8 Viktor Kuzmin 2022-10-11 11:52:50 UTC
Yes, I'm talking about exactly this commit. I've attached screenshots of kernel crash from remote KVM. Unfortunately I have no plain text full log...

I have no more problems after reverting this commit. And the problem is with:

stripe_nr = div_u64(stripe_nr, map->sub_stripes);

It seems that map->sub_stripes is zero in my case.
Comment 9 Qu Wenruo 2022-10-11 12:14:00 UTC
I believe it was some older mkfs causing the sub_stripes to be zero in your chunk items.

Normally I would prefer to make tree-checker to reject such older (and with some invalid values) chunk items.

But I believe you'd better mount with older fs, and do a balance to get rid of such old chunks first.
Just in case if we go the reject path.
Comment 10 Viktor Kuzmin 2022-10-11 13:42:19 UTC
This server works for 5 years already, I think. I will make balance and will recheck. Thanks!
Comment 11 Qu Wenruo 2022-10-11 23:19:30 UTC
You can always verify if you have such offending chunk items by: (can be executed on mounted device)

# btrfs ins dump-tree -t chunk <device>

The target field is "sub_stripes"

	item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15783 itemsize 112
		length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
		io_align 65536 io_width 65536 sector_size 4096
		num_stripes 2 sub_stripes 1
			stripe 0 devid 1 offset 30408704
			dev_uuid 7eec3a5e-6463-4c4b-a2c8-716abd5b08f5
			stripe 1 devid 2 offset 9437184
			dev_uuid f4e381b7-e378-497d-974d-0a8e7f7e71a7

Although sub_stripes really only makes sense for RAID10 (should be 2 for RAID10),
for other profiles they should be 1, no matter what.

If you see something like "sub_stripes 0", then that chunk should be balanced.

If there is no more chunk items with "sub_stripes 0", then it should be safe to use 6.0 kernel then.
Comment 12 Viktor Kuzmin 2022-10-12 18:30:51 UTC
Thanks. This command showed me chunks with "sub_stripes 0".

And they are gone after "btrfs balance start --full-balance --bg /".
Comment 13 The Linux kernel's regression tracker (Thorsten Leemhuis) 2022-10-13 12:44:13 UTC
(In reply to Viktor Kuzmin from comment #12)

> And they are gone after "btrfs balance start --full-balance --bg /".

Well, good for you, but one question remains: might others fall into this trap? It sounds like it; the kernel hence should ideally be modified to handle this situation. Or not?
Comment 14 Viktor Kuzmin 2022-10-13 14:21:49 UTC
I have 12 servers with btrfs RAID0 disks setup. 9 of them were setted up mostly in different time more then a year ago and all of them have this problem - some chunks had 'sub_stripes 0'. And I think others may also fall into this trap.
Comment 15 Qu Wenruo 2022-10-21 00:23:02 UTC
I'm surprised that myself has already submitted such a patch at March 2022 for the same problem.

But at that time I don't have a real world report, nor a known progs version to cause such 0 sub_stripes.

Could you provide the history of the fses which still have the 0 sub_stripes numbers?
I'm particularly interested in which btrfs-progs is causing this problem.
Comment 16 John Bowler 2022-10-23 00:34:14 UTC
I see the same issue.  In my case it is on the RAID root and two other non-RAID btrfs partitions.  A more recently created btrfs partition does not have the problem.

In all three cases there are exactly three "sub_strips 0" items, they are contiguous item numbers and they are right at the start; either items 2,3,4 or 3,4,5 (1-based).

I'm using gentoo (I believe from the screenshots @Viktor is too) and I'm running the dev (~) release so I may have been using a mkfs.btrfs that never hit the standard world.  Nevertheless I regard the bug as a showstopper; it apparently can't be fixed from a running 6.0.x (or 6.1.x?) system.

This is my gentoo bug: https://bugs.gentoo.org/878023

If the mkfs.btrfs version is recorded in the FS I can readily retrieve it, however I am doing the balance on all three affected FSs so I hope the problem will disappear from my sight :-)
Comment 17 Qu Wenruo 2022-10-23 00:58:12 UTC
(In reply to John Bowler from comment #16)
> I see the same issue.  In my case it is on the RAID root and two other
> non-RAID btrfs partitions.  A more recently created btrfs partition does not
> have the problem.
> 
> In all three cases there are exactly three "sub_strips 0" items, they are
> contiguous item numbers and they are right at the start; either items 2,3,4
> or 3,4,5 (1-based).
> 
> I'm using gentoo (I believe from the screenshots @Viktor is too) and I'm
> running the dev (~) release so I may have been using a mkfs.btrfs that never
> hit the standard world.  Nevertheless I regard the bug as a showstopper; it
> apparently can't be fixed from a running 6.0.x (or 6.1.x?) system.
> 
> This is my gentoo bug: https://bugs.gentoo.org/878023
> 
> If the mkfs.btrfs version is recorded in the FS I can readily retrieve it,
> however I am doing the balance on all three affected FSs so I hope the
> problem will disappear from my sight :-)

Or you can try this patch:
https://patchwork.kernel.org/project/linux-btrfs/patch/90e84962486d7ab5a8bca92e329fe3ee6864680f.1666312963.git.wqu@suse.com/

This should make btrfs properly handle such older chunk items.

It will be backported to v6.0 (the only affected release AFAIK).
Comment 18 John Bowler 2022-10-23 01:52:36 UTC
> This should make btrfs properly handle such older chunk items.

Alas I no longer have a test case - I ran the btrfs balance on all my current file systems.

I can confirm that the work round works; I now have a functional 6.0.3 with all file systems mounted.
Comment 19 David Sterba 2022-10-25 08:58:20 UTC
Patch has been queued for merge and will appear in some near future 6.0.x stable. Thanks for the reports and fix.

Note You need to log in before you can comment on or make changes to this bug.