My distribution is a Manjaro 23.0.0, CPU is AMD A10. System disk is an SSD PNY_CS900 120Gb With Linux 6.4.6.1-MANJARO, a scrub on then system disk root reports errors and some times freeze the linux : sudo btrfs scrub status / UUID: 53e62983-fed7-46ed-b97c-23d19af4f26f Scrub started: Tue Aug 1 13:26:14 2023 Status: running Duration: 0:03:21 Time left: 1:12:50 ETA: Tue Aug 1 14:42:25 2023 Total to scrub: 85.39GiB Bytes scrubbed: 3.75GiB (4.40%) Rate: 19.13MiB/s Error summary: read=80 Corrected: 34 Uncorrectable: 46 Unverified: 0 In this case, this report is the last one before the Linux freeze. Scrub not always freese, but when it freeze only low level functionnalities as ping or ps are working. If Y downgrade the linux kernel to 6.1.41-1-MANJARO there is no errors : sudo btrfs scrub status / UUID: 53e62983-fed7-46ed-b97c-23d19af4f26f Scrub started: Thu Aug 3 14:17:11 2023 Status: finished Duration: 0:04:41 Total to scrub: 66.72GiB Rate: 225.62MiB/s Error summary: no errors found Ask me if you want more details.
(In reply to Olivier Wuillemin from comment #0) > My distribution is a Manjaro 23.0.0, CPU is AMD A10. > System disk is an SSD PNY_CS900 120Gb > > With Linux 6.4.6.1-MANJARO, a scrub on then system disk root reports errors > and some times freeze the linux : > > sudo btrfs scrub status / > UUID: 53e62983-fed7-46ed-b97c-23d19af4f26f > Scrub started: Tue Aug 1 13:26:14 2023 > Status: running > Duration: 0:03:21 > Time left: 1:12:50 > ETA: Tue Aug 1 14:42:25 2023 > Total to scrub: 85.39GiB > Bytes scrubbed: 3.75GiB (4.40%) > Rate: 19.13MiB/s > Error summary: read=80 > Corrected: 34 > Uncorrectable: 46 > Unverified: 0 > > In this case, this report is the last one before the Linux freeze. > Scrub not always freese, but when it freeze only low level functionnalities > as ping or ps are working. > > If Y downgrade the linux kernel to 6.1.41-1-MANJARO there is no errors : > > sudo btrfs scrub status / > UUID: 53e62983-fed7-46ed-b97c-23d19af4f26f > Scrub started: Thu Aug 3 14:17:11 2023 > Status: finished > Duration: 0:04:41 > Total to scrub: 66.72GiB > Rate: 225.62MiB/s > Error summary: no errors found > > Ask me if you want more details. Can you bisect between v6.1 and v6.4? See Documentation/admin-guide/bug-bisect.rst in the kernel sources for instructions.
Scrub returns no error to kernel version up to 6.3.13-2 Scrub returns errors for version 6.4.6-1 & 6.5.0rc3-1
If the system freeze, it may be a kernel crash, please provide the dmesg if possible. You may want to setup netconsole to catch the dying message on another machine: https://docs.kernel.org/networking/netconsole.html
Created attachment 304776 [details] Log file with btrfs scrub "errors" This log is the most representative of errors encountered. As the computer is not at the same place I live, I reboot it next morning to execute a memory test. I don't remember if computer was freezed at this time.
I didn't note true kernel crash. As the bug is repetitive, do not create damages and switch of kernel is easy on Manjaro, I may run a 6.4 or 6.5 kernel to make scrub log more voluble if some options exists. But I'm not able to recompile a kernel myself.
If it's not a crash but something like a hang, at least the dmesg would still help us to locate why. There is a branch with performance fixes, but I'm afraid you have to compile it yourself: https://github.com/adam900710/linux/tree/scrub_testing
Hi, I met the same problem and here is the dmesg: http://fars.ee/j6S5
Thanks for the dmesg, it looks like something related to chunk mapping error. This at least means it's false alert, but I'm still unsure what's going on. On those cases I can think of, it looks like there is a chunk being cleaned up while it's also being scrubbed. This happens when a chunk is going to be deleted in the current transaction, but in the previous transaction there are still extents in it. This means, the target fs is also under some workload at least. If that's the case, I can make the scrub process to skip the whole chunk, but this should be rare as we would mark the block group read-only during scrub, thus it should be deleted half way... @zhaoyang, if you can reproduce the problem reliably, mind to compile the btrfs module to test some debug patches?
Sure. It's pretty reproducible when using newer kernel. And what should I do to compile debug patches?
Created attachment 304824 [details] Extra debug output for scrub errors Here is the new debug patch for the scrub error. The current debug target is to check if the block group is removed halfway during scrub.
Created attachment 304832 [details] log file with patch applied
Created attachment 305220 [details] dmesg output
I think I may be seeing the same issue. I see the issue frequently in combination with kernel BUG at include/linux/scatterlist.h:115! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI See attachment above.
Thinking about it, I believe I've only seen this error with filesystems converted from ext4. Not sure this means much as most filesystems on this machine have been converted. Anyway, I converted a FS using `btrfs-convert --csum xxhash` and it looks like I can reproduce this error. I uploaded the converted image at https://gitlab.com/pgerber/btrfs-bug/-/raw/main/image.zst. I hope it helps diagnosing this issue.
I should probably mention the btrfs-progs version I used: Codename: bookworm user@disp8532:~$ dpkg -l btrfs-progs Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==============-============-============-=============================================== ii btrfs-progs 6.2-1 amd64 Checksumming Copy on Write Filesystem utilities user@disp8532:~$ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 12 (bookworm) Release: 12
Checking the debug output, it looks like the bio mapping part has something wrong, as we got the following messages: > unable to find logical 17708773376 length 4096 Another thing is, the debug output confirmed it's not a block group got deleted halfway, but more like a use-after-free, I got some internal report about it but unable to pin it down yet. I'll keep you updated when there is some progress.
I confirm that system partition has been converted form Ext4 in 2021Q1. Today I switch to Kernel 6.6.5.3 and start a scrub on the system partition. Scrub stay in the state "Running" 0 Mb/s 0 bytes scrubbed. After few time, all system lock up and I have to reset the server. I fall back to Linux 6.1.66.1 and scrub system partition with success.