Bug 178781 - btrfsck hangs in an infinite loop in deal_root_from_list() preventing me to return my filesystem into a sane state
Summary: btrfsck hangs in an infinite loop in deal_root_from_list() preventing me to r...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 blocking
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-20 17:30 UTC by ytrezq
Modified: 2016-10-21 15:04 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.5.0 btrfs-progs v4.7.3
Tree: Mainline
Regression: No


Attachments
I know this is irrelevant, but here’s what happend when I try to delete some files which are corrupt (23.61 KB, text/plain)
2016-10-20 17:30 UTC, ytrezq
Details

Description ytrezq 2016-10-20 17:30:08 UTC
Created attachment 242101 [details]
I know this is irrelevant, but here’s what happend when I try to delete some files which are corrupt

Here’s my problem :

I ran duperemove and it process got caught in an initerruptible state forcing me to perform an unclean shutdown (I couldn’t shutdown the computer without umounting the btrfs filesystem since it was in use by duperemove).
Since then, the filesystem is damaged in a way which makes I can remove some large files (It produce a kernel backtrace in the log).



I expected btrfs check --repair --init-extent-tree -b /dev/dm-7 could return my filesystem in a state that would allow me to delete it’s largest files.

Unfortunately, It seems to always get caught in an infinite loop (full cpu with no disk access) after printing this :
Checking filesystem on /dev/dm-7
UUID: 56040bbb-ed5c-47f2-82e2-34457bd7b4f3
Creating a new extent tree
Failed to find [75191291904, 168, 4096]
btrfs unable to find ref byte nr 75191291904 parent 0 root 1  owner 1 offset 0
Failed to find [75191316480, 168, 4096]
btrfs unable to find ref byte nr 75191316480 parent 0 root 1  owner 0 offset 1
parent transid verify failed on 75191349248 wanted 3555361 found 3555362
Ignoring transid failure
checking extents [O]

Before, the unclean shutdown, btrfs check was taking less than a minute (it contains ~240Gb of data in 51Gb and about 150000 files) (there’s no snapshots or subvolumes).

After investigating a bit in gdb, I identified the piece of code where the problem lies :
		while (1) {
			ret = run_next_block(root, bits, bits_nr, &last,
					     pending, seen, reada, nodes,
					     extent_cache, chunk_cache,
					     dev_cache, block_group_cache,
					     dev_extent_cache, rec);
			if (ret != 0)
				break;
		}
Because ret never become true, the loop is infinite at some point in the checking.


I’m not an export in filesystems so I couldn’t invastigate further, but the issue is still reproductible, and I still can’t delete some of my largest files.
I can post additionnal informations if necessary.
Comment 1 ytrezq 2016-10-20 17:36:12 UTC
Of course, in order to be sure the loop is infinite, I let btrfs check ran during 41 hours  (it was using 408Mb of ram).
I also tried without -b and --init-extent-tree but I got the same problem forcing me to press ctrl+c
Comment 2 ytrezq 2016-10-21 14:40:19 UTC
some supplemental informations :

btrfs fi df /mnt/
Data, single: total=66.01GiB, used=0.00B
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=5.00GiB, used=28.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=4.00MiB, used=0.00B

Label: 'backup'  uuid: 56040bbb-ed5c-47f2-82e2-34457bd7b4f3
        Total devices 1 FS bytes used 44.00KiB
        devid    1 size 298.91GiB used 76.04GiB path /dev/mapper/isw_bdffeeeijj_Volume0p7

Result of btrfs-image on /dev/mapper/isw_bdffeeeijj_Volume0p7 :
https://web.archive.org/web/20161020220914/https://filebin.net/7ni8kfpog1dxw4jc/btrfs-image_capture.xz

Note You need to log in before you can comment on or make changes to this bug.