Bug 215712 - kernel deadlocks while mounting the image
Summary: kernel deadlocks while mounting the image
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-21 11:56 UTC by bughunter
Modified: 2022-03-24 12:46 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.15.4
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description bughunter 2022-03-21 11:56:05 UTC
I have created an image with mkfs.ext4, and modified some of the metadata of the image. Unfortunately, when I tried to mount the image with a loop device, the kernel deadlocked. I have attempted many ways to stop the mount process, even executed a 'kill' command, but they are failed, only what I can do is to reboot the system. Can anyone tell me why the kernel deadlocked, and how can I fix this problem?

I have upload the image to the Google Drive (https://drive.google.com/file/d/1NjUKdMufpoiyMscpFMdbiwOzyOios-aa/view?usp=sharing). Looking forward to getting a reply :)
Comment 1 Artem S. Tashkinov 2022-03-21 12:32:07 UTC
(In reply to bughunter from comment #0)
> I have created an image with mkfs.ext4, and modified some of the metadata of
> the image. Unfortunately, when I tried to mount the image with a loop
> device, the kernel deadlocked. I have attempted many ways to stop the mount
> process, even executed a 'kill' command, but they are failed, only what I
> can do is to reboot the system. Can anyone tell me why the kernel
> deadlocked, and how can I fix this problem?

That's a bug in the kernel, this situation shouldn't happen. Hopefully someone will debug and fix this.

In the meanwhile it would be great if you confirmed that 5.17 is also affected.
Comment 2 bughunter 2022-03-22 02:58:32 UTC
Thank you for prompt reply! I have tested this bug on kernel v5.17, and the problem still exists.
Comment 3 Christian Kujau 2022-03-23 23:06:36 UTC
A 5.17 kernel *is* able to mount the image here, but it takes quite some time to complete:

=======================================
$ time mount -v -t ext4 -o loop,ro,debug tmp.img /mnt/disk/
mount: /dev/loop0 mounted on /mnt/disk.

real    1m32.694s
user    0m0.008s
sys     1m32.665s
=======================================

During that time the CPU is spinning like crazy, but I don't know how to debug this further as to why it's spinning. perf comes to mind, but maybe something more ext4 specific is more useful here. dmesg shows, for this mount operation:

=======================================
[  188.269405] [EXT4 FS bs=1024, gc=2, bpg=8192, ipg=2048, mo=a802c818, mo2=0002]
[  280.932637] EXT4-fs (loop0): mounted filesystem with ordered data mode. Quota mode: disabled.
[  595.249319] EXT4-fs (loop0): error count since last fsck: 1
[  595.250559] EXT4-fs (loop0): initial error at time 1647888893: ext4_mb_generate_buddy:756
[  595.253403] EXT4-fs (loop0): last error at time 1647888893: ext4_mb_generate_buddy:756
=======================================

@Ming, can you share details on how the image has been modified?
Comment 4 bughunter 2022-03-24 07:13:43 UTC
I modified an original image by replacing the value of its metadata and recalculating the checksum value, unfortunately I did not record the modification process. The corresponding source image before modification is provided at (https://drive.google.com/file/d/10Pf8E4OwHH7UDVhP3-mxhQuyijvHO3lE/view?usp=sharing). Hope you have a way to compare the difference between these two images.
Comment 5 Christian Kujau 2022-03-24 12:21:38 UTC
Another attempt to mount the image through trace-cmd took an hour to complete and produced a trace.dat file, good thing btrfs compression shrank that down to 18 GB :-)

$ trace-cmd record -e ext4 mount -v -t ext4 -o loop,ro,debug ~/tmp.img /mnt/test/
$ trace-cmd hist
  %-2110.90  (599) mount    ext4_es_lookup_extent_enter #2097643693
         |
         --- *ext4_es_lookup_extent_enter*

  %-2110.90  (599) mount     ext4_es_lookup_extent_exit #2097643669
         |
         --- *ext4_es_lookup_extent_exit*

  %-0.29  (33) kswapd0           ext4_es_shrink_count #292356
         |
         --- *ext4_es_shrink_count*

  %-0.00  (597) trace-cmd           ext4_es_shrink_count #2592
         |
         --- *ext4_es_shrink_count*
[....]
Comment 6 bughunter 2022-03-24 12:46:26 UTC
Why does the 'mount' command take so long when using such a small image?

Note You need to log in before you can comment on or make changes to this bug.