Bug 199947 - Mounting and operating corrupted ext4 image causes invalid error code being returned to user space
Summary: Mounting and operating corrupted ext4 image causes invalid error code being r...
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Eric Sandeen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-06 17:50 UTC by Anatoly Trosinenko
Modified: 2021-01-25 01:59 UTC (History)
2 users (show)

See Also:
Kernel Version: v4.17
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Invalid ext4 FS image causing a bug (718 bytes, application/x-bzip)
2018-06-06 17:50 UTC, Anatoly Trosinenko
Details
Kernel .config file (102.52 KB, text/x-mpsub)
2018-06-06 17:51 UTC, Anatoly Trosinenko
Details
Kernel log (19.63 KB, text/plain)
2018-06-06 17:51 UTC, Anatoly Trosinenko
Details

Description Anatoly Trosinenko 2018-06-06 17:50:44 UTC
Created attachment 276345 [details]
Invalid ext4 FS image causing a bug

Performing some operations on invalid ext4 partition causes hangs for about 1 second (such as "user 0m 0.00s sys 0m 1.03s") and incorrect syscall return code.

How to reproduce:

1. Compile the 4.17 kernel with the attached config
2. Unpack and mount the attached FS image as ext4 (supposing, mount point is /mnt)
3. Run `time ln -s /mnt/abc /mnt/abc`

What happens:

[    1.858452] EXT4-fs warning (device sda): dx_probe:754: inode #2: comm exe: Unrecognised inode hash code 220
[    1.858670] EXT4-fs warning (device sda): dx_probe:865: inode #2: comm exe: Corrupt directory, running e2fsck is recommended
[    2.510052] EXT4-fs warning (device sda): dx_probe:754: inode #2: comm exe: Unrecognised inode hash code 220
[    2.510280] EXT4-fs warning (device sda): dx_probe:865: inode #2: comm exe: Corrupt directory, running e2fsck is recommended
ln: /mnt/abc: Unknown error 4094
[    3.166796] exe (952) used greatest stack depth: 13664 bytes left
Command exited with non-zero status 1
real	0m 1.31s
user	0m 0.00s
sys	0m 1.30s

The (3) can be repeated any number of times with the same 1 second hang without remounting.
Comment 1 Anatoly Trosinenko 2018-06-06 17:51:16 UTC
Created attachment 276347 [details]
Kernel .config file
Comment 2 Anatoly Trosinenko 2018-06-06 17:51:38 UTC
Created attachment 276349 [details]
Kernel log
Comment 3 Eric Sandeen 2018-06-06 18:50:44 UTC
I don't think 1s is considered a hang, it's just doing a slower traversal of a corrupt directory.

The nonstandard error code is incorrect though, I've sent a patch upstream for that.

[PATCH] ext4: Reset error code in ext4_find_entry in fallback

When ext4_find_entry() falls back to "searching the old fashioned
way" due to a corrupt dx dir, it needs to reset the error code
to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned
to userspace.

https://bugzilla.kernel.org/show_bug.cgi?id=199947

Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
Comment 4 Anatoly Trosinenko 2018-06-06 19:29:10 UTC
Technically, now I reproduced this on my primary Ubuntu 18.04 system: remounted everything r/o (to not break real partitions too much :) ), mounted this image and run

#!/bin/bash

while true
do
  ln -s /mnt/$1 /mnt/$1
done

... with different first argument values. When I run about 6 copies on my 6-core machine, my system started responding with random delays and kernel CPU time was about 90% (and what would be if they have CPU affinity set?). Considering question about permitting mounting crafted FS images from containers, it may be used for DOSing the host system.

On the other hand, AFAIK there was considered permitting third party containers to mount arbitrary FS images as bad practice anyway and this seems to be only DOS, so maybe it is not a problem (there probably exist many other ways to DOS the host system when you have so high permissions and I didn't tested what would be if CPU time limits are set properly).

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e968c9f1401088abc9a19ae6ff571644d37a355
Comment 5 Eric Sandeen 2018-06-07 17:29:21 UTC
I really don't see this as a DOS; by default it's not hard for a user to consume the majority of IO on a system and slow things down, but *shrug* perhaps others will have different opinions on the matter.
Comment 6 Anatoly Trosinenko 2018-07-08 11:00:55 UTC
> ... but *shrug* perhaps others will have different opinions on the matter.

Since no different opinions were presented, renamed an issue to reflect actual problem. :)

Note You need to log in before you can comment on or make changes to this bug.