Created attachment 276345 [details] Invalid ext4 FS image causing a bug Performing some operations on invalid ext4 partition causes hangs for about 1 second (such as "user 0m 0.00s sys 0m 1.03s") and incorrect syscall return code. How to reproduce: 1. Compile the 4.17 kernel with the attached config 2. Unpack and mount the attached FS image as ext4 (supposing, mount point is /mnt) 3. Run `time ln -s /mnt/abc /mnt/abc` What happens: [ 1.858452] EXT4-fs warning (device sda): dx_probe:754: inode #2: comm exe: Unrecognised inode hash code 220 [ 1.858670] EXT4-fs warning (device sda): dx_probe:865: inode #2: comm exe: Corrupt directory, running e2fsck is recommended [ 2.510052] EXT4-fs warning (device sda): dx_probe:754: inode #2: comm exe: Unrecognised inode hash code 220 [ 2.510280] EXT4-fs warning (device sda): dx_probe:865: inode #2: comm exe: Corrupt directory, running e2fsck is recommended ln: /mnt/abc: Unknown error 4094 [ 3.166796] exe (952) used greatest stack depth: 13664 bytes left Command exited with non-zero status 1 real 0m 1.31s user 0m 0.00s sys 0m 1.30s The (3) can be repeated any number of times with the same 1 second hang without remounting.
Created attachment 276347 [details] Kernel .config file
Created attachment 276349 [details] Kernel log
I don't think 1s is considered a hang, it's just doing a slower traversal of a corrupt directory. The nonstandard error code is incorrect though, I've sent a patch upstream for that. [PATCH] ext4: Reset error code in ext4_find_entry in fallback When ext4_find_entry() falls back to "searching the old fashioned way" due to a corrupt dx dir, it needs to reset the error code to NULL so that the nonstandard ERR_BAD_DX_DIR code isn't returned to userspace. https://bugzilla.kernel.org/show_bug.cgi?id=199947 Reported-by: Anatoly Trosinenko <anatoly.trosinenko@yandex.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> ---
Technically, now I reproduced this on my primary Ubuntu 18.04 system: remounted everything r/o (to not break real partitions too much :) ), mounted this image and run #!/bin/bash while true do ln -s /mnt/$1 /mnt/$1 done ... with different first argument values. When I run about 6 copies on my 6-core machine, my system started responding with random delays and kernel CPU time was about 90% (and what would be if they have CPU affinity set?). Considering question about permitting mounting crafted FS images from containers, it may be used for DOSing the host system. On the other hand, AFAIK there was considered permitting third party containers to mount arbitrary FS images as bad practice anyway and this seems to be only DOS, so maybe it is not a problem (there probably exist many other ways to DOS the host system when you have so high permissions and I didn't tested what would be if CPU time limits are set properly). https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3e968c9f1401088abc9a19ae6ff571644d37a355
I really don't see this as a DOS; by default it's not hard for a user to consume the majority of IO on a system and slow things down, but *shrug* perhaps others will have different opinions on the matter.
> ... but *shrug* perhaps others will have different opinions on the matter. Since no different opinions were presented, renamed an issue to reflect actual problem. :)
This was merged long ago. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/ext4?id=f39b3f45dbcb0343822cce31ea7636ad66e60bc2