Reproducer: Ran inside a 8 proc VM instance, i386 kernel, 4MB of memory with the stress-ng xattr stress test on a cleanly formated ext4 partition (e.g. mkfs.ext4 /dev/vdb1): using stress-ng - to build from source on ubuntu systems: git clone git://kernel.ubuntu.com/cking/stress-ng sudo apt-get build-dep stress-ng cd stress-ng make test script to invoke the stressor, needs to be run as root: #!/bin/bash -x for i in $(seq 30) do ./stress-ng/stress-ng --xattr 8 -t 10 rc=$? if [ $rc -ne 0 ]; then exit 1 fi done This will trigger errors such as: EXT4-fs error (device vdb1): ext4_xattr_block_find:802: inode #131074: comm stress-ng-xattr: bad block 532519 and more often than not one needs to fsck the partition. I've tested this on mainline kernels from 3.19 through to 4.10-rc6 and I can trigger this on these kernels only on i386 systems. I've tested other 32 bit platforms (32 bit arm, raspberry pi 2) but can't trigger it. I cannot trigger this issue on amd64 builds of the same kernels in a VM. I put some debug into ext4_xattr_block_set() at the /* update the inode. * comment and the bug does not trip, so it seems that this reduces the risk of teh race condition occurring.
Forgot to mention, this can be triggered on a SMP ppc64el VM with 4.8, but far less frequently than the i386.
..and also userspace xattr setting calls get errno EUCLEAN 117 "Structure needs cleaning".
For reference, the xattr stressor is: http://kernel.ubuntu.com/git/cking/stress-ng.git/tree/stress-xattr.c
Thanks for the bug report! Does this require using the latest version of stress-ng (from the git tree), or is it reproducible using the version of stress-ng in Ubuntu or Debian Jessie?
I'd recommend the latest version from the git repo.
And since you appear to be the maintainer of the stress-ng package, while I'm here, any objections if upload a backport of stress-ng v0.07.16 to jessie-backports? :-)
An upload to jessie-backports is doable, but I'm going to push another release at the end of this week.
Hi Ted, is there anything else I need to add to this bug report?
Wow, this triggers in a Debian/unstable 64-bit VM here after the first run: $ mkfs.ext4 /dev/sdd $ mount -t ext4 /dev/sdd /mnt/disk && cd /mnt/disk $ time ~/193661.sh ++ seq 30 + for i in $(seq 30) + /usr/local/sbin/stress-ng --xattr 8 -t 10 stress-ng: info: [2639] dispatching hogs: 8 xattr stress-ng: info: [2639] cache allocate: default cache size: 6144K stress-ng: fail: [2647] stress-ng-xattr: fsetxattr failed, errno=117 (Structure needs cleaning) stress-ng: error: [2639] process 2647 (stress-ng-xattr) terminated with an error, exit status=1 stress-ng: info: [2639] unsuccessful run completed in 10.01s + rc=2 + '[' 2 -ne 0 ']' + exit 1 real 0m10.019s user 0m1.460s sys 0m26.752s $ dmesg | tail -2 [ 494.311212] EXT4-fs (sdd): mounted filesystem with ordered data mode. Opts: (null) [ 550.828337] EXT4-fs error (device sdd): ext4_xattr_block_find:786: inode #524292: comm stress-ng-xattr: bad block 2105384 $ uname -rv 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28)
Created attachment 254979 [details] ext4: lock the the xattr block before calculating its checksum
Created attachment 255541 [details] ext4: lock the xattr block before checksuming it
I've given this a good soak test on 32 bit and 64 bit x86 builds and it fixes the issue. Thanks Ted. Tested-by: Colin Ian King <colin.king@canonical.com>