I use ext4 on zram for my temp directories, and sometimes rarely, things get corrupted. Using ext4 on a normal disk works fine in the same scenarios. I haven't managed to figure out what exactly is going on, but I do have a 157 GB strace log of it happening. One scenario that fairly reliably reproduces it, is building 3 copies of binutils in parallel. About half the time, /var/tmp/portage/cross-i686-w64-mingw32/binutils-2.37_p1-r2/work/build/binutils/.deps/stabs.Po ends up truncated, and one of the builds fails. The only other scenario I've seen it happen in (much less reproducible), is running Bitcoin functional tests. In this case, however, the ext4 structure itself got corrupted, and Linux was unable to recover (the directories affected became unusable until reboot). I suspect it's probably a threading-related issue, but it's plausible it could be page size related (I *think* I'm using 64k pages) though in the latter case I would expect it to be much more common.
Please could you post the results of running zramctl --output-all, and from mount? Also a snippet from the strace log showing zram being exercised might be useful. Thanks.
$ zramctl --output-all NAME DISKSIZE DATA COMPR ALGORITHM STREAMS ZERO-PAGES TOTAL MEM-LIMIT MEM-USED MIGRATED MOUNTPOINT /dev/zram2 62.5G 19.8G 7.1G zstd 64 5644 7.2G 0B 20.6G 502.9K /var/tmp /dev/zram1 62.5G 16.1G 3.2G zstd 64 30131 3.2G 0B 26.6G 31.9K /tmp /dev/zram0 16G 15.9G 1.9G zstd 64 19365 2G 0B 2.3G 61.5K [SWAP] $ mount | grep zram /dev/zram1 on /tmp type ext4 (rw,discard,stripe=16) /dev/zram2 on /var/tmp type ext4 (rw,discard,stripe=16)
Created attachment 300376 [details] snippet of strace log Here's 50000 lines from maybe 5 GB or so from the end of the log
I'd suggest that since you are able to reproduce, it may be helpful to try reproducing under a few slightly different environments to try and isolate this a bit. As a starting point, are you able to try reproducing using (a) a different compression algorithm (e.g., lz4, lzo); and (b) a different filesystem (e.g., XFS, btrfs); and (c) with all but one cores turned off (e.g. use echo 0 > /sys/devices/system/cpu/cpu<X>/online). This may help narrow it down somewhat.
Regarding (a), still having this issue with lz4. I've configured my system to use btrfs whenever I reboot again. No idea how soon that will be.