This bug report is about ext4 metadata corruption on large (>=10TB) ext4 volumes. This was also reported in 2014 [ http://marc.info/?l=linux-ext4&m=139878494527370&w=2 ] I'm getting sporadic FS errors like this one: (More of these i've pasted at https://8n1.org/10745/cc34) | EXT4-fs error (device vdb): ext4_mb_generate_buddy:757: | group 79842, block bitmap and bg descriptor inconsistent: 10073 vs 10071 | free clusters | Aborting journal on device vdb-8. An e2fsck run then shows: | Pass 5: Checking group summary information | Block bitmap differences: +(2616281446--2616281447) | Free blocks count wrong (170942497, counted=129906218). | Free inodes count wrong (670863012, counted=670860975). I've patched my kernel with WARN_ON(1); inserted in tactical places and caught one such situation: | EXT4-fs (vdb): pa ffff880016544888: logic 982168, phys. 2469410748, len 104 | EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3773: group 75360, free 38, pa_free 36 | Aborting journal on device vdb-8. | EXT4-fs (vdb): Remounting filesystem read-only | ------------[ cut here ]------------ | WARNING: CPU: 1 PID: 1706 at fs/ext4/mballoc.c:3774 ext4_mb_release_inode_pa.isra.27+0x1cb/0x2c0() | Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_tcpudp ip6_tables | nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables | x_tables cirrus ttm drm_kms_helper drm kvm_intel kvm ppdev syscopyarea sysfillrect | 8250_fintek serio_raw i2c_piix4 sysimgblt pvpanic parport_pc mac_hid nfsd auth_rpcgss nfs_acl | lockd grace sunrpc lp parport autofs4 psmouse floppy pata_acpi | CPU: 1 PID: 1706 Comm: deluged Not tainted 3.19.8-ckt4 #1 | Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 | ffffffff81ab4fef ffff8800da1bb978 ffffffff817c3760 0000000000000007 | 0000000000000000 ffff8800da1bb9b8 ffffffff8107696a ffff8800da1bb9a8 | 0000000000000026 0000000000003825 0000000000003824 ffff880016544888 | Call Trace: | [<ffffffff817c3760>] dump_stack+0x45/0x57 | [<ffffffff8107696a>] warn_slowpath_common+0x8a/0xc0 | [<ffffffff81076a5a>] warn_slowpath_null+0x1a/0x20 | [<ffffffff812b01bb>] ext4_mb_release_inode_pa.isra.27+0x1cb/0x2c0 | [<ffffffff812739df>] ? ext4_read_block_bitmap_nowait+0x26f/0x5f0 | [<ffffffff812b3c6a>] ext4_discard_preallocations+0x30a/0x490 | [<ffffffff8127b578>] ext4_da_update_reserve_space+0x178/0x1b0 | [<ffffffff812a9129>] ext4_ext_map_blocks+0xcd9/0xe50 | [<ffffffff8127b6d9>] ext4_map_blocks+0x129/0x570 | [<ffffffff8127e89d>] ? ext4_writepages+0x35d/0xca0 | [<ffffffff812ab3a9>] ? __ext4_journal_start_sb+0x69/0xe0 | [<ffffffff8127eac2>] ext4_writepages+0x582/0xca0 | [<ffffffff81187a4e>] do_writepages+0x1e/0x30 | [<ffffffff8117bbe9>] __filemap_fdatawrite_range+0x59/0x60 | [<ffffffff8117bc4c>] filemap_write_and_wait+0x2c/0x60 | [<ffffffff8120903d>] do_vfs_ioctl+0x3fd/0x4e0 | [<ffffffff812091a1>] SyS_ioctl+0x81/0xa0 | [<ffffffff817ca84d>] system_call_fastpath+0x16/0x1b | ---[ end trace c7de4d0d78cb95b6 ]--- | EXT4-fs error (device vdb) in ext4_writepages:2412: IO failure | EXT4-fs (vdb): ext4_writepages: jbd2_start: 9223372036854775751 pages, ino 84149503; err -30 After this, the system started logging a lot of this same message: | EXT4-fs error (device vdb): ext4_find_extent:900: inode #84149503: comm deluged: | pblk 225181822 bad header/extent: invalid magic - magic 53fd, entries 37907, | max 27407(0), depth 50401(0) Ran e2fsck and got: | Pass 5: Checking group summary information | Block bitmap differences: +(1431556444--1431556445) +(2469410748-2469410749) | Free blocks count wrong (134030133, counted=57970467). | Free inodes count wrong (670746893, counted=670746452). Which is usually the same output for fsck in these situations. This server is a QEMU KVM virtual machine running on Intel x64 hardware.
Hit by this bug again: | EXT4-fs error (device vdb): ext4_mb_generate_buddy:757: group 76916, | block bitmap and bg descriptor inconsistent: 959 vs 957 free | clusters | Aborting journal on device vdb-8. | EXT4-fs (vdb): Remounting filesystem read-only | ------------[ cut here ]------------ | WARNING: CPU: 0 PID: 5000 at fs/ext4/mballoc.c:758 | ext4_mb_generate_buddy+0x1fa/0x340() | Modules linked in: btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs | msdos jfs xfs libcrc32c xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 | ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack | nf_conntrack iptable_filter ip_tables x_tables cirrus ttm drm_kms_helper | drm kvm_intel kvm ppdev syscopyarea sysfillrect serio_raw pvpanic | sysimgblt 8250_fintek parport_pc i2c_piix4 mac_hid nfsd auth_rpcgss | nfs_acl lockd grace sunrpc lp parport autofs4 psmouse floppy pata_acpi | CPU: 0 PID: 5000 Comm: deluged Not tainted 3.19.8-ckt4 #1 | Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 | ffffffff81ab4fef ffff8800db637858 ffffffff817c3760 ffff88011fc0fdb8 | 0000000000000000 ffff8800db637898 ffffffff8107696a 0000000000000001 | 00000000000003bf ffff8800d98e4000 0000000000008000 ffff8800084c1000 | Call Trace: | [<ffffffff817c3760>] dump_stack+0x45/0x57 | [<ffffffff8107696a>] warn_slowpath_common+0x8a/0xc0 | [<ffffffff81076a5a>] warn_slowpath_null+0x1a/0x20 | [<ffffffff812ae6ea>] ext4_mb_generate_buddy+0x1fa/0x340 | [<ffffffff812aed00>] ext4_mb_init_cache+0x390/0x700 | [<ffffffff812af5cd>] ext4_mb_load_buddy+0x1ad/0x360 | [<ffffffff812b25d7>] ext4_mb_regular_allocator+0x1b7/0x450 | [<ffffffff812b41f1>] ext4_mb_new_blocks+0x401/0x550 | [<ffffffff812a4120>] ? ext4_find_extent+0x140/0x330 | [<ffffffff812a8a87>] ext4_ext_map_blocks+0x637/0xe50 | [<ffffffff8127b6d9>] ext4_map_blocks+0x129/0x570 | [<ffffffff8127eac2>] ext4_writepages+0x582/0xca0 | [<ffffffff810f0700>] ? get_futex_key+0x1c0/0x2b0 | [<ffffffff810a71c8>] ? __enqueue_entity+0x78/0x80 | [<ffffffff81187a4e>] do_writepages+0x1e/0x30 | [<ffffffff8117bbe9>] __filemap_fdatawrite_range+0x59/0x60 | [<ffffffff8117bc4c>] filemap_write_and_wait+0x2c/0x60 | [<ffffffff8120903d>] do_vfs_ioctl+0x3fd/0x4e0 | [<ffffffff812091a1>] SyS_ioctl+0x81/0xa0 | [<ffffffff817ca84d>] system_call_fastpath+0x16/0x1b | ---[ end trace 860c733f0437c9b7 ]--- After which the kernel starts logging in rapid succession: | EXT4-fs error (device vdb): ext4_find_extent:900: inode #84410502: | comm deluged: pblk 223281469 bad header/extent: invalid magic - magic | 80da, entries 13942, max 3979(0), depth 42746(0) | EXT4-fs error: 40 callbacks suppressed
e2fsck 1.42.9 (4-Feb-2014) [ .. ] Pass 5: Checking group summary information Block bitmap differences: +(2520387693--2520387694) Fix? yes Free blocks count wrong (57970467, counted=36036558). Fix? yes Free inodes count wrong (670746452, counted=670746137). Fix? yes /dev/vdb: ***** FILE SYSTEM WAS MODIFIED ***** /dev/vdb: 342503/671088640 files (5.9% non-contiguous), 2648318002/2684354560 blocks
Since then, running 4.2.0-16-generic (Ubuntu 'Wily', 15.10), problem still persists. I've re-created the FS, made it 10T bigger (20T total, 11T in use), copied all data. Logs: Nov 4 15:45:16 [447918.885531] EXT4-fs (vdb): pa ffff88002162ef08: logic 192512, phys. 2823200768, len 1929 Nov 4 15:45:16 [447918.885776] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86157, free 1917, pa_free 1915 Nov 4 15:45:16 [447918.926300] EXT4-fs (vdb): Remounting filesystem read-only Nov 4 15:45:16 [447918.926652] EXT4-fs error (device vdb) in ext4_writepages:2520: IO failure Nov 4 15:45:16 [447918.968068] EXT4-fs (vdb): ext4_writepages: jbd2_start: 9223372036854775790 pages, ino 310215523; err -30 Nov 4 15:54:39 [448481.868756] EXT4-fs error (device vdb): ext4_mb_generate_buddy:758: group 86156, block bitmap and bg descriptor inconsistent: 28672 vs 21008 free clusters Nov 4 15:54:39 [448481.869114] EXT4-fs (vdb): pa ffff880105bb9888: logic 301056, phys. 2823186432, len 2048 Nov 4 15:54:39 [448481.869343] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86156, free 2048, pa_free 1926 Nov 4 15:54:39 [448482.020157] EXT4-fs (vdb): pa ffff88010b6298f0: logic 464896, phys. 2823178240, len 2048 Nov 4 15:54:39 [448482.020434] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86156, free 2048, pa_free 1683 Nov 4 15:54:40 [448482.570440] EXT4-fs error (device vdb): ext4_put_super:803: Couldn't clean up the journal Nov 4 18:21:31 [457294.270301] EXT4-fs error (device vdb): ext4_mb_generate_buddy:758: group 86089, block bitmap and bg descriptor inconsistent: 13143 vs 13141 free clusters Nov 4 18:21:31 [457294.333046] EXT4-fs (vdb): Remounting filesystem read-only Nov 4 18:21:31 [457294.333211] EXT4-fs error (device vdb) in ext4_writepages:2520: IO failure Nov 4 21:22:53 [ 7546.252366] EXT4-fs (vdb): pa ffff8800062eabc8: logic 64708, phys. 2822632644, len 828 Nov 4 21:22:53 [ 7546.252475] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86139, free 42, pa_free 40 Nov 4 21:22:53 [ 7546.317200] EXT4-fs (vdb): Remounting filesystem read-only Nov 4 21:22:53 [ 7546.317529] EXT4-fs error (device vdb) in ext4_writepages:2520: IO failure Nov 4 21:22:53 [ 7546.374769] EXT4-fs (vdb): ext4_writepages: jbd2_start: 9223372036854774720 pages, ino 310215559; err -30 After this last incident at 21:22:53 CET, i've reconfigured the VM to use 'default' values for 'Cache Mode' and 'IO Mode' in libvirt/qemu. It was set to 'Cache Mode: None', 'IO Mode: Native'. It still uses the VirtIO disk bus. The problem must be triggered by large spikes of random IO (non-sequential reads & writes) to sparse files on the FS. I will try to re-create this in a test-VM.
And again it went RO. libvirt/kvm Cache Mode / IO Mode is not related, it seems, was worth a shot...
Running Fedora with 4.2.8, that crashed my X completely: [ 2797.949100] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 1, block bitmap and bg descriptor inconsistent: 2431 vs 2396 free clusters [ 2797.951110] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 5, block bitmap and bg descriptor inconsistent: 2449 vs 2421 free clusters [ 2797.952697] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 12, block bitmap and bg descriptor inconsistent: 6323 vs 6319 free clusters [ 2797.952878] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 13, block bitmap and bg descriptor inconsistent: 12510 vs 12506 free clusters [ 2797.953017] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 14, block bitmap and bg descriptor inconsistent: 7196 vs 7190 free clusters [ 2797.961216] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 23, block bitmap and bg descriptor inconsistent: 916 vs 897 free clusters [ 2797.961765] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 25, block bitmap and bg descriptor inconsistent: 722 vs 721 free clusters [ 2797.963208] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 26, block bitmap and bg descriptor inconsistent: 904 vs 894 free clusters [ 2797.976310] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 35, block bitmap and bg descriptor inconsistent: 21 vs 65507 free clusters [ 2797.977395] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 44, block bitmap and bg descriptor inconsistent: 809 vs 64297 free clusters [ 2798.063978] JBD2: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash. [ 2798.064804] JBD2: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash. That's on my standard install, not in a VM (hich is what was reported by many other users).
We were recently hit by this issue, too on a physical x64 machine that hosts one kvm virtual machine guest with a windows server 2016 x64 operating system. The filesystem is a mdadm software raid1 spanning over two disks with ext4 on-top. $ uname -r 4.15.0-43-generic $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic The filesystem of the physical machine went read only and effectively crashed all processes, including the guest operating system. A hard reset and disk check appears to have solved the symptom, but we have mixed feelings since we don't know about any collateral damage this may have caused. dmesg displays worrying information that is hopefully not a symptom of irreversible root filesystem corruption: systemd-journald[457]: File /var/log/journal/7346ea28f12b763f29b8995058d63291/user-1000.journal corrupted or uncleanly shut down, renaming and replacing. The symptom manifested while booting the guest machine and thus may well be related to a burst of random reads and writes of the kvm guest. We observed moderate disk activity in htop.