Bug 104571

Summary: ext4_mb_generate_buddy block bitmap and bg descriptor inconsistent
Product: File System Reporter: Sander Smeenk (linux-ext4)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: NEW ---    
Severity: normal CC: dpsenner, k.kotlenga, mario, szg00000
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.2.0-16-generic Subsystem:
Regression: No Bisected commit-id:

Description Sander Smeenk 2015-09-15 08:10:58 UTC
This bug report is about ext4 metadata corruption on large (>=10TB) ext4 volumes.
This was also reported in 2014 [ http://marc.info/?l=linux-ext4&m=139878494527370&w=2 ]

I'm getting sporadic FS errors like this one:
(More of these i've pasted at https://8n1.org/10745/cc34)

|  EXT4-fs error (device vdb): ext4_mb_generate_buddy:757:
|   group 79842, block bitmap and bg descriptor inconsistent: 10073 vs 10071
|   free clusters
| Aborting journal on device vdb-8.

An e2fsck run then shows:
| Pass 5: Checking group summary information
| Block bitmap differences:  +(2616281446--2616281447)
| Free blocks count wrong (170942497, counted=129906218).
| Free inodes count wrong (670863012, counted=670860975).

I've patched my kernel with WARN_ON(1); inserted in tactical places and caught
one such situation:

| EXT4-fs (vdb): pa ffff880016544888: logic 982168, phys. 2469410748, len 104
| EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3773: group 75360, free 38, pa_free 36
| Aborting journal on device vdb-8.
| EXT4-fs (vdb): Remounting filesystem read-only
| ------------[ cut here ]------------
| WARNING: CPU: 1 PID: 1706 at fs/ext4/mballoc.c:3774 ext4_mb_release_inode_pa.isra.27+0x1cb/0x2c0()
| Modules linked in: nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter xt_tcpudp ip6_tables
|     nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables
|     x_tables cirrus ttm drm_kms_helper drm kvm_intel kvm ppdev syscopyarea sysfillrect
|     8250_fintek serio_raw i2c_piix4 sysimgblt pvpanic parport_pc mac_hid nfsd auth_rpcgss nfs_acl
|     lockd grace sunrpc lp parport autofs4 psmouse floppy pata_acpi
| CPU: 1 PID: 1706 Comm: deluged Not tainted 3.19.8-ckt4 #1
| Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
|  ffffffff81ab4fef ffff8800da1bb978 ffffffff817c3760 0000000000000007
|  0000000000000000 ffff8800da1bb9b8 ffffffff8107696a ffff8800da1bb9a8
|  0000000000000026 0000000000003825 0000000000003824 ffff880016544888
| Call Trace:
|  [<ffffffff817c3760>] dump_stack+0x45/0x57
|  [<ffffffff8107696a>] warn_slowpath_common+0x8a/0xc0
|  [<ffffffff81076a5a>] warn_slowpath_null+0x1a/0x20
|  [<ffffffff812b01bb>] ext4_mb_release_inode_pa.isra.27+0x1cb/0x2c0
|  [<ffffffff812739df>] ? ext4_read_block_bitmap_nowait+0x26f/0x5f0
|  [<ffffffff812b3c6a>] ext4_discard_preallocations+0x30a/0x490
|  [<ffffffff8127b578>] ext4_da_update_reserve_space+0x178/0x1b0
|  [<ffffffff812a9129>] ext4_ext_map_blocks+0xcd9/0xe50
|  [<ffffffff8127b6d9>] ext4_map_blocks+0x129/0x570
|  [<ffffffff8127e89d>] ? ext4_writepages+0x35d/0xca0
|  [<ffffffff812ab3a9>] ? __ext4_journal_start_sb+0x69/0xe0
|  [<ffffffff8127eac2>] ext4_writepages+0x582/0xca0
|  [<ffffffff81187a4e>] do_writepages+0x1e/0x30
|  [<ffffffff8117bbe9>] __filemap_fdatawrite_range+0x59/0x60
|  [<ffffffff8117bc4c>] filemap_write_and_wait+0x2c/0x60
|  [<ffffffff8120903d>] do_vfs_ioctl+0x3fd/0x4e0
|  [<ffffffff812091a1>] SyS_ioctl+0x81/0xa0
|  [<ffffffff817ca84d>] system_call_fastpath+0x16/0x1b
| ---[ end trace c7de4d0d78cb95b6 ]---
| EXT4-fs error (device vdb) in ext4_writepages:2412: IO failure
| EXT4-fs (vdb): ext4_writepages: jbd2_start: 9223372036854775751 pages, ino 84149503; err -30

After this, the system started logging a lot of this same message:
| EXT4-fs error (device vdb): ext4_find_extent:900: inode #84149503: comm deluged: 
|    pblk 225181822 bad header/extent: invalid magic - magic 53fd, entries 37907,
|    max 27407(0), depth 50401(0)

Ran e2fsck and got:
| Pass 5: Checking group summary information
| Block bitmap differences:   +(1431556444--1431556445) +(2469410748-2469410749)
| Free blocks count wrong (134030133, counted=57970467).
| Free inodes count wrong (670746893, counted=670746452).

Which is usually the same output for fsck in these situations.

This server is a QEMU KVM virtual machine running on Intel x64 hardware.
Comment 1 Sander Smeenk 2015-09-28 08:04:48 UTC
Hit by this bug again:

| EXT4-fs error (device vdb): ext4_mb_generate_buddy:757: group 76916,
|    block bitmap and bg descriptor inconsistent: 959 vs 957 free
|    clusters
| Aborting journal on device vdb-8.
| EXT4-fs (vdb): Remounting filesystem read-only
| ------------[ cut here ]------------
| WARNING: CPU: 0 PID: 5000 at fs/ext4/mballoc.c:758
|    ext4_mb_generate_buddy+0x1fa/0x340()
| Modules linked in: btrfs xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs
|    msdos jfs xfs libcrc32c xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6
|    ip6table_filter ip6_tables nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
|    nf_conntrack iptable_filter ip_tables x_tables cirrus ttm drm_kms_helper
|    drm kvm_intel kvm ppdev syscopyarea sysfillrect serio_raw pvpanic
|    sysimgblt 8250_fintek parport_pc i2c_piix4 mac_hid nfsd auth_rpcgss
|    nfs_acl lockd grace sunrpc lp parport autofs4 psmouse floppy pata_acpi
| CPU: 0 PID: 5000 Comm: deluged Not tainted 3.19.8-ckt4 #1
| Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
|  ffffffff81ab4fef ffff8800db637858 ffffffff817c3760 ffff88011fc0fdb8
|  0000000000000000 ffff8800db637898 ffffffff8107696a 0000000000000001
|  00000000000003bf ffff8800d98e4000 0000000000008000 ffff8800084c1000
| Call Trace:
|  [<ffffffff817c3760>] dump_stack+0x45/0x57
|  [<ffffffff8107696a>] warn_slowpath_common+0x8a/0xc0
|  [<ffffffff81076a5a>] warn_slowpath_null+0x1a/0x20
|  [<ffffffff812ae6ea>] ext4_mb_generate_buddy+0x1fa/0x340
|  [<ffffffff812aed00>] ext4_mb_init_cache+0x390/0x700
|  [<ffffffff812af5cd>] ext4_mb_load_buddy+0x1ad/0x360
|  [<ffffffff812b25d7>] ext4_mb_regular_allocator+0x1b7/0x450
|  [<ffffffff812b41f1>] ext4_mb_new_blocks+0x401/0x550
|  [<ffffffff812a4120>] ? ext4_find_extent+0x140/0x330
|  [<ffffffff812a8a87>] ext4_ext_map_blocks+0x637/0xe50
|  [<ffffffff8127b6d9>] ext4_map_blocks+0x129/0x570
|  [<ffffffff8127eac2>] ext4_writepages+0x582/0xca0
|  [<ffffffff810f0700>] ? get_futex_key+0x1c0/0x2b0
|  [<ffffffff810a71c8>] ? __enqueue_entity+0x78/0x80
|  [<ffffffff81187a4e>] do_writepages+0x1e/0x30
|  [<ffffffff8117bbe9>] __filemap_fdatawrite_range+0x59/0x60
|  [<ffffffff8117bc4c>] filemap_write_and_wait+0x2c/0x60
|  [<ffffffff8120903d>] do_vfs_ioctl+0x3fd/0x4e0
|  [<ffffffff812091a1>] SyS_ioctl+0x81/0xa0
|  [<ffffffff817ca84d>] system_call_fastpath+0x16/0x1b
| ---[ end trace 860c733f0437c9b7 ]---

After which the kernel starts logging in rapid succession:
| EXT4-fs error (device vdb): ext4_find_extent:900: inode #84410502:
|    comm deluged: pblk 223281469 bad header/extent: invalid magic - magic
|    80da, entries 13942, max 3979(0), depth 42746(0)
| EXT4-fs error: 40 callbacks suppressed
Comment 2 Sander Smeenk 2015-09-28 08:37:24 UTC
e2fsck 1.42.9 (4-Feb-2014)
[ .. ]
Pass 5: Checking group summary information
Block bitmap differences:  +(2520387693--2520387694)
Fix? yes

Free blocks count wrong (57970467, counted=36036558).
Fix? yes

Free inodes count wrong (670746452, counted=670746137).
Fix? yes

/dev/vdb: ***** FILE SYSTEM WAS MODIFIED *****
/dev/vdb: 342503/671088640 files (5.9% non-contiguous), 2648318002/2684354560 blocks
Comment 3 Sander Smeenk 2015-11-04 21:12:08 UTC
Since then, running 4.2.0-16-generic (Ubuntu 'Wily', 15.10), problem still persists. I've re-created the FS, made it 10T bigger (20T total, 11T in use), copied all data.

Logs:

Nov  4 15:45:16 [447918.885531] EXT4-fs (vdb): pa ffff88002162ef08: logic 192512, phys. 2823200768, len 1929
Nov  4 15:45:16 [447918.885776] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86157, free 1917, pa_free 1915
Nov  4 15:45:16 [447918.926300] EXT4-fs (vdb): Remounting filesystem read-only
Nov  4 15:45:16 [447918.926652] EXT4-fs error (device vdb) in ext4_writepages:2520: IO failure
Nov  4 15:45:16 [447918.968068] EXT4-fs (vdb): ext4_writepages: jbd2_start: 9223372036854775790 pages, ino 310215523; err -30

Nov  4 15:54:39 [448481.868756] EXT4-fs error (device vdb): ext4_mb_generate_buddy:758: group 86156, block bitmap and bg descriptor inconsistent: 28672 vs 21008 free clusters
Nov  4 15:54:39 [448481.869114] EXT4-fs (vdb): pa ffff880105bb9888: logic 301056, phys. 2823186432, len 2048
Nov  4 15:54:39 [448481.869343] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86156, free 2048, pa_free 1926
Nov  4 15:54:39 [448482.020157] EXT4-fs (vdb): pa ffff88010b6298f0: logic 464896, phys. 2823178240, len 2048
Nov  4 15:54:39 [448482.020434] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86156, free 2048, pa_free 1683
Nov  4 15:54:40 [448482.570440] EXT4-fs error (device vdb): ext4_put_super:803: Couldn't clean up the journal 

Nov  4 18:21:31 [457294.270301] EXT4-fs error (device vdb): ext4_mb_generate_buddy:758: group 86089, block bitmap and bg descriptor inconsistent: 13143 vs 13141 free clusters
Nov  4 18:21:31 [457294.333046] EXT4-fs (vdb): Remounting filesystem read-only
Nov  4 18:21:31 [457294.333211] EXT4-fs error (device vdb) in ext4_writepages:2520: IO failure

Nov  4 21:22:53 [ 7546.252366] EXT4-fs (vdb): pa ffff8800062eabc8: logic 64708, phys. 2822632644, len 828
Nov  4 21:22:53 [ 7546.252475] EXT4-fs error (device vdb): ext4_mb_release_inode_pa:3788: group 86139, free 42, pa_free 40 
Nov  4 21:22:53 [ 7546.317200] EXT4-fs (vdb): Remounting filesystem read-only
Nov  4 21:22:53 [ 7546.317529] EXT4-fs error (device vdb) in ext4_writepages:2520: IO failure
Nov  4 21:22:53 [ 7546.374769] EXT4-fs (vdb): ext4_writepages: jbd2_start: 9223372036854774720 pages, ino 310215559; err -30


After this last incident at 21:22:53 CET, i've reconfigured the VM to use 'default' values for 'Cache Mode' and 'IO Mode' in libvirt/qemu. It was set to 'Cache Mode: None', 'IO Mode: Native'. It still uses the VirtIO disk bus.

The problem must be triggered by large spikes of random IO (non-sequential reads & writes) to sparse files on the FS. I will try to re-create this in a test-VM.
Comment 4 Sander Smeenk 2015-11-05 07:38:32 UTC
And again it went RO. libvirt/kvm Cache Mode / IO Mode is not related, it seems, was worth a shot...
Comment 5 mpeshev 2016-01-17 22:15:33 UTC
Running Fedora with 4.2.8, that crashed my X completely:

[ 2797.949100] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 1, block bitmap and bg descriptor inconsistent: 2431 vs 2396 free clusters
[ 2797.951110] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 5, block bitmap and bg descriptor inconsistent: 2449 vs 2421 free clusters
[ 2797.952697] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 12, block bitmap and bg descriptor inconsistent: 6323 vs 6319 free clusters
[ 2797.952878] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 13, block bitmap and bg descriptor inconsistent: 12510 vs 12506 free clusters
[ 2797.953017] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 14, block bitmap and bg descriptor inconsistent: 7196 vs 7190 free clusters
[ 2797.961216] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 23, block bitmap and bg descriptor inconsistent: 916 vs 897 free clusters
[ 2797.961765] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 25, block bitmap and bg descriptor inconsistent: 722 vs 721 free clusters
[ 2797.963208] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 26, block bitmap and bg descriptor inconsistent: 904 vs 894 free clusters
[ 2797.976310] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 35, block bitmap and bg descriptor inconsistent: 21 vs 65507 free clusters
[ 2797.977395] EXT4-fs error (device dm-4): ext4_mb_generate_buddy:758: group 44, block bitmap and bg descriptor inconsistent: 809 vs 64297 free clusters
[ 2798.063978] JBD2: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
[ 2798.064804] JBD2: Spotted dirty metadata buffer (dev = dm-4, blocknr = 0). There's a risk of filesystem corruption in case of system crash.

That's on my standard install, not in a VM (hich is what was reported by many other users).
Comment 6 dpsenner 2019-04-09 07:57:29 UTC
We were recently hit by this issue, too on a physical x64 machine that hosts one kvm virtual machine guest with a windows server 2016 x64 operating system. The filesystem is a mdadm software raid1 spanning over two disks with ext4 on-top.

$ uname -r
4.15.0-43-generic

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic

The filesystem of the physical machine went read only and effectively crashed all processes, including the guest operating system. A hard reset and disk check appears to have solved the symptom, but we have mixed feelings since we don't know about any collateral damage this may have caused. dmesg displays worrying information that is hopefully not a symptom of irreversible root filesystem corruption:

systemd-journald[457]: File /var/log/journal/7346ea28f12b763f29b8995058d63291/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.

The symptom manifested while booting the guest machine and thus may well be related to a burst of random reads and writes of the kvm guest. We observed moderate disk activity in htop.