Bug 89621 - EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
Summary: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-12 12:28 UTC by John Hughes
Modified: 2016-03-20 11:25 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.16
Tree: Mainline
Regression: Yes


Attachments

Description John Hughes 2014-12-12 12:28:11 UTC
I've been getting this EXT4 error (on an ext3 filesystem) on two virtual machines, running Debian kernel 3.16-0.bpo.2-amd64.

The stack is:

Host (3.16): Mdadm raid1
LVM
Guest (also 3.16): LVM2 volume, ext3 formatted

[    7.413209] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem
[    7.431191] EXT4-fs (dm-4): warning: maximal mount count reached, running e2fsck is recommended
[    7.442501] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro
...
[957229.875900] EXT4-fs (dm-4): pa ffff88001f7ff980: logic 0, phys. 9176576, len 128
[957229.878640] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 280, free 128, pa_free 127
[957229.881350] Aborting journal on device dm-4-8.
[957229.891273] EXT4-fs (dm-4): Remounting filesystem read-only
[957561.369731] EXT4-fs (dm-4): pa ffff880009f7b028: logic 0, phys. 9192448, len 512
[957561.371926] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 280, free 512, pa_free 511
[957688.608554] EXT4-fs error (device dm-4): ext4_put_super:792: Couldn't clean up the journal

This is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=772848
Comment 1 Theodore Tso 2014-12-13 03:36:07 UTC
This error just means there is an inconsistency between the number of free blocks in the bitmap and the number of blocks reported in the block group descriptors.  So the errors reported by fsck could be consistent with what you are seeing.

This might be fixed by d89b2f764e9f9eb6c1ac90d25e6fc5b43b676a48.   It's a bit painful because I'm not sure what is included in 3.16-0.bpo.2-amd64.   You need to ask the Debian kernel distro folks that question.
Comment 2 John Hughes 2014-12-15 11:57:41 UTC
Well, looking at fs/ext4/extents.c I see:

static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset,
                                  ext4_lblk_t len, loff_t new_size,
                                  int flags, int mode)

So it looks to me as if this kernel (based on 3.16.5) includes that commit.
Comment 3 Theodore Tso 2014-12-15 14:51:12 UTC
Well, was this a one-time thing or can you easily reproduce this error?   The problem is that almost anything can cause this --- bad memory, a wild pointer derference coming from a proprietary video driver from Nvidia, etc.   And by the time we get the ext4-error, basically what we have is the dead body lying in the alley.   Where the bullet came from, and from what direction, and who fired the gun, is not something that will be immediately evident; just that we have a very specific kind of inconsistency between the allocation bitmap and the block group summary statistics.  So in order to track it down, we need to be able to try to isolate causes.   So for example, does the problem show up if you use the latest bleeding edge kernel from mainline, etc.
Comment 4 John Hughes 2014-12-15 15:21:22 UTC
I've seen it 3 times so far.

Once on boot, I ran fsck, the error has not come back (so far).

Once (on another, but similarly configured VM) the error came back almost immediately after being fixed by fsck.  A second run of fsck and no problems so far.

I'm not running any odd drivers in these VM's.  The host of one of them has an Eicon DIVA card with it's drivers, the host of the other doesn't.

The strangest part of my setup is that I have an mdadm raid1 on the host system, that is made available as a virtio block device to a guest (running under LVM).  In the guest I have LVM2 volumes, with ext3 formatted filesystems.  

Everything is now running this 3.16 Debian kernel.

One filesystem was mounted with data=journal, the other is the root filesystem
mounted with the default data=ordered.

barriers are not explicitly disabled so they should be enabled by default by now (yes?)

The virtio block devices have cache=none.

I'll try to schedule the installation of a more up-to-date kernel on at least one of the machines, but it may take me a few days.
Comment 5 Theodore Tso 2014-12-15 22:27:11 UTC
Yes, barriers are enabled by default, and with cache=none, the flush commands *should* be sent down to the host device.   It might be worth doing a quick blktrace to make sure the host device is getting the cache flushes and they aren't getting dropped by qemu or by mdraid, but the configuration at least looks sane.

In the cases where you've seen the problem, have there been any sense where either (a) the host has suffered a crash or shutdown or power drop, or (b) the guest VM has been abruptly shutdown or killed without triggering a normal guest OS shutdown?
Comment 6 John Hughes 2014-12-16 11:46:23 UTC
Could you give me a clue about what exactlt I should do for a "quick blktrace", what should I look for?

As for crashes and so on:

case #1, I found my guest machine "cedric" dead on 3 Dec, root filesystem read-only.  I ran fsck manualy and after reboot the log shows:

Dec  3 11:04:26 cedric kernel: [    1.440438] EXT4-fs (dm-0): mounting ext3 file system using the ext4 subsystem
Dec  3 11:04:26 cedric kernel: [    1.468114] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
Dec  3 11:04:26 cedric kernel: [    6.139705] EXT4-fs (dm-0): re-mounted. Opts: (null)
Dec  3 11:04:26 cedric kernel: [    6.311768] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
Dec  3 11:09:11 cedric kernel: [  302.048065] EXT4-fs (dm-0): error count since last fsck: 3
Dec  3 11:09:11 cedric kernel: [  302.048102] EXT4-fs (dm-0): initial error at time 1417556618: ext4_mb_release_inode_pa:3773
Dec  3 11:09:11 cedric kernel: [  302.048105] EXT4-fs (dm-0): last error at time 1417596801: ext4_remount:4853

"cedric" had been up since 26 Nov.

Case #2, a filesystem went read-only on my guest machine "olympic" while the cyrus imap "squatter" was indexing all the messages.

Dec 11 17:17:06 olympic kernel: [957229.875900] EXT4-fs (dm-4): pa ffff88001f7ff980: logic 0, phys. 9176576, len 128
Dec 11 17:17:06 olympic kernel: [957229.878640] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 280, free 128, pa_free 127
Dec 11 17:17:06 olympic kernel: [957229.881350] Aborting journal on device dm-4-8.
Dec 11 17:17:06 olympic kernel: [957229.891273] EXT4-fs (dm-4): Remounting filesystem read-only
Dec 11 17:22:37 olympic kernel: [957561.369731] EXT4-fs (dm-4): pa ffff880009f7b028: logic 0, phys. 9192448, len 512
Dec 11 17:22:37 olympic kernel: [957561.371926] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 280, free 512, pa_free 511
Dec 11 17:24:45 olympic kernel: [957688.608554] EXT4-fs error (device dm-4): ext4_put_super:792: Couldn't clean up the journal

I ran fsck to fix the problem and re-mounted the filesystem, the problem re-occured:

Dec 11 17:34:19 olympic kernel: [958263.206923] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem
Dec 11 17:34:19 olympic kernel: [958263.253131] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro
Dec 11 17:35:27 olympic kernel: [958331.251355] EXT4-fs (dm-4): pa ffff88003d99aa50: logic 0, phys. 60800, len 32
Dec 11 17:35:27 olympic kernel: [958331.254094] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 1, free 32, pa_free 31
Dec 11 17:35:27 olympic kernel: [958331.256878] Aborting journal on device dm-4-8.
Dec 11 17:35:27 olympic kernel: [958331.266214] EXT4-fs (dm-4): Remounting filesystem read-only
Dec 11 17:36:17 olympic kernel: [958380.640506] EXT4-fs error (device dm-4): ext4_put_super:792: Couldn't clean up the journal

I fixed it again with fsck, now all seems well (so far)

Dec 11 17:48:47 olympic kernel: [959131.397372] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem
Dec 11 17:48:47 olympic kernel: [959131.430854] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro

The system had been running since Nov 30.
Comment 7 Theodore Tso 2014-12-16 14:55:57 UTC
OK, so vis-a-vis case #1, it looks like the fsck didn't take.

Dec  3 11:04:26 cedric kernel: [    1.440438] EXT4-fs (dm-0): mounting ext3 file system using the ext4 subsystem
Dec  3 11:04:26 cedric kernel: [    1.468114] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
Dec  3 11:04:26 cedric kernel: [    6.139705] EXT4-fs (dm-0): re-mounted. Opts: (null)
Dec  3 11:04:26 cedric kernel: [    6.311768] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
Dec  3 11:09:11 cedric kernel: [  302.048065] EXT4-fs (dm-0): error count since last fsck: 3
Dec  3 11:09:11 cedric kernel: [  302.048102] EXT4-fs (dm-0): initial error at time 1417556618: ext4_mb_release_inode_pa:3773
Dec  3 11:09:11 cedric kernel: [  302.048105] EXT4-fs (dm-0): last error at time 1417596801: ext4_remount:4853

I'm not exactly sure what time zone your system messages are in, but e2fsck 1.42.5 (I assume you're using e2fsprogs from debian stable?) will clear the error stats, and time 1417596801 is:

% TZ=GMT date -d @1417596801
Wed Dec  3 08:53:21 GMT 2014

Either that, or maybe the raid system is acting up and writes aren't going to both halves, or something weird like that?    Can you do me a favor and configure a handful of VM's that ***don't*** use MD RAID, just so we can rule out funny business on that score?
Comment 8 Michal Cihar 2015-05-21 19:01:44 UTC
I've seen this several times as well. All cases were after upgrade to Debian Jessie with 3.16.0 kernel running in KVM. The host kernel is unchanged, so I'm pretty sure the guest kernel is guilty here. The host does LVM + RAID, the guest just gets virtio based disk so nothing complex on it's side.

The error in my case:

[149757.256024] EXT4-fs (vda1): pa ffff88001c92bd90: logic 0, phys. 1944064, len 64
[149757.323573] EXT4-fs error (device vda1): ext4_mb_release_inode_pa:3773: group 59, free 4, pa_free 2
[149757.349776] Aborting journal on device vda1-8.
[149757.386808] EXT4-fs (vda1): Remounting filesystem read-only
[236229.600081] EXT4-fs (vda1): error count since last fsck: 1
[236229.600086] EXT4-fs (vda1): initial error at time 1432088178: ext4_mb_release_inode_pa:3773
[236229.600088] EXT4-fs (vda1): last error at time 1432088178: ext4_mb_release_inode_pa:3773

I'm not able to reproduce this, but I've already seen the error several times.
Comment 9 John Hughes 2015-05-27 09:53:06 UTC
So the common factors between Michal's case and mine are:

Debian 3.16.0 kernel running in KVM
Host kernel doing MDADM (Raid 1 for me)

Michal is running LVM on the host, I'm running it on the guest.

Michal, my filesystems are ext3, (now being handled by ext4 subsystem).  Are yours ext3 or ext4?
Comment 10 Michal Cihar 2015-05-27 10:01:37 UTC
In my case it is ext4. And I have LVM + RAID 1 on host.
Comment 11 Theodore Tso 2015-05-28 00:27:29 UTC
Why are you still using v3.16.0?    The latest 3.16 kernel is v3.16.7.  I believe the bug in question was fixed in a backport that first showed up in v3.16.2:

commit 868be28e321b764da29592cfd9fd659434e823aa
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Wed Jul 30 22:17:17 2014 -0400

    ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct
    
    commit 86f0afd463215fc3e58020493482faa4ac3a4d69 upstream.
    
    If there is a failure while allocating the preallocation structure, a
    number of blocks can end up getting marked in the in-memory buddy
    bitmap, and then not getting released.  This can result in the
    following corruption getting reported by the kernel:
    
    EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
    12793 clusters in bitmap, 12729 in gd
    
    In that case, we need to release the blocks using mb_free_blocks().
    
    Tested: fs smoke test; also demonstrated that with injected errors,
        the file system is no longer getting corrupted
    
    Google-Bug-Id: 16657874
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Comment 12 Theodore Tso 2015-05-28 01:22:48 UTC
Ah, never mind; the "debian 3.16.0" kernel is actually based on 3.16.7-ckt9 -- at least, the latest version of debian 3.16.0.   You'd have to specify exactly what version of the "debian 3.16.0" kernel to know for sure.   (But this is why many upstream kernel developers don't try to support distro kernels.  It's hard to see what commits might or might not be in a particular kernel distro.)
Comment 13 Michal Cihar 2015-05-28 05:24:35 UTC
Sorry for being unclear, in my case it's 3.16.7-ckt9-3~deb8u1, latest version available in Debian stable.
Comment 14 John Hughes 2015-05-28 10:35:19 UTC
And for me its:

Linux olympic 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt7-1~bpo70+1 (2015-04-07) x86_64 GNU/Linux

The last but one backported version for oldstable.  I can update to 3.16.7-ckt9...

(What a horrid name)
Comment 15 John Hughes 2015-07-24 16:17:27 UTC
Since I've seen this bug regularly:

Jul  9 01:33:50 olympic kernel: [2914041.432613] Aborting journal on device dm-4-8.
Jul 15 14:00:17 olympic kernel: [196797.659798] Aborting journal on device dm-2-8.
Jun 23 11:16:37 olympic kernel: [1566607.798081] Aborting journal on device dm-4-8.

And a bunch of times on / where it didn't get logged, but I needed to reboot after fsck:

Jul 13 07:20:31 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
Jul 15 14:33:18 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
Jul 19 15:36:11 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
Jul 23 08:00:30 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.

(More or less at least once every 4 days)

I've upgraded the kernel to stock 3.18.19.

I'll report here if I see the problem again.
Comment 16 John Hughes 2015-08-11 15:15:01 UTC
Ok, I've now seen the bug while running 3.18.19 (compiled by me straight from kernel git).

It stayed up for way longer this time :-) but then...

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.18.19-jh1 (john@olympic) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 SMP Thu Jul 23 14:54:25 CEST 2015
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.18.19-jh1 root=/dev/mapper/olympic-olympic--root ro console=tty0 console=ttyS0,19200n8 quiet
...
[    5.310778] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
[    7.917833] EXT4-fs (vda1): mounting ext3 file system using the ext4 subsystem
[    7.937645] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[    7.973622] EXT4-fs (dm-2): mounting ext3 file system using the ext4 subsystem
[    8.020501] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[    8.051234] EXT4-fs (dm-3): mounting ext3 file system using the ext4 subsystem
[    8.084443] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: errors=remount-ro
[    8.110174] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support!
[    8.110180] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem
[    8.163955] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro
...
[1570311.174406] EXT4-fs (dm-2): pa ffff8800281ac980: logic 0, phys. 1351168, len 512
[1570311.176802] EXT4-fs error (device dm-2): ext4_mb_release_inode_pa:3773: group 41, free 431, pa_free 430
[1570311.179413] Aborting journal on device dm-2-8.
[1570311.199091] EXT4-fs (dm-2): Remounting filesystem read-only


On reboot the system complained about dm-2 being bad:

[    9.196972] EXT4-fs (dm-2): mounting ext3 file system using the ext4 subsystem
[   10.134032] EXT4-fs warning (device dm-2): ext4_clear_journal_err:4659: Filesystem error recorded from previous mount: IO failure
[   10.136694] EXT4-fs warning (device dm-2): ext4_clear_journal_err:4660: Marking fs in need of filesystem check.
[   10.158051] EXT4-fs (dm-2): warning: mounting fs with errors, running e2fsck is recommended
[   10.171173] EXT4-fs (dm-2): Errors on filesystem, clearing orphan list.
[   10.171173]
[   10.173578] EXT4-fs (dm-2): recovery complete
[   10.188013] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: errors=remount-ro


So I unmounted it and did a fsck:

Script started on Tue Aug 11 16:54:56 2015
root@olympic:~# fsck -y -f /dev/mapper/olympic-olympic--home 
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found.  Fix? yes

Inode 3827814 was part of the orphaned inode list.  FIXED.
Deleted inode 3828431 has zero dtime.  Fix? yes

Inode 4137318 was part of the orphaned inode list.  FIXED.
Inode 4137485 was part of the orphaned inode list.  FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(723856--723871) -(738144--738172) +1351589 -7897632 -(7926715--7926730) -(11616272--11616287) -(11627328--11627356) -18842491 -(18846824--18846839) -29065743 -(29093767--29093782) -32965419 -(32983830--32983845) -(36864912--36864919) -(36965200--36965204)
Fix? yes

Free blocks count wrong for group #22 (27818, counted=27863).
Fix? yes

Free blocks count wrong for group #241 (25273, counted=25290).
Fix? yes

Free blocks count wrong for group #354 (23179, counted=23224).
Fix? yes

Free blocks count wrong for group #575 (25214, counted=25231).
Fix? yes

Free blocks count wrong for group #887 (24494, counted=24511).
Fix? yes

Free blocks count wrong for group #1006 (21249, counted=21266).
Fix? yes

Free blocks count wrong for group #1125 (25845, counted=25853).
Fix? yes

Free blocks count wrong for group #1128 (28938, counted=28943).
Fix? yes

Free blocks count wrong (22597710, counted=22597881).
Fix? yes

Inode bitmap differences:  -3827814 -3828431 -4137318 -4137485
Fix? yes

Free inodes count wrong for group #467 (5325, counted=5327).
Fix? yes

Free inodes count wrong for group #505 (6985, counted=6987).
Fix? yes

Free inodes count wrong (9975485, counted=9975489).
Fix? yes


olympic-home: ***** FILE SYSTEM WAS MODIFIED *****
olympic-home: 510271/10485760 files (13.1% non-contiguous), 19345159/41943040 blocks
root@olympic:~# exit

Script done on Tue Aug 11 17:01:11 2015
Comment 17 Theodore Tso 2015-08-11 18:16:09 UTC
John, are you willing to give us a lot more details about your set up?   Hardware, software, details of the host OS, details of the guest OS, what is the workload where the problem is happening, etc?

And it might be helpful if you opened a separate bug which we use exclusively for tracking your particular issue.   One of the things which is often disastrous, and why I've largely given up on Ubuntu's launchpad, is when multiple people assume that the problem that they are seeing is the same as everyone else's. when in fact they could be separate bugs, or one of the bugs could be a hardware problem, etc.

Then when one person doesn't give details about their hardware because they saw someone else already had, it makes finding the problems even harder.

One thing to keep in mind when we are talking about problems found in the guest OS is that misconfigurations at the host OS level can be indistinguishable from hardware problems.   So for example, depending on the caching policy used by KVM in the host OS issues, it may be that cache flush commands in the guest OS don't peculate down all the way down to the actual hardware platters.   Thus, after a power fail event, even if everything is configured correctly at the guest OS, it can result in an inconsistent file system image because the KVM / the host OS didn't make sure everything got pushed to the disk after a journal commit.

So knowing whether or not were any shutdowns caused either by KVM crashing, or the host OS crashing, or the guest OS crashing, is all very important in trying to debug things, because the error message reported simply means that there was an inconsistency detected in the allocation bitmaps.   This can be caused by any number of things --- which is why doctors hate it when someone asks them what's wrong because "I have a cough" at a social occasion, and then someone else chimes in, "well, my grandfather had a cough right before he was diagnosed with terminal lung cancer"...
Comment 18 cwseys 2015-08-11 21:11:02 UTC
Hi John,
  Me too!  :p  We're having similar symptoms.  Running VMs with Wheezy (kernel version 3.2), SL6 (kernel version 2.6), and Jessie (kernel version 3.16).

  Only Jessie VMs go read-only sporadically even when running on the same host.  I'm recording which VM host this is happening on, hoping to find a pattern, but so far not luck.

  Also, not easily reproducible.  E.g. for me (maybe not same bug!) the hosts have both been running software RAID and have been using hardware RAID cards.
Not all the VMs running 3.16 have had this problem, but some have had it multiple times.

If I hit on a reliable failure I will start git bisecting between 3.2 and 3.16.  (A brute force approach, but maybe helpful.)

Chad.
Comment 19 John Hughes 2015-08-12 09:08:36 UTC
(In reply to Theodore Tso from comment #17)
> John, are you willing to give us a lot more details about your set up?  
> Hardware, software, details of the host OS, details of the guest OS, what is
> the workload where the problem is happening, etc?
> 
> And it might be helpful if you opened a separate bug which we use
> exclusively for tracking your particular issue.

Ok, I've opened https://bugzilla.kernel.org/show_bug.cgi?id=102731

Please let me know if you need more info.
 
> One thing to keep in mind when we are talking about problems found in the
> guest OS is that misconfigurations at the host OS level can be
> indistinguishable from hardware problems.   So for example, depending on the
> caching policy used by KVM in the host OS issues, it may be that cache flush
> commands in the guest OS don't peculate down all the way down to the actual
> hardware platters.   Thus, after a power fail event, even if everything is
> configured correctly at the guest OS, it can result in an inconsistent file
> system image because the KVM / the host OS didn't make sure everything got
> pushed to the disk after a journal commit.

I have had one power fail, but since then mdadm has run a consistency check and found no inconsistencies.

I'm also running regular (once a week) fsck's on (snapshots of) my filesystems.  So far the fsck have never found a problem.

Note You need to log in before you can comment on or make changes to this bug.