Bug 89621
Summary: | EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 | ||
---|---|---|---|
Product: | File System | Reporter: | John Hughes (john) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | NEW --- | ||
Severity: | normal | CC: | cwseys, k.kotlenga, kernel.org-115, michal, szg00000, tytso |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 3.16 | Subsystem: | |
Regression: | Yes | Bisected commit-id: |
Description
John Hughes
2014-12-12 12:28:11 UTC
This error just means there is an inconsistency between the number of free blocks in the bitmap and the number of blocks reported in the block group descriptors. So the errors reported by fsck could be consistent with what you are seeing. This might be fixed by d89b2f764e9f9eb6c1ac90d25e6fc5b43b676a48. It's a bit painful because I'm not sure what is included in 3.16-0.bpo.2-amd64. You need to ask the Debian kernel distro folks that question. Well, looking at fs/ext4/extents.c I see: static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, ext4_lblk_t len, loff_t new_size, int flags, int mode) So it looks to me as if this kernel (based on 3.16.5) includes that commit. Well, was this a one-time thing or can you easily reproduce this error? The problem is that almost anything can cause this --- bad memory, a wild pointer derference coming from a proprietary video driver from Nvidia, etc. And by the time we get the ext4-error, basically what we have is the dead body lying in the alley. Where the bullet came from, and from what direction, and who fired the gun, is not something that will be immediately evident; just that we have a very specific kind of inconsistency between the allocation bitmap and the block group summary statistics. So in order to track it down, we need to be able to try to isolate causes. So for example, does the problem show up if you use the latest bleeding edge kernel from mainline, etc. I've seen it 3 times so far. Once on boot, I ran fsck, the error has not come back (so far). Once (on another, but similarly configured VM) the error came back almost immediately after being fixed by fsck. A second run of fsck and no problems so far. I'm not running any odd drivers in these VM's. The host of one of them has an Eicon DIVA card with it's drivers, the host of the other doesn't. The strangest part of my setup is that I have an mdadm raid1 on the host system, that is made available as a virtio block device to a guest (running under LVM). In the guest I have LVM2 volumes, with ext3 formatted filesystems. Everything is now running this 3.16 Debian kernel. One filesystem was mounted with data=journal, the other is the root filesystem mounted with the default data=ordered. barriers are not explicitly disabled so they should be enabled by default by now (yes?) The virtio block devices have cache=none. I'll try to schedule the installation of a more up-to-date kernel on at least one of the machines, but it may take me a few days. Yes, barriers are enabled by default, and with cache=none, the flush commands *should* be sent down to the host device. It might be worth doing a quick blktrace to make sure the host device is getting the cache flushes and they aren't getting dropped by qemu or by mdraid, but the configuration at least looks sane. In the cases where you've seen the problem, have there been any sense where either (a) the host has suffered a crash or shutdown or power drop, or (b) the guest VM has been abruptly shutdown or killed without triggering a normal guest OS shutdown? Could you give me a clue about what exactlt I should do for a "quick blktrace", what should I look for? As for crashes and so on: case #1, I found my guest machine "cedric" dead on 3 Dec, root filesystem read-only. I ran fsck manualy and after reboot the log shows: Dec 3 11:04:26 cedric kernel: [ 1.440438] EXT4-fs (dm-0): mounting ext3 file system using the ext4 subsystem Dec 3 11:04:26 cedric kernel: [ 1.468114] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) Dec 3 11:04:26 cedric kernel: [ 6.139705] EXT4-fs (dm-0): re-mounted. Opts: (null) Dec 3 11:04:26 cedric kernel: [ 6.311768] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro Dec 3 11:09:11 cedric kernel: [ 302.048065] EXT4-fs (dm-0): error count since last fsck: 3 Dec 3 11:09:11 cedric kernel: [ 302.048102] EXT4-fs (dm-0): initial error at time 1417556618: ext4_mb_release_inode_pa:3773 Dec 3 11:09:11 cedric kernel: [ 302.048105] EXT4-fs (dm-0): last error at time 1417596801: ext4_remount:4853 "cedric" had been up since 26 Nov. Case #2, a filesystem went read-only on my guest machine "olympic" while the cyrus imap "squatter" was indexing all the messages. Dec 11 17:17:06 olympic kernel: [957229.875900] EXT4-fs (dm-4): pa ffff88001f7ff980: logic 0, phys. 9176576, len 128 Dec 11 17:17:06 olympic kernel: [957229.878640] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 280, free 128, pa_free 127 Dec 11 17:17:06 olympic kernel: [957229.881350] Aborting journal on device dm-4-8. Dec 11 17:17:06 olympic kernel: [957229.891273] EXT4-fs (dm-4): Remounting filesystem read-only Dec 11 17:22:37 olympic kernel: [957561.369731] EXT4-fs (dm-4): pa ffff880009f7b028: logic 0, phys. 9192448, len 512 Dec 11 17:22:37 olympic kernel: [957561.371926] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 280, free 512, pa_free 511 Dec 11 17:24:45 olympic kernel: [957688.608554] EXT4-fs error (device dm-4): ext4_put_super:792: Couldn't clean up the journal I ran fsck to fix the problem and re-mounted the filesystem, the problem re-occured: Dec 11 17:34:19 olympic kernel: [958263.206923] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem Dec 11 17:34:19 olympic kernel: [958263.253131] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro Dec 11 17:35:27 olympic kernel: [958331.251355] EXT4-fs (dm-4): pa ffff88003d99aa50: logic 0, phys. 60800, len 32 Dec 11 17:35:27 olympic kernel: [958331.254094] EXT4-fs error (device dm-4): ext4_mb_release_inode_pa:3773: group 1, free 32, pa_free 31 Dec 11 17:35:27 olympic kernel: [958331.256878] Aborting journal on device dm-4-8. Dec 11 17:35:27 olympic kernel: [958331.266214] EXT4-fs (dm-4): Remounting filesystem read-only Dec 11 17:36:17 olympic kernel: [958380.640506] EXT4-fs error (device dm-4): ext4_put_super:792: Couldn't clean up the journal I fixed it again with fsck, now all seems well (so far) Dec 11 17:48:47 olympic kernel: [959131.397372] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem Dec 11 17:48:47 olympic kernel: [959131.430854] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro The system had been running since Nov 30. OK, so vis-a-vis case #1, it looks like the fsck didn't take. Dec 3 11:04:26 cedric kernel: [ 1.440438] EXT4-fs (dm-0): mounting ext3 file system using the ext4 subsystem Dec 3 11:04:26 cedric kernel: [ 1.468114] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null) Dec 3 11:04:26 cedric kernel: [ 6.139705] EXT4-fs (dm-0): re-mounted. Opts: (null) Dec 3 11:04:26 cedric kernel: [ 6.311768] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro Dec 3 11:09:11 cedric kernel: [ 302.048065] EXT4-fs (dm-0): error count since last fsck: 3 Dec 3 11:09:11 cedric kernel: [ 302.048102] EXT4-fs (dm-0): initial error at time 1417556618: ext4_mb_release_inode_pa:3773 Dec 3 11:09:11 cedric kernel: [ 302.048105] EXT4-fs (dm-0): last error at time 1417596801: ext4_remount:4853 I'm not exactly sure what time zone your system messages are in, but e2fsck 1.42.5 (I assume you're using e2fsprogs from debian stable?) will clear the error stats, and time 1417596801 is: % TZ=GMT date -d @1417596801 Wed Dec 3 08:53:21 GMT 2014 Either that, or maybe the raid system is acting up and writes aren't going to both halves, or something weird like that? Can you do me a favor and configure a handful of VM's that ***don't*** use MD RAID, just so we can rule out funny business on that score? I've seen this several times as well. All cases were after upgrade to Debian Jessie with 3.16.0 kernel running in KVM. The host kernel is unchanged, so I'm pretty sure the guest kernel is guilty here. The host does LVM + RAID, the guest just gets virtio based disk so nothing complex on it's side. The error in my case: [149757.256024] EXT4-fs (vda1): pa ffff88001c92bd90: logic 0, phys. 1944064, len 64 [149757.323573] EXT4-fs error (device vda1): ext4_mb_release_inode_pa:3773: group 59, free 4, pa_free 2 [149757.349776] Aborting journal on device vda1-8. [149757.386808] EXT4-fs (vda1): Remounting filesystem read-only [236229.600081] EXT4-fs (vda1): error count since last fsck: 1 [236229.600086] EXT4-fs (vda1): initial error at time 1432088178: ext4_mb_release_inode_pa:3773 [236229.600088] EXT4-fs (vda1): last error at time 1432088178: ext4_mb_release_inode_pa:3773 I'm not able to reproduce this, but I've already seen the error several times. So the common factors between Michal's case and mine are: Debian 3.16.0 kernel running in KVM Host kernel doing MDADM (Raid 1 for me) Michal is running LVM on the host, I'm running it on the guest. Michal, my filesystems are ext3, (now being handled by ext4 subsystem). Are yours ext3 or ext4? In my case it is ext4. And I have LVM + RAID 1 on host. Why are you still using v3.16.0? The latest 3.16 kernel is v3.16.7. I believe the bug in question was fixed in a backport that first showed up in v3.16.2: commit 868be28e321b764da29592cfd9fd659434e823aa Author: Theodore Ts'o <tytso@mit.edu> Date: Wed Jul 30 22:17:17 2014 -0400 ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa struct commit 86f0afd463215fc3e58020493482faa4ac3a4d69 upstream. If there is a failure while allocating the preallocation structure, a number of blocks can end up getting marked in the in-memory buddy bitmap, and then not getting released. This can result in the following corruption getting reported by the kernel: EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126, 12793 clusters in bitmap, 12729 in gd In that case, we need to release the blocks using mb_free_blocks(). Tested: fs smoke test; also demonstrated that with injected errors, the file system is no longer getting corrupted Google-Bug-Id: 16657874 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Ah, never mind; the "debian 3.16.0" kernel is actually based on 3.16.7-ckt9 -- at least, the latest version of debian 3.16.0. You'd have to specify exactly what version of the "debian 3.16.0" kernel to know for sure. (But this is why many upstream kernel developers don't try to support distro kernels. It's hard to see what commits might or might not be in a particular kernel distro.) Sorry for being unclear, in my case it's 3.16.7-ckt9-3~deb8u1, latest version available in Debian stable. And for me its: Linux olympic 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt7-1~bpo70+1 (2015-04-07) x86_64 GNU/Linux The last but one backported version for oldstable. I can update to 3.16.7-ckt9... (What a horrid name) Since I've seen this bug regularly: Jul 9 01:33:50 olympic kernel: [2914041.432613] Aborting journal on device dm-4-8. Jul 15 14:00:17 olympic kernel: [196797.659798] Aborting journal on device dm-2-8. Jun 23 11:16:37 olympic kernel: [1566607.798081] Aborting journal on device dm-4-8. And a bunch of times on / where it didn't get logged, but I needed to reboot after fsck: Jul 13 07:20:31 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started. Jul 15 14:33:18 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started. Jul 19 15:36:11 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started. Jul 23 08:00:30 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started. (More or less at least once every 4 days) I've upgraded the kernel to stock 3.18.19. I'll report here if I see the problem again. Ok, I've now seen the bug while running 3.18.19 (compiled by me straight from kernel git). It stayed up for way longer this time :-) but then... [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Initializing cgroup subsys cpuacct [ 0.000000] Linux version 3.18.19-jh1 (john@olympic) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 SMP Thu Jul 23 14:54:25 CEST 2015 [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.18.19-jh1 root=/dev/mapper/olympic-olympic--root ro console=tty0 console=ttyS0,19200n8 quiet ... [ 5.310778] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro [ 7.917833] EXT4-fs (vda1): mounting ext3 file system using the ext4 subsystem [ 7.937645] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 7.973622] EXT4-fs (dm-2): mounting ext3 file system using the ext4 subsystem [ 8.020501] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 8.051234] EXT4-fs (dm-3): mounting ext3 file system using the ext4 subsystem [ 8.084443] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: errors=remount-ro [ 8.110174] EXT4-fs: Warning: mounting with data=journal disables delayed allocation and O_DIRECT support! [ 8.110180] EXT4-fs (dm-4): mounting ext3 file system using the ext4 subsystem [ 8.163955] EXT4-fs (dm-4): mounted filesystem with journalled data mode. Opts: data=journal,errors=remount-ro ... [1570311.174406] EXT4-fs (dm-2): pa ffff8800281ac980: logic 0, phys. 1351168, len 512 [1570311.176802] EXT4-fs error (device dm-2): ext4_mb_release_inode_pa:3773: group 41, free 431, pa_free 430 [1570311.179413] Aborting journal on device dm-2-8. [1570311.199091] EXT4-fs (dm-2): Remounting filesystem read-only On reboot the system complained about dm-2 being bad: [ 9.196972] EXT4-fs (dm-2): mounting ext3 file system using the ext4 subsystem [ 10.134032] EXT4-fs warning (device dm-2): ext4_clear_journal_err:4659: Filesystem error recorded from previous mount: IO failure [ 10.136694] EXT4-fs warning (device dm-2): ext4_clear_journal_err:4660: Marking fs in need of filesystem check. [ 10.158051] EXT4-fs (dm-2): warning: mounting fs with errors, running e2fsck is recommended [ 10.171173] EXT4-fs (dm-2): Errors on filesystem, clearing orphan list. [ 10.171173] [ 10.173578] EXT4-fs (dm-2): recovery complete [ 10.188013] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: errors=remount-ro So I unmounted it and did a fsck: Script started on Tue Aug 11 16:54:56 2015 root@olympic:~# fsck -y -f /dev/mapper/olympic-olympic--home fsck from util-linux 2.20.1 e2fsck 1.42.5 (29-Jul-2012) Pass 1: Checking inodes, blocks, and sizes Inodes that were part of a corrupted orphan linked list found. Fix? yes Inode 3827814 was part of the orphaned inode list. FIXED. Deleted inode 3828431 has zero dtime. Fix? yes Inode 4137318 was part of the orphaned inode list. FIXED. Inode 4137485 was part of the orphaned inode list. FIXED. Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -(723856--723871) -(738144--738172) +1351589 -7897632 -(7926715--7926730) -(11616272--11616287) -(11627328--11627356) -18842491 -(18846824--18846839) -29065743 -(29093767--29093782) -32965419 -(32983830--32983845) -(36864912--36864919) -(36965200--36965204) Fix? yes Free blocks count wrong for group #22 (27818, counted=27863). Fix? yes Free blocks count wrong for group #241 (25273, counted=25290). Fix? yes Free blocks count wrong for group #354 (23179, counted=23224). Fix? yes Free blocks count wrong for group #575 (25214, counted=25231). Fix? yes Free blocks count wrong for group #887 (24494, counted=24511). Fix? yes Free blocks count wrong for group #1006 (21249, counted=21266). Fix? yes Free blocks count wrong for group #1125 (25845, counted=25853). Fix? yes Free blocks count wrong for group #1128 (28938, counted=28943). Fix? yes Free blocks count wrong (22597710, counted=22597881). Fix? yes Inode bitmap differences: -3827814 -3828431 -4137318 -4137485 Fix? yes Free inodes count wrong for group #467 (5325, counted=5327). Fix? yes Free inodes count wrong for group #505 (6985, counted=6987). Fix? yes Free inodes count wrong (9975485, counted=9975489). Fix? yes olympic-home: ***** FILE SYSTEM WAS MODIFIED ***** olympic-home: 510271/10485760 files (13.1% non-contiguous), 19345159/41943040 blocks root@olympic:~# exit Script done on Tue Aug 11 17:01:11 2015 John, are you willing to give us a lot more details about your set up? Hardware, software, details of the host OS, details of the guest OS, what is the workload where the problem is happening, etc? And it might be helpful if you opened a separate bug which we use exclusively for tracking your particular issue. One of the things which is often disastrous, and why I've largely given up on Ubuntu's launchpad, is when multiple people assume that the problem that they are seeing is the same as everyone else's. when in fact they could be separate bugs, or one of the bugs could be a hardware problem, etc. Then when one person doesn't give details about their hardware because they saw someone else already had, it makes finding the problems even harder. One thing to keep in mind when we are talking about problems found in the guest OS is that misconfigurations at the host OS level can be indistinguishable from hardware problems. So for example, depending on the caching policy used by KVM in the host OS issues, it may be that cache flush commands in the guest OS don't peculate down all the way down to the actual hardware platters. Thus, after a power fail event, even if everything is configured correctly at the guest OS, it can result in an inconsistent file system image because the KVM / the host OS didn't make sure everything got pushed to the disk after a journal commit. So knowing whether or not were any shutdowns caused either by KVM crashing, or the host OS crashing, or the guest OS crashing, is all very important in trying to debug things, because the error message reported simply means that there was an inconsistency detected in the allocation bitmaps. This can be caused by any number of things --- which is why doctors hate it when someone asks them what's wrong because "I have a cough" at a social occasion, and then someone else chimes in, "well, my grandfather had a cough right before he was diagnosed with terminal lung cancer"... Hi John, Me too! :p We're having similar symptoms. Running VMs with Wheezy (kernel version 3.2), SL6 (kernel version 2.6), and Jessie (kernel version 3.16). Only Jessie VMs go read-only sporadically even when running on the same host. I'm recording which VM host this is happening on, hoping to find a pattern, but so far not luck. Also, not easily reproducible. E.g. for me (maybe not same bug!) the hosts have both been running software RAID and have been using hardware RAID cards. Not all the VMs running 3.16 have had this problem, but some have had it multiple times. If I hit on a reliable failure I will start git bisecting between 3.2 and 3.16. (A brute force approach, but maybe helpful.) Chad. (In reply to Theodore Tso from comment #17) > John, are you willing to give us a lot more details about your set up? > Hardware, software, details of the host OS, details of the guest OS, what is > the workload where the problem is happening, etc? > > And it might be helpful if you opened a separate bug which we use > exclusively for tracking your particular issue. Ok, I've opened https://bugzilla.kernel.org/show_bug.cgi?id=102731 Please let me know if you need more info. > One thing to keep in mind when we are talking about problems found in the > guest OS is that misconfigurations at the host OS level can be > indistinguishable from hardware problems. So for example, depending on the > caching policy used by KVM in the host OS issues, it may be that cache flush > commands in the guest OS don't peculate down all the way down to the actual > hardware platters. Thus, after a power fail event, even if everything is > configured correctly at the guest OS, it can result in an inconsistent file > system image because the KVM / the host OS didn't make sure everything got > pushed to the disk after a journal commit. I have had one power fail, but since then mdadm has run a consistency check and found no inconsistencies. I'm also running regular (once a week) fsck's on (snapshots of) my filesystems. So far the fsck have never found a problem. This is on a HP ProLiant DL380 G7 with HP Smart Array P410 controller with 8 disks configured as RAID-6. S.M.A.R.T. reports the disks as healthy. The server has been running for roughly ten months before this issue occured the 1st of this month. fsck did some repairs. Two days later, 3rd, the issue occured again. What is the meaning of the suffix -8 as reported by https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/tree/fs/jbd2/journal.c#n2422 for j_devname? Regardless of actual device, all bug reports appear to have that -8 in common. Examples: "Aborting journal on device sda3-8." https://askubuntu.com/questions/595024/suddenly-read-only-file-system-ext4 "Aborting journal on device dm-2-8." https://bugzilla.kernel.org/show_bug.cgi?id=102731 "Aborting journal on device sda2:8." https://bbs.archlinux.org/viewtopic.php?id=70077 "Aborting journal on device vda1-8." https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672 "Aborting journal on device dm-1-8." https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/comments/26 "Aborting journal on device dm-9-8." https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/comments/28 "This" in my last comment referring to the following output, which I had been omitting unintentionally: kvm [12151]: vcpu0, guest rIP: 0xffffffff010410a8 disabled perfctr wrmsr: 0xc1 data 0xabcd kvm [14901]: vcpu0, guest rIP: 0xffffffff010410a8 disabled perfctr wrmsr: 0xc1 data 0xabcd blk_update_request: critical medium error, dev sda, sector 1736391592 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 Aborting journal on device sda4-8. EXT4-fs error (device sda4): ext4_journal_check_start:83: comm worker: Detected aborted journal EXT4-fs (sda4): Remounting filesystem read-only The first two lines may not be related, but at least some (all?) bug reports on the topic mentioned the machine in question being used as hypervisor. |