Bug 102731
Summary: | I have a cough. | ||
---|---|---|---|
Product: | File System | Reporter: | John Hughes (john) |
Component: | ext4 | Assignee: | fs_ext4 (fs_ext4) |
Status: | NEW --- | ||
Severity: | normal | CC: | andre.arnold, cwseys, k.kotlenga, pmhahn, szg00000, tytso, vaclav.ovsik |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.16 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
config for kernel 3.18.19
fsck output after reboot. attachment-20424-0.html |
Description
John Hughes
2015-08-12 08:47:05 UTC
The hardware: A couple of Dell PowerEdge 840's (Xeon 3050, 4G). disks SATA internal and SCSI in external cabinets Host OS Debian 7.7 kernel 3.16, version 3.16.5-1~bpo70+1 Disks assembled into RAID1 arrays then passed to kvm guest as virtio. Guest OS Debian 7.7, kernel either as host, or now 3.18.19 Guest uses LVM2 All filesystems on guest are ext3, now handled by ext4 subsystem. host uptime: 10:59:22 up 24 days, 14:21, 2 users, load average: 0.39, 0.51, 0.90 (Last reboot caused by power cut) PCI config on host: 00:00.0 Host bridge: Intel Corporation E7230/3000/3010 Memory Controller Hub 00:01.0 PCI bridge: Intel Corporation E7230/3000/3010 PCI Express Root Port 00:1c.0 PCI bridge: Intel Corporation NM10/ICH7 Family PCI Express Port 1 (rev 01) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01) 00:1d.0 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #1 (rev 01) 00:1d.1 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #2 (rev 01) 00:1d.2 USB controller: Intel Corporation NM10/ICH7 Family USB UHCI Controller #3 (rev 01) 00:1d.7 USB controller: Intel Corporation NM10/ICH7 Family USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.3 SMBus: Intel Corporation NM10/ICH7 Family SMBus Controller (rev 01) 01:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09) 01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) 02:08.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) 03:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI Bridge A (rev 09) 04:01.0 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01) 04:01.1 SCSI storage controller: Adaptec AHA-3960D / AIC-7899A U160/m (rev 01) 04:02.0 Ethernet controller: Intel Corporation 82543GC Gigabit Ethernet Controller (Copper) (rev 02) 05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5721 Gigabit Ethernet PCI Express (rev 11) 07:00.0 Ethernet controller: Altima (nee Broadcom) AC9100 Gigabit Ethernet (rev 15) 07:05.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI ES1000 (rev 02) Disk configuration on host: [0:0:0:0] cd/dvd LITE-ON CD-ROM LTN-4891S NDS3 /dev/sr0 [2:0:0:0] disk HITACHI HUS151473VL3800 S110 /dev/sda [2:0:1:0] disk FUJITSU MAP3735NC 5605 /dev/sdb [2:0:2:0] disk FUJITSU MAP3735NC 5605 /dev/sdc [2:0:3:0] disk FUJITSU MAP3735NC 5605 /dev/sdd [2:0:8:0] disk QUANTUM ATLAS10K2-TY367J DA40 /dev/sde [2:0:10:0] disk QUANTUM ATLAS10K2-TY367J DA40 /dev/sdf [3:0:0:0] disk IBM IC35L036UCDY10-0 S27E /dev/sdg [3:0:1:0] disk QUANTUM ATLAS10K2-TY367J DA40 /dev/sdh [3:0:2:0] disk FUJITSU MAP3735NC 5608 /dev/sdi [3:0:3:0] disk FUJITSU MAP3735NC 5608 /dev/sdj [3:0:4:0] disk FUJITSU MAP3735NC 5608 /dev/sdk [3:0:5:0] disk FUJITSU MAP3735NC 5608 /dev/sdl [3:0:9:0] disk IBM DMVS18M 0220 /dev/sdm [3:0:10:0] disk IBM DMVS18D 02B0 /dev/sdn [3:0:11:0] disk IBM IC35L036UCDY10-0 S27F /dev/sdo [3:0:12:0] disk SEAGATE ST318305LC 2203 /dev/sdp [3:0:13:0] disk FUJITSU MAX3073NC 0104 /dev/sdq [3:0:15:0] process Dell 12 BAY U2W CU 0209 - [4:0:0:0] disk ATA ST3808110AS J /dev/sds [4:0:1:0] disk ATA ST3808110AS J /dev/sdr [4:0:2:0] disk ATA ST3808110AS J /dev/sdt [4:0:3:0] disk ATA ST3808110AS J /dev/sdu mdadm config on host: Personalities : [raid1] md122 : active raid1 sda1[2] sdj1[3] 71680902 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md123 : active raid1 sdb1[0] sdk1[1] 71616384 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md124 : active raid1 sdg1[3] sdf1[2] 35549071 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md125 : active raid1 sdh1[2] sde1[3] 35549071 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md126 : active raid1 sdc1[0] sdi1[2] 71680902 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sdd[0] sdl[1] 71621696 blocks super 1.2 [2/2] [UU] md2 : active (auto-read-only) raid1 sdt1[2] sdu1[1] 78121912 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sds2[0] sdr2[1] 77623224 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md0 : active raid1 sds1[0] sdr1[1] 498676 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk Last dmesg messages on host: [1225118.952197] md: data-check of RAID array md0 [1225118.952291] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225118.952384] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225118.952528] md: using 128k window, over a total of 498676k. [1225119.166383] md: delaying data-check of md1 until md0 has finished (they share one or more physical units) [1225119.178563] md: data-check of RAID array md122 [1225119.178644] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225119.178735] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225119.178877] md: using 128k window, over a total of 71680902k. [1225119.192249] md: data-check of RAID array md123 [1225119.192331] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225119.192422] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225119.192564] md: using 128k window, over a total of 71616384k. [1225119.205378] md: data-check of RAID array md124 [1225119.205461] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225119.210851] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225119.216548] md: using 128k window, over a total of 35549071k. [1225119.230698] md: data-check of RAID array md125 [1225119.236498] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225119.242171] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225119.247721] md: using 128k window, over a total of 35549071k. [1225119.265408] md: data-check of RAID array md126 [1225119.270962] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225119.276552] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225119.282158] md: using 128k window, over a total of 71680902k. [1225119.296781] md: data-check of RAID array md127 [1225119.302834] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225119.309046] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225119.315202] md: using 128k window, over a total of 71621696k. [1225127.248170] md: md0: data-check done. [1225127.284640] md: data-check of RAID array md1 [1225127.290534] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [1225127.296453] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check. [1225127.302521] md: using 128k window, over a total of 77623224k. [1226450.960973] md: md1: data-check done. [1226756.371905] md: md125: data-check done. [1227131.051785] md: md127: data-check done. [1227141.431898] md: md124: data-check done. [1227808.276946] md: md126: data-check done. [1227847.417636] md: md123: data-check done. [1227978.744149] md: md122: data-check done. Configuration of guest: <domain type='kvm' id='4'> <name>olympic</name> <uuid>b42a5958-4258-1e0b-cbd0-05a0768c250b</uuid> <memory unit='KiB'>2097152</memory> <currentMemory unit='KiB'>1048576</currentMemory> <vcpu placement='static'>1</vcpu> <os> <type arch='x86_64' machine='pc-1.1'>hvm</type> <boot dev='hd'/> </os> <clock offset='utc' adjustment='reset'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/kvm</emulator> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:70a'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:70b'/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:34c'/> <target dev='vdc' bus='virtio'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:34d'/> <target dev='vdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:70e'/> <target dev='vde' bus='virtio'/> <alias name='virtio-disk4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:70f'/> <target dev='vdf' bus='virtio'/> <alias name='virtio-disk5'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <interface type='bridge'> <mac address='00:16:3e:4b:86:33'/> <source bridge='calvaedi'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/0'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/0'> <source path='/dev/pts/0'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </memballoon> </devices> <seclabel type='none'/> </domain> What is the guest doing? It's a NFSv4 server for my home directories. It's my email server (sendmail and cyrus imap). I've seen the problem (bug 89621) on the root filesystem, the home filesystem and the imap filesystem. Most often on the root, a little less frequently the homedir filesystem. Can you upload your kernel .config file so I can try to replicate your environment as closely as possible? -- Ted Created attachment 184741 [details]
config for kernel 3.18.19
Well, it's just happened again. [1728664.535798] EXT4-fs (dm-2): pa ffff88001ce2d368: logic 512, phys. 19774464, len 512 [1728664.538530] EXT4-fs error (device dm-2): ext4_mb_release_inode_pa:3773: group 603, free 449, pa_free 448 [1728664.541378] Aborting journal on device dm-2-8. [1728664.549493] EXT4-fs (dm-2): Remounting filesystem read-only [1728664.551281] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:4775: Journal has aborted [1728664.590678] EXT4-fs error (device dm-2): ext4_journal_check_start:56: Detected aborted journal [1728664.599749] EXT4-fs error (device dm-2) in ext4_orphan_del:2688: Journal has aborted [1728664.612118] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:4775: Journal has aborted I'll attach the fsck output. Created attachment 186381 [details]
fsck output after reboot.
How many guest VM's are you running, and how often does the file system corruption happens? It sounds like it's about once every 2-3 weeks? Would you be willing to experiment running the VM without using the RAID1 so we're bypassing the MD layer? One of the common factors between those people who are reporting this are (a) they are using KVM, and (b) they are using RAID. Now, I'm doing all of my testing using KVM, and I've never seen anything like this. So I wonder if the operative issue is happening at the MD layer.... You workload is common enough that I would expect that if this was happening for everyone who was using ext4, I should be getting a lot more bug reports. > How many guest VM's are you running One at the moment. > how often does the file system corruption happens? It sounds like it's > about once every 2-3 weeks? With the Debian 3.16 kernel it was happening about once every 4 days. I installed my 3.18.19 kernel on 24/7/2015 and have seen the bug twice: 11/8/2015 and 31/8/2015. > Would you be willing to experiment running the VM without using the RAID1 so > we're bypassing the MD layer? I guess I'll have to try that. It's going to be a monumental pain in the arse: The guests /home is a 160GiB LVM volume striped across 4 mdadm raid1's. What I can try first is just disabling one side of the mdadm mirrors, if the problem re-occurs I can get rid of mdadm completely. (This will be the first time I've run one of my servers without disk mirroring since 1986. Ugh.) If you're getting failures faster in kernel 3.16 you might also get failures faster by backtracking 3.15.x 3.14.x etc towards 3.2 until you have to wait long enough to be confident you've passed the bug. Also, less surgery on Linux raid. Chad. On 01/09/15 16:43, bugzilla-daemon@bugzilla.kernel.org wrote: > If you're getting failures faster in kernel 3.16 you might also get failures > faster by backtracking > 3.15.x > 3.14.x > etc towards 3.2 Well, since a similar bug is known to be present before 3.16.something that doesn't seem like a good idea. But you're right in one thing, since the bug happens faster with the Debian 3.16 than with my 3.18.19 I might try running that one without the raid. Ok, as of a few minutes ago I am running the Debian 3.16 kernel again, with all the mdadm devices unmirrored. More news if I see the problem again. So it's been 12 days, and previously when you were using the Debian 3.16 kernel, it was triggering once every four days, right? Can I assume that your silence indicates that you haven't seen a problem to date? If so, then it really does seen that it might be an interaction between LVM/MD and KVM. So if that's the case, then the next thing to ask is to try to figure out what might be the triggering cause. A couple of things come to mind: 1) Some failure to properly handle a flush cache command being sent to the MD device. This combined to either a power failure or a crash of the guest OS (depending on how KVM is configured), might explain a block update getting lost. The fact that the block bitmap is out of sync with the block group descriptor is consistent with this failure. However, if you were seeing failures once every four days, that would imply that the guest OS and/or host OS would be crashing at that or about that level of frequency, and you haven't reported that. 2) Some kind a race between a 4k write and a RAID1 resync leading to a block write getting lost. Again, this reported data corruption is consistent with this theory --- but this also requires the guest OS crashing due to some kind of kernel crash or KVM/qemu shutdown and/or host OS crash / power failure, as in (1) above. If you weren't seeing these failures once every four days or so, then this isn't a likely explanation. 3) Some kind of corruption caused by the TRIM command being sent to the RAID/MD device, possibly racing with a block bitmap update. This could be caused either by the file system being mounted with the -o discard mount option, or by fstrim getting run out of cron, or by e2fsck explicitly being asked to discard unused blocks (with the "-E discard" option). 4) Some kind of bug which happens rarely either in qemu, the host kernel or the guest kernel depending on how it communicates with the virtual disk. (i.e., virtio, scsi, ide, etc.) Virtio is the most likely use case, and so trying to change to use scsi emulation might be interesting. (OTOH, if the problem is specific to the MD layer, then this possibility is less likely.) So as far as #3 is concerned, can you check to see if you had fstrim enabled, or are mounting the file system with -o discard? On 28/09/15 19:06, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=102731 > > --- Comment #13 from Theodore Tso <tytso@mit.edu> --- > So it's been 12 days, and previously when you were using the Debian 3.16 > kernel, it was triggering once every four days, right? Can I assume that > your > silence indicates that you haven't seen a problem to date? I haven't seen the problem, but unfortunately I'm running 3.18.19 at the moment (I screwed up on the last boot and let it boot the default kernel). I haven't had time to reboot. So I'd like to give it a bit more time. > > If so, then it really does seen that it might be an interaction between > LVM/MD > and KVM. > > So if that's the case, then the next thing to ask is to try to figure out > what > might be the triggering cause. A couple of things come to mind: > > 1) Some failure to properly handle a flush cache command being sent to the MD > device. This combined to either a power failure or a crash of the guest OS > (depending on how KVM is configured), might explain a block update getting > lost. The fact that the block bitmap is out of sync with the block group > descriptor is consistent with this failure. However, if you were seeing > failures once every four days, that would imply that the guest OS and/or host > OS would be crashing at that or about that level of frequency, and you > haven't > reported that. I haven't had any host or guest crashes. > > 2) Some kind a race between a 4k write and a RAID1 resync leading to a block > write getting lost. Again, this reported data corruption is consistent with > this theory --- but this also requires the guest OS crashing due to some kind > of kernel crash or KVM/qemu shutdown and/or host OS crash / power failure, as > in (1) above. If you weren't seeing these failures once every four days or > so, > then this isn't a likely explanation. No crashes. > > 3) Some kind of corruption caused by the TRIM command being sent to the > RAID/MD device, possibly racing with a block bitmap update. This could be > caused either by the file system being mounted with the -o discard mount > option, or by fstrim getting run out of cron, or by e2fsck explicitly being > asked to discard unused blocks (with the "-E discard" option). I'm not using "-o discard", or fstrim, I've never used the "-E discard" option to fsck. > > 4) Some kind of bug which happens rarely either in qemu, the host kernel or > the guest kernel depending on how it communicates with the virtual disk. > (i.e., virtio, scsi, ide, etc.) Virtio is the most likely use case, and so > trying to change to use scsi emulation might be interesting. (OTOH, if the > problem is specific to the MD layer, then this possibility is less likely.) > > So as far as #3 is concerned, can you check to see if you had fstrim enabled, > or are mounting the file system with -o discard? > I'm a bit overwhelmed with work at the moment so I haven't had time to read this message with the care it deserves, I'll get back to you with more detail next week. On 28/09/15 19:06, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=102731 > > --- Comment #13 from Theodore Tso <tytso@mit.edu> --- > So it's been 12 days, and previously when you were using the Debian 3.16 > kernel, it was triggering once every four days, right? Can I assume that > your > silence indicates that you haven't seen a problem to date? As I said I was, due to a silly mistake (I rebooted the system and forgot that the default kernel was still the 3.18.19 one) not running the Debian kernel. And, after 19 days the problem has shown up again: Oct 7 17:34:24 olympic kernel: [1657429.788105] EXT4-fs (dm-2): pa ffff880004211a50: logic 512, phys. 10298368, len 512 Oct 7 17:34:24 olympic kernel: [1657429.790412] EXT4-fs error (device dm-2): ext4_mb_release_inode_pa:3773: group 314, free 497, pa_free 495 Oct 7 17:34:24 olympic kernel: [1657429.793168] Aborting journal on device dm-2-8. Oct 7 17:34:24 olympic kernel: [1657429.795367] EXT4-fs (dm-2): Remounting filesystem read-only All the filesystem where it happened, "dm-2", is a LVM volume: # lvdisplay -m /dev/olympic/olympic-home --- Logical volume --- LV Path /dev/olympic/olympic-home LV Name olympic-home VG Name olympic LV UUID drA6nQ-zbcu-SDLc-UeXH-foML-fPcB-ve1HQf LV Write Access read/write LV Creation host, time , LV Status available # open 1 LV Size 160.00 GiB Current LE 40960 Segments 3 Allocation inherit Read ahead sectors auto - currently set to 512 Block device 253:2 --- Segments --- Logical extent 0 to 34999: Type striped Stripes 2 Stripe size 64.00 KiB Stripe 0: Physical volume /dev/vdf Physical extents 0 to 17499 Stripe 1: Physical volume /dev/vdb Physical extents 0 to 17499 Logical extent 35000 to 40861: Type striped Stripes 2 Stripe size 64.00 KiB Stripe 0: Physical volume /dev/vdd Physical extents 4339 to 7269 Stripe 1: Physical volume /dev/vdc Physical extents 4388 to 7318 Logical extent 40862 to 40959: Type striped Stripes 2 Stripe size 64.00 KiB Stripe 0: Physical volume /dev/vdd Physical extents 7270 to 7318 Stripe 1: Physical volume /dev/vdc Physical extents 4339 to 4387 The devices, vdf, vdb, vdd, vdc are all unmirrored: <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:70b'/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:34c'/> <target dev='vdc' bus='virtio'/> <alias name='virtio-disk2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:34d'/> <target dev='vdd' bus='virtio'/> <alias name='virtio-disk3'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </disk> ... <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/disk/by-id/md-name-olympic:70f'/> <target dev='vdf' bus='virtio'/> <alias name='virtio-disk5'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </disk> lrwxrwxrwx 1 root root 11 Oct 7 17:48 /dev/disk/by-id/md-name-olympic:34c -> ../../md125 lrwxrwxrwx 1 root root 11 Oct 7 17:48 /dev/disk/by-id/md-name-olympic:34d -> ../../md124 lrwxrwxrwx 1 root root 11 Oct 7 17:48 /dev/disk/by-id/md-name-olympic:70b -> ../../md126 lrwxrwxrwx 1 root root 11 Oct 7 17:48 /dev/disk/by-id/md-name-olympic:70f -> ../../md122 md122 : active raid1 sda1[2] sdj1[3](F) 71680902 blocks super 1.2 [2/1] [U_] -- md124 : active raid1 sdg1[3] sdf1[2](F) 35549071 blocks super 1.2 [2/1] [U_] -- md125 : active raid1 sdh1[2] sde1[3](F) 35549071 blocks super 1.2 [2/1] [U_] -- md126 : active raid1 sdc1[0] sdi1[2](F) 71680902 blocks super 1.2 [2/1] [U_] Minor detail: The ext4 decided the filesystem was corrupt on 7/10, but on 4/10 I'd run an online check on it and it was OK. (took a LVM snapshot, fsck'd that). Oct 4 02:44:27 olympic online-check: .IN: Online fileystem check /dev/olympic/olympic-home Oct 4 02:52:00 olympic online-check: .NO: Volume /dev/olympic/olympic-home checked OK Oct 4 02:52:01 olympic online-check: .IN: Snapshot /dev/olympic/olympic-home-snap used 0.10% I just realized something which I forgot to ask you --- what file system are you using on the *host*? You said that on the guest you are using ext3 file systems with the ext4 driver --- but what about on the host OS side? I will say that using a full ext4 file system is far more stable on 3.18.21 than using an ext3 file system in compatibility mode. Ext4 with the 4.2 kernel will support ext3 with zero unexpected test failures, but there are a handful of test failures that are showing up with the 3.18.21 kernel. I've backported a few bug fixes, but they haven't shown up in the stable kernel series yet, and there are still a half-dozen test failures that I haven't had time to characterized yet. (Note: support for long-term stable kernel is something I do as a very low priority background task. My priority is upstream regressions and upstream development, and ideally someone would help with testing the stable kernels and identifying patches that require manual backports, but I haven't found a sucker yet.) One of the reasons why I ask is that the PUNCH hole functionality was relatively new in the 3.18 kernel, and KVM uses it --- and I am suspicious that there might be some bug fixes that didn't land in the 3.18 kernel. So one thing that might be worth trying is to get a 4.2.3 kernel for both your Host and Guest kernels, and see what that does for you. > I just realized something which I forgot to ask you --- what file system are > you using on the *host*? You said that on the guest you are using ext3 file > systems with the ext4 driver --- but what about on the host OS side? The volumes passed to the guest are just mdadm volumes, no fs from the point of view of the host. The host filesystems are all ext3 (on lvm, on mdadm) , the host runs the debian 3.16 backport kernel, but it does very little filesystem I/O. > I will say that using a full ext4 file system is far more stable on 3.18.21 > than using an ext3 file system in compatibility mode. What's the magic command to convert an ext3 fs to ext4? The wisdom of net seems to say: tune2fs -O extents,uninit_bg,dir_index /dev/xxx fsck.ext4 -yfD /dev/xxx Or would it be better to make a new FS and copy the files? > One of the reasons why I ask is that the PUNCH hole functionality was > relatively new in the 3.18 kernel, and KVM uses it --- and I am suspicious > that > there might be some bug fixes that didn't land in the 3.18 kernel. Was it used in 3.16? I first saw the problem with debians 3.16 based kernel. > So one > thing that might be worth trying is to get a 4.2.3 kernel for both your Host > and Guest kernels, and see what that does for you. Ok, I'll investigate that. Might be interesting to convert the filesystem to XFS via a series of partition adds / copies / removes . At least then the ext4 connection could be more firmly established. (Actually, might want to "convert" to ext4 instead of XFS first and verify that the problem does not go away.) Chad. So yesterday it happened again: Oct 14 18:08:48 olympic kernel: [605450.518092] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:757: group 434, block bitmap and bg descriptor inconsistent: 1834 vs 1833 free clusters Oct 14 18:08:48 olympic kernel: [605450.522151] Aborting journal on device dm-2-8. Oct 14 18:08:48 olympic kernel: [605450.524218] EXT4-fs (dm-2): Remounting filesystem read-only Oct 14 18:08:48 olympic kernel: [605450.572244] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:4775: Journal has aborted Oct 14 18:08:48 olympic kernel: [605450.577125] EXT4-fs error (device dm-2) in ext4_orphan_del:2688: Journal has aborted Oct 14 18:08:48 olympic kernel: [605450.583858] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:4775: Journal has aborted (In reply to cwseys from comment #19) > Might be interesting to convert the filesystem to XFS via a series of > partition adds / copies / removes . At least then the ext4 connection > could be more firmly established. Well, since I've demirrored the system (ugh) I have enough space to make a copies of my filesystmes. > (Actually, might want to "convert" to ext4 instead of XFS first and verify > that the problem does not go away.) I'm planning to move to ext4 over the weekend. I've not yet decided whether to do it in-place or by copying the data to a new filesystem. Will doing an in place conversion be enough? But on reflection maybe it's better to move to 4.2 as Theodore says: "Ext4 with the 4.2 kernel will support ext3 with zero unexpected test failures, but there are a handful of test failures that are showing up with the 3.18.21 kernel." Decisions, decisions. So I've booted the guest with kernel 4.3.2: Linux olympic 4.2.3-jh1 #2 SMP Thu Oct 15 19:04:09 CEST 2015 x86_64 GNU/Linux I'm leaving the host on debians 3.16.5-1~bpo70+1 kernel for the moment. If I see the problem again I'll move the host to 4.2.3. If I see the problem with host and guest at 4.2.3 I'll convert the filesystems (in place) to ext4. If I see the problems with in-place converted ext4 I'll rebuild the filesysems as native ext4 and copy all the files over. Seems like a good plan? Hi John,
Thanks for working on this!
I've attempted to send debian infrastructure an email asking if they've
noticed this problem. If not, I'd like to find the difference. (Maybe they
don't use ext4 in VMs.)
> If I see the problem with host and guest at 4.2.3 I'll convert the
> filesystems (in place) to ext4.
>
> If I see the problems with in-place converted ext4 I'll rebuild the
> filesysems as native ext4 and copy all the files over.
I am using formatted as ext4 "native ext4" and seem to see the same
problem. But will be interesting to hear your results over a couple months .
:)
Chad.
P.S. There is a bug report for Ubuntu VMs going read-only has similar backtraces. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672 There are some with "ext4_mb_generate_buddy" (NOT what John originally reported) and some with "ext4_mb_release_inode_pa" (John's original bug report). They seem to be mutually exclusive - if one appears the other does not. The ones with "ext4_mb_release_inode_pa" show up in kernel 3.13.? (Ubuntu's 3.13.0-54-generic). Theo said of backtraces containing "ext4_mb_generate_buddy" - "I believe the bug in question was fixed in a backport that first showed up in v3.16.2" C. Well I've just had the problem happen when running the 4.2.3 kernel (on the guest, the host is still 3.16). This time it was on the root filesystem, so no log messages unfortunately. Will try and schedule some time to move the host to 4.2.3 this week. And now I have had (with 4.2.3) an error on my /home filesystem: Oct 20 16:06:58 olympic kernel: [ 8027.714881] EXT4-fs error (device dm-2): ext4_mb_generate_buddy:758: group 519, block bitmap and bg descriptor inconsistent: 22589 vs 22588 free clusters Oct 20 16:06:58 olympic kernel: [ 8027.718806] Aborting journal on device dm-2-8. Oct 20 16:06:58 olympic kernel: [ 8027.722387] EXT4-fs (dm-2): Remounting filesystem read-only Oct 20 16:06:58 olympic kernel: [ 8027.724291] EXT4-fs error (device dm-2) in ext4_free_blocks:4889: Journal has aborted Oct 20 16:06:58 olympic kernel: [ 8027.727369] EXT4-fs error (device dm-2) in ext4_do_update_inode:4504: Journal has aborted Oct 20 16:06:58 olympic kernel: [ 8027.735809] EXT4-fs error (device dm-2) in ext4_truncate:3789: IO failure Oct 20 16:06:58 olympic kernel: [ 8027.739795] EXT4-fs error (device dm-2) in ext4_orphan_del:2895: Journal has aborted Oct 20 16:06:58 olympic kernel: [ 8027.743834] EXT4-fs error (device dm-2) in ext4_do_update_inode:4504: Journal has aborted Note that this is the ext4_mb_generate_buddy error cwseys mentioned. AARGH! When I ran fsck only one error was detected: "block bitmap differences +17016320" As the system was down I've moved the host to 4.2.3 (the users were already screaming, so...) I've just noticed that the system had pretty low memory -- only 1G. I've upped it to 2G. Don't know whether that will make a difference. I see some allocation failures, mostly relating to apache and tcp. Oct 17 05:26:10 olympic kernel: [53046.469494] apache2: page allocation failure: order:1, mode:0x204020 Oct 17 05:26:10 olympic kernel: [53046.469812] apache2: page allocation failure: order:1, mode:0x204020 Oct 17 05:26:10 olympic kernel: [53046.469985] apache2: page allocation failure: order:1, mode:0x204020 Oct 17 05:26:10 olympic kernel: [53046.470154] apache2: page allocation failure: order:1, mode:0x204020 Oct 19 22:43:46 olympic kernel: [288101.737120] apache2: page allocation failure: order:1, mode:0x204020 Oct 19 22:43:46 olympic kernel: [288101.737833] kswapd0: page allocation failure: order:1, mode:0x204020 Oct 20 06:12:46 olympic kernel: [315042.431636] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 06:12:46 olympic kernel: [315042.431952] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 06:12:46 olympic kernel: [315042.631774] swapper/0: page allocation failure: order:1, mode:0x204020 Oct 20 06:12:46 olympic kernel: [315042.635743] swapper/0: page allocation failure: order:1, mode:0x204020 Oct 20 07:03:32 olympic kernel: [318088.158197] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 07:03:32 olympic kernel: [318088.160930] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 07:03:32 olympic kernel: [318088.163835] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 10:44:54 olympic kernel: [331369.850853] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 10:44:54 olympic kernel: [331369.851186] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 10:44:54 olympic kernel: [331369.851357] apache2: page allocation failure: order:1, mode:0x204020 Oct 20 10:54:04 olympic kernel: [331920.651891] apache2: page allocation failure: order:1, mode:0x204020 If what we're seeing is the same thing... 8 of 10 VMs with <=1GB which have gone read-only. 0 of 2 VMs with >1GB where the problem has never occurred. (This is an estimate: May have missed some which have not gone read-only.) So let's see what happens! C. But, FWIW awhile back I had a test VM with 1GB RAM and ran 'stress' on the VM for a couple of weeks with no problems. C. Ok, its now been 35 days since the last problem. I'm currently running: Host: 4.2.3 Guest: 4.2.3 Unfortunately when I moved the host from 3.16 to 4.2.3 I also changed the amount of memory available for the guest, poor style for debugging but I was getting a lot of flack from disgruntled users. I will now try moving the guest back to 3.16 with 1G of ram to see whether the problem was solved by changing the host kernel or by changing the guest memory size. More news as it becomes available. Well I've (finally) rebooted with guest kernel 3.16 and 1Go memory. It took a long time to convince the users to risk a (potentially) unstable system after a long time of all working OK. For memory -- the problem went away when I moved the host kernel to 4.2.3 and increased the memory of the guest system to 2Go. Changing the guest kernel from 4.2.3 to 3.16 *did not* make the problem go away. Possible results -- 1. problem does not show up, implies the fix is moving the host kernel to 4.2.3 2. problem does show up, implies the fix is either: a. increasing the guest memory to 2Go, or b. moving the host to 4.2.3 *and* increasing the guest memory to 2Go c. moving the guest to 4.2.3 *and* increasing the guest memory to 2Go d. moving the host to 4.2.3 *and* moving the guest to 4.2.3 e. moving the host to 4.2.3 *and* moving the guest to 4.2.3 *and* increasing the guest memory to 2Go. More news as it happens. Hello, Could you summarize your findings? It is hard to keep track after so much time. :) Here is a beginning: Hversion Gversion Gmem read-only? uptime 3.16 3.16 ???? yes ~14 days 4.2.3 4.2.3 2 GB no ~60 days 4.2.3 3.16 ? GB inprogress Thanks! Chad. I first started seeing the problem when I moved the guest kernel to 3.16 (i.e. started using the ext4 driver for my ext3 fileystems). 1. Host 3.16, Guest 3.16, 1G memory, filesystem goes readonly after around 4 days Then I upgraded the guest to 3.18.19 2. Host 3.16 Guest 3.18.19, 1G memory, filesystem goes readonly after around 20 days Then I upgraded the guest to 4.3.2 3. Host 3.16 Guest 4.3.2, 1G memory, filesystem goes readonly after 4 days Then I upgraded the host to 4.3.2, noticed the guest had only 1G of memory (and the guest kernel was bitching about not having enough memory during some TCP operations) so I increased the guest memory to 2G, Bad debugging style, but I was desperate. 4. Host 4.3.2, Guest 4.3.2, 2G memory, system ran for 90 days with no problem. So now I've set the guest to 3.16 with 1G of memory. We'll see what happens next. Thanks! I think you mistyped 4.2.3 -> 4.3.2 . Here's the table (helps me see any pattern b/c it is closer together. Less memory required. :) ) Hversion Gversion Gmem time before read-only 3.16 3.16 1 GB ~4 days 3.16 3.18.19 1 GB ~20 days 3.16 4.2.3 1 GB 4 days 4.2.3 4.2.3 2 GB not after ~90 days 4.2.3 3.16 1 GB inprogress > Then I upgraded the host to 4.3.2, noticed the guest had only 1G of > memory (and the guest kernel was bitching about not having enough memory > during some TCP operations) so I increased the guest memory to 2G, Bad > debugging style, but I was desperate. Debugging and production don't get along well together. :) > So now I've set the guest to 3.16 with 1G of memory. We'll see what > happens next. I haven't tried changing the host's kernel version yet. Will be interested to know how it turns out! Thanks for debugging and hopefully you'll help us (Debian/Ubuntu users) find a workaround at least. Thanks again, Chad. Created attachment 200981 [details] attachment-20424-0.html On 22/01/16 16:20, bugzilla-daemon@bugzilla.kernel.org wrote: > I think you mistyped 4.2.3 -> 4.3.2 . Duh, yes. # uname -a Linux baltic*4.2.3*-jh1 #2 SMP Thu Oct 15 19:04:09 CEST 2015 x86_64 GNU/Linux Hey all together, thanks for the information provided, it helped me a lot fixing my VM crashes in KVM. After downgrading the Gversion from 3.16 the crashes were gone. Chad, whats your experience with your latest test? Running Hversion 4.3.2 and Gversion on 3.16? Thanks in advance. Cheers André My current status is: Host: 4.2.3 Guest: 3.16.7 Guest uptime: 20 days. I'm going to leave it like this for some more time, but my current feeling is that the ext4 subsystem in a KVM quest is only stable if the host kernel is > 4.something. The ext3 subsystem did not seem to be vunerable to whatever the host was doing wrong before 4.whatever. Hello, please take a look on https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818502 IMO the problem is in the hypervisor in 3.16 and older Intel CPU. Currently I'm testing hypervisor 4.1 and it seems to be OK. -- Zito Issue attributes should be changed. This is not about filesystem in VM. The colleague had fail with btrfs in VM too... -------- Přeposlaná zpráva -------- Předmět: [logcheck] ttrss.home.nahorany.net 2016-03-11 15:00 +0100 Security Events Datum: Fri, 11 Mar 2016 15:00:16 +0100 (CET) Od: logcheck system account <logcheck@home.nahorany.net> Komu: centrino@nahorany.net Security Events for kernel =-=-=-=-=-=-=-=-=-=-=-=-=- Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 2195232 Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 2198280 Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 2973336 Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 3351896 Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 2198280 Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 2973336 Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 3351896 System Events =-=-=-=-=-=-= Mar 11 14:41:36 ttrss sshd[23130]: Received disconnect from 172.16.0.22: 11: disconnected by user Mar 11 14:41:06 ttrss kernel: [2816393.980768] end_request: I/O error, dev vda, sector 2195232 Mar 11 14:23:32 ttrss ansible-command: Invoked with warn=True executable=None chdir=None _raw_params=apt-get update removes=None creates=None _uses_shell=False Mar 11 14:25:14 ttrss ansible-command: Invoked with warn=True executable=None chdir=None _raw_params=apt-get upgrade -y removes=None creates=None _uses_shell=False Mar 11 14:39:19 ttrss kernel: [2816286.989244] device-mapper: uevent: version 1.0.3 Mar 11 14:39:19 ttrss kernel: [2816287.638040] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel@redhat.com Mar 11 14:39:19 ttrss kernel: [2816286.989244] device-mapper: uevent: version 1.0.3 Mar 11 14:39:19 ttrss kernel: [2816287.638040] device-mapper: ioctl: 4.27.0-ioctl (2013-10-30) initialised: dm-devel@redhat.com Mar 11 14:39:26 ttrss kernel: [2816295.076906] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled Mar 11 14:39:26 ttrss kernel: [2816295.076906] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled Mar 11 14:39:27 ttrss kernel: [2816295.173689] JFS: nTxBlock = 1938, nTxLock = 15510 Mar 11 14:39:27 ttrss kernel: [2816295.414255] ntfs: driver 2.1.30 [Flags: R/W MODULE]. Mar 11 14:39:27 ttrss kernel: [2816295.721879] QNX4 filesystem 0.2.3 registered. Mar 11 14:39:27 ttrss kernel: [2816295.173689] JFS: nTxBlock = 1938, nTxLock = 15510 Mar 11 14:39:27 ttrss kernel: [2816295.414255] ntfs: driver 2.1.30 [Flags: R/W MODULE]. Mar 11 14:39:27 ttrss kernel: [2816295.721879] QNX4 filesystem 0.2.3 registered. Mar 11 14:39:28 ttrss kernel: [2816296.129309] fuse init (API version 7.23) Mar 11 14:39:28 ttrss kernel: [2816296.129309] fuse init (API version 7.23) Mar 11 14:39:28 ttrss os-prober: debug: /dev/vda1: is active swap Mar 11 14:39:29 ttrss ansible-command: Invoked with warn=True executable=None chdir=None _raw_params=apt-get -y upgrade removes=None creates=None _uses_shell=False Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:41:06 ttrss kernel: [2816393.980768] BTRFS: bdev /dev/vda2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Mar 11 14:45:30 ttrss ansible-command: Invoked with warn=True executable=None chdir=None _raw_params=apt-get -y upgrade removes=None creates=None _uses_shell=False -- Zito I'm not sure we need to leave this bug open at all. It sounds like it's some combination of a buggy Intel CPU and a buggy kernel code in the KVM/virtualization support. The fix seems to be to update to a newer host kernel (4.1 or newer). This is problematic for Debian users, but at this point it's a bug for the Debian kernel maintainers, and this bugzilla is for the upstream kernel.... Yes, you are right. I found a commit https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?h=7dec5603b6b8dc4c3e1c65d318bd2a5a8c62a424 using git bisect. I hope this commit can be used for Debian kernel package. Thanks Vaclav, I want to really, really thank you for doing the bisect. Looking at the testing procedure you used for the git bisect here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=818502 I'm quite confident you've found the commit that needs to be backported into the Debian 3.16 kernel. Given that the commit was added in 3.17, it should be fairly easy to backport it into the Debian kernel. It's really too bad this commit hadn't been marked with a cc:stable@vger.kernel.org tag, but to be fair I've sometimes made this mistake, either out of sheer forgetfulness or because I didn't recognize the seriousness of the bugs that a commit could address. To be fair to the KVM maintainers, the commit description doesn't really list the potential impacts of the bug it was fixing. Also with 20-20 hindsight, it's perhaps unfortunate that during this time period there is a real divergence[1] of kernels that were used by distributions. So a bug that would only show up on certain generations of Intel chipsets, and only when used in virtualization as a host, is precisely the sort of bug that it is not likely to be noticed until it goes into enterprise-wide deployment --- and so the fact that other distributions didn't standardize on a single kernel in this time period (and Debian standardized on the oldest kernel in this rough time period, 3.16, and the bug in question was fixed in 3.17) meant that it's not all that surprising that this slipped through. And while it would have been more convenient if Debian had been willing to switch over to a 3.18 based stable series, it wasn't compatible with their release schedule. [1] Ubuntu 14.04 LTS stayed on 3.16 for only 18 months before moving to 3.19; Fedora 20 was on 3.11, and Fedora 21 jumped to 3.17, etc. Vaclav, thanks again for finding a simple/easy repro, and then bisecting until you found the commit that is needed for the 3.16 debian kernel! Any ideas why ext4 in guests with kernel version 3.2.x (Wheezy) did not go read-only, but guests with 3.16.x (Jessie) did? I suppose running a vmhost with kernel 3.16 and bisecting the guest's kernel from 3.2 to 3.16 would give us this answer. But not sure how interesting it would be given that the root cause is KVM not simulating real machine correctly. C. Sorry for my lack of reactivity, I've been a bit ill (had a terrible cough, actually). I've been running with a 4.2.3 kernel in the host since 19/1/2016 (around 80 days) with no problems. Yesterday my host was rebooted (power cut) and ran with the 3.16 kernel for one day and I saw the problem again. Based on that, and Vaclav's superb research I am convinced that the problem is in the host system, probably fixed by the patch he found. |