We are using machines with current official 4.4.72 longterm stable kernels from kernel.org. Storage is NFS where client and server use the same kernel. To make use of discard features we switched from NFS 4.0 to 4.2. When running qemu with raw disk images on NFS 4.2 we no longer can live migrate the VMs successfully (VM migration not disk migration). It works flawlessly on NFS 4.0. Process flow is as follows: - qemu running on source side - qemu started on target side (access to the same disk images on NFS) - start migration from source to destination - write data to disk inside VM - qemu wants to write data to disk - Write fails on source with EACCES (Permission denied) - qemu crashes activating a client NFS log on source gives the following output prior to the crash: ... nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 NFS: nfs_update_inode(0:46/19327352961 fh_crc=0x40f4f073 ct=2 info=0x2427e7f) NFS: change_attr change on server for file 0:46/19327352961 NFS: (0:46/19327352961) revalidation complete journal: operation failed: migration job: failed due to I/O error looking at the code it might be this area: if (fattr->valid & NFS_ATTR_FATTR_CHANGE) { if (inode->i_version != fattr->change_attr) { dprintk("NFS: change_attr change on server for file %s/%ld\n", inode->i_sb->s_id, inode->i_ino); invalid |= NFS_INO_INVALID_ATTR | NFS_INO_INVALID_DATA | NFS_INO_INVALID_ACCESS | NFS_INO_INVALID_ACL; if (S_ISDIR(inode->i_mode)) nfs_force_lookup_revalidate(inode); inode->i_version = fattr->change_attr; } } else { Difference to newer kernels show there are at least commits targeting that code area: ca0daa277acac1029f74d9fea838c9e507398226 NFS: Cache aggressively when file is open for writing 38512aa98a3feb6acd7da8f0ed5dade5b592b426 NFS: Don't flush caches for a getattr that races with writeback Is this the right analysis? If yes what patches are needed to fix the issue completely. And finally does it classify as a 4.4 longterm bug that should be fixed? More details can be found at: https://bugzilla.redhat.com/show_bug.cgi?id=1464787 That bug report was filed for distro specific kernel version. In the process of analysis we switched to upstream but the issue still occured. Thus that second bug report.
"Write fails on source with EACCES (Permission denied)" Is it possible that something during migration changed the file's permissions? Would it be possible to try the experiment with NFSv4.1? If it works with 4.1, then maybe there's a labeled NFS issue.
Looking at the redhat bz, it's clear that this is a labeled NFS issue. The revalidation may be relevant there only in so far as it's what caused the source client to recognize a label change. Let's close this and continue discussion on the redhat bz.