Created attachment 304789 [details] full boot sequence where XFS fails Background: I've seen that happen on few machines now, both consumer (my personal laptop) and enterprise (VMs on server with ECC memory) so I think any kind of disk corruption could be ruled out here. Common things between the machines is that all of them were on Debian, dist-upgraded multiple times. IIRC the filesystem was originally formatted on 4.9 kernel. So basically XFS was formatted long time ago: xfs_db> version versionnum [0xb4b4+0x8a] = V4,NLINK,DIRV2,ATTR,ALIGN,LOGV2,EXTFLG,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT xfs_info /dev/vda2 meta-data=/dev/vda2 isize=256 agcount=4, agsize=624064 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0, rmapbt=0 = reflink=0 bigtime=0 inobtcount=0 nrext64=0 data = bsize=4096 blocks=2496256, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=0 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 And now *some* of them started to crash near-immediately on boot. After upgrading to Debian's 6.1 I get the crash, booting the old version (either 5,17, 5.10 or 4.19) works fine. I also tried 6.5~rc4 from experimental to the same effect, and 6.3 on my laptop (where it only crashes one of the partitions, as in attachment The other interesting part is that the server mentioned in logs have a HA pair, that was formatted at the same date, used near-same but doesn't (so far) have same behaviour. In both cases XFS complains about in-memory corruption but I don't believe that's the case, aside of one of the systems having ECC memory (and dozen other VMs running without problem) I also memtested them for ~hour just to be sure. I also booted that VM on other machine with same effect. The log in question: [ 16.115185] ------------[ cut here ]------------ [ 16.115856] WARNING: CPU: 2 PID: 646 at fs/xfs/xfs_inode.c:1831 xfs_iunlink+0x165/0x1e0 [xfs] [ 16.118200] Modules linked in: tcp_diag inet_diag binfmt_misc ext4 ghash_clmulni_intel sha512_ssse3 sha512_generic crc16 mbcache jbd2 aesni_intel crypto_simd cryptd cirrus drm_shmem_helper drm_kms_helper i6300esb virtio_balloon watchdog pcspkr button joydev evdev serio_raw loop fuse drm efi_pstore dm_mod configfs qemu_fw_cfg virtio_rng rng_core ip_tables x_tables autofs4 hid_generic usbhid hid xfs libcrc32c crc32c_generic ata_generic virtio_net virtio_blk net_failover failover uhci_hcd ata_piix crct10dif_pclmul crct10dif_common ehci_hcd libata virtio_pci crc32_pclmul floppy crc32c_intel virtio_pci_legacy_dev virtio_pci_modern_dev virtio scsi_mod usbcore psmouse scsi_common virtio_ring usb_common i2c_piix4 [ 16.129279] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Not tainted 6.5.0-0-amd64 #1 Debian 6.5~rc4-1~exp1 [ 16.129290] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1 01/01/2007 [ 16.129293] RIP: 0010:xfs_iunlink+0x165/0x1e0 [xfs] [ 16.134173] Code: 89 4c 24 04 72 2d e8 ea 5f f1 d8 8b 74 24 04 48 8d bd e0 00 00 00 e8 8a 40 94 d9 48 89 c3 48 85 c0 74 07 48 83 78 20 00 75 26 <0f> 0b e8 24 9c f1 d8 eb 13 f3 0f 1e fa 48 c7 c6 be fe 64 c0 4c 89 [ 16.134177] RSP: 0018:ffffae6740c67b60 EFLAGS: 00010246 [ 16.134189] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 [ 16.134192] RDX: 0000000000000000 RSI: ffff9810817bfd98 RDI: 0000000000088b40 [ 16.134194] RBP: ffff9810809cd800 R08: ffff9810817bff28 R09: ffff9810809cd8e0 [ 16.134196] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9810861e2ae0 [ 16.134199] R13: 00000000000101c0 R14: ffff981081642900 R15: ffff981084217c00 [ 16.146474] FS: 00007f89734e06c0(0000) GS:ffff981119d00000(0000) knlGS:0000000000000000 [ 16.146478] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.146480] CR2: 00007f8972b04000 CR3: 00000001024a6002 CR4: 00000000000206e0 [ 16.146493] Call Trace: [ 16.146533] <TASK> [ 16.146536] ? xfs_iunlink+0x165/0x1e0 [xfs] [ 16.153113] ? __warn+0x81/0x130 [ 16.153193] ? xfs_iunlink+0x165/0x1e0 [xfs] [ 16.154796] ? report_bug+0x191/0x1c0 [ 16.155433] ? handle_bug+0x3c/0x80 [ 16.155468] ? exc_invalid_op+0x17/0x70 [ 16.155472] ? asm_exc_invalid_op+0x1a/0x20 [ 16.155498] ? xfs_iunlink+0x165/0x1e0 [xfs] [ 16.159010] xfs_rename+0xaf9/0xe50 [xfs] [ 16.159285] xfs_vn_rename+0xfe/0x170 [xfs] [ 16.161080] ? __pfx_bpf_lsm_inode_permission+0x10/0x10 [ 16.161137] vfs_rename+0xb7e/0xd40 [ 16.163241] ? do_renameat2+0x57a/0x5f0 [ 16.163256] do_renameat2+0x57a/0x5f0 [ 16.163277] __x64_sys_rename+0x43/0x50 [ 16.165707] do_syscall_64+0x60/0xc0 [ 16.165740] ? do_syscall_64+0x6c/0xc0 [ 16.165746] ? do_syscall_64+0x6c/0xc0 [ 16.165749] ? do_syscall_64+0x6c/0xc0 [ 16.165754] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 16.165776] RIP: 0033:0x7f89b7277997 [ 16.170373] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00 b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8 [ 16.170377] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000052 [ 16.170382] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX: 00007f89b7277997 [ 16.170384] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI: 00007f8972b5fb30 [ 16.170386] RBP: 00007f8975544480 R08: 0000000000000009 R09: 0000000000000fa0 [ 16.170388] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f89734dfca0 [ 16.170399] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15: 00007f8975544480 [ 16.170407] </TASK> [ 16.170408] ---[ end trace 0000000000000000 ]--- [ 16.170419] XFS (vda2): Internal error xfs_trans_cancel at line 1104 of file fs/xfs/xfs_trans.c. Caller xfs_rename+0x613/0xe50 [xfs] [ 16.201330] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Tainted: G W 6.5.0-0-amd64 #1 Debian 6.5~rc4-1~exp1 [ 16.201335] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1 01/01/2007 [ 16.201337] Call Trace: [ 16.201377] <TASK> [ 16.201382] dump_stack_lvl+0x47/0x60 [ 16.201412] xfs_trans_cancel+0x131/0x150 [xfs] [ 16.214058] xfs_rename+0x613/0xe50 [xfs] [ 16.218956] xfs_vn_rename+0xfe/0x170 [xfs] [ 16.218956] ? __pfx_bpf_lsm_inode_permission+0x10/0x10 [ 16.222083] vfs_rename+0xb7e/0xd40 [ 16.222083] ? do_renameat2+0x57a/0x5f0 [ 16.222083] do_renameat2+0x57a/0x5f0 [ 16.222083] __x64_sys_rename+0x43/0x50 [ 16.222083] do_syscall_64+0x60/0xc0 [ 16.222083] ? do_syscall_64+0x6c/0xc0 [ 16.222083] ? do_syscall_64+0x6c/0xc0 [ 16.222083] ? do_syscall_64+0x6c/0xc0 [ 16.222083] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 16.222083] RIP: 0033:0x7f89b7277997 [ 16.222083] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00 b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8 [ 16.222083] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000052 [ 16.222083] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX: 00007f89b7277997 [ 16.222083] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI: 00007f8972b5fb30 [ 16.222083] RBP: 00007f8975544480 R08: 0000000000000009 R09: 0000000000000fa0 [ 16.222083] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f89734dfca0 [ 16.222083] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15: 00007f8975544480 [ 16.222083] </TASK> [ 16.223078] XFS (vda2): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x14a/0x150 [xfs] (fs/xfs/xfs_trans.c:1105). Shutting down filesystem. [ 16.223941] systemd-journald[251]: /var/log/journal/48d6a6734183b423cc1f60686f4553bb/system.journal: IO error, rotating. [ 16.224548] XFS (vda2): Please unmount the filesystem and rectify the problem(s)
Please try running xfs_repair on the filesystems in question, and capture the output. (you can use xfs_repair -n to do a dry run if you prefer, it will yield the same basic information.) My guess is that you will find complaints about unlinked inodes - please let us know.
It did, thanks for help! Why is that reported as "corruption of in-memory data" ? The filesystem on other machine also had exactly 3 disconnected inodes Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... agi unlinked bucket 0 is 559936 in ag 2 (inode=34114368) agi unlinked bucket 42 is 175466 in ag 2 (inode=33729898) agi unlinked bucket 53 is 198581 in ag 2 (inode=33753013) - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 33729898, would move to lost+found disconnected inode 33753013, would move to lost+found disconnected inode 34114368, would move to lost+found Phase 7 - verify link counts...
It's essentially an unexpected/inconsistent in-memory state, as opposed to an on-disk structure that was found to be corrupt. I presume that it boots ok now post-repair? Do you know if this was the root or /boot filesystem or something else? It's still a mystery about how filesystems get into this state; we should never have a clean filesystem that requires no log recovery, but with unlinked inodes ... recovery is supposed to clear that. It may have persisted on this filesystem for a very long time and it's just recent code changes that have started tripping over it, but I've always had a hunch that /boot seems to show the problem more often.
> It's essentially an unexpected/inconsistent in-memory state, as opposed to an > on-disk structure that was found to be corrupt. Shouldn't that also set filesystem as "dirty" ? The problem is that this *basically* makes system unbootable without intervention; OS thinks it is clean so it doesn't run xfs_repair, then driver crashes without marking it unclean, reboot and process repeats. The crash also blows every mounted XFS system which means even if log partition doesn't have that problem none of logs will persist. I had to reformat /var/log in ext4 to even gather them on my laptop > I presume that it boots ok now post-repair? Yes > Do you know if this was the root or /boot filesystem or something else? It's > still a mystery about how filesystems get into this state; we should never > have a clean filesystem that requires no log recovery, but with unlinked > inodes ... recovery is supposed to clear that. It was root in both cases, we keep /boot on ext4 So far (well, we got few hundred more machines to upgrade) I've only seen that on old ones, might be some bug that was fixed but left the mark on filesystem ? > It may have persisted on this filesystem for a very long time and it's just > recent code changes that have started tripping over it, but I've always had a > hunch that /boot seems to show the problem more often. That would track, I only saw that on old machines (I think they were formatted around 4.9 kernel release, some even earlier). I just had another case on machine but this time reading certain files triggered it.
Ok thanks, the other instance I recently saw of this problem likely also started on a rather old kernel.
I still have image of that VM with the problem if you want me to check something on it
What might be most useful is to create an xfs_metadump image of the problematic filesystem (with -o if you are ok with showing filenames in the clear) and from there we can examine things. A metadump image is metadata only, no file data, and compresses well. This can then be turned back into a filesystem image for analysis.
Created attachment 304795 [details] Metadata dump
Created attachment 304796 [details] Metadata dump after xfs_repair
I've attached metadata dump before and after xfs_repair
Another question - Did the systems repeatedly fail to boot, or fail once and then succeed? If anything at all had happened on the filesystem prior to encountering the problem, I think that the next boot should have seen a dirty log, and cleaned up the problem as a result. But if the first action on the fs was to create a tmpfile or unlink a file, we might shut down before the log ever gets dirty and requires replay.
The system was never marked as dirty so fsck (which fixed the problem) wasn't happening on boot. So yeah, once I upgraded kernel if it failed the first time it failed on every reboot after till manually fscking or downgrading kernel. On the one machine where it failed after the boot (reading of certain files triggered it), that too was repeatable Maybe on "corruption of in-memory data" driver should also mark FS as dirty? After all if data is corrupted from actual memory error (and not like here, from loading bad data) there is nonzero chance some of that data ended up being written to disk
Was VMware (the hypervisor) ever involved here, eg. were these VMware guests, was VMware Tools installed, were they converted from VMware to KVM, or anything similar?
Nope and I've seen same problem on bare metal
No problems. We had a similar bug reported internally that happens on VMware guests, and I'm just trying to rule out VMware as a factor.
On Wed, Aug 09, 2023 at 07:10:03PM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217769 > > --- Comment #15 from Richard W.M. Jones (rjones@redhat.com) --- > No problems. We had a similar bug reported internally that > happens on VMware guests, and I'm just trying to rule out VMware > as a factor. Does this: https://lore.kernel.org/linux-xfs/20230829232043.GE28186@frogsfrogsfrogs/T/#u help in any way? --D > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are watching the assignee of the bug.
We're experiencing the same bug following a data migration to new servers. The servers are all running a fresh install of Debian 12 with brand new hardware. So far in the past 3 days we've had 2 mounts fail with: [28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_rename+0x61a/0xea0 [xfs] [28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098). Shutting down filesystem. [28797.488595] XFS (sdn): Please unmount the filesystem and rectify the problem(s) Both occurred in the same function on separate servers: xfs_rename+0x61a/0xea0 Neither mounts are the root filesystem. versionnum [0xbcf5+0x18a] = V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME meta-data=/dev/sdk isize=512 agcount=17, agsize=268435455 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 data = bsize=4096 blocks=4394582016, imaxpct=50 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Please let me know if I can provide more information.
On Tue, Nov 14, 2023 at 03:57:06PM +0000, bugzilla-daemon@kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=217769 > > Grant Millar (grant@cylo.net) changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > CC| |grant@cylo.net > > --- Comment #17 from Grant Millar (grant@cylo.net) --- > We're experiencing the same bug following a data migration to new servers. > > The servers are all running a fresh install of Debian 12 with brand new > hardware. > > So far in the past 3 days we've had 2 mounts fail with: > > [28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of > file > fs/xfs/xfs_trans.c. Caller xfs_rename+0x61a/0xea0 [xfs] > [28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at > xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098). Shutting down > filesystem. > [28797.488595] XFS (sdn): Please unmount the filesystem and rectify the > problem(s) > > Both occurred in the same function on separate servers: > xfs_rename+0x61a/0xea0 > > Neither mounts are the root filesystem. This should be fixed in 6.6, could you try that and report back? (See "xfs: reload entire unlinked bucket lists") --D > > versionnum [0xbcf5+0x18a] = > > V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME > > meta-data=/dev/sdk isize=512 agcount=17, agsize=268435455 > blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=0 > = reflink=1 bigtime=1 inobtcount=1 > nrext64=0 > data = bsize=4096 blocks=4394582016, imaxpct=50 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=521728, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > Please let me know if I can provide more information. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You are watching the assignee of the bug.