Bug 217769 - XFS crash on mount on kernels >= 6.1
Summary: XFS crash on mount on kernels >= 6.1
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: XFS (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: FileSystem/XFS Default Virtual Assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-07 16:35 UTC by Mariusz Gronczewski
Modified: 2023-11-14 23:46 UTC (History)
6 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
full boot sequence where XFS fails (98.63 KB, text/plain)
2023-08-07 16:35 UTC, Mariusz Gronczewski
Details
Metadata dump (2.31 MB, application/x-xz)
2023-08-08 14:51 UTC, Mariusz Gronczewski
Details
Metadata dump after xfs_repair (2.31 MB, application/x-xz)
2023-08-08 14:54 UTC, Mariusz Gronczewski
Details

Description Mariusz Gronczewski 2023-08-07 16:35:35 UTC
Created attachment 304789 [details]
full boot sequence where XFS fails

Background:

I've seen that happen on few machines now, both consumer (my personal laptop) and enterprise (VMs on server with ECC memory) so I think any kind of disk corruption could be ruled out here.

Common things between the machines is that all of them were on Debian, dist-upgraded multiple times. IIRC the filesystem was originally formatted on 4.9 kernel. So basically XFS was formatted long time ago:

xfs_db> version
versionnum [0xb4b4+0x8a] = V4,NLINK,DIRV2,ATTR,ALIGN,LOGV2,EXTFLG,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT

xfs_info /dev/vda2 
meta-data=/dev/vda2              isize=256    agcount=4, agsize=624064 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0, rmapbt=0
         =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
data     =                       bsize=4096   blocks=2496256, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

And now *some* of them started to crash near-immediately on boot. After upgrading to Debian's 6.1 I get the crash, booting the old version (either 5,17, 5.10 or 4.19) works fine. I also tried 6.5~rc4 from experimental to the same effect, and 6.3 on my laptop (where it only crashes one of the partitions, as in attachment

The other interesting part is that the server mentioned in logs have a HA pair, that was formatted at the same date, used near-same but doesn't (so far) have same behaviour.

In both cases XFS complains about in-memory corruption but I don't believe that's the case, aside of one of the systems having ECC memory (and dozen other VMs running without problem) I also memtested them  for ~hour just to be sure. I also booted that VM on other machine with same effect.

The log in question: 

[   16.115185] ------------[ cut here ]------------
[   16.115856] WARNING: CPU: 2 PID: 646 at fs/xfs/xfs_inode.c:1831 xfs_iunlink+0x165/0x1e0 [xfs]
[   16.118200] Modules linked in: tcp_diag inet_diag binfmt_misc ext4 ghash_clmulni_intel sha512_ssse3 sha512_generic crc16 mbcache jbd2 aesni_intel crypto_simd cryptd cirrus drm_shmem_helper drm_kms_helper i6300esb virtio_balloon watchdog pcspkr button joydev evdev serio_raw loop fuse drm efi_pstore dm_mod configfs qemu_fw_cfg virtio_rng rng_core ip_tables x_tables autofs4 hid_generic usbhid hid xfs libcrc32c crc32c_generic ata_generic virtio_net virtio_blk net_failover failover uhci_hcd ata_piix crct10dif_pclmul crct10dif_common ehci_hcd libata virtio_pci crc32_pclmul floppy crc32c_intel virtio_pci_legacy_dev virtio_pci_modern_dev virtio scsi_mod usbcore psmouse scsi_common virtio_ring usb_common i2c_piix4
[   16.129279] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Not tainted 6.5.0-0-amd64 #1  Debian 6.5~rc4-1~exp1
[   16.129290] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1 01/01/2007
[   16.129293] RIP: 0010:xfs_iunlink+0x165/0x1e0 [xfs]
[   16.134173] Code: 89 4c 24 04 72 2d e8 ea 5f f1 d8 8b 74 24 04 48 8d bd e0 00 00 00 e8 8a 40 94 d9 48 89 c3 48 85 c0 74 07 48 83 78 20 00 75 26 <0f> 0b e8 24 9c f1 d8 eb 13 f3 0f 1e fa 48 c7 c6 be fe 64 c0 4c 89
[   16.134177] RSP: 0018:ffffae6740c67b60 EFLAGS: 00010246
[   16.134189] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[   16.134192] RDX: 0000000000000000 RSI: ffff9810817bfd98 RDI: 0000000000088b40
[   16.134194] RBP: ffff9810809cd800 R08: ffff9810817bff28 R09: ffff9810809cd8e0
[   16.134196] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9810861e2ae0
[   16.134199] R13: 00000000000101c0 R14: ffff981081642900 R15: ffff981084217c00
[   16.146474] FS:  00007f89734e06c0(0000) GS:ffff981119d00000(0000) knlGS:0000000000000000
[   16.146478] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.146480] CR2: 00007f8972b04000 CR3: 00000001024a6002 CR4: 00000000000206e0
[   16.146493] Call Trace:
[   16.146533]  <TASK>
[   16.146536]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.153113]  ? __warn+0x81/0x130
[   16.153193]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.154796]  ? report_bug+0x191/0x1c0
[   16.155433]  ? handle_bug+0x3c/0x80
[   16.155468]  ? exc_invalid_op+0x17/0x70
[   16.155472]  ? asm_exc_invalid_op+0x1a/0x20
[   16.155498]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.159010]  xfs_rename+0xaf9/0xe50 [xfs]
[   16.159285]  xfs_vn_rename+0xfe/0x170 [xfs]
[   16.161080]  ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[   16.161137]  vfs_rename+0xb7e/0xd40
[   16.163241]  ? do_renameat2+0x57a/0x5f0
[   16.163256]  do_renameat2+0x57a/0x5f0
[   16.163277]  __x64_sys_rename+0x43/0x50
[   16.165707]  do_syscall_64+0x60/0xc0
[   16.165740]  ? do_syscall_64+0x6c/0xc0
[   16.165746]  ? do_syscall_64+0x6c/0xc0
[   16.165749]  ? do_syscall_64+0x6c/0xc0
[   16.165754]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   16.165776] RIP: 0033:0x7f89b7277997
[   16.170373] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00 b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8
[   16.170377] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[   16.170382] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX: 00007f89b7277997
[   16.170384] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI: 00007f8972b5fb30
[   16.170386] RBP: 00007f8975544480 R08: 0000000000000009 R09: 0000000000000fa0
[   16.170388] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f89734dfca0
[   16.170399] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15: 00007f8975544480
[   16.170407]  </TASK>
[   16.170408] ---[ end trace 0000000000000000 ]---
[   16.170419] XFS (vda2): Internal error xfs_trans_cancel at line 1104 of file fs/xfs/xfs_trans.c.  Caller xfs_rename+0x613/0xe50 [xfs]
[   16.201330] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Tainted: G        W          6.5.0-0-amd64 #1  Debian 6.5~rc4-1~exp1
[   16.201335] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1 01/01/2007
[   16.201337] Call Trace:
[   16.201377]  <TASK>
[   16.201382]  dump_stack_lvl+0x47/0x60
[   16.201412]  xfs_trans_cancel+0x131/0x150 [xfs]
[   16.214058]  xfs_rename+0x613/0xe50 [xfs]
[   16.218956]  xfs_vn_rename+0xfe/0x170 [xfs]
[   16.218956]  ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[   16.222083]  vfs_rename+0xb7e/0xd40
[   16.222083]  ? do_renameat2+0x57a/0x5f0
[   16.222083]  do_renameat2+0x57a/0x5f0
[   16.222083]  __x64_sys_rename+0x43/0x50
[   16.222083]  do_syscall_64+0x60/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   16.222083] RIP: 0033:0x7f89b7277997
[   16.222083] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00 b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8
[   16.222083] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[   16.222083] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX: 00007f89b7277997
[   16.222083] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI: 00007f8972b5fb30
[   16.222083] RBP: 00007f8975544480 R08: 0000000000000009 R09: 0000000000000fa0
[   16.222083] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f89734dfca0
[   16.222083] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15: 00007f8975544480
[   16.222083]  </TASK>
[   16.223078] XFS (vda2): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x14a/0x150 [xfs] (fs/xfs/xfs_trans.c:1105).  Shutting down filesystem.
[   16.223941] systemd-journald[251]: /var/log/journal/48d6a6734183b423cc1f60686f4553bb/system.journal: IO error, rotating.
[   16.224548] XFS (vda2): Please unmount the filesystem and rectify the problem(s)
Comment 1 Eric Sandeen 2023-08-07 17:17:35 UTC
Please try running xfs_repair on the filesystems in question, and capture the output. (you can use xfs_repair -n to do a dry run if you prefer, it will yield the same basic information.)

My guess is that you will find complaints about unlinked inodes - please let us know.
Comment 2 Mariusz Gronczewski 2023-08-07 18:06:34 UTC
It did, thanks for help! Why is that reported as "corruption of in-memory data" ?

The filesystem on other machine also had exactly 3 disconnected inodes

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
agi unlinked bucket 0 is 559936 in ag 2 (inode=34114368)
agi unlinked bucket 42 is 175466 in ag 2 (inode=33729898)
agi unlinked bucket 53 is 198581 in ag 2 (inode=33753013)
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 33729898, would move to lost+found
disconnected inode 33753013, would move to lost+found
disconnected inode 34114368, would move to lost+found
Phase 7 - verify link counts...
Comment 3 Eric Sandeen 2023-08-07 19:01:09 UTC
It's essentially an unexpected/inconsistent in-memory state, as opposed to an on-disk structure that was found to be corrupt.

I presume that it boots ok now post-repair?

Do you know if this was the root or /boot filesystem or something else? It's still a mystery about how filesystems get into this state; we should never have a clean filesystem that requires no log recovery, but with unlinked inodes ... recovery is supposed to clear that.

It may have persisted on this filesystem for a very long time and it's just recent code changes that have started tripping over it, but I've always had a hunch that /boot seems to show the problem more often.
Comment 4 Mariusz Gronczewski 2023-08-07 19:40:19 UTC
> It's essentially an unexpected/inconsistent in-memory state, as opposed to an
> on-disk structure that was found to be corrupt.

Shouldn't that also set filesystem as "dirty" ? The problem is that this *basically* makes system unbootable without intervention; OS thinks it is clean so it doesn't run xfs_repair, then driver crashes without marking it unclean, reboot and process repeats. The crash also blows every mounted XFS system which means even if log partition doesn't have that problem none of logs will persist. I had to reformat /var/log in ext4 to even gather them on my laptop

> I presume that it boots ok now post-repair?

Yes

> Do you know if this was the root or /boot filesystem or something else? It's
> still a mystery about how filesystems get into this state; we should never
> have a clean filesystem that requires no log recovery, but with unlinked
> inodes ... recovery is supposed to clear that.

It was root in both cases, we keep /boot on ext4

So far (well, we got few hundred more machines to upgrade) I've only seen that on old ones, might be some bug that was fixed but left the mark on filesystem ?

> It may have persisted on this filesystem for a very long time and it's just
> recent code changes that have started tripping over it, but I've always had a
> hunch that /boot seems to show the problem more often.

That would track, I only saw that on old machines (I think they were formatted around 4.9 kernel release, some even earlier). I just had another case on machine but this time reading certain files triggered it.
Comment 5 Eric Sandeen 2023-08-07 20:15:11 UTC
Ok thanks, the other instance I recently saw of this problem likely also started on a rather old kernel.
Comment 6 Mariusz Gronczewski 2023-08-07 22:34:20 UTC
I still have image of that VM with the problem if you want me to check something on it
Comment 7 Eric Sandeen 2023-08-08 13:39:03 UTC
What might be most useful is to create an xfs_metadump image of the problematic filesystem (with -o if you are ok with showing filenames in the clear) and from there we can examine things.

A metadump image is metadata only, no file data, and compresses well. This can then be turned back into a filesystem image for analysis.
Comment 8 Mariusz Gronczewski 2023-08-08 14:51:53 UTC
Created attachment 304795 [details]
Metadata dump
Comment 9 Mariusz Gronczewski 2023-08-08 14:54:36 UTC
Created attachment 304796 [details]
Metadata dump after xfs_repair
Comment 10 Mariusz Gronczewski 2023-08-08 14:55:06 UTC
I've attached metadata dump before and after xfs_repair
Comment 11 Eric Sandeen 2023-08-08 21:54:04 UTC
Another question - Did the systems repeatedly fail to boot, or fail once and then succeed?

If anything at all had happened on the filesystem prior to encountering the problem, I think that the next boot should have seen a dirty log, and cleaned up the problem as a result.

But if the first action on the fs was to create a tmpfile or unlink a file, we might shut down before the log ever gets dirty and requires replay.
Comment 12 Mariusz Gronczewski 2023-08-08 22:07:19 UTC
The system was never marked as dirty so fsck (which fixed the problem) wasn't happening on boot. So yeah, once I upgraded kernel if it failed the first time it failed on every reboot after till manually fscking or downgrading kernel.

On the one machine where it failed after the boot (reading of certain files triggered it), that too was repeatable

Maybe on "corruption of in-memory data" driver should also mark FS as dirty? After all if data is corrupted from actual memory error (and not like here, from loading bad data) there is nonzero chance some of that data ended up being written to disk
Comment 13 Richard W.M. Jones 2023-08-09 16:43:45 UTC
Was VMware (the hypervisor) ever involved here, eg. were these
VMware guests, was VMware Tools installed, were they converted
from VMware to KVM, or anything similar?
Comment 14 Mariusz Gronczewski 2023-08-09 18:31:01 UTC
Nope and I've seen same problem on bare metal
Comment 15 Richard W.M. Jones 2023-08-09 19:10:03 UTC
No problems.  We had a similar bug reported internally that
happens on VMware guests, and I'm just trying to rule out VMware
as a factor.
Comment 16 Darrick J. Wong 2023-08-29 23:42:42 UTC
On Wed, Aug 09, 2023 at 07:10:03PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217769
> 
> --- Comment #15 from Richard W.M. Jones (rjones@redhat.com) ---
> No problems.  We had a similar bug reported internally that
> happens on VMware guests, and I'm just trying to rule out VMware
> as a factor.

Does this:
https://lore.kernel.org/linux-xfs/20230829232043.GE28186@frogsfrogsfrogs/T/#u

help in any way?

--D

> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.
Comment 17 Grant Millar 2023-11-14 15:57:06 UTC
We're experiencing the same bug following a data migration to new servers.

The servers are all running a fresh install of Debian 12 with brand new hardware.

So far in the past 3 days we've had 2 mounts fail with:

[28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c.  Caller xfs_rename+0x61a/0xea0 [xfs]
[28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098).  Shutting down filesystem.
[28797.488595] XFS (sdn): Please unmount the filesystem and rectify the problem(s)

Both occurred in the same function on separate servers: xfs_rename+0x61a/0xea0

Neither mounts are the root filesystem.

versionnum [0xbcf5+0x18a] = V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME

meta-data=/dev/sdk               isize=512    agcount=17, agsize=268435455 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=4394582016, imaxpct=50
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Please let me know if I can provide more information.
Comment 18 Darrick J. Wong 2023-11-14 23:46:06 UTC
On Tue, Nov 14, 2023 at 03:57:06PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217769
> 
> Grant Millar (grant@cylo.net) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |grant@cylo.net
> 
> --- Comment #17 from Grant Millar (grant@cylo.net) ---
> We're experiencing the same bug following a data migration to new servers.
> 
> The servers are all running a fresh install of Debian 12 with brand new
> hardware.
> 
> So far in the past 3 days we've had 2 mounts fail with:
> 
> [28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of
> file
> fs/xfs/xfs_trans.c.  Caller xfs_rename+0x61a/0xea0 [xfs]
> [28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at
> xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098).  Shutting down
> filesystem.
> [28797.488595] XFS (sdn): Please unmount the filesystem and rectify the
> problem(s)
> 
> Both occurred in the same function on separate servers:
> xfs_rename+0x61a/0xea0
> 
> Neither mounts are the root filesystem.

This should be fixed in 6.6, could you try that and report back?

(See "xfs: reload entire unlinked bucket lists")

--D

> 
> versionnum [0xbcf5+0x18a] =
>
> V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME
> 
> meta-data=/dev/sdk               isize=512    agcount=17, agsize=268435455
> blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1    bigtime=1 inobtcount=1
>          nrext64=0
> data     =                       bsize=4096   blocks=4394582016, imaxpct=50
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Please let me know if I can provide more information.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

Note You need to log in before you can comment on or make changes to this bug.