Bug 213137 - NVMe device file system corruption immediately after mkfs
Summary: NVMe device file system corruption immediately after mkfs
Status: RESOLVED INSUFFICIENT_DATA
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-19 00:44 UTC by Brian T. McKee
Modified: 2022-02-12 10:43 UTC (History)
0 users

See Also:
Kernel Version: 5.11.21
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Brian T. McKee 2021-05-19 00:44:42 UTC
When upgrading from 5.11.0 to 5.12.X or 5.11.21 there was an issue that cropped up with an ext4 partition on an NVMe drive. The drive crashed with an error on a boot up with the later kernels. I have subsequently gone back to 5.11 on the ARMv7 and it works fine.

For the following keep in mind that nvme0n1p1 is the correct partition.

The following is the kernel panic I received from 5.11.21:

[    2.764492] EXT4-fs error (device nvme0n1p1): ext4_get_journal_inode:5227: inode #8: comm swapper/0: iget: checksum invalid
[    2.774712]  mmcblk0: p1 p2 p3
[    2.775694] EXT4-fs (nvme0n1p1): no journal found
[    2.791532] exFAT-fs (nvme0n1p1): invalid boot record signature
[    2.797446] exFAT-fs (nvme0n1p1): failed to read boot sector
[    2.803118] exFAT-fs (nvme0n1p1): failed to recognize exfat type
[    2.809739] List of all partitions:
[    2.813229] 0100            8192 ram0 
[    2.813238]  (driver?)
[    2.819324] 0101            8192 ram1 
[    2.819332]  (driver?)
[    2.825465] 103:00000  244198584 nvme0n1 
[    2.825475]  (driver?)
[    2.831830]   103:00001  244194304 nvme0n1p1 41a7d0bc-dd30-544f-8854-7939357a793d
[    2.831840] 
[    2.840788] b300         7847936 mmcblk0 
[    2.840797]  driver: mmcblk
[    2.847572]   b301         1024000 mmcblk0p1 a2b2f070-01
[    2.847581] 
[    2.854374]   b302         1843200 mmcblk0p2 a2b2f070-02
[    2.854382] 
[    2.861169]   b303           10240 mmcblk0p3 a2b2f070-03
[    2.861177] 
[    2.867950] No filesystem could mount root, tried: 
[    2.867955]  ext3
[    2.872821]  ext4
[    2.874740]  ext2
[    2.876658]  vfat
[    2.878576]  exfat
[    2.880504]  ntfs
[    2.882510] 
[    2.885912] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(259,1)



After booting mmc, I ran e2fsck:

threads /mnt # e2fsck /dev/sdb  
e2fsck 1.46.2 (28-Feb-2021) 
ext2fs_open2: Bad magic number in super-block 
e2fsck: Superblock invalid, trying backup blocks... 
e2fsck: Bad magic number in super-block while trying to open /dev/sdb 
The superblock could not be read or does not describe a valid ext2/ext3/ext4 
filesystem. If the device is valid and it really contains an ext2/ext3/ext4 
filesystem (and not swap or ufs or something else), then the superblock 
is corrupt, and you might try running e2fsck with an alternate superblock: 
   e2fsck -b 8193 <device> 
 or 
   e2fsck -b 32768 <device> 
/dev/sdb contains `DOS/MBR boot sector; partition 1 : ID=0x83, start-CHS (0x0,32,33), end-CHS (0x199,250,33), startsector 2048, 500101120 sectors, extended 
partition table (last)' data

I was not able to repair this at all. I even blanked the partition table and created a new partition and as soon as it was made, it failed mount and then fsck. I tried ext4 and ext2:

root@cyclone5:/mnt# mount /dev/nvme0n1p1 /mnt/gentoo
[  811.671376] EXT4-fs error (device nvme0n1p1): ext4_get_journal_inode:5227: inode #8: comm mount: iget: bad extra_isize 51821 (inode size 256)
[  811.684101] EXT4-fs (nvme0n1p1): no journal found
mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error.
root@cyclone5:/mnt# fsck.ext4 /dev/nvme0n1p1
e2fsck 1.45.3 (14-Jul-2019)
/dev/nvme0n1p1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p1: 11/15630336 files (0.0% non-contiguous), 1258688/62514766 blocks
root@cyclone5:/mnt# fsck.ext4 /dev/nvme0n1p1
e2fsck 1.45.3 (14-Jul-2019)
/dev/nvme0n1p1: clean, 11/15630336 files, 1258688/62514766 blocks
root@cyclone5:/mnt# mount /dev/nvme0n1p1 /mnt/gentoo
[  868.054311] EXT4-fs error (device nvme0n1p1): ext4_get_journal_inode:5227: inode #8: comm mount: iget: bad extra_isize 51821 (inode size 256)
[  868.067073] EXT4-fs (nvme0n1p1): no journal found
mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error.
root@cyclone5:/mnt# fsck.ext4 /dev/nvme0n1p1        
e2fsck 1.45.3 (14-Jul-2019)
/dev/nvme0n1p1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/nvme0n1p1: 11/15630336 files (0.0% non-contiguous), 1258688/62514766 blocks
root@cyclone5:/mnt# fsck.ext4 /dev/nvme0n1p1
e2fsck 1.45.3 (14-Jul-2019)
/dev/nvme0n1p1: clean, 11/15630336 files, 1258688/62514766 blocks
root@cyclone5:/mnt# mkfs.ext2 /dev/nvme0n1p1 
mke2fs 1.45.3 (14-Jul-2019)
/dev/nvme0n1p1 contains a ext4 file system
        created on Tue May 18 23:21:23 2021
Proceed anyway? (y,N) y
Discarding device blocks: done                            
Creating filesystem with 62514766 4k blocks and 15630336 inodes
Filesystem UUID: 0e5ae116-2982-4ad5-b178-2099b6bda7f2
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872

Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done     

root@cyclone5:/mnt# mount /dev/nvme0n1p1 /mnt/gentoo
[  926.309614] EXT2-fs (nvme0n1p1): error: can't find an ext2 filesystem on dev nvme0n1p1.
mount: /mnt/gentoo: wrong fs type, bad option, bad superblock on /dev/nvme0n1p1, missing codepage or helper program, or other error.
root@cyclone5:/mnt# fsck.ext2 /dev/nvme0n1p1         
e2fsck 1.45.3 (14-Jul-2019)
/dev/nvme0n1p1: clean, 11/15630336 files, 996093/62514766 blocks
root@cyclone5:/mnt# uname -a
Linux cyclone5 5.11.21-wtec #1 SMP Mon May 17 16:36:49 PDT 2021 armv7l GNU/Linux
root@cyclone5:/mnt# halt

I'm having a similar problem (but maybe not related) with a jmicron USB to PCIe controller on my x86-64 laptop. It seems to have the same problem creating usable partitions on the NVMe. I have not verified that I can get it to work with an older kernel. I'll have to hand install and build it as I have already removed the old version of the kernel from my Gentoo install.

I have many x86-64 machines running 5.12.4 with ext4 partitions on nvme and I'm a little nervous and thinking I might want to go back for a while.

Perhaps someone broke backward compatibility on the filesystem driver?

I have to dig into this because I can't ship embedded hardware with nvme issues and I don't like limiting myself to old kernels, so I thought I'd come here and ask for advice. Keep in mind I'm a hardware engineer, so be nice and I'll let you borrow my soldering iron from time to time.
Comment 1 Brian T. McKee 2022-02-12 10:43:15 UTC
This ticket can be closed. I think the issue is better explained in ticket: 

https://bugzilla.kernel.org/show_bug.cgi?id=215595

Note You need to log in before you can comment on or make changes to this bug.