Bug 186551 - mkfs.ext4 tries to discard sector beyond the end of the device
Summary: mkfs.ext4 tries to discard sector beyond the end of the device
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-01 17:30 UTC by Tamas Vincze
Modified: 2016-11-02 14:30 UTC (History)
1 user (show)

See Also:
Kernel Version: 3.10.0-327.36.3.el7.x86_64
Tree: Mainline
Regression: No


Attachments
full dmesg (106.48 KB, text/plain)
2016-11-01 17:50 UTC, Tamas Vincze
Details

Description Tamas Vincze 2016-11-01 17:30:21 UTC
800GB Intel P3700 SSD has 195,352,576 4k sectors, but mkfs.ext4 tried to discard sector 1,530,955,776 that is beyond the end of the device.

# mkfs.ext4 /dev/nvme0n1p1
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: failed - Input/output error
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
48840704 inodes, 195352576 blocks
9767628 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2344615936
5962 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
	102400000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done     

[61825.159172] blk_update_request: I/O error, dev nvme0n1, sector 1530955776

# parted /dev/nvme0n1 "unit s" "print"
Model: Unknown (unknown)
Disk /dev/nvme0n1: 195353046s
Sector size (logical/physical): 4096B/4096B
Partition Table: gpt
Disk Flags: 

Number  Start  End         Size        File system  Name     Flags
 1      256s   195352831s  195352576s               primary
Comment 1 Eric Sandeen 2016-11-01 17:40:41 UTC
195352576 4k sectors is 1562820608 512 byte sectors

The discard at 1530955776 seems to be within the size of the device.

What does /proc/partitions say, what kernel are you running, and can you attach the full dmesg?
Comment 2 Tamas Vincze 2016-11-01 17:48:00 UTC
# cat /proc/partitions 
major minor  #blocks  name

   8        0  292935982 sda
   8        1     512000 sda1
   8        2    1048576 sda2
   8        3    1048576 sda3
   8        4  290325504 sda4
  11        0    1048575 sr0
 259        0  781412184 nvme0n1
 259        1  781410304 nvme0n1p1
 253        0   16777216 dm-0
 253        1   33554432 dm-1
 253        2  134217728 dm-2

I thought that the sector number in
"blk_update_request: I/O error, dev nvme0n1, sector 1530955776"
is a 4k sector number.
Comment 3 Tamas Vincze 2016-11-01 17:50:28 UTC
Created attachment 243511 [details]
full dmesg
Comment 4 Eric Sandeen 2016-11-01 17:51:11 UTC
I am pretty sure all of those messages will be in 512-byte sectors.

/proc/partitions is in 1k units (just to keep it interesting)

781412184 * 1024/512 = 1562824368 512-byte sectors - also past the sector it's trying to discard.

I wonder what happens if you try a blkdiscard for the entire device, and see if you get a failure from it as well.  (*** note that this will clear any data on the device, as it will discard every block on the device ***).

It seems unlikely that this is an ext4/e2fsprogs problem at this point.
Comment 5 Tamas Vincze 2016-11-01 19:04:40 UTC
It isn't in use yet, so here you go:

# blkdiscard -v /dev/nvme0n1p1 
blkdiscard: /dev/nvme0n1p1: BLKDISCARD ioctl failed: Input/output error

[602663.440080] blk_update_request: I/O error, dev nvme0n1, sector 1518338648
[602663.440106] blk_update_request: I/O error, dev nvme0n1, sector 1543504448
[602663.440124] blk_update_request: I/O error, dev nvme0n1, sector 1509950048
[602663.440143] blk_update_request: I/O error, dev nvme0n1, sector 1501561448
[602663.440160] blk_update_request: I/O error, dev nvme0n1, sector 1526727248
[602663.440178] blk_update_request: I/O error, dev nvme0n1, sector 1535115848
[602663.440196] blk_update_request: I/O error, dev nvme0n1, sector 1560281648
[602663.440234] blk_update_request: I/O error, dev nvme0n1, sector 1551893048
Comment 6 Eric Sandeen 2016-11-01 19:14:59 UTC
Well - I'm afraid that all I can tell you is that it's not an ext4 or e2fsprogs problem, then.  Issuing discards straight to the device yielded those errors with no filesystem code involved.
Comment 7 Eric Sandeen 2016-11-01 19:41:10 UTC
Those seem to be spaced out at roughly 4G boundaries - 4k short of 4G if my math is right.  Not sure what that implies though.

You might try with "-v" and/or trying smaller step values with blkdiscard to see if anything works.

What do /sys/block/nvme0n1/discard_alignment and /sys/block/nvme0n1/queue/discard_granularity contain?
Comment 8 Eric Sandeen 2016-11-01 19:42:41 UTC
And maybe /sys/block/nvme0n1/alignment_offset as well - perhaps this is some alignment problem with the discard requests, though I'd expect more than that small handful if that were the problem.
Comment 9 Tamas Vincze 2016-11-01 19:47:29 UTC
# cat /sys/block/nvme0n1/discard_alignment
4096
# cat /sys/block/nvme0n1/queue/discard_granularity
4096
# cat /sys/block/nvme0n1/alignment_offset
0

I ran it with "-v" but wasn't verbose at all.
Comment 10 Tamas Vincze 2016-11-01 19:54:09 UTC
# blkdiscard -v --offset 700G --step 1073741824 /dev/nvme0n1p1 
blkdiscard: /dev/nvme0n1p1: BLKDISCARD ioctl failed: Input/output error

[606050.283746] blk_update_request: I/O error, dev nvme0n1, sector 1547700224

# blkdiscard -v --length=700G /dev/nvme0n1p1
/dev/nvme0n1p1: Discarded 751619276800 bytes from the offset 0

Looks like it's having problems above 700G.
Comment 11 Tamas Vincze 2016-11-02 14:30:15 UTC
Updated the SSD's firmware and now it works:

# blkdiscard -v /dev/nvme0n1p1
/dev/nvme0n1p1: Discarded 800164151296 bytes from the offset 0

For the record: old firmware: 8DV10171, updated to: 8DV101F0
Oddly enough its release notes doesn't mention any discard-related fix.

Thanks Eric for your help!

Note You need to log in before you can comment on or make changes to this bug.