Bug 110231 - [Regression] Crash at blk_queue_split+0x22a/0x490
Summary: [Regression] Crash at blk_queue_split+0x22a/0x490
Status: RESOLVED CODE_FIX
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: Intel Linux
: P1 blocking
Assignee: Jens Axboe
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-01 15:35 UTC by Greg White
Modified: 2019-08-09 02:11 UTC (History)
3 users (show)

See Also:
Kernel Version: 4.4-rc7
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Segment split patch (500 bytes, patch)
2016-01-02 02:56 UTC, Jens Axboe
Details | Diff
Fix for split on first bio vector (602 bytes, application/octet-stream)
2016-01-03 16:15 UTC, Keith Busch
Details
Re-attaching as a patch. (602 bytes, patch)
2016-01-03 16:17 UTC, Keith Busch
Details | Diff
patch submitted to list (2.00 KB, patch)
2016-01-04 22:53 UTC, Keith Busch
Details | Diff

Description Greg White 2016-01-01 15:35:29 UTC
This just started happening with mainline, but I bisected it back to the following commit:

commit d3805611130af9b911e908af9f67a3f64f4f0914
Author: Keith Busch <keith.busch@intel.com>
Date:   Tue Dec 22 15:48:44 2015 -0700

    block: Split bios on chunk boundaries
    
    For h/w that advertise their block storage's underlying chunk size, it's
    a big performance win to not submit commands that cross them. This patch
    uses that criteria if it is provided. If it is not provided, this patch
    uses the max sectors as before.
    
    Signed-off-by: Keith Busch <keith.busch@intel.com>
    Signed-off-by: Jens Axboe <axboe@fb.com>


[  938.125561] kernel BUG at block/bio.c:1787!
[  938.127100] invalid opcode: 0000 [#1] SMP 
[  938.128622] Modules linked in: zram
[  938.130128] CPU: 1 PID: 3424 Comm: rsync Tainted: G     U          4.4.0-rc7-GTW+ #1
[  938.131647] Hardware name: ASUS All Series/SABERTOOTH Z97 MARK 1, BIOS 2702 10/27/2015
[  938.133170] task: ffff8807f1126600 ti: ffff88080d2f0000 task.ti: ffff88080d2f0000
[  938.134692] RIP: 0010:[<ffffffff813dfa75>]  [<ffffffff813dfa75>] bio_split+0x65/0x70
[  938.136227] RSP: 0018:ffff88080d2f3a18  EFLAGS: 00010246
[  938.137753] RAX: 0000000000000000 RBX: 0000000000001000 RCX: ffff880819d18180
[  938.139281] RDX: 0000000002400000 RSI: 0000000000000000 RDI: ffff88078bf7ccc0
[  938.140787] RBP: ffff88080d2f3aa0 R08: ffff88081b740800 R09: 0000000000000004
[  938.142281] R10: ffff88078bf7ccc0 R11: 0000000000000000 R12: 000000000002b000
[  938.143761] R13: 0000000000000000 R14: 0000000000000015 R15: ffff880814b2d400
[  938.145239] FS:  00007f92b78ca700(0000) GS:ffff88083fa40000(0000) knlGS:0000000000000000
[  938.146724] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  938.148209] CR2: 000000000243c288 CR3: 00000007e26e7000 CR4: 00000000001406e0
[  938.149696] Stack:
[  938.151173]  ffffffff813ed34a ffff880819d18180 ffffe8ffffc41f00 ffff88080d2f3a58
[  938.152682]  ffff88078bf7ccc0 ffff88080d2f3ac8 00000000000dbafc 0000000000000000
[  938.154197]  0000000000000000 ffff88078bf7ccc0 ffff880818fbccc0 000000000001013d
[  938.155718] Call Trace:
[  938.157227]  [<ffffffff813ed34a>] ? blk_queue_split+0x22a/0x490
[  938.158761]  [<ffffffff813f29fc>] blk_mq_make_request+0x5c/0x390
[  938.160294]  [<ffffffff8128bdad>] ? do_mpage_readpage+0x42d/0x6e0
[  938.161822]  [<ffffffff813e71f3>] generic_make_request+0xd3/0x180
[  938.163345]  [<ffffffff813e7307>] submit_bio+0x67/0x140
[  938.164872]  [<ffffffff8128c19a>] mpage_readpages+0x13a/0x160
[  938.166402]  [<ffffffff8132b610>] ? fat_detach+0xd0/0xd0
[  938.167934]  [<ffffffff8132b610>] ? fat_detach+0xd0/0xd0
[  938.169456]  [<ffffffff8123876c>] ? alloc_pages_current+0x8c/0x110
[  938.170984]  [<ffffffff8132b84d>] fat_readpages+0x1d/0x20
[  938.172497]  [<ffffffff811fb8c8>] __do_page_cache_readahead+0x168/0x200
[  938.174001]  [<ffffffff811fba30>] ondemand_readahead+0xd0/0x250
[  938.175503]  [<ffffffff811fbd9e>] page_cache_sync_readahead+0x2e/0x50
[  938.177015]  [<ffffffff811f043f>] generic_file_read_iter+0x46f/0x570
[  938.178533]  [<ffffffff811fd4b7>] ? lru_cache_add_active_or_unevictable+0x27/0x80
[  938.180060]  [<ffffffff8121b644>] ? handle_mm_fault+0xe04/0x1440
[  938.181585]  [<ffffffff81250dc7>] __vfs_read+0xa7/0xd0
[  938.183101]  [<ffffffff81251566>] vfs_read+0x86/0x130
[  938.184612]  [<ffffffff81252216>] SyS_read+0x46/0xb0
[  938.186115]  [<ffffffff819de3b6>] entry_SYSCALL_64_fastpath+0x16/0x75
[  938.187618] Code: 4d 85 ed 74 12 44 89 e6 48 89 df c1 e6 09 41 89 75 28 e8 bf f1 ff ff 5b 4c 89 e8 41 5c 41 5d 5d c3 e8 b0 fc ff ff 49 89 c5 eb d5 <0f> 0b 0f 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 
[  938.189260] RIP  [<ffffffff813dfa75>] bio_split+0x65/0x70
[  938.190843]  RSP <ffff88080d2f3a18>
[  938.211401] ---[ end trace 1f58ea74114814ec ]---
[  938.212799] ------------[ cut here ]------------
[  938.212801] WARNING: CPU: 1 PID: 3424 at kernel/exit.c:661 do_exit+0x50/0xac0()
[  938.212802] Modules linked in: zram
[  938.212803] CPU: 1 PID: 3424 Comm: rsync Tainted: G     UD         4.4.0-rc7-GTW+ #1
[  938.212804] Hardware name: ASUS All Series/SABERTOOTH Z97 MARK 1, BIOS 2702 10/27/2015
[  938.212805]  ffffffff81c2030d ffff88080d2f3738 ffffffff8140eaa4 0000000000000000
[  938.212806]  ffff88080d2f3770 ffffffff8111c662 ffff8807f1126600 000000000000000b
[  938.212807]  ffff88080d2f3968 0000000000000000 0000000000000000 ffff88080d2f3780
[  938.212808] Call Trace:
[  938.212810]  [<ffffffff8140eaa4>] dump_stack+0x44/0x60
[  938.212812]  [<ffffffff8111c662>] warn_slowpath_common+0x82/0xc0
[  938.212813]  [<ffffffff8111c75a>] warn_slowpath_null+0x1a/0x20
[  938.212814]  [<ffffffff8111de00>] do_exit+0x50/0xac0
[  938.212816]  [<ffffffff81053ac1>] oops_end+0xa1/0xd0
[  938.212817]  [<ffffffff81053c1b>] die+0x4b/0x70
[  938.212818]  [<ffffffff81050f91>] do_trap+0xb1/0x140
[  938.212819]  [<ffffffff81051097>] do_error_trap+0x77/0xe0
[  938.212820]  [<ffffffff813dfa75>] ? bio_split+0x65/0x70
[  938.212823]  [<ffffffff812824ea>] ? __find_get_block+0xaa/0x100
[  938.212824]  [<ffffffff811f1265>] ? mempool_alloc_slab+0x15/0x20
[  938.212825]  [<ffffffff811f1265>] ? mempool_alloc_slab+0x15/0x20
[  938.212826]  [<ffffffff81051370>] do_invalid_op+0x20/0x30
[  938.212827]  [<ffffffff819dfdae>] invalid_op+0x1e/0x30
[  938.212828]  [<ffffffff813dfa75>] ? bio_split+0x65/0x70
[  938.212830]  [<ffffffff813ef900>] ? __blk_mq_alloc_request+0xe0/0x1e0
[  938.212831]  [<ffffffff813ed34a>] ? blk_queue_split+0x22a/0x490
[  938.212832]  [<ffffffff813f29fc>] blk_mq_make_request+0x5c/0x390
[  938.212833]  [<ffffffff8128bdad>] ? do_mpage_readpage+0x42d/0x6e0
[  938.212835]  [<ffffffff813e71f3>] generic_make_request+0xd3/0x180
[  938.212836]  [<ffffffff813e7307>] submit_bio+0x67/0x140
[  938.212837]  [<ffffffff8128c19a>] mpage_readpages+0x13a/0x160
[  938.212838]  [<ffffffff8132b610>] ? fat_detach+0xd0/0xd0
[  938.212839]  [<ffffffff8132b610>] ? fat_detach+0xd0/0xd0
[  938.212841]  [<ffffffff8123876c>] ? alloc_pages_current+0x8c/0x110
[  938.212842]  [<ffffffff8132b84d>] fat_readpages+0x1d/0x20
[  938.212844]  [<ffffffff811fb8c8>] __do_page_cache_readahead+0x168/0x200
[  938.212845]  [<ffffffff811fba30>] ondemand_readahead+0xd0/0x250
[  938.212846]  [<ffffffff811fbd9e>] page_cache_sync_readahead+0x2e/0x50
[  938.212847]  [<ffffffff811f043f>] generic_file_read_iter+0x46f/0x570
[  938.212848]  [<ffffffff811fd4b7>] ? lru_cache_add_active_or_unevictable+0x27/0x80
[  938.212850]  [<ffffffff8121b644>] ? handle_mm_fault+0xe04/0x1440
[  938.212851]  [<ffffffff81250dc7>] __vfs_read+0xa7/0xd0
[  938.212852]  [<ffffffff81251566>] vfs_read+0x86/0x130
[  938.212853]  [<ffffffff81252216>] SyS_read+0x46/0xb0
[  938.212854]  [<ffffffff819de3b6>] entry_SYSCALL_64_fastpath+0x16/0x75
Comment 1 Greg White 2016-01-01 15:36:38 UTC
The block device in question is an Intel 750 NVME SSD:

02:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Intel Corporation Device 370d
        Flags: bus master, fast devsel, latency 0, NUMA node 0
        Memory at dfe10000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at dfe00000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI-X: Enable+ Count=32 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Virtual Channel
        Capabilities: [180] Power Budgeting <?>
        Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [270] Device Serial Number 55-cd-2e-41-4c-88-d1-e8
        Capabilities: [2a0] #19
        Kernel driver in use: nvme
Comment 2 Greg White 2016-01-01 15:47:41 UTC
Reverting that single commit seems to fix the problem with mainline.  I have what seems to be a consistent way to reproduce this (building the kernel, aptly enough.)
Comment 3 Jens Axboe 2016-01-02 02:41:07 UTC
Thanks, we'll take a look.
Comment 4 Jens Axboe 2016-01-02 02:56:09 UTC
Created attachment 198601 [details]
Segment split patch

Can you try this patch?
Comment 5 Greg White 2016-01-02 03:06:05 UTC
Thanks.

It no longer seems to reproduce with that patch applied.
Comment 6 Keith Busch 2016-01-03 16:15:55 UTC
Created attachment 198681 [details]
Fix for split on first bio vector

Thanks for the catch. This fails xfstests as well.

I have an alternative proposal attached to fix that still splits the command. It's preferable for performance with this hardware that such commands are split.
Comment 7 Keith Busch 2016-01-03 16:17:57 UTC
Created attachment 198691 [details]
Re-attaching as a patch.
Comment 8 Greg White 2016-01-03 16:21:45 UTC
Retested with patch #2.  This also seems to work.
Comment 9 Keith Busch 2016-01-04 15:29:17 UTC
Great, thanks!

I'll sync with Jens this week to see which route to go. I recommend mine for a couple reasons. A bio can be split in the middle of a vector, so might as well use the preferred alignment instead of requiring the driver accept the entire vector. And I think there's an issue in Jens' (perhaps only in theory) if the first bio vector's length is greater than the h/w's max transfer size.
Comment 10 Keith Busch 2016-01-04 16:51:57 UTC
I think there's potential for my patch to report the wrong segment count. I'll fix that up and resend to the mailing list after a successful xfstests.
Comment 11 Jens Axboe 2016-01-04 22:46:29 UTC
Keith, your approach is the best one, for sure. Let me know when you have the segment part tested, and I can queue up the fix.
Comment 12 Keith Busch 2016-01-04 22:53:10 UTC
Created attachment 198751 [details]
patch submitted to list

This one passed xfstests that was failing before.

The previous patch passed too, but I think that was more coincidence: we still need to split SG page gaps, which wasn't taken into account before.
Comment 13 Jens Axboe 2016-01-04 22:54:22 UTC
Thanks, I saw that right after writing here. Looks good to me, queued up.
Comment 14 Marcos Souza 2019-08-09 02:11:25 UTC
Closing as fixed.

Note You need to log in before you can comment on or make changes to this bug.