Hello, I'm using a caching SSD with a backing RAID1 HDDs, writeback mode, formatted as ext4 In order to trigger this bug, fill the cache with some dirty data. I use writeback_percent at it's default (10) and writeback_rate at 512. When there is some dirty data, wait for the bcache_writeback to start writing it to the backing device (0-3 minutes) Then, this happens: [ 83.436844] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 [ 83.437025] IP: [<ffffffff81309ce5>] bio_trim+0x15/0xe0 [ 83.437128] PGD 0 [ 83.437227] Oops: 0000 [#1] SMP [ 83.437373] Modules linked in: [ 83.437472] CPU: 0 PID: 3160 Comm: kworker/0:3 Not tainted 4.4.0-bcache+ #2 [ 83.437524] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H77 Pro4-M, BIOS P1.60 09/20/2012 [ 83.437580] Workqueue: events write_dirty [ 83.437679] task: ffff88040ae6d240 ti: ffff88040b814000 task.ti: ffff88040b814000 [ 83.437731] RIP: 0010:[<ffffffff81309ce5>] [<ffffffff81309ce5>] bio_trim+0x15/0xe0 [ 83.437833] RSP: 0018:ffff88040b817c68 EFLAGS: 00010246 [ 83.437884] RAX: ffff88040b852038 RBX: 0000000000000001 RCX: 000000000000560b [ 83.437936] RDX: 0000000000113000 RSI: 0000000000000000 RDI: 0000000000000000 [ 83.437988] RBP: ffff88040b817c88 R08: 0000000000000000 R09: ffffffff8114d492 [ 83.438041] R10: ffff88041ec182e0 R11: ffffea00102e37c0 R12: 0000000000000000 [ 83.438093] R13: 0000000000113000 R14: ffff88040a00a500 R15: ffff880037b5a000 [ 83.438145] FS: 0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000 [ 83.438361] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 83.438497] CR2: 0000000000000028 CR3: 000000000198e000 CR4: 00000000001406f0 [ 83.438679] Stack: [ 83.438865] 0000000000000001 0000000000000000 0000000000000000 ffff88040a00a500 [ 83.439286] ffff88040b817d60 ffffffff815822af ffff88040c82c9e8 ffff88040af11000 [ 83.439707] ffff88040c82c9e8 ffff880037b5a070 0000000000003000 ffff880400000898 [ 83.440128] Call Trace: [ 83.440270] [<ffffffff815822af>] make_request+0x47f/0xca0 [ 83.440409] [<ffffffff815a5d3d>] md_make_request+0xdd/0x220 [ 83.440543] [<ffffffff810c53ce>] ? pick_next_task_fair+0x12e/0x450 [ 83.440676] [<ffffffff81311d2e>] generic_make_request+0xce/0x1b0 [ 83.440809] [<ffffffff815a1b70>] write_dirty+0x60/0xb0 [ 83.440944] [<ffffffff810aafa7>] process_one_work+0x147/0x3d0 [ 83.441077] [<ffffffff810ab546>] worker_thread+0x46/0x440 [ 83.441212] [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0 [ 83.441396] [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0 [ 83.441602] [<ffffffff810afeb4>] kthread+0xc4/0xe0 [ 83.441785] [<ffffffff810afdf0>] ? kthread_park+0x50/0x50 [ 83.441970] [<ffffffff8170c85f>] ret_from_fork+0x3f/0x70 [ 83.442154] [<ffffffff810afdf0>] ? kthread_park+0x50/0x50 [ 83.442336] Code: bb d8 ff c6 05 c0 b1 75 00 01 e9 7c ff ff ff 0f 1f 80 00 00 00 00 55 c1 e2 09 85 f6 48 89 e5 41 56 41 55 41 89 d5 41 54 53 75 05 <3b> 57 28 74 26 83 67 10 fd 41 89 f4 81 e6 ff ff 7f 00 48 89 fb [ 83.446617] RIP [<ffffffff81309ce5>] bio_trim+0x15/0xe0 [ 83.446853] RSP <ffff88040b817c68> [ 83.447028] CR2: 0000000000000028 [ 83.447203] ---[ end trace ea4f5743ef3a93f1 ]--- [ 83.447401] BUG: unable to handle kernel paging request at ffffffffffffffd8 [ 83.447696] IP: [<ffffffff810b059c>] kthread_data+0xc/0x20 [ 83.447697] PGD 198f067 PUD 1991067 PMD 0 [ 83.447698] Oops: 0000 [#2] SMP [ 83.447698] Modules linked in: [ 83.447699] CPU: 0 PID: 3160 Comm: kworker/0:3 Tainted: G D 4.4.0-bcache+ #2 [ 83.447700] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H77 Pro4-M, BIOS P1.60 09/20/2012 [ 83.447705] task: ffff88040ae6d240 ti: ffff88040b814000 task.ti: ffff88040b814000 [ 83.447706] RIP: 0010:[<ffffffff810b059c>] [<ffffffff810b059c>] kthread_data+0xc/0x20 [ 83.447707] RSP: 0018:ffff88040b817980 EFLAGS: 00010002 [ 83.447707] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81bad900 [ 83.447708] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040ae6d240 [ 83.447708] RBP: ffff88040b817990 R08: 0000000000000000 R09: ffffffff81316701 [ 83.447708] R10: 0000000000003800 R11: ffffea0010295300 R12: ffff88041ec15180 [ 83.447709] R13: 0000000000000000 R14: ffff88040ae6d240 R15: 0000000000000000 [ 83.447709] FS: 0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000 [ 83.447709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 83.447710] CR2: 0000000000000028 CR3: 000000000198e000 CR4: 00000000001406f0 [ 83.447710] Stack: [ 83.447711] ffffffff810abb1c 0000000000015180 ffff88040b8179e8 ffffffff81708d06 [ 83.447712] 0000000000000000 ffff88040ae6d240 ffff88040a54c760 ffff88040b8179e8 [ 83.447712] ffff88040b818000 ffff88040b817a38 ffff88040b8176c8 0000000000000001 [ 83.447713] Call Trace: [ 83.447715] [<ffffffff810abb1c>] ? wq_worker_sleeping+0xc/0x90 [ 83.447716] [<ffffffff81708d06>] __schedule+0x746/0x960 [ 83.447717] [<ffffffff81708f50>] schedule+0x30/0x80 [ 83.447719] [<ffffffff8109813a>] do_exit+0x6fa/0xa80 [ 83.447721] [<ffffffff8104fd08>] oops_end+0x68/0x90 [ 83.447723] [<ffffffff810893c4>] no_context+0x104/0x360 [ 83.447725] [<ffffffff810bef00>] ? select_task_rq_fair+0x6c0/0x7c0 [ 83.447726] [<ffffffff81089690>] __bad_area_nosemaphore+0x70/0x1f0 [ 83.447728] [<ffffffff8108981e>] bad_area_nosemaphore+0xe/0x10 [ 83.447729] [<ffffffff81089a6d>] __do_page_fault+0x7d/0x3c0 [ 83.447731] [<ffffffff8114d470>] ? mempool_alloc_slab+0x10/0x20 [ 83.447731] [<ffffffff8114d552>] ? mempool_alloc+0x52/0x130 [ 83.447732] [<ffffffff81089dbc>] do_page_fault+0xc/0x10 [ 83.447734] [<ffffffff8170e1c2>] page_fault+0x22/0x30 [ 83.447735] [<ffffffff8114d492>] ? mempool_free_slab+0x12/0x20 [ 83.447736] [<ffffffff81309ce5>] ? bio_trim+0x15/0xe0 [ 83.447738] [<ffffffff815822af>] make_request+0x47f/0xca0 [ 83.447740] [<ffffffff815a5d3d>] md_make_request+0xdd/0x220 [ 83.447741] [<ffffffff810c53ce>] ? pick_next_task_fair+0x12e/0x450 [ 83.447743] [<ffffffff81311d2e>] generic_make_request+0xce/0x1b0 [ 83.447746] [<ffffffff815a1b70>] write_dirty+0x60/0xb0 [ 83.447747] [<ffffffff810aafa7>] process_one_work+0x147/0x3d0 [ 83.447748] [<ffffffff810ab546>] worker_thread+0x46/0x440 [ 83.447749] [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0 [ 83.447750] [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0 [ 83.447751] [<ffffffff810afeb4>] kthread+0xc4/0xe0 [ 83.447751] [<ffffffff810afdf0>] ? kthread_park+0x50/0x50 [ 83.447753] [<ffffffff8170c85f>] ret_from_fork+0x3f/0x70 [ 83.447753] [<ffffffff810afdf0>] ? kthread_park+0x50/0x50 [ 83.447762] Code: 91 73 81 89 f0 48 c1 e0 03 48 29 c2 48 89 d6 ba 02 00 00 00 e8 56 fa ff ff 5d c3 0f 1f 40 00 48 8b 87 e8 03 00 00 55 48 89 e5 5d <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 [ 83.447763] RIP [<ffffffff810b059c>] kthread_data+0xc/0x20 [ 83.447763] RSP <ffff88040b817980> [ 83.447763] CR2: ffffffffffffffd8 [ 83.447764] ---[ end trace ea4f5743ef3a93f2 ]--- [ 83.447764] Fixing recursive fault but reboot is needed! In this version I got the "stability patches" for bcache from linux-next, but the problem was present in stock 4.4.0. This is the bisect I have done and reverting commit 578270bfbd2803dc7b0b03fbc2ac119efbc73195 does fix the problem. git bisect start # good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3 git bisect good 6a13feb9c82803e2b815eca72fa7a9f5561d7861 # bad: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4 git bisect bad afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc # good: [f66477a0aeb77f97a7de5f791700dadc42f3f792] Merge tag 'clk-for-linus-20151104' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux git bisect good f66477a0aeb77f97a7de5f791700dadc42f3f792 # good: [56e0464980febfa50432a070261579415c72664e] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good 56e0464980febfa50432a070261579415c72664e # good: [34258a32d9a9fc9e38fb549efe1692301cc31f85] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux git bisect good 34258a32d9a9fc9e38fb549efe1692301cc31f85 # bad: [fb7b26e47ef932a21f0cac4cf04e8f51105d77d7] Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad fb7b26e47ef932a21f0cac4cf04e8f51105d77d7 # bad: [c64410f3eca037f831e9db7cf880e600acc841af] Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect bad c64410f3eca037f831e9db7cf880e600acc841af # good: [0ec7dc8d19683ae292a4866c155824b647226103] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus git bisect good 0ec7dc8d19683ae292a4866c155824b647226103 # bad: [dcd8376c369fa8fde8269e721b14f50475dd397b] Revert "blk-flush: Queue through IO scheduler when flush not required" git bisect bad dcd8376c369fa8fde8269e721b14f50475dd397b # good: [dcfeda9d5f52bf43ae4297d8102a76d88b778d93] Merge tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty git bisect good dcfeda9d5f52bf43ae4297d8102a76d88b778d93 # good: [8aeea03195ee6e33fcb00039c414eabfc37a4eb8] mtip32xx: use formatting capability of kthread_create_on_node git bisect good 8aeea03195ee6e33fcb00039c414eabfc37a4eb8 # bad: [4ce01c518ebf6e3818abf5f4a9e1a1ef9d52f232] Merge branch 'for-linus' of git://git.kernel.dk/linux-block git bisect bad 4ce01c518ebf6e3818abf5f4a9e1a1ef9d52f232 # good: [1ec218373b8ebda821aec00bb156a9c94fad9cd4] Linux 4.4-rc2 git bisect good 1ec218373b8ebda821aec00bb156a9c94fad9cd4 # bad: [02e707424c2eadbcda68cd38876c9f4434ca8e1a] blk-merge: fix blk_bio_segment_split git bisect bad 02e707424c2eadbcda68cd38876c9f4434ca8e1a # good: [b094f89ca42fbb8ce40174d5f85ca8430e499da6] blk-mq: fix calling unplug callbacks with preempt disabled git bisect good b094f89ca42fbb8ce40174d5f85ca8430e499da6 # bad: [578270bfbd2803dc7b0b03fbc2ac119efbc73195] block: fix segment split git bisect bad 578270bfbd2803dc7b0b03fbc2ac119efbc73195 # first bad commit: [578270bfbd2803dc7b0b03fbc2ac119efbc73195] block: fix segment split
Created attachment 199711 [details] debug patch for checking source bio's segments
The bisected commit should be a false positive, and it is simple and correct. 0028 should be offset of 'bio->bi_iter.bi_size', so looks 'mbio' cloned from bio_clone_mddev() is NULL, could you test the attached patch? Then reproduce and post the log? Thanks,
Does this problem still show up in Linux v5.5 or v5.6-rc ? Thanks. Coly Li