Bug 110771 - bcache: Panic when writing data to backing device (bisected)
Summary: bcache: Panic when writing data to backing device (bisected)
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Michael Lyle
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-13 22:28 UTC by miuvlad
Modified: 2020-02-15 17:16 UTC (History)
4 users (show)

See Also:
Kernel Version: 4.4.0
Subsystem:
Regression: No
Bisected commit-id:


Attachments
debug patch for checking source bio's segments (412 bytes, patch)
2016-01-15 08:21 UTC, Lei Ming
Details | Diff

Description miuvlad 2016-01-13 22:28:59 UTC
Hello,

I'm using a caching SSD with a backing RAID1 HDDs, writeback mode, formatted as ext4
In order to trigger this bug, fill the cache with some dirty data.
I use writeback_percent at it's default (10) and writeback_rate at 512.
When there is some dirty data, wait for the bcache_writeback to start writing it to the backing device (0-3 minutes)
Then, this happens:

[   83.436844] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[   83.437025] IP: [<ffffffff81309ce5>] bio_trim+0x15/0xe0
[   83.437128] PGD 0
[   83.437227] Oops: 0000 [#1] SMP
[   83.437373] Modules linked in:
[   83.437472] CPU: 0 PID: 3160 Comm: kworker/0:3 Not tainted 4.4.0-bcache+ #2
[   83.437524] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H77 Pro4-M, BIOS P1.60 09/20/2012
[   83.437580] Workqueue: events write_dirty
[   83.437679] task: ffff88040ae6d240 ti: ffff88040b814000 task.ti: ffff88040b814000
[   83.437731] RIP: 0010:[<ffffffff81309ce5>]  [<ffffffff81309ce5>] bio_trim+0x15/0xe0
[   83.437833] RSP: 0018:ffff88040b817c68  EFLAGS: 00010246
[   83.437884] RAX: ffff88040b852038 RBX: 0000000000000001 RCX: 000000000000560b
[   83.437936] RDX: 0000000000113000 RSI: 0000000000000000 RDI: 0000000000000000
[   83.437988] RBP: ffff88040b817c88 R08: 0000000000000000 R09: ffffffff8114d492
[   83.438041] R10: ffff88041ec182e0 R11: ffffea00102e37c0 R12: 0000000000000000
[   83.438093] R13: 0000000000113000 R14: ffff88040a00a500 R15: ffff880037b5a000
[   83.438145] FS:  0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000
[   83.438361] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.438497] CR2: 0000000000000028 CR3: 000000000198e000 CR4: 00000000001406f0
[   83.438679] Stack:
[   83.438865]  0000000000000001 0000000000000000 0000000000000000 ffff88040a00a500
[   83.439286]  ffff88040b817d60 ffffffff815822af ffff88040c82c9e8 ffff88040af11000
[   83.439707]  ffff88040c82c9e8 ffff880037b5a070 0000000000003000 ffff880400000898
[   83.440128] Call Trace:
[   83.440270]  [<ffffffff815822af>] make_request+0x47f/0xca0
[   83.440409]  [<ffffffff815a5d3d>] md_make_request+0xdd/0x220
[   83.440543]  [<ffffffff810c53ce>] ? pick_next_task_fair+0x12e/0x450
[   83.440676]  [<ffffffff81311d2e>] generic_make_request+0xce/0x1b0
[   83.440809]  [<ffffffff815a1b70>] write_dirty+0x60/0xb0
[   83.440944]  [<ffffffff810aafa7>] process_one_work+0x147/0x3d0
[   83.441077]  [<ffffffff810ab546>] worker_thread+0x46/0x440
[   83.441212]  [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0
[   83.441396]  [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0
[   83.441602]  [<ffffffff810afeb4>] kthread+0xc4/0xe0
[   83.441785]  [<ffffffff810afdf0>] ? kthread_park+0x50/0x50
[   83.441970]  [<ffffffff8170c85f>] ret_from_fork+0x3f/0x70
[   83.442154]  [<ffffffff810afdf0>] ? kthread_park+0x50/0x50
[   83.442336] Code: bb d8 ff c6 05 c0 b1 75 00 01 e9 7c ff ff ff 0f 1f 80 00 00 00 00 55 c1 e2 09 85 f6 48 89 e5 41 56 41 55 41 89 d5 41 54 53 75 05 <3b> 57 28 74 26 83 67 10 fd 41 89 f4 81 e6 ff ff 7f 00 48 89 fb
[   83.446617] RIP  [<ffffffff81309ce5>] bio_trim+0x15/0xe0
[   83.446853]  RSP <ffff88040b817c68>
[   83.447028] CR2: 0000000000000028
[   83.447203] ---[ end trace ea4f5743ef3a93f1 ]---
[   83.447401] BUG: unable to handle kernel paging request at ffffffffffffffd8
[   83.447696] IP: [<ffffffff810b059c>] kthread_data+0xc/0x20
[   83.447697] PGD 198f067 PUD 1991067 PMD 0
[   83.447698] Oops: 0000 [#2] SMP
[   83.447698] Modules linked in:
[   83.447699] CPU: 0 PID: 3160 Comm: kworker/0:3 Tainted: G      D         4.4.0-bcache+ #2
[   83.447700] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H77 Pro4-M, BIOS P1.60 09/20/2012
[   83.447705] task: ffff88040ae6d240 ti: ffff88040b814000 task.ti: ffff88040b814000
[   83.447706] RIP: 0010:[<ffffffff810b059c>]  [<ffffffff810b059c>] kthread_data+0xc/0x20
[   83.447707] RSP: 0018:ffff88040b817980  EFLAGS: 00010002
[   83.447707] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff81bad900
[   83.447708] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88040ae6d240
[   83.447708] RBP: ffff88040b817990 R08: 0000000000000000 R09: ffffffff81316701
[   83.447708] R10: 0000000000003800 R11: ffffea0010295300 R12: ffff88041ec15180
[   83.447709] R13: 0000000000000000 R14: ffff88040ae6d240 R15: 0000000000000000
[   83.447709] FS:  0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000
[   83.447709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   83.447710] CR2: 0000000000000028 CR3: 000000000198e000 CR4: 00000000001406f0
[   83.447710] Stack:
[   83.447711]  ffffffff810abb1c 0000000000015180 ffff88040b8179e8 ffffffff81708d06
[   83.447712]  0000000000000000 ffff88040ae6d240 ffff88040a54c760 ffff88040b8179e8
[   83.447712]  ffff88040b818000 ffff88040b817a38 ffff88040b8176c8 0000000000000001
[   83.447713] Call Trace:
[   83.447715]  [<ffffffff810abb1c>] ? wq_worker_sleeping+0xc/0x90
[   83.447716]  [<ffffffff81708d06>] __schedule+0x746/0x960
[   83.447717]  [<ffffffff81708f50>] schedule+0x30/0x80
[   83.447719]  [<ffffffff8109813a>] do_exit+0x6fa/0xa80
[   83.447721]  [<ffffffff8104fd08>] oops_end+0x68/0x90
[   83.447723]  [<ffffffff810893c4>] no_context+0x104/0x360
[   83.447725]  [<ffffffff810bef00>] ? select_task_rq_fair+0x6c0/0x7c0
[   83.447726]  [<ffffffff81089690>] __bad_area_nosemaphore+0x70/0x1f0
[   83.447728]  [<ffffffff8108981e>] bad_area_nosemaphore+0xe/0x10
[   83.447729]  [<ffffffff81089a6d>] __do_page_fault+0x7d/0x3c0
[   83.447731]  [<ffffffff8114d470>] ? mempool_alloc_slab+0x10/0x20
[   83.447731]  [<ffffffff8114d552>] ? mempool_alloc+0x52/0x130
[   83.447732]  [<ffffffff81089dbc>] do_page_fault+0xc/0x10
[   83.447734]  [<ffffffff8170e1c2>] page_fault+0x22/0x30
[   83.447735]  [<ffffffff8114d492>] ? mempool_free_slab+0x12/0x20
[   83.447736]  [<ffffffff81309ce5>] ? bio_trim+0x15/0xe0
[   83.447738]  [<ffffffff815822af>] make_request+0x47f/0xca0
[   83.447740]  [<ffffffff815a5d3d>] md_make_request+0xdd/0x220
[   83.447741]  [<ffffffff810c53ce>] ? pick_next_task_fair+0x12e/0x450
[   83.447743]  [<ffffffff81311d2e>] generic_make_request+0xce/0x1b0
[   83.447746]  [<ffffffff815a1b70>] write_dirty+0x60/0xb0
[   83.447747]  [<ffffffff810aafa7>] process_one_work+0x147/0x3d0
[   83.447748]  [<ffffffff810ab546>] worker_thread+0x46/0x440
[   83.447749]  [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0
[   83.447750]  [<ffffffff810ab500>] ? rescuer_thread+0x2d0/0x2d0
[   83.447751]  [<ffffffff810afeb4>] kthread+0xc4/0xe0
[   83.447751]  [<ffffffff810afdf0>] ? kthread_park+0x50/0x50
[   83.447753]  [<ffffffff8170c85f>] ret_from_fork+0x3f/0x70
[   83.447753]  [<ffffffff810afdf0>] ? kthread_park+0x50/0x50
[   83.447762] Code: 91 73 81 89 f0 48 c1 e0 03 48 29 c2 48 89 d6 ba 02 00 00 00 e8 56 fa ff ff 5d c3 0f 1f 40 00 48 8b 87 e8 03 00 00 55 48 89 e5 5d <48> 8b 40 d8 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55
[   83.447763] RIP  [<ffffffff810b059c>] kthread_data+0xc/0x20
[   83.447763]  RSP <ffff88040b817980>
[   83.447763] CR2: ffffffffffffffd8
[   83.447764] ---[ end trace ea4f5743ef3a93f2 ]---
[   83.447764] Fixing recursive fault but reboot is needed!

In this version I got the "stability patches" for bcache from linux-next, but the problem was present in stock 4.4.0.
This is the bisect I have done and reverting commit 578270bfbd2803dc7b0b03fbc2ac119efbc73195 does fix the problem.

git bisect start
# good: [6a13feb9c82803e2b815eca72fa7a9f5561d7861] Linux 4.3
git bisect good 6a13feb9c82803e2b815eca72fa7a9f5561d7861
# bad: [afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc] Linux 4.4
git bisect bad afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc
# good: [f66477a0aeb77f97a7de5f791700dadc42f3f792] Merge tag 'clk-for-linus-20151104' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
git bisect good f66477a0aeb77f97a7de5f791700dadc42f3f792
# good: [56e0464980febfa50432a070261579415c72664e] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 56e0464980febfa50432a070261579415c72664e
# good: [34258a32d9a9fc9e38fb549efe1692301cc31f85] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect good 34258a32d9a9fc9e38fb549efe1692301cc31f85
# bad: [fb7b26e47ef932a21f0cac4cf04e8f51105d77d7] Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad fb7b26e47ef932a21f0cac4cf04e8f51105d77d7
# bad: [c64410f3eca037f831e9db7cf880e600acc841af] Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect bad c64410f3eca037f831e9db7cf880e600acc841af
# good: [0ec7dc8d19683ae292a4866c155824b647226103] Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
git bisect good 0ec7dc8d19683ae292a4866c155824b647226103
# bad: [dcd8376c369fa8fde8269e721b14f50475dd397b] Revert "blk-flush: Queue through IO scheduler when flush not required"
git bisect bad dcd8376c369fa8fde8269e721b14f50475dd397b
# good: [dcfeda9d5f52bf43ae4297d8102a76d88b778d93] Merge tag 'tty-4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good dcfeda9d5f52bf43ae4297d8102a76d88b778d93
# good: [8aeea03195ee6e33fcb00039c414eabfc37a4eb8] mtip32xx: use formatting capability of kthread_create_on_node
git bisect good 8aeea03195ee6e33fcb00039c414eabfc37a4eb8
# bad: [4ce01c518ebf6e3818abf5f4a9e1a1ef9d52f232] Merge branch 'for-linus' of git://git.kernel.dk/linux-block
git bisect bad 4ce01c518ebf6e3818abf5f4a9e1a1ef9d52f232
# good: [1ec218373b8ebda821aec00bb156a9c94fad9cd4] Linux 4.4-rc2
git bisect good 1ec218373b8ebda821aec00bb156a9c94fad9cd4
# bad: [02e707424c2eadbcda68cd38876c9f4434ca8e1a] blk-merge: fix blk_bio_segment_split
git bisect bad 02e707424c2eadbcda68cd38876c9f4434ca8e1a
# good: [b094f89ca42fbb8ce40174d5f85ca8430e499da6] blk-mq: fix calling unplug callbacks with preempt disabled
git bisect good b094f89ca42fbb8ce40174d5f85ca8430e499da6
# bad: [578270bfbd2803dc7b0b03fbc2ac119efbc73195] block: fix segment split
git bisect bad 578270bfbd2803dc7b0b03fbc2ac119efbc73195
# first bad commit: [578270bfbd2803dc7b0b03fbc2ac119efbc73195] block: fix segment split
Comment 1 Lei Ming 2016-01-15 08:21:34 UTC
Created attachment 199711 [details]
debug patch for checking source bio's segments
Comment 2 Lei Ming 2016-01-15 08:23:47 UTC
The bisected commit should be a false positive, and it is simple and correct.

0028 should be offset of 'bio->bi_iter.bi_size', so looks 'mbio' cloned from
bio_clone_mddev() is NULL, could you test the attached patch? Then reproduce
and post the log?

Thanks,
Comment 3 Coly Li 2020-02-15 17:16:34 UTC
Does this problem still show up in Linux v5.5 or v5.6-rc ?

Thanks.

Coly Li

Note You need to log in before you can comment on or make changes to this bug.