Bug 207811

Summary: Regression: unable to handle kernel NULL pointer dereference (create_empty_buffers) on bcache-backed mdraid
Product: IO/Storage Reporter: Ryan Finnie (ryan)
Component: Block LayerAssignee: Coly Li (colyli)
Status: NEW ---    
Severity: normal CC: mauricio.foliveira, ryan
Priority: P1    
Hardware: x86-64   
OS: Linux   
URL: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867916
Kernel Version: 5.7rc6 Subsystem:
Regression: Yes Bisected commit-id:

Description Ryan Finnie 2020-05-21 02:31:06 UTC
Downstream bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1867916

The downstream bug involves regression from Ubuntu's 4.15.0-88 to 4.15.0-91, but I've determined it also affects Ubuntu's 5.4 kernel line, as well as HEAD (tl;dr: since commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15, which was backported to Ubuntu's 4.15.0-91).

The relevant part of my block setup:

-> sd{c,d,f,g,h}: each 4TB gpt, sd{c,d,f,g,h}1: each type linux_raid_member
--> md0: raid6, sd{c,d,f,g,h}1
--> sda: 512GB gpt, sda1: type bcache
---> bcache0: md0 + sda1
----> whatadisk_crypt: LUKS on bcache0
-----> LVM VG whatadisk: whatadisk_crypt
------> multiple LVs

When bcache0 is attempted to be activated, I get:

[ 194.444436] bcache: bch_journal_replay() journal replay done, 3 keys in 6 entries, seq 23285862
[ 194.444622] bcache: register_cache() registered cache device sdb1
[ 194.448381] bcache: register_bdev() registered backing device md0
[ 194.602075] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 194.602100] IP: create_empty_buffers+0x29/0xf0
[ 194.602110] PGD 0 P4D 0
[ 194.602121] Oops: 0000 [#1] SMP NOPTI
[ 194.602137] Modules linked in: bcache ebtable_filter ebtables ip6table_filter ip6_tables devlink iptable_filter pps_ldisc aufs overlay cmac bnep bonding bridge stp llc arc4 snd_hda_codec_hdmi nls_iso8859_1 edac_mce_amd kvm_amd kvm irqbypass rtl8xxxu mac80211 btusb btrtl btbcm snd_hda_intel btintel snd_hda_codec bluetooth cfg80211 eeepc_wmi snd_hda_core asus_wmi joydev sparse_keymap snd_hwdep wmi_bmof input_leds snd_pcm ecdh_generic snd_timer snd soundcore ccp k10temp shpchp mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core nfsd iscsi_tcp libiscsi_tcp auth_rpcgss libiscsi nfs_acl scsi_transport_iscsi lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress algif_skcipher af_alg dm_crypt raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
[ 194.602323] xor raid6_pq libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc mxm_wmi video ttm drm_kms_helper aesni_intel syscopyarea sysfillrect igb sysimgblt aes_x86_64 fb_sys_fops crypto_simd glue_helper dca cryptd i2c_algo_bit drm i2c_piix4 ptp ahci nvme pps_core libahci nvme_core gpio_amdpt wmi gpio_generic
[ 194.602406] CPU: 1 PID: 4403 Comm: bcache-register Not tainted 4.15.0-91-generic #92-Ubuntu
[ 194.602425] Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 4207 12/08/2018
[ 194.602447] RIP: 0010:create_empty_buffers+0x29/0xf0
[ 194.602459] RSP: 0018:ffffa833cc37f7a8 EFLAGS: 00010246
[ 194.602471] RAX: 0000000000000000 RBX: fffff6227c3f4c00 RCX: 0000000000000013
[ 194.602487] RDX: 0000000000000000 RSI: 0000000000080000 RDI: fffff6227c3f4c00
[ 194.602503] RBP: ffffa833cc37f7c0 R08: 0000000000000001 R09: dead0000000000ff
[ 194.602519] R10: ffff8976ca4f7aa0 R11: 0000000000000000 R12: 0000000000000000
[ 194.602535] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000200
[ 194.602551] FS: 00007f51c3bb2500(0000) GS:ffff89771ec40000(0000) knlGS:0000000000000000
[ 194.602569] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 194.602582] CR2: 0000000000000008 CR3: 0000000f24aea000 CR4: 00000000003406e0
[ 194.602598] Call Trace:
[ 194.602606] create_page_buffers+0x51/0x60
[ 194.602616] block_read_full_page+0x4e/0x370
[ 194.602626] ? set_init_blocksize+0x80/0x80
[ 194.602636] blkdev_readpage+0x18/0x20
[ 194.602646] do_read_cache_page+0x2a2/0x580
[ 194.602656] ? blkdev_writepages+0x40/0x40
[ 194.602667] ? update_load_avg+0x57f/0x6e0
[ 194.602676] read_cache_page+0x15/0x20
[ 194.602687] read_dev_sector+0x2d/0xd0
[ 194.602696] read_lba+0x130/0x220
[ 194.602704] ? efi_partition+0x11a/0x790
[ 194.602713] efi_partition+0x138/0x790
[ 194.602723] ? string+0x60/0x90
[ 194.602731] ? vsnprintf+0xfb/0x510
[ 194.602740] ? snprintf+0x45/0x70
[ 194.602748] ? is_gpt_valid.part.6+0x420/0x420
[ 194.602760] check_partition+0x130/0x230
[ 194.603304] ? check_partition+0x130/0x230
[ 194.603837] rescan_partitions+0xaa/0x350
[ 194.604372] bdev_disk_changed+0x53/0x60
[ 194.604895] __blkdev_get+0x34a/0x510
[ 194.605413] blkdev_get+0x129/0x320
[ 194.605930] ? wake_up_bit+0x42/0x50
[ 194.606445] ? unlock_new_inode+0x4c/0x80
[ 194.606947] ? bdget+0x108/0x120
[ 194.607432] device_add_disk+0x38b/0x490
[ 194.607909] ? bcache_dev_sectors_dirty_add+0xb0/0xb0 [bcache]
[ 194.608373] bch_cached_dev_run.part.28+0x42/0x1a0 [bcache]
[ 194.608821] ? wait_woken+0x80/0x80
[ 194.609258] bch_cached_dev_attach+0x3b5/0x4f0 [bcache]
[ 194.609681] ? vprintk_func+0x47/0xc0
[ 194.610088] ? printk+0x52/0x6e
[ 194.610479] register_bcache+0x864/0x1110 [bcache]
[ 194.610861] ? register_bcache+0x864/0x1110 [bcache]
[ 194.611227] kobj_attr_store+0x12/0x20
[ 194.611580] ? kobj_attr_store+0x12/0x20
[ 194.611925] sysfs_kf_write+0x3c/0x50
[ 194.612259] kernfs_fop_write+0x125/0x1a0
[ 194.612595] __vfs_write+0x1b/0x40
[ 194.612921] vfs_write+0xb1/0x1a0
[ 194.613245] SyS_write+0x5c/0xe0
[ 194.613565] do_syscall_64+0x73/0x130
[ 194.613886] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 194.614212] RIP: 0033:0x7f51c36b8154
[ 194.614530] RSP: 002b:00007ffd27009a58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 194.614858] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f51c36b8154
[ 194.615183] RDX: 0000000000000009 RSI: 0000560160a77260 RDI: 0000000000000003
[ 194.615506] RBP: 0000560160a77260 R08: 0000000000000000 R09: 00007f51c37163a0
[ 194.615831] R10: 00000000fffffff8 R11: 0000000000000246 R12: 00007ffd27009af0
[ 194.616157] R13: 0000000000000009 R14: 00007f51c39902a0 R15: 00007f51c398f760
[ 194.616486] Code: 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 49 89 d5 ba 01 00 00 00 48 89 fb e8 22 ff ff ff 49 89 c4 48 89 c2 eb 03 48 89 ca <48> 8b 4a 08 4c 09 2a 48 85 c9 75 f1 4c 89 62 08 48 8b 43 08 48
[ 194.617194] RIP: create_empty_buffers+0x29/0xf0 RSP: ffffa833cc37f7a8
[ 194.617557] CR2: 0000000000000008

I've bisected the offending commit down to ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 ("block: fix an integer overflow in logical block size").  https://www.finnie.org/stuff/lp1867916-crashdump.tar.xz contains a crashdump of the affected system.  I'm happy to provide any additional information or testing on the system.
Comment 1 Mauricio Faria de Oliveira 2020-06-03 18:23:20 UTC
Hi Coly,

I sent a patch for this problem for your review; hope it helps.

[PATCH] bcache: check and adjust logical block size for backing devices
https://www.spinics.net/lists/linux-bcache/msg08411.html

cheers,
Mauricio