Bug 205447 - (Suspected problem with BFQ) Frequent freeze in kernel 5.3
Summary: (Suspected problem with BFQ) Frequent freeze in kernel 5.3
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: Block Layer (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Paolo Valente
URL:
Keywords:
: 205725 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-11-06 05:23 UTC by Patrick Dung
Modified: 2020-06-30 08:44 UTC (History)
5 users (show)

See Also:
Kernel Version: kernel 5.3.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Tentative fix to apply on top of my dev-bfq branch (3.67 KB, patch)
2019-11-11 09:10 UTC, Paolo Valente
Details | Diff
5.4.0 kernel with the 4 patch applied (31.34 KB, image/png)
2019-12-04 16:03 UTC, Patrick Dung
Details
fix check on oom in split, add more checks on oom merging (2.43 KB, patch)
2019-12-04 17:52 UTC, Paolo Valente
Details | Diff
Tarball of six patches (29.39 KB, application/x-compressed-tar)
2020-01-07 18:15 UTC, Chris Evich
Details
serial console output of traces from kernel (41.79 KB, text/plain)
2020-01-08 20:21 UTC, Chris Evich
Details
Tentative fix, to apply on top of all the other patches (1.82 KB, patch)
2020-01-09 16:47 UTC, Paolo Valente
Details | Diff
Tentative fix V2, to apply on top of the other patches (3.21 KB, patch)
2020-01-10 07:18 UTC, Paolo Valente
Details | Diff
New patch covering a possible bug hiding in group change (1.18 KB, patch)
2020-01-21 16:18 UTC, Paolo Valente
Details | Diff
tentative patch series to check and fix group references (4.02 KB, application/gzip)
2020-01-23 18:18 UTC, Paolo Valente
Details
This patch should fix a mistake in my last patch (2.01 KB, patch)
2020-01-24 16:08 UTC, Paolo Valente
Details | Diff
Mainline version of patches (4.64 KB, application/gzip)
2020-01-31 14:49 UTC, Paolo Valente
Details

Description Patrick Dung 2019-11-06 05:23:40 UTC
I am using Fedora 31, the default scheduler is changed to BFQ.
I tried to perform a CPU intensive task and the system hanged. It had system freeze about one time everyday. Seems it is related to BFQ scheduler. Previously, (Fedora 30) I was using the deadline scheduler without problem.

Nov 05 05:26:24 kernel: Oops: 0000 [#1] SMP PTI
Nov 05 05:26:24 kernel: CPU: 8 PID: 23976 Comm: ora_ckpt_homedb Kdump: loaded Tainted: P           OE     5.3.8-300.fc31.x86_64 #1
Nov 05 05:26:24 kernel: Hardware name: [REDACTED]
Nov 05 05:26:24 kernel: RIP: 0010:bfq_insert+0x21/0x70
Nov 05 05:26:24 kernel: Code: 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 fa 48 89 fd 53 48 8b 07 48 89 f3 48 85 c0 74 27 48 8b 7e 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
Nov 05 05:26:24 kernel: RSP: 0018:ffffa2e346f57768 EFLAGS: 00010006
Nov 05 05:26:24 kernel: RAX: 0000000000001000 RBX: ffff936208036488 RCX: 0000000000001000
Nov 05 05:26:24 kernel: RDX: ffff93587a1d30a0 RSI: ffff93587a1d30a8 RDI: 000000aa4e352b55
Nov 05 05:26:24 kernel: RBP: ffff93626803bd58 R08: ffffa2e346f57739 R09: 0000000000000004
Nov 05 05:26:24 kernel: R10: 0000000000000002 R11: ffffa2e346f57739 R12: ffff93626803bd50
Nov 05 05:26:24 kernel: R13: 0000000000000001 R14: ffff936208036400 R15: ffff936208036488
Nov 05 05:26:24 kernel: FS:  00007f64fe2823c0(0000) GS:ffff93627fa00000(0000) knlGS:0000000000000000
Nov 05 05:26:24 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 05 05:26:24 kernel: CR2: 0000000000001028 CR3: 0000002ca90a6004 CR4: 00000000001606e0
Nov 05 05:26:24 kernel: Call Trace:
Nov 05 05:26:24 kernel:  __bfq_deactivate_entity+0x125/0x1a0
Nov 05 05:26:24 kernel:  bfq_deactivate_entity+0x4f/0xc0
Nov 05 05:26:24 kernel:  bfq_del_bfqq_busy+0xac/0x160
Nov 05 05:26:24 kernel:  __bfq_bfqq_expire+0x52/0xc0
Nov 05 05:26:24 kernel:  bfq_bfqq_expire+0x363/0x8e0
Nov 05 05:26:24 kernel:  ? bfq_may_expire_for_budg_timeout+0x4b/0x190
Nov 05 05:26:24 kernel:  bfq_dispatch_request+0x1c5/0xea0
Nov 05 05:26:24 kernel:  blk_mq_do_dispatch_sched+0xc6/0x120
Nov 05 05:26:24 kernel:  blk_mq_sched_dispatch_requests+0x111/0x160
Nov 05 05:26:24 kernel:  __blk_mq_run_hw_queue+0x55/0x110
Nov 05 05:26:24 kernel:  __blk_mq_delay_run_hw_queue+0x164/0x170
Nov 05 05:26:24 kernel:  blk_mq_run_hw_queue+0x87/0x110
Nov 05 05:26:24 kernel:  blk_mq_sched_insert_requests+0x70/0xf0
Nov 05 05:26:24 kernel:  blk_mq_flush_plug_list+0x214/0x2b0
Nov 05 05:26:24 kernel:  blk_flush_plug_list+0xec/0x110
Nov 05 05:26:24 kernel:  blk_finish_plug+0x21/0x2e
Nov 05 05:26:24 kernel:  ext4_writepages+0xa39/0xe20
Nov 05 05:26:24 kernel:  ? finish_task_switch+0x108/0x2a0
Nov 05 05:26:24 kernel:  ? do_writepages+0x43/0xd0
Nov 05 05:26:24 kernel:  ? ext4_mark_inode_dirty+0x1d0/0x1d0
Nov 05 05:26:24 kernel:  do_writepages+0x43/0xd0
Nov 05 05:26:24 kernel:  ? ext4_write_end+0x14e/0x430
Nov 05 05:26:24 kernel:  __filemap_fdatawrite_range+0xbf/0x100
Nov 05 05:26:24 kernel:  file_write_and_wait_range+0x4e/0xa0
Nov 05 05:26:24 kernel:  ext4_sync_file+0x86/0x3d0
Nov 05 05:26:24 kernel:  ext4_file_write_iter+0xff/0x3b0
Nov 05 05:26:24 kernel:  ? _cond_resched+0x15/0x30
Nov 05 05:26:24 kernel:  new_sync_write+0x12d/0x1d0
Nov 05 05:26:24 kernel:  vfs_write+0xb6/0x1a0
Nov 05 05:26:24 kernel:  ksys_pwrite64+0x65/0xa0
Nov 05 05:26:24 kernel:  do_syscall_64+0x5f/0x1a0
Nov 05 05:26:24 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 05 05:26:24 kernel: RIP: 0033:0x7f64fe49821a
Nov 05 05:26:24 kernel: Code: 48 c7 c0 ff ff ff ff eb be 0f 1f 80 00 00 00 00 f3 0f 1e fa 49 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 12 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
Nov 05 05:26:24 kernel: RSP: 002b:00007ffd8d174c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
Nov 05 05:26:24 kernel: RAX: ffffffffffffffda RBX: 00007f64fd14d020 RCX: 00007f64fe49821a
Nov 05 05:26:24 kernel: RDX: 0000000000004000 RSI: 00007f64fd148000 RDI: 0000000000000100
Nov 05 05:26:24 kernel: RBP: 00007ffd8d179600 R08: 00007ffd8d1e7090 R09: 000000009dbcf778
Nov 05 05:26:24 kernel: R10: 000000000000c000 R11: 0000000000000246 R12: 0000000000004000
Nov 05 05:26:24 kernel: R13: 00000001f49df2d9 R14: 0000000000000000 R15: 00007f64fd14d228
Nov 05 05:26:24 kernel: Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper drm ipmi_devintf ipmi_msghandler vfio_iommu_type1 vfio mpt3sas raid_class mptctl mptbase vmnet(OE) vmmon(OE) ebtable_filter ebtables openvswitch nsh nf_conncount bridge stp llc parport_pc parport nf_log_ipv6 nf_log_ipv4 nf_log_common nft_log nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_chain_nat nf_nat nft_counter nft_ct nf_conntrack vmw_vsock_vmci_transport nf_defrag_ipv6 vsock nf_defrag_ipv4 nf_tables vmw_vmci nfnetlink nct6775 hwmon_vid lm92 xfs vfat fat btrfs zstd_compress zstd_decompress intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel snd_hda_codec_realtek kvm dm_raid snd_hda_codec_generic irqbypass ledtrig_audio snd_hda_intel iTCO_wdt crct10dif_pclmul iTCO_vendor_support snd_hda_codec crc32_pclmul snd_hda_core joydev ghash_clmulni_intel snd_hwdep intel_cstate snd_seq intel_uncore snd_seq_device
Nov 05 05:26:24 kernel:  intel_rapl_perf snd_pcm pcspkr i2c_i801 snd_timer snd lpc_ich ses enclosure scsi_transport_sas soundcore nfsd nfs_acl lockd auth_rpcgss grace sunrpc binfmt_misc ip_tables raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 crc32c_intel igb megaraid_sas dca i2c_algo_bit wmi target_core_mod vhost_net tun tap vhost fuse ecryptfs [last unloaded: ipmi_msghandler]
Nov 05 05:26:24 kernel: CR2: 0000000000001028
Nov 05 05:26:24 kernel: ---[ end trace ea552b29d0f8a0af ]---
Nov 05 05:26:24 kernel: RIP: 0010:bfq_insert+0x21/0x70
Nov 05 05:26:24 kernel: Code: 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 fa 48 89 fd 53 48 8b 07 48 89 f3 48 85 c0 74 27 48 8b 7e 28 eb 03 48 89 c8 <48> 8b 48 28 48 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6
Nov 05 05:26:24 kernel: RSP: 0018:ffffa2e346f57768 EFLAGS: 00010006
Nov 05 05:26:24 kernel: RAX: 0000000000001000 RBX: ffff936208036488 RCX: 0000000000001000
Nov 05 05:26:24 kernel: RDX: ffff93587a1d30a0 RSI: ffff93587a1d30a8 RDI: 000000aa4e352b55
Nov 05 05:26:24 kernel: RBP: ffff93626803bd58 R08: ffffa2e346f57739 R09: 0000000000000004
Nov 05 05:26:24 kernel: R10: 0000000000000002 R11: ffffa2e346f57739 R12: ffff93626803bd50
Nov 05 05:26:24 kernel: R13: 0000000000000001 R14: ffff936208036400 R15: ffff936208036488
Nov 05 05:26:24 kernel: FS:  00007f64fe2823c0(0000) GS:ffff93627fa00000(0000) knlGS:0000000000000000
Nov 05 05:26:24 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 05 05:26:24 kernel: CR2: 0000000000001028 CR3: 0000002ca90a6004 CR4: 00000000001606e0

Nov 05 05:27:32 kernel: watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [pmdalinux:3983]
Nov 05 05:27:32 kernel: Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper drm ipmi_devintf ipmi_msghandler vfio_iommu_type1 vfio mpt3sas raid_class mptctl mptbase vmnet(OE) vmmon(OE) ebtable_filter ebtables openvswitch nsh nf_conncount bridge stp llc parport_pc parport nf_log_ipv6 nf_log_ipv4 nf_log_common nft_log nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_masq nft_chain_nat nf_nat nft_counter nft_ct nf_conntrack vmw_vsock_vmci_transport nf_defrag_ipv6 vsock nf_defrag_ipv4 nf_tables vmw_vmci nfnetlink nct6775 hwmon_vid lm92 xfs vfat fat btrfs zstd_compress zstd_decompress intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_hdmi kvm_intel snd_hda_codec_realtek kvm dm_raid snd_hda_codec_generic irqbypass ledtrig_audio snd_hda_intel iTCO_wdt crct10dif_pclmul iTCO_vendor_support snd_hda_codec crc32_pclmul snd_hda_core joydev ghash_clmulni_intel snd_hwdep intel_cstate snd_seq intel_uncore snd_seq_device
Nov 05 05:27:32 kernel:  intel_rapl_perf snd_pcm pcspkr i2c_i801 snd_timer snd lpc_ich ses enclosure scsi_transport_sas soundcore nfsd nfs_acl lockd auth_rpcgss grace sunrpc binfmt_misc ip_tables raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 crc32c_intel igb megaraid_sas dca i2c_algo_bit wmi target_core_mod vhost_net tun tap vhost fuse ecryptfs [last unloaded: ipmi_msghandler]
Nov 05 05:27:32 kernel: CPU: 10 PID: 3983 Comm: pmdalinux Kdump: loaded Tainted: P      D    OE     5.3.8-300.fc31.x86_64 #1
Nov 05 05:27:32 kernel: Hardware name: [REDACTED]
Nov 05 05:27:32 kernel: RIP: 0010:smp_call_function_single+0x99/0x110
Nov 05 05:27:32 kernel: Code: 74 76 65 8b 05 90 a6 e9 48 a9 00 01 1f 00 75 79 85 c9 75 40 48 c7 c6 00 98 02 00 65 48 03 35 46 3e e9 48 8b 46 18 a8 01 74 09 <f3> 90 8b 46 18 a8 01 75 f7 83 4e 18 01 4c 89 c9 4c 89 c2 e8 6f fe
Nov 05 05:27:32 kernel: RSP: 0018:ffffa2e343b6fbe0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
Nov 05 05:27:32 kernel: RAX: 0000000000000001 RBX: 0000000b111cf8cf RCX: 0000000000000000
Nov 05 05:27:32 kernel: RDX: 0000000000000000 RSI: ffff93627faa9800 RDI: 0000000000000003
Nov 05 05:27:32 kernel: RBP: ffffa2e343b6fc38 R08: ffffffffb703ddf0 R09: 0000000000000000
Nov 05 05:27:32 kernel: R10: ffff93626f7370c0 R11: 0000000000000000 R12: 000007aeb10c542b
Nov 05 05:27:32 kernel: R13: 0000000000000001 R14: ffff935d95042500 R15: ffff935d95042500
Nov 05 05:27:32 kernel: FS:  00007fe92cded900(0000) GS:ffff93627fa80000(0000) knlGS:0000000000000000
Nov 05 05:27:32 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 05 05:27:32 kernel: CR2: 00007f56be1b6c30 CR3: 0000002f4fdec004 CR4: 00000000001606e0
Nov 05 05:27:32 kernel: Call Trace:
Nov 05 05:27:32 kernel:  ? xattr_resolve_name+0xa0/0xc0
Nov 05 05:27:32 kernel:  ? recalibrate_cpu_khz+0x10/0x10
Nov 05 05:27:32 kernel:  ? ktime_get+0x3c/0x90
Nov 05 05:27:32 kernel:  aperfmperf_snapshot_cpu+0x40/0x50
Nov 05 05:27:32 kernel:  arch_freq_prepare_all+0x5e/0xa0
Nov 05 05:27:32 kernel:  cpuinfo_open+0xe/0x20
Nov 05 05:27:32 kernel:  proc_reg_open+0x6f/0x130
Nov 05 05:27:32 kernel:  ? proc_put_link+0x10/0x10
Nov 05 05:27:32 kernel:  do_dentry_open+0x13a/0x380
Nov 05 05:27:32 kernel:  path_openat+0x591/0x1470
Nov 05 05:27:32 kernel:  do_filp_open+0x91/0x100
Nov 05 05:27:32 kernel:  ? __check_object_size+0x136/0x147
Nov 05 05:27:32 kernel:  ? audit_alloc_name+0x8c/0xe0
Nov 05 05:27:32 kernel:  ? __alloc_fd+0x3d/0x140
Nov 05 05:27:32 kernel:  do_sys_open+0x184/0x220
Nov 05 05:27:32 kernel:  do_syscall_64+0x5f/0x1a0
Nov 05 05:27:32 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 05 05:27:32 kernel: RIP: 0033:0x7fe92daf712b
Nov 05 05:27:32 kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
Nov 05 05:27:32 kernel: RSP: 002b:00007ffe47d867f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Nov 05 05:27:32 kernel: RAX: ffffffffffffffda RBX: 0000560b18975be0 RCX: 00007fe92daf712b
Nov 05 05:27:32 kernel: RDX: 0000000000000000 RSI: 00007ffe47d86990 RDI: 00000000ffffff9c
Nov 05 05:27:32 kernel: RBP: 00007ffe47d86990 R08: 0000000000000008 R09: 0000000000000001
Nov 05 05:27:32 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 05 05:27:32 kernel: R13: 0000560b18975be0 R14: 0000000000000001 R15: 00007ffe47d87a80
Comment 1 Thorsten Schubert 2019-11-08 23:57:06 UTC
I am experiencing the same problem on an older system (i7-2600K) running Fedora 31 on shutdown (nearly always) and when stopping containers via podman (one in three attempts results in a system freeze).

The system seems to behave normally everywhere else; I did not experience random lockups otherwise, even under some I/O load.


`BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.3.8-305.polaris.fc31.x86_64 root=/dev/mapper/vg_ss_system-lv_root ro resume=/dev/mapper/cswap rd.lvm.lv=vg_ss_system/lv_root quiet mitigations=off rd.driver.blacklist=nouveau pcie_aspm=off pcie_port_pm=off`


```
general protection fault: 0000 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 204 Comm: kworker/u16:7 Tainted: P           OE     5.3.8-305.polaris.fc31.x86_64 #1
Hardware name: [REDACTED]
Workqueue: writeback wb_workfn (flush-253:4)
RIP: 0010:bfq_find_set_group+0x71/0xe0
Code: 4c 4c 8b 04 f0 4d 85 c0 0f 84 81 00 00 00 4c 89 c0 48 05 98 00 00 00 75 43 eb 5a 4c 89 c2 4d 85 c0 74 07 48 8b 90 68 ff ff ff <48> 8b 52 38 48 85 d2 74 47 48 8b 14 f2 48 85 d2 48 0f 44 d1 48 8b
RSP: 0018:ffffbb62c0303548 EFLAGS: 00010086
RAX: ffff94c7348cec98 RBX: ffff94c745a1d800 RCX: ffff94c7009ea800
RDX: 4e3eaf50cf5ba557 RSI: 000000000000004e RDI: ffff94c6fb8ac2f8
RBP: ffff94c6fb8ac200 R08: ffff94c7348cec00 R09: ffff94c73993e260
R10: 0000000000000000 R11: ffff94c6fb8ac300 R12: ffff94c745a1d800
R13: 000000000000036e R14: 0000000000000000 R15: ffff94c74302f600
FS:  0000000000000000(0000) GS:ffff94c74eb40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561798491018 CR3: 000000018740a002 CR4: 00000000000606e0

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lscpu:6007]
Modules linked in: f2fs xt_CHECKSUM nf_nat_tftp nf_conntrack_tftp tun md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs libarc4 dns_resolver fscache nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ebtable_nat ebtable_broute ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw zstd zstd_compress zstd_decompress xt_multiport iptable_security xt_nat xt_addrtype xt_mark xt_MASQUERADE xt_comment veth bridge ip_set stp llc nfnetlink ebtable_filter ebtables ip6table_filter ip6table_nat ip6_tables iptable_nat nf_nat iptable_filter xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c overlay nct6775 hwmon_vid sunrpc vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) intel_rapl_msr iTCO_wdt iTCO_vendor_support intel_rapl_common snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_intel intel_cstate snd_usb_audio
 intel_uncore snd_hda_codec intel_rapl_perf snd_hda_core snd_usbmidi_lib snd_rawmidi snd_hwdep snd_seq eeepc_wmi asus_wmi snd_seq_device wmi_bmof sparse_keymap rfkill i2c_i801 snd_pcm drm_kms_helper mc joydev snd_timer drm snd lpc_ich soundcore mei_me ipmi_devintf mei ipmi_msghandler ip_tables dm_crypt uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel mxm_wmi e1000e serio_raw video wmi fuse
CPU: 0 PID: 6007 Comm: lscpu Tainted: P      D    OE     5.3.8-305.polaris.fc31.x86_64 #1
Hardware name: [REDACTED]
RIP: 0010:smp_call_function_single+0xbc/0x160
Code: a9 00 01 1f 00 0f 85 ab 00 00 00 85 db 75 69 48 c7 c6 00 98 02 00 65 48 03 35 d8 d1 e8 5f 8b 46 18 a8 01 74 09 f3 90 8b 46 18 <a8> 01 75 f7 83 4e 18 01 44 89 e7 4c 89 f1 4c 89 ea e8 4e fe ff ff
RSP: 0018:ffffbb62c3e4fbc0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff94c74ea29800 RDI: ffffffffa0183138
RBP: ffffbb62c3e4fc28 R08: 0000000000000001 R09: ffff94c747c9bf30
R10: ffff94c71a301fd8 R11: ffff94c74ea28930 R12: 0000000000000006
R13: ffffffffa003f4b0 R14: 0000000000000000 R15: ffff94c74b2b0500
FS:  00007fdb907b6740(0000) GS:ffff94c74ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc4ae7e418 CR3: 00000003c387c005 CR4: 00000000000606f0
```
Comment 2 Jens Axboe 2019-11-09 00:10:46 UTC
Moving this to Paolo
Comment 3 Paolo Valente 2019-11-09 10:45:11 UTC
(In reply to Patrick Dung from comment #0)
> I am using Fedora 31, the default scheduler is changed to BFQ.
> I tried to perform a CPU intensive task and the system hanged. It had system
> freeze about one time everyday. Seems it is related to BFQ scheduler.
> Previously, (Fedora 30) I was using the deadline scheduler without problem.
> 

Unfortunately I cannot reproduce any hang. Would you be willing to test a dev version of BFQ, which should give a much more informative OOPS? If you are, then we have two options.

The first is that you build and install the default branch in this repo:
https://github.com/Algodev-github/bfq-mq

The second is that you apply a patch to the source tree of your kernel.

For both options, I'm willing to assist you in every step.

Thanks,
Paolo
Comment 4 Paolo Valente 2019-11-09 10:47:20 UTC
(In reply to guibs from comment #1)
> I am experiencing the same problem on an older system (i7-2600K) running
> Fedora 31 on shutdown (nearly always) and when stopping containers via
> podman (one in three attempts results in a system freeze).
> 
> The system seems to behave normally everywhere else; I did not experience
> random lockups otherwise, even under some I/O load.
> 
> 
> `BOOT_IMAGE=(hd1,gpt2)/vmlinuz-5.3.8-305.polaris.fc31.x86_64
> root=/dev/mapper/vg_ss_system-lv_root ro resume=/dev/mapper/cswap
> rd.lvm.lv=vg_ss_system/lv_root quiet mitigations=off
> rd.driver.blacklist=nouveau pcie_aspm=off pcie_port_pm=off`
> 

As I said to the other reporter, unfortunately I cannot reproduce any hang.

So I'll ask you too you could kindly test a dev version of BFQ (which should give a much more informative OOPS). If you can, then here are the two options again.

First, you build and install the default branch in this repo:
https://github.com/Algodev-github/bfq-mq

Second, you apply a patch to the source tree of your kernel.

For both options, I'm willing to assist you in every step.

Thanks,
Paolo
Comment 5 Patrick Dung 2019-11-09 11:12:23 UTC
(In reply to Paolo Valente from comment #3)
> (In reply to Patrick Dung from comment #0)
> > I am using Fedora 31, the default scheduler is changed to BFQ.
> > I tried to perform a CPU intensive task and the system hanged. It had
> system
> > freeze about one time everyday. Seems it is related to BFQ scheduler.
> > Previously, (Fedora 30) I was using the deadline scheduler without problem.
> > 
> 
> Unfortunately I cannot reproduce any hang. Would you be willing to test a
> dev version of BFQ, which should give a much more informative OOPS? If you
> are, then we have two options.
> 
> The first is that you build and install the default branch in this repo:
> https://github.com/Algodev-github/bfq-mq
> 
> The second is that you apply a patch to the source tree of your kernel.
> 
> For both options, I'm willing to assist you in every step.
> 
> Thanks,
> Paolo

For my case, the hang is on a physical machine . I could not afford frequent hang/crashes on that machine, so I update my system setting to use the deadline scheduler.

I have a Fedora 31 VM that maybe used for testing. I opt for the second option (patching the existing Fedora kernel). Source of the kernel may be download in https://dl.fedoraproject.org/pub/fedora/linux/updates/31/Everything/SRPMS/Packages/k/
As a side node, I tried to run fio (fsync=0) in VM with some looping for a few hours but can't make it crash.

Thanks.
Comment 6 Paolo Valente 2019-11-09 12:35:35 UTC
(In reply to Patrick Dung from comment #5)
> (In reply to Paolo Valente from comment #3)
> > (In reply to Patrick Dung from comment #0)
> > > I am using Fedora 31, the default scheduler is changed to BFQ.
> > > I tried to perform a CPU intensive task and the system hanged. It had
> > system
> > > freeze about one time everyday. Seems it is related to BFQ scheduler.
> > > Previously, (Fedora 30) I was using the deadline scheduler without
> problem.
> > > 
> > 
> > Unfortunately I cannot reproduce any hang. Would you be willing to test a
> > dev version of BFQ, which should give a much more informative OOPS? If you
> > are, then we have two options.
> > 
> > The first is that you build and install the default branch in this repo:
> > https://github.com/Algodev-github/bfq-mq
> > 
> > The second is that you apply a patch to the source tree of your kernel.
> > 
> > For both options, I'm willing to assist you in every step.
> > 
> > Thanks,
> > Paolo
> 
> For my case, the hang is on a physical machine . I could not afford frequent
> hang/crashes on that machine, so I update my system setting to use the
> deadline scheduler.
> 

I do understand

> I have a Fedora 31 VM that maybe used for testing. I opt for the second
> option (patching the existing Fedora kernel). Source of the kernel may be
> download in
> https://dl.fedoraproject.org/pub/fedora/linux/updates/31/Everything/SRPMS/
> Packages/k/
> As a side node, I tried to run fio (fsync=0) in VM with some looping for a
> few hours but can't make it crash.
> 

So in your VM no crashes at all, so far. Until you get a crash, I think it is little useful that you instrument BFQ in the VM.

Thanks you,
Paolo

> Thanks.
Comment 7 Patrick Dung 2019-11-09 12:47:08 UTC
> > I have a Fedora 31 VM that maybe used for testing. I opt for the second
> > option (patching the existing Fedora kernel). Source of the kernel may be
> > download in
> > https://dl.fedoraproject.org/pub/fedora/linux/updates/31/Everything/SRPMS/
> > Packages/k/
> > As a side node, I tried to run fio (fsync=0) in VM with some looping for a
> > few hours but can't make it crash.
> > 
> 
> So in your VM no crashes at all, so far. Until you get a crash, I think it
> is little useful that you instrument BFQ in the VM.

Maybe I need more testing (longer test, test with fsync=1) with the VM, I just run it once a few days ago.
Comment 8 Thorsten Schubert 2019-11-10 01:48:27 UTC
(In reply to Paolo Valente from comment #4)
> 
> As I said to the other reporter, unfortunately I cannot reproduce any hang.
> 
> So I'll ask you too you could kindly test a dev version of BFQ (which should
> give a much more informative OOPS). If you can, then here are the two
> options again.
> 
> First, you build and install the default branch in this repo:
> https://github.com/Algodev-github/bfq-mq
> 
> Second, you apply a patch to the source tree of your kernel.
> 
> For both options, I'm willing to assist you in every step.
> 
> Thanks,
> Paolo

After patching the Fedora kernel with your development branch, I captured this after stopping a podman container:


kernel: ------------[ cut here ]------------
kernel: kernel BUG at block/bfq-iosched.c:5381!
kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
kernel: CPU: 3 PID: 567 Comm: kworker/3:3 Tainted: P           OE     5.3.9-308.bfqdbg.fc31.x86_64 #1
kernel: Hardware name: [REDACTED]
kernel: Workqueue: cgroup_destroy css_killed_work_fn
kernel: RIP: 0010:bfq_put_queue+0x376/0x420
kernel: Code: b0 62 d0 ff 8b 5d 00 48 83 c4 18 e9 33 fd ff ff 0f 0b 48 b8 53 48 41 52 45 44 2d 00 48 89 44 24 04 e9 bc fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 48 89 ef e8 8c 6f 00 00 48 c7 c2 c0 a0 16 91 4c
kernel: RSP: 0018:ffffb52380757c18 EFLAGS: 00010002
kernel: RAX: 00000000000000d2 RBX: 0000000000000001 RCX: 0000000000000000
kernel: RDX: 0000000000000001 RSI: 0000000000000036 RDI: 0000000000000000
kernel: RBP: ffff9bddc68f7cf0 R08: 0000000000000000 R09: 0000000000000003
kernel: R10: 0000000000000000 R11: ffffb52380757c20 R12: ffff9bddc8da7000
kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
kernel: FS:  0000000000000000(0000) GS:ffff9bddceac0000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000c000379000 CR3: 000000031099a004 CR4: 00000000000606e0
kernel: Call Trace:
kernel:  bfq_forget_entity+0x97/0x140
kernel:  ? bfq_active_extract+0x50/0x180
kernel:  __bfq_deactivate_entity+0xac/0x1d0
kernel:  bfq_deactivate_entity+0x8a/0x2b0
kernel:  bfq_bfqq_move+0x279/0x410
kernel:  ? __bfq_deactivate_entity+0xac/0x1d0
kernel:  bfq_pd_offline+0x113/0x150
kernel:  blkg_destroy+0x72/0x200
kernel:  blkcg_destroy_blkgs+0x52/0x90
kernel:  css_killed_work_fn+0x55/0x100
kernel:  process_one_work+0x199/0x360
kernel:  worker_thread+0x50/0x3a0
kernel:  kthread+0xfb/0x130
kernel:  ? process_one_work+0x360/0x360
kernel:  ? kthread_park+0x80/0x80
kernel:  ret_from_fork+0x1f/0x40
kernel: Modules linked in: f2fs xt_CHECKSUM nf_nat_tftp nf_conntrack_tftp tun md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs libarc4 dns_resolver fscache tcp_bbr sch_fq nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw iptable_security ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables xt_conntrack iptable_filter xt_multiport xt_nat xt_addrtype xt_mark xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c veth bridge stp llc overlay zstd zstd_compress zstd_decompress nct6775 hwmon_vid sunrpc vfat fat nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(OE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support snd_hda_intel irqbypass intel_cstate
kernel:  intel_uncore intel_rapl_perf snd_hda_codec eeepc_wmi asus_wmi snd_usb_audio sparse_keymap rfkill wmi_bmof drm_kms_helper snd_hda_core i2c_i801 snd_usbmidi_lib snd_seq snd_hwdep snd_rawmidi drm mc joydev snd_seq_device lpc_ich snd_pcm ipmi_devintf snd_timer ipmi_msghandler snd syscopyarea mei_me sysfillrect sysimgblt soundcore fb_sys_fops mei ip_tables dm_crypt uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel e1000e ghash_clmulni_intel serio_raw mxm_wmi video wmi fuse
kernel: ---[ end trace bc104a582d3386f0 ]---
kernel: RIP: 0010:bfq_put_queue+0x376/0x420
kernel: Code: b0 62 d0 ff 8b 5d 00 48 83 c4 18 e9 33 fd ff ff 0f 0b 48 b8 53 48 41 52 45 44 2d 00 48 89 44 24 04 e9 bc fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 48 89 ef e8 8c 6f 00 00 48 c7 c2 c0 a0 16 91 4c
kernel: RSP: 0018:ffffb52380757c18 EFLAGS: 00010002
kernel: RAX: 00000000000000d2 RBX: 0000000000000001 RCX: 0000000000000000
kernel: RDX: 0000000000000001 RSI: 0000000000000036 RDI: 0000000000000000
kernel: RBP: ffff9bddc68f7cf0 R08: 0000000000000000 R09: 0000000000000003
kernel: R10: 0000000000000000 R11: ffffb52380757c20 R12: ffff9bddc8da7000
kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
kernel: FS:  0000000000000000(0000) GS:ffff9bddceac0000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000000c000379000 CR3: 000000031099a004 CR4: 00000000000606e0
kernel: note: kworker/3:3[567] exited with preempt_count 3
Comment 9 Paolo Valente 2019-11-11 09:09:58 UTC
(In reply to guibs from comment #8)
> (In reply to Paolo Valente from comment #4)
> > 
> > As I said to the other reporter, unfortunately I cannot reproduce any hang.
> > 
> > So I'll ask you too you could kindly test a dev version of BFQ (which
> should
> > give a much more informative OOPS). If you can, then here are the two
> > options again.
> > 
> > First, you build and install the default branch in this repo:
> > https://github.com/Algodev-github/bfq-mq
> > 
> > Second, you apply a patch to the source tree of your kernel.
> > 
> > For both options, I'm willing to assist you in every step.
> > 
> > Thanks,
> > Paolo
> 
> After patching the Fedora kernel with your development branch, I captured
> this after stopping a podman container:
> 
> 
> kernel: ------------[ cut here ]------------
> kernel: kernel BUG at block/bfq-iosched.c:5381!
> kernel: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> kernel: CPU: 3 PID: 567 Comm: kworker/3:3 Tainted: P           OE    
> 5.3.9-308.bfqdbg.fc31.x86_64 #1
> kernel: Hardware name: [REDACTED]
> kernel: Workqueue: cgroup_destroy css_killed_work_fn
> kernel: RIP: 0010:bfq_put_queue+0x376/0x420
> kernel: Code: b0 62 d0 ff 8b 5d 00 48 83 c4 18 e9 33 fd ff ff 0f 0b 48 b8 53
> 48 41 52 45 44 2d 00 48 89 44 24 04 e9 bc fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b
> 0f 0b 48 89 ef e8 8c 6f 00 00 48 c7 c2 c0 a0 16 91 4c
> kernel: RSP: 0018:ffffb52380757c18 EFLAGS: 00010002
> kernel: RAX: 00000000000000d2 RBX: 0000000000000001 RCX: 0000000000000000
> kernel: RDX: 0000000000000001 RSI: 0000000000000036 RDI: 0000000000000000
> kernel: RBP: ffff9bddc68f7cf0 R08: 0000000000000000 R09: 0000000000000003
> kernel: R10: 0000000000000000 R11: ffffb52380757c20 R12: ffff9bddc8da7000
> kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> kernel: FS:  0000000000000000(0000) GS:ffff9bddceac0000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 000000c000379000 CR3: 000000031099a004 CR4: 00000000000606e0
> kernel: Call Trace:
> kernel:  bfq_forget_entity+0x97/0x140
> kernel:  ? bfq_active_extract+0x50/0x180
> kernel:  __bfq_deactivate_entity+0xac/0x1d0
> kernel:  bfq_deactivate_entity+0x8a/0x2b0
> kernel:  bfq_bfqq_move+0x279/0x410
> kernel:  ? __bfq_deactivate_entity+0xac/0x1d0
> kernel:  bfq_pd_offline+0x113/0x150
> kernel:  blkg_destroy+0x72/0x200
> kernel:  blkcg_destroy_blkgs+0x52/0x90
> kernel:  css_killed_work_fn+0x55/0x100
> kernel:  process_one_work+0x199/0x360
> kernel:  worker_thread+0x50/0x3a0
> kernel:  kthread+0xfb/0x130
> kernel:  ? process_one_work+0x360/0x360
> kernel:  ? kthread_park+0x80/0x80
> kernel:  ret_from_fork+0x1f/0x40
> kernel: Modules linked in: f2fs xt_CHECKSUM nf_nat_tftp nf_conntrack_tftp
> tun md4 sha512_ssse3 sha512_generic cmac nls_utf8 cifs libarc4 dns_resolver
> fscache tcp_bbr sch_fq nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT
> ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4
> ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw
> ip6table_security iptable_mangle iptable_raw iptable_security ip_set
> nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables xt_conntrack
> iptable_filter xt_multiport xt_nat xt_addrtype xt_mark xt_MASQUERADE
> xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> libcrc32c veth bridge stp llc overlay zstd zstd_compress zstd_decompress
> nct6775 hwmon_vid sunrpc vfat fat nvidia_drm(POE) nvidia_modeset(POE)
> nvidia_uvm(OE) nvidia(POE) intel_rapl_msr intel_rapl_common
> snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel
> kvm iTCO_wdt iTCO_vendor_support snd_hda_intel irqbypass intel_cstate
> kernel:  intel_uncore intel_rapl_perf snd_hda_codec eeepc_wmi asus_wmi
> snd_usb_audio sparse_keymap rfkill wmi_bmof drm_kms_helper snd_hda_core
> i2c_i801 snd_usbmidi_lib snd_seq snd_hwdep snd_rawmidi drm mc joydev
> snd_seq_device lpc_ich snd_pcm ipmi_devintf snd_timer ipmi_msghandler snd
> syscopyarea mei_me sysfillrect sysimgblt soundcore fb_sys_fops mei ip_tables
> dm_crypt uas usb_storage crct10dif_pclmul crc32_pclmul crc32c_intel e1000e
> ghash_clmulni_intel serio_raw mxm_wmi video wmi fuse
> kernel: ---[ end trace bc104a582d3386f0 ]---
> kernel: RIP: 0010:bfq_put_queue+0x376/0x420
> kernel: Code: b0 62 d0 ff 8b 5d 00 48 83 c4 18 e9 33 fd ff ff 0f 0b 48 b8 53
> 48 41 52 45 44 2d 00 48 89 44 24 04 e9 bc fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b
> 0f 0b 48 89 ef e8 8c 6f 00 00 48 c7 c2 c0 a0 16 91 4c
> kernel: RSP: 0018:ffffb52380757c18 EFLAGS: 00010002
> kernel: RAX: 00000000000000d2 RBX: 0000000000000001 RCX: 0000000000000000
> kernel: RDX: 0000000000000001 RSI: 0000000000000036 RDI: 0000000000000000
> kernel: RBP: ffff9bddc68f7cf0 R08: 0000000000000000 R09: 0000000000000003
> kernel: R10: 0000000000000000 R11: ffffb52380757c20 R12: ffff9bddc8da7000
> kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> kernel: FS:  0000000000000000(0000) GS:ffff9bddceac0000(0000)
> knlGS:0000000000000000
> kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: 000000c000379000 CR3: 000000031099a004 CR4: 00000000000606e0
> kernel: note: kworker/3:3[567] exited with preempt_count 3

Thank you very much, this OOPS probably allowed me to find the bug. Could you please try the fix I'm about to upload?
Comment 10 Paolo Valente 2019-11-11 09:10:50 UTC
Created attachment 285857 [details]
Tentative fix to apply on top of my dev-bfq branch
Comment 11 Thorsten Schubert 2019-11-11 22:52:10 UTC
(In reply to Paolo Valente from comment #10)
> Created attachment 285857 [details]
> Tentative fix to apply on top of my dev-bfq branch

After a short test with the applied patch, I can claim not to be able to reproduce the problem.

At least the extensive starting and stopping of podman containers and multiple reboots didn't result in a single kernel OOPS.

If you provide me with a non-debug patch against the stable kernel branch, I am willing to test it for several days in daily use of the system if that should be something that is needed.
Comment 12 Paolo Valente 2019-11-12 17:32:56 UTC
(In reply to guibs from comment #11)
> (In reply to Paolo Valente from comment #10)
> > Created attachment 285857 [details]
> > Tentative fix to apply on top of my dev-bfq branch
> 
> After a short test with the applied patch, I can claim not to be able to
> reproduce the problem.
> 
> At least the extensive starting and stopping of podman containers and
> multiple reboots didn't result in a single kernel OOPS.
> 
> If you provide me with a non-debug patch against the stable kernel branch, I
> am willing to test it for several days in daily use of the system if that
> should be something that is needed.

You can find the production patch, e.g., here:
https://www.spinics.net/lists/kernel/msg3313638.html

Jens already accepted it.
Comment 13 Thorsten Schubert 2019-11-12 17:46:33 UTC
(In reply to Paolo Valente from comment #12)
> You can find the production patch, e.g., here:
> https://www.spinics.net/lists/kernel/msg3313638.html
> 
> Jens already accepted it.

I've been using it for eight hours now, works flawlessly, thanks!
Comment 14 Thorsten Schubert 2019-11-13 13:34:41 UTC
While yesterday everything appeared to work fine, today I noticed that a shutdown is often stuck for several minutes.

Also, "sync" seems to hang indefinitely and some command line utilities take a really long time, e.g. sudo. Switching to mq-deadline fixes this.

However, the system was somewhat usable and file I/O *seemed* to work, except for the mentioned issues.

I noticed this on a patched Fedora 31 5.3.10 and 5.3.11 Kernel.
Comment 15 Paolo Valente 2019-11-13 13:56:40 UTC
(In reply to Thorsten Schubert from comment #14)
> While yesterday everything appeared to work fine, today I noticed that a
> shutdown is often stuck for several minutes.
> 
> Also, "sync" seems to hang indefinitely and some command line utilities take
> a really long time, e.g. sudo. Switching to mq-deadline fixes this.
> 
> However, the system was somewhat usable and file I/O *seemed* to work,
> except for the mentioned issues.
> 
> I noticed this on a patched Fedora 31 5.3.10 and 5.3.11 Kernel.

I hoped everything went well, but I'm not that surprised that this tentative fix still needs a little more work. Could you please test for a while my dev-bfq branch, with my tentative fix applied? A possible kernel OOPS by dev bfq would tell us a lot. Thanks.
Comment 16 Paolo Valente 2019-11-14 18:46:44 UTC
Fixed fix here:
https://lkml.org/lkml/2019/11/14/199
Comment 17 Paolo Valente 2019-11-14 18:47:22 UTC
Fixed fix here:
https://lkml.org/lkml/2019/11/14/199
Comment 18 Patrick Dung 2019-11-30 17:17:43 UTC
@Paolo Valente

I think the patch on Nov-14 should be included in 5.4-rc8 (https://github.com/torvalds/linux/commit/478de3380c1c7dbb0f65f545ee0185848413f3fe)

I had installed an 5.4-rc8 kernel (Fedora RPM) and use BFQ for a few days. Just now I got a kernel crash with BFQ. Please kindly take a look, thanks.

The submitted ticket is in here:
https://bugzilla.kernel.org/show_bug.cgi?id=205725
Comment 19 Paolo Valente 2019-12-02 15:35:06 UTC
(In reply to Patrick Dung from comment #18)
> @Paolo Valente
> 
> I think the patch on Nov-14 should be included in 5.4-rc8
> (https://github.com/torvalds/linux/commit/
> 478de3380c1c7dbb0f65f545ee0185848413f3fe)
> 
> I had installed an 5.4-rc8 kernel (Fedora RPM) and use BFQ for a few days.
> Just now I got a kernel crash with BFQ. Please kindly take a look, thanks.
> 
> The submitted ticket is in here:
> https://bugzilla.kernel.org/show_bug.cgi?id=205725

Thanks for reporting this persisting failure.

Could you please try with the following branch?
https://github.com/Algodev-github/bfq-mq/tree/dev-bfq-on-5.4

It contains a dev version of bfq, plus a tentative fix, based on a guess of mine on what could be still causing this failure. If the fix doesn't work, this dev version should at least provide useful information.

If more convenient for you, you can also use a 5.4 tree of yours, after applying, on top of it, the top four commits in the above branch.

Looking forward to your feedback.
Comment 20 Patrick Dung 2019-12-02 15:54:54 UTC
@Paolo Valente
Sorry to let you know that the physical machine had special purpose and could not use the dev-bfq branch.

Let me see if I could try the dev version of bfq or the latter method (patching 5.4) on a VM. But I can't promise I could reproduce the problem.
Comment 21 Patrick Dung 2019-12-02 18:07:42 UTC
@Paolo Valente

For method 2 (applying the four patch files):

Got a problem when I tried perform patching on Fedora source RPM:

Applying: block, bfq, DEBUG: check proc refs in case of requeue
Applying: block, bfq: do not idle for queues with no proc refs
error: patch failed: block/bfq-iosched.c:3833
error: block/bfq-iosched.c: patch does not apply
Patch failed at 0043 block, bfq: do not idle for queues with no proc refs
hint: Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
error: Bad exit status from /var/tmp/rpm-tmp.tMB62V (%prep)

The source rpm/git should be:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1416758
https://src.fedoraproject.org/rpms/kernel#7009ddf5df3fe8e765d8e0743b4051e0df68f5e2

If I skipped 'block, bfq: do not idle for queues with no proc refs'. The other three patches can be applied.
Comment 22 Paolo Valente 2019-12-02 20:02:58 UTC
(In reply to Patrick Dung from comment #21)
> @Paolo Valente
> 
> For method 2 (applying the four patch files):
> 
> Got a problem when I tried perform patching on Fedora source RPM:
> 
> Applying: block, bfq, DEBUG: check proc refs in case of requeue
> Applying: block, bfq: do not idle for queues with no proc refs
> error: patch failed: block/bfq-iosched.c:3833
> error: block/bfq-iosched.c: patch does not apply
> Patch failed at 0043 block, bfq: do not idle for queues with no proc refs
> hint: Use 'git am --show-current-patch' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> error: Bad exit status from /var/tmp/rpm-tmp.tMB62V (%prep)
> 
> The source rpm/git should be:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1416758
> https://src.fedoraproject.org/rpms/
> kernel#7009ddf5df3fe8e765d8e0743b4051e0df68f5e2
> 
> If I skipped 'block, bfq: do not idle for queues with no proc refs'. The
> other three patches can be applied.

If I'm not mistaken, you applied patches in reverse order.

The right order is:
1. block, bfq, DEBUG: add logs and BUG_ONs
2. Makefile: add -bfq version suffix
3. block, bfq: do not idle for queues with no proc refs
4. block, bfq, DEBUG: check proc refs in case of requeue
Comment 23 Patrick Dung 2019-12-02 21:27:03 UTC
OK, the patches applied cleanly now. Thanks.
Comment 24 Patrick Dung 2019-12-04 16:03:19 UTC
From my original post, the problem occurs at Oracle DB checkpoint process on an ext4 file system.

I compiled the kernel (5.4.0) with the four patches and install on an VM.
Also I had an Oracle DB with ext4 on it. Just now I had tried to utilize the DB with some java app and it crashed.

The VM hangs and I can't create a crash dump. Attached file is the related screen capture.
Comment 25 Patrick Dung 2019-12-04 16:03:47 UTC
Created attachment 286175 [details]
5.4.0 kernel with the 4 patch applied
Comment 26 Patrick Dung 2019-12-04 17:27:57 UTC
Alright, managed to get the kernel log with the crash dump.

[56796.145958] ------------[ cut here ]------------
[56796.146340] kernel BUG at block/bfq-iosched.c:6875!
[56796.146699] invalid opcode: 0000 [#1] SMP PTI
[56796.146973] CPU: 3 PID: 38322 Comm: ora_rvwr_fctts0 Kdump: loaded Not tainted 5.4.0-2.fc31.x86_64 #1
[56796.147485] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[56796.148201] RIP: 0010:bfq_init_rq+0x387/0xb80
[56796.148497] Code: 00 00 00 4c 89 f7 45 31 ed e8 45 f9 ff ff 49 89 c4 48 85 c0 0f 84 b2 06 00 00 4d 8d be f0 01 00 00 4d 39 e7 0f 85 98 fd ff ff <0f> 0b 4d 8d be f0 01 00 00 45 31 ed e9 87 fd ff ff 48 b8 53 48 41
[56796.149348] RSP: 0018:ffffa23581b976f0 EFLAGS: 00010046
[56796.149625] RAX: ffff918d5027a1f0 RBX: ffff918d50185400 RCX: 0000000000000000
[56796.149886] RDX: ffffa23581b97587 RSI: 0000000000000031 RDI: 0000000000000000
[56796.150177] RBP: ffff918d52ce83c0 R08: ffffa23581b976a5 R09: 0000000000ffff0a
[56796.150467] R10: 0000000000000000 R11: 0000000000000000 R12: ffff918d5027a1f0
[56796.150772] R13: 0000000000000000 R14: ffff918d5027a000 R15: ffff918d5027a1f0
[56796.151050] FS:  00007fecbc971280(0000) GS:ffff918d5cec0000(0000) knlGS:0000000000000000
[56796.151355] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56796.151645] CR2: 00007f7b13642000 CR3: 0000000013bea005 CR4: 00000000001606e0
[56796.151931] Call Trace:
[56796.152304]  bfq_insert_requests+0x10e/0xde0
[56796.152668]  ? mempool_alloc+0x6b/0x180
[56796.153003]  ? recalibrate_cpu_khz+0x10/0x10
[56796.153369]  ? ktime_get+0x38/0x90
[56796.153689]  ? generic_make_request_checks+0x316/0x650
[56796.154013]  blk_mq_sched_insert_requests+0x65/0xf0
[56796.154436]  ? sbitmap_get+0x69/0x140
[56796.154731]  blk_mq_flush_plug_list+0x214/0x2b0
[56796.155049]  blk_flush_plug_list+0xec/0x110
[56796.155375]  blk_mq_make_request+0x25d/0x5d0
[56796.155661]  generic_make_request+0xcf/0x320
[56796.157685]  submit_bio+0x42/0x1c0
[56796.158847]  btrfs_map_bio+0x2fc/0x390 [btrfs]
[56796.159500]  btrfs_submit_bio_hook+0x8c/0x170 [btrfs]
[56796.160107]  submit_one_bio+0x31/0x50 [btrfs]
[56796.160738]  extent_readpages+0x2e7/0x320 [btrfs]
[56796.161314]  read_pages+0x6b/0x1b0
[56796.161900]  ? 0xffffffff8b000000
[56796.162416]  __do_page_cache_readahead+0x168/0x1d0
[56796.162920]  filemap_fault+0x69d/0xa80
[56796.163514]  ? page_add_file_rmap+0x19/0x220
[56796.163994]  ? alloc_set_pte+0x126/0x680
[56796.164546]  ? xas_load+0x9/0x80
[56796.164964]  ? xas_find+0x164/0x1b0
[56796.165411]  ? filemap_map_pages+0x2a5/0x400
[56796.165798]  __do_fault+0x36/0x100
[56796.166247]  __handle_mm_fault+0x105a/0x15d0
[56796.166674]  ? __switch_to+0x10d/0x440
[56796.167050]  handle_mm_fault+0xc4/0x1f0
[56796.167508]  do_user_addr_fault+0x1f9/0x450
[56796.167955]  do_page_fault+0x31/0x110
[56796.168413]  page_fault+0x3e/0x50
[56796.168829] RIP: 0033:0x7192d60
[56796.169176] Code: Bad RIP value.
[56796.169569] RSP: 002b:00007ffeec6a8338 EFLAGS: 00010246
[56796.169929] RAX: 0000000000000000 RBX: 00007fecbc971280 RCX: 00000000000000f8
[56796.170305] RDX: 00000000000000f8 RSI: 00000000b790d8d8 RDI: 00000000b790d5d8
[56796.170648] RBP: 00007ffeec6a85c0 R08: 00000000b790d5b8 R09: 00000000000009f2
[56796.171000] R10: 0000000000000060 R11: 000000000000dddc R12: 0000000000000879
[56796.171379] R13: 0000000000000002 R14: 0000000000000001 R15: 0000000000000771
[56796.171720] Modules linked in: ip6table_mangle ip6table_nat ebtable_filter ebtables ip6table_filter ip6_tables nfnetlink bluetooth ecdh_generic ecc bridge stp llc ppdev parport_pc parport cfg80211 rfkill ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vmw_vsock_vmci_transport vsock xfs intel_rapl_msr intel_rapl_common sb_edac kvm_intel kvm irqbypass btrfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xor pktcdvd zstd_compress vmw_balloon raid6_pq intel_rapl_perf libcrc32c zstd_decompress joydev pcspkr i2c_piix4 vmw_vmci nfsd auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc ip_tables vmwgfx drm_kms_helper ttm crc32c_intel mptsas serio_raw drm scsi_transport_sas mptscsih vmxnet3 mptbase ata_generic pata_acpi target_core_mod vhost_net tun tap vhost fuse
[56820.257941] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kswapd0:160]
[56820.258449] Modules linked in: ip6table_mangle ip6table_nat ebtable_filter ebtables ip6table_filter ip6_tables nfnetlink bluetooth ecdh_generic ecc bridge stp llc ppdev parport_pc parport cfg80211 rfkill ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vmw_vsock_vmci_transport vsock xfs intel_rapl_msr intel_rapl_common sb_edac kvm_intel kvm irqbypass btrfs crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xor pktcdvd zstd_compress vmw_balloon raid6_pq intel_rapl_perf libcrc32c zstd_decompress joydev pcspkr i2c_piix4 vmw_vmci nfsd auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc ip_tables vmwgfx drm_kms_helper ttm crc32c_intel mptsas serio_raw drm scsi_transport_sas mptscsih vmxnet3 mptbase ata_generic pata_acpi target_core_mod vhost_net tun tap vhost fuse
[56820.263519] CPU: 0 PID: 160 Comm: kswapd0 Kdump: loaded Tainted: G      D           5.4.0-2.fc31.x86_64 #1
[56820.264783] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[56820.266140] RIP: 0010:smp_call_function_many+0x1fe/0x260
[56820.266848] Code: e8 07 19 87 00 3b 05 65 9a 6a 01 89 c7 0f 83 9a fe ff ff 48 63 c7 49 8b 0f 48 03 0c c5 80 39 41 8c 8b 41 18 a8 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 e0 af 82 8c 4c 89 f6 89 df
[56820.269012] RSP: 0018:ffffa2358059b9f8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[56820.269765] RAX: 0000000000000003 RBX: 0000000000029a40 RCX: ffff918d5ceee280
[56820.270519] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
[56820.271288] RBP: ffffffff8b07ccc0 R08: ffff918d5bd6cc70 R09: 0000000000000003
[56820.272049] R10: ffff918d5bd6c640 R11: 0000000000000002 R12: ffffffff8c008a00
[56820.272834] R13: 0000000000000001 R14: 0000000000000080 R15: ffff918d5ce29a80
[56820.273602] FS:  0000000000000000(0000) GS:ffff918d5ce00000(0000) knlGS:0000000000000000
[56820.274322] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56820.275072] CR2: 0000000005034f10 CR3: 0000000035f04006 CR4: 00000000001606f0
[56820.275871] Call Trace:
[56820.276712]  ? flush_tlb_func_common.constprop.0+0x250/0x250
[56820.277498]  ? flush_tlb_func_common.constprop.0+0x250/0x250
[56820.278214]  on_each_cpu_mask+0x24/0x60
[56820.278947]  ? x86_configure_nx+0x40/0x40
[56820.279653]  on_each_cpu_cond_mask+0xa5/0x130
[56820.280385]  arch_tlbbatch_flush+0x44/0x90
[56820.281091]  try_to_unmap_flush+0x26/0x40
[56820.281818]  shrink_page_list+0x3d7/0xdd0
[56820.282537]  shrink_inactive_list+0x1eb/0x3c0
[56820.283192]  shrink_node_memcg+0x235/0x7f0
[56820.283848]  shrink_node+0xc0/0x410
[56820.284475]  balance_pgdat+0x2db/0x530
[56820.285052]  kswapd+0x1fb/0x3c0
[56820.285675]  ? finish_wait+0x80/0x80
[56820.286248]  kthread+0xf9/0x130
[56820.286818]  ? balance_pgdat+0x530/0x530
[56820.287346]  ? kthread_park+0x90/0x90
[56820.287852]  ret_from_fork+0x35/0x40
[56821.196533] ---[ end trace ffe87571d7a36fe8 ]---
[56821.197034] RIP: 0010:bfq_init_rq+0x387/0xb80
[56821.197519] Code: 00 00 00 4c 89 f7 45 31 ed e8 45 f9 ff ff 49 89 c4 48 85 c0 0f 84 b2 06 00 00 4d 8d be f0 01 00 00 4d 39 e7 0f 85 98 fd ff ff <0f> 0b 4d 8d be f0 01 00 00 45 31 ed e9 87 fd ff ff 48 b8 53 48 41
[56821.198917] RSP: 0018:ffffa23581b976f0 EFLAGS: 00010046
[56821.199390] RAX: ffff918d5027a1f0 RBX: ffff918d50185400 RCX: 0000000000000000
[56821.199876] RDX: ffffa23581b97587 RSI: 0000000000000031 RDI: 0000000000000000
[56821.200355] RBP: ffff918d52ce83c0 R08: ffffa23581b976a5 R09: 0000000000ffff0a
[56821.200807] R10: 0000000000000000 R11: 0000000000000000 R12: ffff918d5027a1f0
[56821.201262] R13: 0000000000000000 R14: ffff918d5027a000 R15: ffff918d5027a1f0
[56821.201698] FS:  00007fecbc971280(0000) GS:ffff918d5cec0000(0000) knlGS:0000000000000000
[56821.202150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56821.202616] CR2: 0000000007192d36 CR3: 0000000013bea005 CR4: 00000000001606e0

crash> bt
PID: 0      TASK: ffffffff8c614780  CPU: 0   COMMAND: "swapper/0"
 #0 [ffffffff8c603e18] __schedule at ffffffff8ba14832
 #1 [ffffffff8c603eb0] schedule_idle at ffffffff8ba14f98
 #2 [ffffffff8c603ec0] do_idle at ffffffff8b11651f
 #3 [ffffffff8c603f10] cpu_startup_entry at ffffffff8b1167b9
 #4 [ffffffff8c603f20] start_kernel at ffffffff8ce5f205
 #5 [ffffffff8c603f50] secondary_startup_64 at ffffffff8b0000e6
Comment 27 Paolo Valente 2019-12-04 17:52:56 UTC
Created attachment 286179 [details]
fix check on oom in split, add more checks on oom merging
Comment 28 Paolo Valente 2019-12-04 17:54:39 UTC
(In reply to Patrick Dung from comment #26)
> Alright, managed to get the kernel log with the crash dump.
> 
> [56796.145958] ------------[ cut here ]------------
> [56796.146340] kernel BUG at block/bfq-iosched.c:6875!
> [56796.146699] invalid opcode: 0000 [#1] SMP PTI
> [56796.146973] CPU: 3 PID: 38322 Comm: ora_rvwr_fctts0 Kdump: loaded Not
> tainted 5.4.0-2.fc31.x86_64 #1
> [56796.147485] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 07/29/2019
> [56796.148201] RIP: 0010:bfq_init_rq+0x387/0xb80
> [56796.148497] Code: 00 00 00 4c 89 f7 45 31 ed e8 45 f9 ff ff 49 89 c4 48
> 85 c0 0f 84 b2 06 00 00 4d 8d be f0 01 00 00 4d 39 e7 0f 85 98 fd ff ff <0f>
> 0b 4d 8d be f0 01 00 00 45 31 ed e9 87 fd ff ff 48 b8 53 48 41
> [56796.149348] RSP: 0018:ffffa23581b976f0 EFLAGS: 00010046
> [56796.149625] RAX: ffff918d5027a1f0 RBX: ffff918d50185400 RCX:
> 0000000000000000
> [56796.149886] RDX: ffffa23581b97587 RSI: 0000000000000031 RDI:
> 0000000000000000
> [56796.150177] RBP: ffff918d52ce83c0 R08: ffffa23581b976a5 R09:
> 0000000000ffff0a
> [56796.150467] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff918d5027a1f0
> [56796.150772] R13: 0000000000000000 R14: ffff918d5027a000 R15:
> ffff918d5027a1f0
> [56796.151050] FS:  00007fecbc971280(0000) GS:ffff918d5cec0000(0000)
> knlGS:0000000000000000
> [56796.151355] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56796.151645] CR2: 00007f7b13642000 CR3: 0000000013bea005 CR4:
> 00000000001606e0
> [56796.151931] Call Trace:
> [56796.152304]  bfq_insert_requests+0x10e/0xde0
> [56796.152668]  ? mempool_alloc+0x6b/0x180
> [56796.153003]  ? recalibrate_cpu_khz+0x10/0x10
> [56796.153369]  ? ktime_get+0x38/0x90
> [56796.153689]  ? generic_make_request_checks+0x316/0x650
> [56796.154013]  blk_mq_sched_insert_requests+0x65/0xf0
> [56796.154436]  ? sbitmap_get+0x69/0x140
> [56796.154731]  blk_mq_flush_plug_list+0x214/0x2b0
> [56796.155049]  blk_flush_plug_list+0xec/0x110
> [56796.155375]  blk_mq_make_request+0x25d/0x5d0
> [56796.155661]  generic_make_request+0xcf/0x320
> [56796.157685]  submit_bio+0x42/0x1c0
> [56796.158847]  btrfs_map_bio+0x2fc/0x390 [btrfs]
> [56796.159500]  btrfs_submit_bio_hook+0x8c/0x170 [btrfs]
> [56796.160107]  submit_one_bio+0x31/0x50 [btrfs]
> [56796.160738]  extent_readpages+0x2e7/0x320 [btrfs]
> [56796.161314]  read_pages+0x6b/0x1b0
> [56796.161900]  ? 0xffffffff8b000000
> [56796.162416]  __do_page_cache_readahead+0x168/0x1d0
> [56796.162920]  filemap_fault+0x69d/0xa80
> [56796.163514]  ? page_add_file_rmap+0x19/0x220
> [56796.163994]  ? alloc_set_pte+0x126/0x680
> [56796.164546]  ? xas_load+0x9/0x80
> [56796.164964]  ? xas_find+0x164/0x1b0
> [56796.165411]  ? filemap_map_pages+0x2a5/0x400
> [56796.165798]  __do_fault+0x36/0x100
> [56796.166247]  __handle_mm_fault+0x105a/0x15d0
> [56796.166674]  ? __switch_to+0x10d/0x440
> [56796.167050]  handle_mm_fault+0xc4/0x1f0
> [56796.167508]  do_user_addr_fault+0x1f9/0x450
> [56796.167955]  do_page_fault+0x31/0x110
> [56796.168413]  page_fault+0x3e/0x50
> [56796.168829] RIP: 0033:0x7192d60
> [56796.169176] Code: Bad RIP value.
> [56796.169569] RSP: 002b:00007ffeec6a8338 EFLAGS: 00010246
> [56796.169929] RAX: 0000000000000000 RBX: 00007fecbc971280 RCX:
> 00000000000000f8
> [56796.170305] RDX: 00000000000000f8 RSI: 00000000b790d8d8 RDI:
> 00000000b790d5d8
> [56796.170648] RBP: 00007ffeec6a85c0 R08: 00000000b790d5b8 R09:
> 00000000000009f2
> [56796.171000] R10: 0000000000000060 R11: 000000000000dddc R12:
> 0000000000000879
> [56796.171379] R13: 0000000000000002 R14: 0000000000000001 R15:
> 0000000000000771
> [56796.171720] Modules linked in: ip6table_mangle ip6table_nat
> ebtable_filter ebtables ip6table_filter ip6_tables nfnetlink bluetooth
> ecdh_generic ecc bridge stp llc ppdev parport_pc parport cfg80211 rfkill
> ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_CHECKSUM
> iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 vmw_vsock_vmci_transport vsock xfs intel_rapl_msr
> intel_rapl_common sb_edac kvm_intel kvm irqbypass btrfs crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel xor pktcdvd zstd_compress vmw_balloon
> raid6_pq intel_rapl_perf libcrc32c zstd_decompress joydev pcspkr i2c_piix4
> vmw_vmci nfsd auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc ip_tables
> vmwgfx drm_kms_helper ttm crc32c_intel mptsas serio_raw drm
> scsi_transport_sas mptscsih vmxnet3 mptbase ata_generic pata_acpi
> target_core_mod vhost_net tun tap vhost fuse
> [56820.257941] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> [kswapd0:160]
> [56820.258449] Modules linked in: ip6table_mangle ip6table_nat
> ebtable_filter ebtables ip6table_filter ip6_tables nfnetlink bluetooth
> ecdh_generic ecc bridge stp llc ppdev parport_pc parport cfg80211 rfkill
> ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_CHECKSUM
> iptable_mangle xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 vmw_vsock_vmci_transport vsock xfs intel_rapl_msr
> intel_rapl_common sb_edac kvm_intel kvm irqbypass btrfs crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel xor pktcdvd zstd_compress vmw_balloon
> raid6_pq intel_rapl_perf libcrc32c zstd_decompress joydev pcspkr i2c_piix4
> vmw_vmci nfsd auth_rpcgss nfs_acl lockd binfmt_misc grace sunrpc ip_tables
> vmwgfx drm_kms_helper ttm crc32c_intel mptsas serio_raw drm
> scsi_transport_sas mptscsih vmxnet3 mptbase ata_generic pata_acpi
> target_core_mod vhost_net tun tap vhost fuse
> [56820.263519] CPU: 0 PID: 160 Comm: kswapd0 Kdump: loaded Tainted: G      D
> 5.4.0-2.fc31.x86_64 #1
> [56820.264783] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
> Desktop Reference Platform, BIOS 6.00 07/29/2019
> [56820.266140] RIP: 0010:smp_call_function_many+0x1fe/0x260
> [56820.266848] Code: e8 07 19 87 00 3b 05 65 9a 6a 01 89 c7 0f 83 9a fe ff
> ff 48 63 c7 49 8b 0f 48 03 0c c5 80 39 41 8c 8b 41 18 a8 01 74 0a f3 90 <8b>
> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 e0 af 82 8c 4c 89 f6 89 df
> [56820.269012] RSP: 0018:ffffa2358059b9f8 EFLAGS: 00000202 ORIG_RAX:
> ffffffffffffff13
> [56820.269765] RAX: 0000000000000003 RBX: 0000000000029a40 RCX:
> ffff918d5ceee280
> [56820.270519] RDX: 0000000000000001 RSI: 0000000000000000 RDI:
> 0000000000000003
> [56820.271288] RBP: ffffffff8b07ccc0 R08: ffff918d5bd6cc70 R09:
> 0000000000000003
> [56820.272049] R10: ffff918d5bd6c640 R11: 0000000000000002 R12:
> ffffffff8c008a00
> [56820.272834] R13: 0000000000000001 R14: 0000000000000080 R15:
> ffff918d5ce29a80
> [56820.273602] FS:  0000000000000000(0000) GS:ffff918d5ce00000(0000)
> knlGS:0000000000000000
> [56820.274322] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56820.275072] CR2: 0000000005034f10 CR3: 0000000035f04006 CR4:
> 00000000001606f0
> [56820.275871] Call Trace:
> [56820.276712]  ? flush_tlb_func_common.constprop.0+0x250/0x250
> [56820.277498]  ? flush_tlb_func_common.constprop.0+0x250/0x250
> [56820.278214]  on_each_cpu_mask+0x24/0x60
> [56820.278947]  ? x86_configure_nx+0x40/0x40
> [56820.279653]  on_each_cpu_cond_mask+0xa5/0x130
> [56820.280385]  arch_tlbbatch_flush+0x44/0x90
> [56820.281091]  try_to_unmap_flush+0x26/0x40
> [56820.281818]  shrink_page_list+0x3d7/0xdd0
> [56820.282537]  shrink_inactive_list+0x1eb/0x3c0
> [56820.283192]  shrink_node_memcg+0x235/0x7f0
> [56820.283848]  shrink_node+0xc0/0x410
> [56820.284475]  balance_pgdat+0x2db/0x530
> [56820.285052]  kswapd+0x1fb/0x3c0
> [56820.285675]  ? finish_wait+0x80/0x80
> [56820.286248]  kthread+0xf9/0x130
> [56820.286818]  ? balance_pgdat+0x530/0x530
> [56820.287346]  ? kthread_park+0x90/0x90
> [56820.287852]  ret_from_fork+0x35/0x40
> [56821.196533] ---[ end trace ffe87571d7a36fe8 ]---
> [56821.197034] RIP: 0010:bfq_init_rq+0x387/0xb80
> [56821.197519] Code: 00 00 00 4c 89 f7 45 31 ed e8 45 f9 ff ff 49 89 c4 48
> 85 c0 0f 84 b2 06 00 00 4d 8d be f0 01 00 00 4d 39 e7 0f 85 98 fd ff ff <0f>
> 0b 4d 8d be f0 01 00 00 45 31 ed e9 87 fd ff ff 48 b8 53 48 41
> [56821.198917] RSP: 0018:ffffa23581b976f0 EFLAGS: 00010046
> [56821.199390] RAX: ffff918d5027a1f0 RBX: ffff918d50185400 RCX:
> 0000000000000000
> [56821.199876] RDX: ffffa23581b97587 RSI: 0000000000000031 RDI:
> 0000000000000000
> [56821.200355] RBP: ffff918d52ce83c0 R08: ffffa23581b976a5 R09:
> 0000000000ffff0a
> [56821.200807] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff918d5027a1f0
> [56821.201262] R13: 0000000000000000 R14: ffff918d5027a000 R15:
> ffff918d5027a1f0
> [56821.201698] FS:  00007fecbc971280(0000) GS:ffff918d5cec0000(0000)
> knlGS:0000000000000000
> [56821.202150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56821.202616] CR2: 0000000007192d36 CR3: 0000000013bea005 CR4:
> 00000000001606e0
> 
> crash> bt
> PID: 0      TASK: ffffffff8c614780  CPU: 0   COMMAND: "swapper/0"
>  #0 [ffffffff8c603e18] __schedule at ffffffff8ba14832
>  #1 [ffffffff8c603eb0] schedule_idle at ffffffff8ba14f98
>  #2 [ffffffff8c603ec0] do_idle at ffffffff8b11651f
>  #3 [ffffffff8c603f10] cpu_startup_entry at ffffffff8b1167b9
>  #4 [ffffffff8c603f20] start_kernel at ffffffff8ce5f205
>  #5 [ffffffff8c603f50] secondary_startup_64 at ffffffff8b0000e6

Thank you very much for this test. Could you please apply also the patch "fix check on oom in split, add more checks on oom merging" which I have just attached here
https://bugzilla.kernel.org/attachment.cgi?id=286179&action=diff
and retry in the same VM with the same workload?
Comment 29 Patrick Dung 2019-12-04 18:10:56 UTC
(In reply to Paolo Valente from comment #28)
> Thank you very much for this test. Could you please apply also the patch
> "fix check on oom in split, add more checks on oom merging" which I have
> just attached here
> https://bugzilla.kernel.org/attachment.cgi?id=286179&action=diff
> and retry in the same VM with the same workload?

The patch applied but it would take sometime for me to compile the new kernel and retest with the application loading again.
Comment 30 Patrick Dung 2019-12-04 20:46:09 UTC
@Paolo

I had compiled with the new kernel and tried to load the server like before for three times. It does not crash at the moment.

I would leave it running for a few days. I would get more info if it would crash.
Comment 31 Patrick Dung 2019-12-14 20:19:47 UTC
The VM with the latest patches has been running for about ten days without crashes. Looks ok now.
Comment 32 Patrick Dung 2019-12-28 13:30:45 UTC
@Paolo

Regarding the problem or crashes that I had reported, isthe bug fixed in the official Linux kernel 5.4/5.5 branches? Thanks.
Comment 33 Paolo Valente 2019-12-30 17:51:40 UTC
(In reply to Patrick Dung from comment #32)
> @Paolo
> 
> Regarding the problem or crashes that I had reported, isthe bug fixed in the
> official Linux kernel 5.4/5.5 branches? Thanks.

Patch not yet submitted, I'm on vacation, sorry. I should make it by the first half of january.
Comment 34 Chris Evich 2020-01-07 17:13:10 UTC
I'm coming here from https://bugzilla.redhat.com/show_bug.cgi?id=1767539 (same issue) where I originally reproduced (reliably) this bug by running the podman integration tests (coincidence?).  I just did again after updating to  5.4.8-200.fc31.x86_64.  I'll try applying patches as recommended above and report back if the problem reproduces for me.
Comment 35 Chris Evich 2020-01-07 18:15:29 UTC
Created attachment 286665 [details]
Tarball of six patches
Comment 36 Chris Evich 2020-01-07 18:45:12 UTC
Note for the Fedora kernel: I did not use '0002-Makefile-add-bfq-version-suffix.patch' in kernel.spec
Comment 37 Chris Evich 2020-01-08 16:19:12 UTC
I'm getting a new oops using a 5.4.8 kernel I built with the 5 patches in the attached tarball (minus the Makefile one).  I'm going to double-check I booted the right kernel, and run again.  I'll report the oops message if it fails again...
Comment 38 Chris Evich 2020-01-08 20:12:17 UTC
...confirmed, it's still oops'ing for me using the latest F31 5.4.8 kernel with patches applied.  Once it finishes spewing gobbledygook on the serial console, I'll attach what I'm seeing.
Comment 39 Chris Evich 2020-01-08 20:21:05 UTC
Created attachment 286687 [details]
serial console output of traces from kernel

The last podman integration test running was 'podman exec simple working directory test'.  Command from the test at time of failure was:

Running: /var/tmp/go/src/github.com/containers/libpod/bin/podman --storage-opt vfs.imagestore=/tmp/podman/imagecachedir --root /tmp/podman_test475333727/crio --runroot /tmp/podman_test475333727/crio-run --runtime /usr/bin/crun --conmon /usr/bin/conmon --cni-config-dir /etc/cni/net.d --cgroup-manager systemd --tmpdir /tmp/podman_test475333727 --events-backend file --storage-driver vfs rm -fa
83e0d6293db0cfe3fc5a92485e9a7dee80f9bd42dd91920b66503dab7980f3f3
Comment 40 Paolo Valente 2020-01-09 08:20:31 UTC
(In reply to Chris Evich from comment #39)
> Created attachment 286687 [details]
> serial console output of traces from kernel
> 
> The last podman integration test running was 'podman exec simple working
> directory test'.  Command from the test at time of failure was:
> 
> Running: /var/tmp/go/src/github.com/containers/libpod/bin/podman
> --storage-opt vfs.imagestore=/tmp/podman/imagecachedir --root
> /tmp/podman_test475333727/crio --runroot /tmp/podman_test475333727/crio-run
> --runtime /usr/bin/crun --conmon /usr/bin/conmon --cni-config-dir
> /etc/cni/net.d --cgroup-manager systemd --tmpdir /tmp/podman_test475333727
> --events-backend file --storage-driver vfs rm -fa
> 83e0d6293db0cfe3fc5a92485e9a7dee80f9bd42dd91920b66503dab7980f3f3

Could you please attach also the full kernel log before the first oops? And the output of uname -a.
Comment 41 Paolo Valente 2020-01-09 16:47:53 UTC
Created attachment 286717 [details]
Tentative fix, to apply on top of all the other patches

Please try again with this fix applied on top of all other patches.
Comment 42 Chris Evich 2020-01-09 21:13:29 UTC
well...err...hrmmm...yeah.  So with that patch in built in the kernel (build took 1.5 hours), running the reproducer run for a good long while.  However, it ultimately results in the system "hanging".  I can't re-connect over ssh, and the getty on the serial-console is unresponsive.

I can still see the historical contents on the serial-console, so I won't hard-reset the VM until you say so.  Looking through it myself, it appears uninteresting IMHO, mostly "normal" looking boot and effects of some of our tests running (the reproducer).  Paolo, let me know if you want me to copy paste anything, like the system boot messages or whatnot.
Comment 43 Paolo Valente 2020-01-10 06:56:18 UTC
(In reply to Chris Evich from comment #42)
> well...err...hrmmm...yeah.  So with that patch in built in the kernel (build
> took 1.5 hours), running the reproducer run for a good long while.

Great, then we may be on the right track.

>  However,
> it ultimately results in the system "hanging".  I can't re-connect over ssh,
> and the getty on the serial-console is unresponsive.
> 

The fix was likely incomplete. I'm preparing one that should cover all cases.

> I can still see the historical contents on the serial-console, so I won't
> hard-reset the VM until you say so.  Looking through it myself, it appears
> uninteresting IMHO, mostly "normal" looking boot and effects of some of our
> tests running (the reproducer).  Paolo, let me know if you want me to copy
> paste anything, like the system boot messages or whatnot.

No no, that stuff is uninteresting as you say (now that we know that the VM is running the right kernel).

I'm about to upload a new version of the last tentative fix. So the last version is to be discarded, and only the new one is to be applied.
Comment 44 Paolo Valente 2020-01-10 07:18:25 UTC
Created attachment 286731 [details]
Tentative fix V2, to apply on top of the other patches

This is a new version of my last tentative fix. This new version replaces the previous one.
Comment 45 Chris Evich 2020-01-10 14:28:51 UTC
Okay, I will start building with that applied after all prior patches.  Yesterday it took about 1.5 hours.  When it finishes, I'll install it and re-run the reproducer.  Also, I found a way to get you access to the serial-console.  Ping me on IRC for details later.
Comment 46 Chris Evich 2020-01-10 15:20:17 UTC
Uh-oh, the patch from comment 44 does not apply cleanly after applying the patch from comment 41.  Assuming this is my misunderstanding (since the patch name is the same).  Attempting to apply it ontop of the patches from the tarball instead.
Comment 47 Chris Evich 2020-01-10 19:07:07 UTC
With the latest patch applied, I'm seeing CPU soft-lock messages over and over while the reproducer is not making any progress.  However I don't see any big OOPS, so small progress forward?

Jan 10 13:53:11 cevich-fedora-31-libpod-6228273469587456 systemd[1]: Started libcrun container.
[  361.511728] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [3:25003]
[  361.518569] Modules linked in: xt_multiport xt_nat xt_addrtype xt_mark ip6table_nat ip6_tables xt_conntrack iptable
_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c veth bridge s
tp llc rfkill snd_pcsp snd_seq snd_seq_device intel_rapl_msr intel_rapl_common snd_pcm sb_edac i2c_piix4 virtio_net ne
t_failover failover snd_timer pvpanic snd soundcore intel_rapl_perf ip_tables crct10dif_pclmul crc32_pclmul crc32c_int
el ghash_clmulni_intel serio_raw virtio_scsi fuse
[  361.565370] CPU: 1 PID: 25003 Comm: 3 Not tainted 5.4.8-203.bz205447.fc31.x86_64 #1
[  361.573499] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  361.583081] RIP: 0010:smp_call_function_many+0x1fe/0x260
[  361.588565] Code: e8 47 3c 87 00 3b 05 a5 98 6a 01 89 c7 0f 83 9a fe ff ff 48 63 c7 49 8b 0f 48 03 0c c5 80 69 41 9
1 8b 41 18 a8 01 74 0a f3 90 <8b> 51 18 83 e2 01 75 f6 eb c8 48 c7 c2 20 b2 82 91 4c 89 f6 89 df
[  361.608152] RSP: 0018:ffffa264814dfb28 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[  361.616338] RAX: 0000000000000003 RBX: 0000000000029a40 RCX: ffff9348f0baf2a0
[  361.623644] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000003
[  361.630963] RBP: ffffffff9007c160 R08: ffff9348eec27960 R09: ffff9348eec27208
[  361.638283] R10: ffff9348eec27960 R11: 0000000000000000 R12: 0000000000000000
[  361.646730] R13: 0000000000000001 R14: 0000000000000004 R15: ffff9348f0aa9a80
[  361.654080] FS:  00007fede19ccc00(0000) GS:ffff9348f0a80000(0000) knlGS:0000000000000000
[  361.662348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  361.668265] CR2: 00005650eef9a988 CR3: 00000003b36b0006 CR4: 00000000001606e0
[  361.675600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  361.682919] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  361.690312] Call Trace:
[  361.692951]  ? tlbflush_read_file+0x70/0x70
[  361.697319]  on_each_cpu+0x28/0x50
[  361.700897]  __purge_vmap_area_lazy+0x6d/0x6f0
[  361.705517]  _vm_unmap_aliases+0xf1/0x130
[  361.709702]  change_page_attr_set_clr+0xc5/0x1f0
[  361.714496]  set_memory_ro+0x26/0x30
[  361.718259]  bpf_int_jit_compile+0x2d2/0x32e
[  361.722705]  ? kmem_cache_alloc_trace+0x162/0x220
[  361.727580]  bpf_prog_select_runtime+0xcd/0x150
[  361.732374]  bpf_prog_load+0x3ce/0x660
[  361.736301]  __do_sys_bpf+0x105/0x1820
[  361.740236]  ? security_task_setrlimit+0x3c/0x50
[  361.754453]  ? do_prlimit+0xc4/0x1f0
[  361.758214]  do_syscall_64+0x5b/0x1a0
[  361.762152]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  361.767388] RIP: 0033:0x7fede1c8d1ad
[  361.772097] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab 5c 0c 00 f7 d8 64 89 01 48
[  361.791915] RSP: 002b:00007ffc01d3a5f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[  361.799672] RAX: ffffffffffffffda RBX: 00005650eef926e0 RCX: 00007fede1c8d1ad
[  361.806997] RDX: 0000000000000070 RSI: 00007ffc01d3a620 RDI: 0000000000000005
[  361.814494] RBP: 0000000000000006 R08: 0000000000000007 R09: 0000000000000000
[  361.821922] R10: 0000000000000002 R11: 0000000000000246 R12: 00007ffc01d3ace0
[  361.829242] R13: 0000000000000000 R14: 00007ffc01d3ace0 R15: 00007ffc01d3a620

...more of the same...

When I try to ssh in, the problem is widespread, as I got:

  933.342839] CPU: 3 PID: 598 Comm: systemd-logind Tainted: G             L    5.4.8-203.bz205447.fc31.x86_64 #1
[  933.342840] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  933.342841] RIP: 0010:bfq_insert+0x2c/0x70
[  933.342842] Code: 44 00 00 48 83 7e 30 00 75 62 55 48 89 fa 48 89 fd 53 48 8b 07 48 89 f3 48 85 c0 74 27 48 8b 7e 28 eb 03 48 89 c8 48 8b 48 28 <48> 8d 70 10 48 8d 50 08 48 29 f9 48 85 c9 48 0f 4f d6 48 8b 0a 48
[  933.342842] RSP: 0000:ffffa2648010cd50 EFLAGS: 00000086
[  933.342844] RAX: ffff9348b55bb8a0 RBX: ffff9348e701af40 RCX: ffff9348b55bb0c0
[  933.342844] RDX: ffff9348b55bb8a8 RSI: ffff9348b55bb8b0 RDI: 0000004a506853b5
[  933.342845] RBP: ffff9348e7e78158 R08: ffffa2648010cd1f R09: 0000000000ffff0a
[  933.342845] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000001
[  933.342846] R13: ffff9348e7e78150 R14: 0000000000000001 R15: ffff9348e701af40
[  933.342847] FS:  00007f891a810980(0000) GS:ffff9348f0b80000(0000) knlGS:0000000000000000
[  933.342847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  933.342848] CR2: 00007f28c1a0106d CR3: 00000003e9050003 CR4: 00000000001606e0
[  933.342848] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  933.342849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  933.342849] Call Trace:
[  933.342849]  <IRQ>
[  933.342850]  __bfq_deactivate_entity+0x13f/0x1d0
[  933.342850]  bfq_deactivate_entity+0x8a/0x2b0
[  933.342851]  bfq_del_bfqq_busy+0xf2/0x190
[  933.342851]  __bfq_bfqq_expire+0xa0/0x120
[  933.342852]  bfq_bfqq_expire+0x3b0/0xb80
[  933.342852]  ? snprintf+0x49/0x60
[  933.342852]  bfq_idle_slice_timer+0xd4/0x1b0
[  933.342853]  ? bfq_add_request+0xa20/0xa20
[  933.342853]  __hrtimer_run_queues+0xf6/0x270
[  933.342854]  hrtimer_interrupt+0x10e/0x240
[  933.342854]  smp_apic_timer_interrupt+0x6c/0x130
[  933.342855]  apic_timer_interrupt+0xf/0x20
[  933.342855]  </IRQ>
[  933.342855] RIP: 0033:0x7f891b81f21f
[  933.342857] Code: 7f 44 17 c0 c5 fe 7f 47 40 c5 fe 7f 44 17 a0 c5 fe 7f 47 60 c5 fe 7f 44 17 80 48 01 fa 48 83 e2 80 48 39 d1 74 ba c5 fd 7f 01 <c5> fd 7f 41 20 c5 fd 7f 41 40 c5 fd 7f 41 60 48 81 c1 80 00 00 00
[  933.342857] RSP: 002b:00007fffd3330128 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13
[  933.342858] RAX: 000055bd223376b0 RBX: 0000000000007960 RCX: 000055bd22337700
[  933.342859] RDX: 000055bd22337a00 RSI: 0000000000000000 RDI: 000055bd223376b0
[  933.342859] RBP: 00007f891b87c9e0 R08: 000055bd223376b0 R09: 000000000000041c
[  933.342860] R10: 0000000000000370 R11: 00007f891b87ca40 R12: 0000000000000360
[  933.342861] R13: 000055bd223376a0 R14: 000055bd22317d40 R15: 000055bd22337280
[  933.342863] rcu: rcu_sched kthread starved for 600017 jiffies! g102005 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=3
[  933.605054] rcu: RCU grace-period kthread stack dump:
[  933.610308] rcu_sched       R  running task        0    11      2 0x80004000
[  933.617639] Call Trace:
[  933.620269]  ? __schedule+0x2d2/0x730
[  933.624108]  schedule+0x39/0xa0
[  933.627426]  schedule_timeout+0x15e/0x300
[  933.631612]  ? __next_timer_interrupt+0xd0/0xd0
[  933.636372]  rcu_gp_kthread+0x4b9/0xa30
[  933.640419]  kthread+0xf9/0x130
[  933.643741]  ? rcu_accelerate_cbs_unlocked+0x80/0x80
[  933.648887]  ? kthread_park+0x90/0x90
[  933.654468]  ret_from_fork+0x35/0x40
Comment 48 Chris Evich 2020-01-14 14:11:51 UTC
Paolo,

Would access to the serial-port of this machine help?  Last I checked, it's still spewing out traces similar to above.
Comment 49 Chris Evich 2020-01-15 20:14:05 UTC
Paolo, kernel build 204 finished (with customized config).  I installed it onto reproducer and rebooted.  Caught this on boot (so I think it worked?):

[    6.252253] *************************************************************
[    6.253251] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
[    6.254251] **                                                         **
[    6.255251] **  IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL  **
[    6.256253] **                                                         **
[    6.257251] ** This means that this kernel is built to expose internal **
[    6.258251] ** IOMMU data structures, which may compromise security on **
[    6.259250] ** your system.                                            **
[    6.260251] **                                                         **
[    6.261251] ** If you see this message and you are not debugging the   **
[    6.262251] ** kernel, report this immediately to your vendor!         **
[    6.263251] **                                                         **
[    6.265246] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
[    6.266250] *************************************************************

Assuming so, I'll fire off the reproducer...
Comment 50 Paolo Valente 2020-01-16 07:41:57 UTC
(In reply to Chris Evich from comment #49)
> Paolo, kernel build 204 finished (with customized config).  I installed it
> onto reproducer and rebooted.  Caught this on boot (so I think it worked?):
> 

It worked. And I found the culprit I expected; see log snippet below. Now I need to rebuild to test a fix. But the procedure is not clear to me. You suggest, as a first step:
rm -rf /root/kernel; mkdir /root/kernel
Why cannot I use my existing sources and kernel.spec? ping me when you are available on IRC.

[  560.763109] ==================================================================
[  560.771248] BUG: KASAN: use-after-free in rb_erase+0x466/0x8c0
[  560.777405] Write of size 8 at addr ffff8882910f3498 by task e2e.test/3287

[  560.786305] CPU: 1 PID: 3287 Comm: e2e.test Not tainted 5.4.8-204.bz205447.fc31.x86_64 #2
[  560.794807] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  560.804420] Call Trace:
[  560.807127]  dump_stack+0xb8/0x110
[  560.810858]  print_address_description.constprop.0+0x1b/0x240
[  560.816895]  ? rb_erase+0x466/0x8c0
[  560.820639]  ? rb_erase+0x466/0x8c0
[  560.824385]  __kasan_report.cold+0x37/0x7b
[  560.828758]  ? rb_erase+0x466/0x8c0
[  560.832514]  kasan_report+0xe/0x20
[  560.836169]  rb_erase+0x466/0x8c0
[  560.839808]  ? bfq_update_next_in_service+0xdc0/0xdc0
[  560.845165]  bfq_idle_extract+0x9d/0x1b0
[  560.849359]  bfq_activate_requeue_entity+0x3ee/0x5b0
[  560.854607]  ? bfq_entity_service_tree+0x110/0x270
[  560.859680]  ? __bfq_requeue_entity+0xd0/0xd0
[  560.864349]  bfq_activate_bfqq+0x6c/0xa0
[  560.868548]  bfq_add_bfqq_busy+0xf0/0x26d
[  560.872845]  ? bfq_del_bfqq_busy+0x280/0x280
[  560.877439]  bfq_bfqq_handle_idle_busy_switch.isra.0+0x570/0xfa0
[  560.883788]  ? bfq_set_next_ioprio_data.isra.0.cold+0x66/0x66
[  560.889766]  ? bfq_dispatch_request+0x1880/0x1880
[  560.894734]  ? bfq_init_queue+0x6d0/0x6d0
[  560.899012]  ? rb_insert_color+0x26b/0x3e0
[  560.903426]  bfq_add_request+0xd48/0x11b0
[  560.907700]  ? snprintf+0x9e/0xd0
[  560.911223]  ? bfq_bfqq_handle_idle_busy_switch.isra.0+0xfa0/0xfa0
[  560.917647]  ? bfq_insert_requests+0x723/0x16b0
[  560.922408]  bfq_insert_requests+0xba7/0x16b0
[  560.927149]  ? lock_downgrade+0x360/0x360
[  560.931435]  ? bfq_request_merged+0x330/0x330
[  560.936059]  ? lock_acquire+0xe5/0x210
[  560.940144]  ? blk_mq_sched_insert_requests+0x5/0x350
[  560.945463]  blk_mq_sched_insert_requests+0x12c/0x350
[  560.950808]  blk_mq_flush_plug_list+0x49e/0x660
[  560.955593]  ? blk_mq_insert_requests+0x2c0/0x2c0
[  560.960669]  blk_flush_plug_list+0x20f/0x250
[  560.965235]  ? rcu_read_lock_bh_held+0xc0/0xc0
[  560.969905]  ? blk_insert_cloned_request+0x1c0/0x1c0
[  560.975182]  ? kmem_cache_free+0x1a3/0x400
[  560.979568]  blk_finish_plug+0x3c/0x54
[  560.983645]  ext4_writepages+0xace/0x1cd0
[  560.988010]  ? ext4_mark_inode_dirty+0x470/0x470
[  560.993019]  ? kvm_sched_clock_read+0x14/0x30
[  560.997663]  ? sched_clock+0x5/0x10
[  561.001403]  ? mark_held_locks+0x23/0xa0
[  561.005597]  ? __call_rcu+0x4e1/0x840
[  561.010045]  ? do_writepages+0xa5/0x1a0
[  561.014089]  ? ext4_mark_inode_dirty+0x470/0x470
[  561.018977]  do_writepages+0xa5/0x1a0
[  561.022853]  ? page_writeback_cpu_online+0x10/0x10
[  561.027915]  ? invoke_rcu_core+0xe0/0xe0
[  561.032167]  __filemap_fdatawrite_range+0x193/0x1f0
[  561.037314]  ? delete_from_page_cache_batch+0x670/0x670
[  561.042968]  ? rcu_read_lock_sched_held+0xa1/0xe0
[  561.047976]  file_write_and_wait_range+0xaa/0x100
[  561.052953]  ext4_sync_file+0x14e/0x900
[  561.057155]  do_fsync+0x38/0x70
[  561.060524]  __x64_sys_fsync+0x1d/0x30
[  561.064504]  do_syscall_64+0x74/0xd0
[  561.068383]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  561.073720] RIP: 0033:0x4c03c0
[  561.077031] Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
[  561.096233] RSP: 002b:000000c0004d58b0 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
[  561.104119] RAX: ffffffffffffffda RBX: 000000c000076500 RCX: 00000000004c03c0
[  561.111553] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007
[  561.118973] RBP: 000000c0004d58f0 R08: 0000000000000000 R09: 0000000000000000
[  561.126380] R10: 0000000000000000 R11: 0000000000000202 R12: 00000000000000f9
[  561.133733] R13: 0000000000000000 R14: 00000000018296c1 R15: 0000000000000000

[  561.142807] Allocated by task 4543:
[  561.146645]  save_stack+0x1b/0x80
[  561.150154]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  561.155180]  bfq_pd_alloc+0x72/0xa0
[  561.158860]  blkg_alloc+0x310/0x3d0
[  561.162573]  blkg_create+0x701/0xa00
[  561.166417]  __blkg_lookup_create+0x241/0x2b0
[  561.170972]  blkg_lookup_create+0xee/0x120
[  561.175322]  bio_associate_blkg_from_css+0x91/0x150
[  561.180405]  bio_associate_blkg+0x7e/0x1d0
[  561.184722]  ext4_mpage_readpages+0xace/0x1020
[  561.189373]  read_pages+0xef/0x340
[  561.192966]  __do_page_cache_readahead+0x22b/0x310
[  561.198010]  ondemand_readahead+0x2da/0x630
[  561.202406]  generic_file_read_iter+0x964/0x1340
[  561.207257]  new_sync_read+0x282/0x3a0
[  561.211214]  vfs_read+0xf9/0x1e0
[  561.214639]  kernel_read+0x78/0xc0
[  561.218329]  prepare_binprm+0x16f/0x350
[  561.222703]  __do_execve_file.isra.0+0x8c4/0x10f0
[  561.227632]  __x64_sys_execve+0x54/0x60
[  561.231722]  do_syscall_64+0x74/0xd0
[  561.235524]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[  561.242592] Freed by task 5626:
[  561.245983]  save_stack+0x1b/0x80
[  561.249937]  __kasan_slab_free+0x12c/0x170
[  561.254619]  kfree+0x124/0x410
[  561.257898]  blkg_free.part.0+0x4f/0xc0
[  561.261962]  rcu_do_batch+0x401/0x7a0
[  561.266013]  rcu_core+0x3f7/0x600
[  561.269643]  __do_softirq+0x130/0x5c1




> [    6.252253] *************************************************************
> [    6.253251] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
> [    6.254251] **                                                         **
> [    6.255251] **  IOMMU DebugFS SUPPORT HAS BEEN ENABLED IN THIS KERNEL  **
> [    6.256253] **                                                         **
> [    6.257251] ** This means that this kernel is built to expose internal **
> [    6.258251] ** IOMMU data structures, which may compromise security on **
> [    6.259250] ** your system.                                            **
> [    6.260251] **                                                         **
> [    6.261251] ** If you see this message and you are not debugging the   **
> [    6.262251] ** kernel, report this immediately to your vendor!         **
> [    6.263251] **                                                         **
> [    6.265246] **     NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE    **
> [    6.266250] *************************************************************
> 
> Assuming so, I'll fire off the reproducer...
Comment 51 Paolo Valente 2020-01-21 16:18:31 UTC
Created attachment 286933 [details]
New patch covering a possible bug hiding in group change

Plese add this patch on top of the others.
Comment 52 Paolo Valente 2020-01-23 18:18:06 UTC
Created attachment 286953 [details]
tentative patch series to check and fix group references

New patch series, which adds missing group references (the first patch is actually a revert of one of the previous tentative fixes). Please apply on top of all the other patches, and cross your fingers ...
Comment 53 Chris Evich 2020-01-23 20:21:45 UTC
great, tossing it into the mix and starting a build...
Comment 54 Chris Evich 2020-01-23 22:39:07 UTC
...build finished, installing and then will begin tests after hooking up screen on the build VM to the serial-console of the test VM.
Comment 55 Chris Evich 2020-01-24 15:40:05 UTC
Serial console output is slightly garbled, but there's certainly a trace followed by a reboot.  That's good news, if it's rebooting that makes me think maybe we can get it to dump a core.  Let me know if that's needed and I'll try to set it up.

Jan 23 17:57:31 cevich-fedora-31-libpod-6228273469587456 NetworkManager[653]: <info>  [1579820251.2243] device (cni-podman0): carrier: link connected
[  758.761592] ------------[ cut here ]------------
[  758.766790] kernel BUG at block/bfq-cgroup.c:351!
[  758.772806] invalid opcode: 0000 [#1] SMP KASAN PTI
[  758.777908] CPU: 0 PID: 3257 Comm: e2e.test Not tainted 5.4.8-208.bz205447.fc31.x86_64 #2
[  758.786311] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[  758.796216] RIP: 0010:bfqg_put+0x42/0x50
[  758.800449] Code: 00 00 00 01 78 22 75 1e 48 8d bd b0 00 00 00 e8 94 c2 c3 ff 80 bd b0 00 00 00 00 75 0d 48 89 ef 5d e9 b2 66 c3 ff 5d c3 0f 0b <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 eb a9 66
[  758.819572] RSP: 0018:ffff8883a8e0fd98 EFLAGS: 00010202
[  758.825015] RAX: 0000000000000000 RBX: ffffffffb493d3d8 RCX: ffffffffb089c55c
[  758.832370] RDX: 1ffff11067422b16 RSI: 0000000000000008 RDI: ffff88833a1158b0
Jan 23 17:57:31 [  758.839723] RBP: ffff88833a115800 R08: ffffffffb089c545 R09: fffffbfff6797799
cevich-fedora-31[  758.848490] R10: fffffbfff6797798 R11: ffffffffb3cbbcc7 R12: ffff88833a115800
-libpod-62282734[  758.857168] R13: ffff8882686d2000 R14: ffffffffb2493060 R15: ffff8882686d25e0
69587456 kernel:[  758.865934] FS:  00007ff741484700(0000) GS:ffff8883a8e00000(0000) knlGS:0000000000000000
 ------------[ c[  758.875651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
ut here ]-------[  758.882947] CR2: 000000c0004a5000 CR3: 000000038547a001 CR4: 00000000001606f0
-----
Jan 23 17[  758.891630] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:57:31 cevich-fe[  758.900296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
dora-31-libpod-6[  758.909001] Call Trace:
228273469587456 [  758.912976]  <IRQ>
kernel: kernel B[  758.916528]  blkg_free.part.0+0x4f/0xc0
UG at block/bfq-[  758.921903]  ? blkg_prfill_rwstat_field_recursive+0xe0/0xe0
cgroup.c:351!
J[  758.929008]  rcu_do_batch+0x401/0x7a0
an 23 17:57:31 c[  758.934221]  ? rcu_note_context_switch+0x300/0x300
evich-fedora-31-[  758.940551]  ? lockdep_hardirqs_on+0x182/0x260
libpod-622827346[  758.950342]  rcu_core+0x3f7/0x600
9587456 kernel: [  758.955188]  __do_softirq+0x130/0x5c1
invalid opcode: [  758.960817]  irq_exit+0x16b/0x170
0000 [#1] SMP KA[  758.965651]  smp_apic_timer_interrupt+0xfd/0x320
SAN PTI
Jan 23 [  758.971840]  apic_timer_interrupt+0xf/0x20
17:57:31 cevich-[  758.977455]  </IRQ>
fedora-31-libpod[  758.981207] RIP: 0010:_raw_spin_unlock_irqrestore+0x50/0x60
-622827346958745[  758.988333] Code: 05 ff f6 c7 02 75 19 48 89 df 57 9d 0f 1f 44 00 00 e8 14 20 18 ff 65 ff 0d c5 5b e8 4e 5b 5d c3 e8 55 22 18 ff 48 89 df 57 9d <0f> 1f 44 00 00 eb e5 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53
6 kernel: CPU: 0[  759.008863] RSP: 0018:ffff88839b89f798 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
 PID: 3257 Comm:[  759.020296] RAX: 0000000000000000 RBX: 0000000000000206 RCX: dffffc0000000000
 e2e.test Not ta[  759.029097] RDX: 0000000000000007 RSI: 0000000000000008 RDI: 0000000000000206
inted 5.4.8-208.[  759.037765] RBP: ffffffffb4990a18 R08: ffffffffb01f3d62 R09: fffffbfff6797799
bz205447.fc31.x8[  759.046570] R10: fffffbfff6797798 R11: ffffffffb3cbbcc7 R12: ffff8882d2b96588
6_64 #2
Jan 23 [  759.055240] R13: 1ffff11073713ef8 R14: ffffffffb215be00 R15: 0000000000000000
17:57:31 cevich-[  759.063930]  ? lockdep_hardirqs_on+0x182/0x260
fedora-31-libpod[  759.070255]  debug_object_activate+0x1ff/0x300
-622827346958745[  759.076323]  ? debug_object_assert_init+0x220/0x220
6 kernel: Hardwa[  759.082735]  ? sched_clock+0x5/0x10
re name: Google [  759.087750]  ? sched_clock_cpu+0x18/0x110
Google Compute E[  759.093301]  ? soft_offline_page.cold+0x11c/0x11c
ngine/Google Com[  759.099542]  ? free_buffer_head+0x2e/0x70
pute Engine, BIO[  759.105109]  __call_rcu+0x89/0x840
S Google 01/01/2[  759.110047]  ? lock_contended+0x680/0x680
011
Jan 23 17:5[  759.115618]  ? invoke_rcu_core+0xe0/0xe0
7:31 cevich-fedo[  759.121079]  ? _raw_spin_unlock_irqrestore+0x4b/0x60
ra-31-libpod-622[  759.127587]  ? free_buffer_head+0x2e/0x70
8273469587456 ke[  759.133712]  ? lockdep_hardirqs_on+0x182/0x260
rnel: RIP: 0010:[  759.139691]  ? free_buffer_head+0x2e/0x70
bfqg_put+0x42/0x[  759.145225]  kmem_cache_free+0x283/0x400
50
Jan 23 17:57[  759.150691]  free_buffer_head+0x2e/0x70
:31 cevich-fedor[  759.156115]  try_to_free_buffers+0x18b/0x2d0
a-31-libpod-6228[  759.161957]  ? drop_buffers+0x230/0x230
273469587456 ker[  759.167342]  ? jbd2_journal_grab_journal_head+0x92/0xe0
nel: Code: 00 00[  759.174229]  truncate_cleanup_page+0x14d/0x270
 00 01 78 22 75 [  759.180305]  truncate_inode_pages_range+0x333/0xc10
1e 48 8d bd b0 0[  759.194040]  ? generic_error_remove_page+0x60/0x60
0 00 00 e8 94 c2[  759.200378]  ? pvclock_clocksource_read+0xd9/0x190
 c3 ff 80 bd b0 [  759.206734]  ? _raw_spin_unlock_irq+0x29/0x40
00 00 00 00 75 0[  759.212598]  ? lockdep_hardirqs_on+0x182/0x260
d 48 89 ef 5d e9[  759.218628]  ext4_evict_inode+0x29c/0xb70
 b2 66 c3 ff 5d [  759.229321]  ? lock_downgrade+0x360/0x360
c3 0f 0b <0f> 0b[  759.234979]  ? var_wake_function+0x80/0x80
 66 66 2e 0f 1f [  759.240605]  ? ext4_da_write_begin+0x870/0x870
84 00 00 00 00 0[  759.246596]  evict+0x18c/0x2b0
0 90 0f 1f 44 00[  759.251211]  do_unlinkat+0x2ab/0x3f0
 00 eb a9 66
Ja[  759.256337]  ? __ia32_sys_rmdir+0x20/0x20
n 23 17:57:31 ce[  759.261880]  ? strncpy_from_user+0xd2/0x1f0
vich-fedora-31-l[  759.267613]  ? getname_flags+0x105/0x2a0
ibpod-6228273469[  759.273151]  do_syscall_64+0x74/0xd0
587456 kernel: R[  759.278260]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
SP: 0018:ffff888[  759.284868] RIP: 0033:0x4c03c0
3a8e0fd98 EFLAGS[  759.289451] Code: 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 49 c7 c2 00 00 00 00 49 c7 c0 00 00 00 00 49 c7 c1 00 00 00 00 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
: 00010202
Jan [  759.310016] RSP: 002b:000000c0004f0208 EFLAGS: 00000216 ORIG_RAX: 0000000000000107
23 17:57:31 cevi[  759.319096] RAX: ffffffffffffffda RBX: 000000c00007ca00 RCX: 00000000004c03c0
ch-fedora-31-lib[  759.327785] RDX: 0000000000000000 RSI: 000000c000042b18 RDI: 0000000000000016
pod-622827346958[  759.336586] RBP: 000000c0004f0250 R08: 0000000000000000 R09: 0000000000000000
7456 kernel: RAX[  759.345258] R10: 0000000000000000 R11: 0000000000000216 R12: 00000000000000b2
: 00000000000000[  759.353945] R13: 00000000000000b1 R14: 0000000000000200 R15: 0000000000000038
00 RBX: ffffffff[  759.362626] Modules linked in: ip6table_nat ip6_tables xt_conntrack iptable_filter xt_MASQUERADE xt_comment iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c veth bridge stp llc rfkill intel_rapl_msr snd_pcsp snd_seq snd_seq_device snd_pcm intel_rapl_common sb_edac snd_timer virtio_net snd net_failover i2c_piix4 failover pvpanic soundcore intel_rapl_perf ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_scsi serio_raw fuse
b493d3d8 RCX: ff[  759.407916] ---[ end trace 3bc1a557a5b39ddd ]---
ffffffb089c55c
[  759.413926] RIP: 0010:bfqg_put+0x42/0x50
Jan 23 17:57:31 [  759.419374] Code: 00 00 00 01 78 22 75 1e 48 8d bd b0 00 00 00 e8 94 c2 c3 ff 80 bd b0 00 00 00 00 75 0d 48 89 ef 5d e9 b2 66 c3 ff 5d c3 0f 0b <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 eb a9 66
cevich-fedora-31[  759.439721] RSP: 0018:ffff8883a8e0fd98 EFLAGS: 00010202
-libpod-62282734[  759.446470] RAX: 0000000000000000 RBX: ffffffffb493d3d8 RCX: ffffffffb089c55c
69587456 kernel:[  759.455171] RDX: 1ffff11067422b16 RSI: 0000000000000008 RDI: ffff88833a1158b0
 RDX: 1ffff11067[  759.463842] RBP: ffff88833a115800 R08: ffffffffb089c545 R09: fffffbfff6797799
422b16 RSI: 0000[  759.472513] R10: fffffbfff6797798 R11: ffffffffb3cbbcc7 R12: ffff88833a115800
000000000008 RDI[  759.481226] R13: ffff8882686d2000 R14: ffffffffb2493060 R15: ffff8882686d25e0
: ffff88833a1158[  759.489875] FS:  00007ff741484700(0000) GS:ffff8883a8e00000(0000) knlGS:0000000000000000
b0
[  759.499634] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  759.505964] CR2: 000000c0004a5000 CR3: 000000038547a001 CR4: 00000000001606f0
[  759.513391] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  759.520805] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  759.528215] Kernel panic - not syncing: Fatal exception in interrupt
[  759.536211] Kernel Offset: 0x2f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  759.547344] Rebooting in 10 seconds..
Comment 56 Paolo Valente 2020-01-24 16:08:03 UTC
Created attachment 286959 [details]
This patch should fix a mistake in my last patch

Add this patch on top of all the others. And let's cross our fingers again ...
Comment 57 Chris Evich 2020-01-24 18:03:56 UTC
...built, installed, and reproducer running.  Same as before, new log filename for build 209...
Comment 58 Chris Evich 2020-01-24 23:04:06 UTC
...tests finished, but oddly the serial port disconnected.  I marked the serial-port log with '#### hmmmm ####' at the point I reconnected.  Then I restarted a second run of tests around 18:02 EST....
Comment 59 Chris Evich 2020-01-27 14:15:57 UTC
...those finished, so I started another run.  My tests are ultimately failing w/ a too many open files error.  However, I don't think this has anything to do with the elevator bug.  I think the next step to confirm the fix is to try running build/repro cycle with out debugging options enabled, ya?
Comment 60 Paolo Valente 2020-01-27 14:19:07 UTC
(In reply to Chris Evich from comment #59)
> ...those finished, so I started another run.  My tests are ultimately
> failing w/ a too many open files error.  However, I don't think this has
> anything to do with the elevator bug.  I think the next step to confirm the
> fix is to try running build/repro cycle with out debugging options enabled,
> ya?

Good idea!

One extra action before a new test: if, as I understand, the VM is still alive after the test (and you have not rebooted it already), then could you please paste the full output of dmesg?
Comment 61 Chris Evich 2020-01-27 16:25:16 UTC
The VM remained up all weekend and throughout three iterations of the reproducer.  Unf. after the third, I could not recover leaked open files and mounts enough to run again so I rebooted to increase the limit.  However, I do have a log of the serial-port output.  I'll e-mail it to you.
Comment 62 Chris Evich 2020-01-28 20:45:54 UTC
Running build 210 (all patches but w/o custom .config) seems to be passing all checks.  A 'grep trace <logfile>' gives no matches.
Comment 63 Chris Evich 2020-01-29 17:02:58 UTC
Testing of build 210 seems good to me.  I'm ready to give a consolidated draft-patch (THE fix, minus all debugging and instrumentation) a build and trial run.
Comment 64 Paolo Valente 2020-01-31 14:49:05 UTC
Created attachment 287043 [details]
Mainline version of patches

These patches are a port for mainline of the patches I proposed here. I submitted these patches on linux kernel mailing lists too (link to some archive not available yet). I've tested these patches on top of a 5.5.
Comment 65 Chris Evich 2020-01-31 19:11:20 UTC
Kernel built and installed, reproducer running, no errors so far.
Comment 66 Chris Evich 2020-01-31 21:10:32 UTC
Running reproducer again.
Comment 67 Chris Evich 2020-02-27 15:13:03 UTC
Follow-up:  Was not able to reproduce any problems with patches from #c64 which are substantially equivalent to what was applied. Ref. https://lkml.org/lkml/2020/2/3/321
Comment 68 Patrick Dung 2020-03-19 13:44:46 UTC
*** Bug 205725 has been marked as a duplicate of this bug. ***
Comment 69 Patrick Dung 2020-03-19 13:46:08 UTC
I was using 5.5.9-200.fc31.x86_64 on FC31 for about a week. It's fine so far.
Comment 70 Patrick Dung 2020-04-13 07:50:16 UTC
Moved from FC31 to FC32beta. Kernel is now 5.6.2.
I had two occurrence of system hard hang after upgrading.

The actual hardware is not the same as the one when I create the ticket (2019).
But the symptom is very similar as before. It would freeze the system without any crash dump or oops (or it actually freeze before an crash dump or oops could be recorded).

What was done before the freeze/hang:
I had performed dnf update 'some packages'. It freeze after the command is completed a while (few seconds to one or two minutes).

The system is not responsive. I could not trigger a crash dump or reboot with the Magic sysq. After hard reset, I checked the log:

Apr 13 12:35:28 home audit: PROCTITLE proctitle=2F7573722F62696E2F73797374656D642D746D7066696C6573002D2D637265617465
Apr 13 12:35:28 home systemd[1]: Started /usr/bin/systemctl start man-db-cache-update.
Apr 13 12:35:28 home audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=run-r56bb0604d6b8466daf57738724fcf817 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 13 12:41:10 home kernel: Linux version 5.6.2-300.fc32.x86_64 (mockbuild@bkernel04.phx2.fedoraproject.org) (gcc version 10.0.1 20200328 (Red Hat 10.0.1-0.11) (GCC)) #1 SMP Thu Apr 2 18:34:20 UTC 2020

Last command (run by system, not me) was systemctl start man-db-cache-update. The two system freeze had similar log/command.

I tried to test it with three VM but unable to reproduce.

The problem occurs on the physical hardware. I can't afford system hang on that hardware. I don't have evidence that the freeze is directly related to BFQ. But I am now changing back to mq-deadline io scheduler, and check if the problem would occur.
Comment 71 Chris Evich 2020-04-13 20:31:19 UTC
Patrick,

Testing with mq-deadline is a smart move, please let us know if the problem does not reproduce.  Previously, IIRC, Paolo needed to turn on a few debugging configurations in the kernel, otherwise we barely got more than a squeek + whimper on the serial console.  I _REALLY_ hope for your reliability sake, you are NOT reproducing the same/similar problem :|
Comment 72 Jordan Russell 2020-04-17 22:17:56 UTC
(In reply to Patrick Dung from comment #70)
> Moved from FC31 to FC32beta. Kernel is now 5.6.2.
> I had two occurrence of system hard hang after upgrading.

What you describe in this comment seems to be precisely what I'm seeing:
On Fedora 32 Beta, after dnf commands complete, the system sometimes freezes a few seconds later. Nothing is printed on the console, the cursor stops blinking, and there is no response to Magic SysRq.

So far, I've seen 5 freezes on 2 VMware VMs, always after a "dnf upgrade|install|remove" command. The freezes don't appear to be tied to any specific packages, because repeating the same transaction on the same VM or a different F32 VM does not reproduce the freeze.

Overall, it seems roughly 10% of dnf transactions result in a freeze.

One VM is a F31->32 upgrade running kernel 5.6.2-301, and the other is a clean install of F32 Beta (using netinstall ISO with "Minimum" software selection) running kernel 5.6.3-300. Both VMs are configured with 1 CPU and 2GB RAM.

My logs look identical to yours. Just prior to every freeze, I see "Started ... man-db-cache-update" and "SERVICE_START".

Is there already a Fedora Bugzilla entry about this? I can't seem to find one. IMO this seems serious enough to be considered a blocking bug for F32 Final. Freezes are bad, but freezes that occur right after package updates seem especially dangerous, since they might result in the loss of async writes to critical files needed to boot the system.
Comment 73 Patrick Dung 2020-04-17 23:02:25 UTC
Previously I had a ticket for Fedora 31
https://bugzilla.redhat.com/show_bug.cgi?id=1768092
a) After several patches, it seems stable at Kernel 5.5.
b) Another ticket (https://bugzilla.redhat.com/show_bug.cgi?id=1767539) about BFQ opened earlier than my ticket.

So I closed BZ#1768092 in Redhat. BZ#1767539 remains open in Redhat.

I think the problem may have higher probability to occur in Kernel 5.6. I had about two weeks of using BFQ on Kernel 5.5 without problem, also I had upgrade from FC31 to FC32 beta without problem.

If you can reproduce the problem in VM, that's very good. When the VM hang you can get a crash dump.
QEMU: https://wiki.ubuntu.com/DebuggingKernelWithQEMU
or vmss2core for Vmware based VM
https://kb.vmware.com/s/article/2003941

After getting the crash dump, you can get the log and backtrace.
https://docs.fedoraproject.org/en-US/Fedora/14/html/Deployment_Guide/s1-kdump-crash.html
Then we will know what is causing the problem (e.g BFQ or other issue) and we can have kernel developers to fix it.

Below ticket is reporting a BFQ problem on FC32 beta / kernel 5.6 RC.
https://bugzilla.redhat.com/show_bug.cgi?id=1816204
But seems no one work on it except the reporter.... I also have no problem with mq-deadline for a few days.
Comment 74 Patrick Dung 2020-04-17 23:07:48 UTC
@Paolo
Please kindly take a look about the crashes in https://bugzilla.redhat.com/show_bug.cgi?id=1816204
The report had some logs recorded at Kernel 5.6 RC, it could be useful at the moment.
Thanks.

@Jordan
I do think this problem deserve a blocking bug for FC32 too.
Comment 75 Patrick Dung 2020-04-18 06:42:00 UTC
@Jordan
I can't reproduce it with my VM just now.
BTW, Kernel 5.6.5 had two patches related to BFQ. Worth to install and see if it's fixed. https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.6.5

There are new rpm for FC32, it's in the update-testing channel.
https://bodhi.fedoraproject.org/updates/FEDORA-2020-bebcd88161
Comment 76 Paolo Valente 2020-04-18 10:15:27 UTC
@Patrick
Ok, then I'll wait for an update by you before checking the log.
Comment 77 Paolo Valente 2020-04-18 10:37:01 UTC
At any rate, this problem may have to do with the fact that my latest fixes have been applied only for 5.7.0:
https://lkml.org/lkml/2020/3/21/89

It is usually a little hard for me to know which commits then get backported to existing stable trees.
Comment 78 Patrick Dung 2020-04-18 10:45:40 UTC
@Paolo
It seems Kernel 5.6.5 may not have all the patches applied.
I would wait for Kernel 5.7 and test it on the physical machine.
Comment 79 Jordan Russell 2020-04-18 19:01:44 UTC
Here is a script which reliably reproduces the freeze for me in under a minute (on VMware with 5.6.3-300). The freezes appear to always occur during the sleeps, i.e. after dnf prints "Complete!".

========================================
#!/bin/bash

set -e

while true; do
        dnf --setopt=keepcache=True -y install nano
        sleep 2
        dnf -y remove nano
        sleep 2
done
========================================

I will try other kernel versions, and see if switching to mq-deadline helps.
Comment 80 Jordan Russell 2020-04-18 19:43:24 UTC
Freeze still happens under 5.6.5-300 (from updates-testing repo).

So then I tried:
  echo mq-deadline > /sys/block/sda/queue/scheduler
and my reproducer script no longer freezes (after 10 minutes).
Comment 81 Patrick Dung 2020-04-18 19:49:17 UTC
For me, my VMs (using kernel 5.6.5/5.6.2) are unable to crash with those dnf reinstall commands.
Comment 82 Jordan Russell 2020-04-18 20:10:43 UTC
It looks like the bug may only be readily reproducible if the VM is configured with 1 CPU or booted with maxcpus=1. When I configure the VM with 2 CPUs, my script doesn't freeze (at least not yet).
Comment 83 Patrick Dung 2020-04-18 20:53:18 UTC
I tried running with one CPU. Can't reproduce the problem with the script running for about 30 minutes.

04:15:52 AM  LINUX RESTART      (1 CPU)

04:17:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:18:01 AM     all     26.43      3.94     13.61     44.29      0.00     11.73
04:19:01 AM     all     58.55      3.42     14.42     13.20      0.00     10.41
04:20:01 AM     all     49.84      7.94     11.92     27.44      0.00      2.86
04:21:01 AM     all     47.42      8.76     13.05     30.77      0.00      0.00
04:22:01 AM     all     48.95      8.84     12.74     29.44      0.00      0.03
04:23:01 AM     all     46.75      8.91     13.04     31.28      0.00      0.02
04:24:01 AM     all     49.43      8.68     12.88     29.01      0.00      0.00
04:25:01 AM     all     47.62      8.89     12.94     30.55      0.00      0.00
04:26:01 AM     all     48.31      8.58     13.05     30.06      0.00      0.00
04:27:01 AM     all     47.35      8.90     13.36     30.39      0.00      0.00
04:28:01 AM     all     47.70      8.75     13.03     30.52      0.00      0.00
04:29:01 AM     all     48.07      8.82     13.03     30.08      0.00      0.00
04:30:01 AM     all     47.64      8.85     13.26     30.23      0.00      0.02
04:31:01 AM     all     48.25      9.04     13.43     29.27      0.00      0.00
04:32:01 AM     all     47.49      8.87     12.95     30.69      0.00      0.00
04:33:01 AM     all     48.90      8.64     12.95     29.49      0.00      0.03
04:34:01 AM     all     47.79      9.02     13.21     29.98      0.00      0.00
04:35:01 AM     all     49.45      8.37     12.99     29.19      0.00      0.00
04:36:01 AM     all     47.37      9.05     13.21     30.31      0.00      0.07
04:37:01 AM     all     48.08      8.51     12.90     30.51      0.00      0.00
04:38:01 AM     all     47.67      8.82     12.78     30.73      0.00      0.00
04:39:01 AM     all     47.80      8.86     12.78     30.53      0.00      0.03
04:40:01 AM     all     47.79      8.79     12.92     30.50      0.00      0.00
04:41:01 AM     all     47.66      8.90     13.06     30.35      0.00      0.02
04:42:01 AM     all     48.44      8.77     13.08     29.71      0.00      0.00

04:42:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
04:43:01 AM     all     47.51      8.83     12.63     15.22      0.00     15.80
04:44:01 AM     all     48.33      8.82     12.65     21.18      0.00      9.02
04:45:01 AM     all     47.46      8.90     12.98     30.64      0.00      0.02
04:46:01 AM     all     49.08      8.72     13.25     28.91      0.00      0.03
04:47:01 AM     all     47.04      8.67     13.07     31.20      0.00      0.02
Comment 84 Jordan Russell 2020-04-18 22:32:02 UTC
Argh - it's 100% reproducible for me in <1 minute as long as I boot with 1 CPU.

But: I just installed 5.7.0-rc1 from the F33 repo (using "dnf --releasever=33 upgrade kernel"), and now I cannot reproduce the freeze (after 10 minutes running the script).

So it seems this indeed may just be a matter of 5.6.x not having the necessary patches backported?
Comment 85 Paolo Valente 2020-04-19 07:25:27 UTC
I've checked my mailbox, and I have seen that only one of my last fixes has been backported to 5.6.

If useful for you, I can ask Sasha Levin to backport the other commits too, to 5.6.
Comment 86 Jordan Russell 2020-04-20 21:48:44 UTC
That should be helpful, thanks.

I've also gone ahead and filed a Fedora bug on this, which hopefully gets their attention before Fedora 32's release:
https://bugzilla.redhat.com/show_bug.cgi?id=1826091
Comment 87 Jordan Russell 2020-04-21 20:51:12 UTC
I finally managed to get the Oops log out of the VMware state file (thanks Patrick for the tip).
("crash" was erroring out on the core file produced by vmss2core, but it turns out that "crash" actually now accepts VMSS files directly.)

Here is the log I get on 5.6.5. I assume this NULL dereference was already fixed by the 5.7 patches (since I can't reproduce on 5.7.0-rc1)?

[  247.748678] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  247.748970] #PF: supervisor write access in kernel mode
[  247.749267] #PF: error_code(0x0002) - not-present page
[  247.749550] PGD 0 P4D 0 
[  247.749832] Oops: 0002 [#1] SMP PTI
[  247.750127] CPU: 0 PID: 5 Comm: kworker/0:0 Kdump: loaded Not tainted 5.6.5-300.fc32.x86_64 #1
[  247.750431] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[  247.751078] Workqueue: cgroup_destroy css_killed_work_fn
[  247.751419] RIP: 0010:bfq_bfqq_move+0x15/0x140
[  247.751754] Code: 86 f8 03 00 00 e9 5f ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 49 89 d5 41 54 49 89 fc 55 48 89 f5 <83> 06 01 48 39 b7 a0 00 00 00 0f 84 04 01 00 00 48 89 ef e8 03 84
[  247.752824] RSP: 0018:ffffa20b8004bdc8 EFLAGS: 00010082
[  247.753168] RAX: 0000000000000000 RBX: ffff9075248ac120 RCX: ffffffff8a3e00d8
[  247.753501] RDX: ffff907533205000 RSI: 0000000000000000 RDI: ffff907533204000
[  247.753843] RBP: 0000000000000000 R08: ffff9074f5719ba0 R09: 0000000000000000
[  247.754174] R10: 0000000000000000 R11: ffff907538b71900 R12: ffff907533204000
[  247.754517] R13: ffff907533205000 R14: ffff9075332043f0 R15: ffff9075248ac098
[  247.754840] FS:  0000000000000000(0000) GS:ffff90753be00000(0000) knlGS:0000000000000000
[  247.755169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.755493] CR2: 0000000000000000 CR3: 0000000078fee004 CR4: 00000000003606f0
[  247.755851] Call Trace:
[  247.756188]  bfq_pd_offline+0x87/0xf0
[  247.756534]  blkg_destroy+0x52/0xf0
[  247.756870]  blkcg_destroy_blkgs+0x4f/0xa0
[  247.757218]  css_killed_work_fn+0x4d/0xd0
[  247.757557]  process_one_work+0x1b4/0x380
[  247.757906]  worker_thread+0x53/0x3e0
[  247.758244]  ? process_one_work+0x380/0x380
[  247.758597]  kthread+0x115/0x140
[  247.758940]  ? __kthread_bind_mask+0x60/0x60
[  247.759295]  ret_from_fork+0x35/0x40
[  247.759644] Modules linked in: rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6_tables nft_compat ip_set nf_tables nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel vmw_balloon intel_rapl_perf joydev vmw_vmci i2c_piix4 ip_tables xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi fuse
[  247.762493] CR2: 0000000000000000
[  247.762914] ---[ end trace 5497cbca09c4bf8f ]---
[  247.763338] RIP: 0010:bfq_bfqq_move+0x15/0x140
[  247.763754] Code: 86 f8 03 00 00 e9 5f ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 49 89 d5 41 54 49 89 fc 55 48 89 f5 <83> 06 01 48 39 b7 a0 00 00 00 0f 84 04 01 00 00 48 89 ef e8 03 84
[  247.765000] RSP: 0018:ffffa20b8004bdc8 EFLAGS: 00010082
[  247.765422] RAX: 0000000000000000 RBX: ffff9075248ac120 RCX: ffffffff8a3e00d8
[  247.765842] RDX: ffff907533205000 RSI: 0000000000000000 RDI: ffff907533204000
[  247.766246] RBP: 0000000000000000 R08: ffff9074f5719ba0 R09: 0000000000000000
[  247.766659] R10: 0000000000000000 R11: ffff907538b71900 R12: ffff907533204000
[  247.767071] R13: ffff907533205000 R14: ffff9075332043f0 R15: ffff9075248ac098
[  247.767467] FS:  0000000000000000(0000) GS:ffff90753be00000(0000) knlGS:0000000000000000
[  247.767893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  247.768295] CR2: 0000000000000000 CR3: 0000000078fee004 CR4: 00000000003606f0
Comment 88 Paolo Valente 2020-04-22 09:03:46 UTC
Yep, this stack trace matches those of the failures fixed by my last fixes.
Comment 89 Paolo Valente 2020-04-22 09:10:01 UTC
BTW, my last fixes have been queued for 5.4x and 5.6.x by Greg K-H:
https://lore.kernel.org/stable/20200421173603.GA1307163@kroah.com/T/#t

Note You need to log in before you can comment on or make changes to this bug.