Computer crashs when copying files from a local drive to a cifs mounted files system. This bug exists since kernel 6.8.4 and is always present in last kernel 6.9. Step to reproduce: on AlmaLinux 9 Boot with kernel 6.9.1-2 or 6.8.4 Open a terminal and copy some files on the cifs mounted directory. (Copy can be performed either by cp * ~/COMMUN or using Thunar) --> The computer freezes and reboots producing an /boot/initram.....kdump.img file The bug is not present in kernel 5.14.0-427 nor in 6.7.9 The bug does not affect read operations from the cifs mounted directory Mount options: //192.168.1.112/commun /home/olivier/COMMUN cifs noauto,x-systemd.automount,user,nosuid,gid=100,uid=1026,credentials=/home/olivier/passwd 0 0 ----------------------------------------------------------- The /var/log/messages contains the following: May 19 09:41:12 deyme18 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [cifsd:1959] May 19 09:41:12 deyme18 kernel: Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) nls_utf8(E) cifs(E) cifs_arc4(E) nls_ucs2_utils(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) cifs_md4(E) dns_resolver(E) netfs(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) sunrpc(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) libcrc32c(E) nfnetlink(E) snd_sof_pci_intel_skl(E) snd_sof_intel_hda_common(E) soundwire_intel(E) soundwire_generic_allocation(E) snd_sof_intel_hda_mlink(E) soundwire_cadence(E) snd_sof_intel_hda(E) snd_sof_pci(E) snd_sof_xtensa_dsp(E) snd_sof(E) snd_sof_utils(E) soundwire_bus(E) snd_soc_avs(E) snd_soc_hda_codec(E) snd_soc_skl(E) snd_soc_hdac_hda(E) snd_hda_ext_core(E) snd_soc_sst_ipc(E) snd_soc_sst_dsp(E) snd_soc_acpi_intel_match(E) snd_soc_acpi(E) snd_soc_core(E) vfat(E) fat(E) intel_rapl_msr(E) iwlmvm(E) May 19 09:41:12 deyme18 kernel: snd_compress(E) intel_rapl_common(E) snd_pcm_dmaengine(E) ac97_bus(E) snd_hda_intel(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) intel_tcc_cooling(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hda_codec(E) coretemp(E) jc42(E) mac80211(E) kvm_intel(E) libarc4(E) uvcvideo(E) regmap_i2c(E) snd_hda_core(E) i915(E) ee1004(E) iTCO_wdt(E) btusb(E) iwlwifi(E) intel_pmc_bxt(E) kvm(E) uvc(E) videobuf2_vmalloc(E) snd_hwdep(E) iTCO_vendor_support(E) videobuf2_memops(E) btrtl(E) snd_seq(E) snd_seq_device(E) btintel(E) btbcm(E) irqbypass(E) videobuf2_v4l2(E) btmtk(E) snd_pcm(E) videodev(E) cfg80211(E) rapl(E) videobuf2_common(E) bluetooth(E) intel_cstate(E) mc(E) intel_uncore(E) mei_me(E) intel_gtt(E) snd_timer(E) drm_buddy(E) i2c_i801(E) pcspkr(E) i2c_algo_bit(E) i2c_smbus(E) drm_display_helper(E) snd(E) mei(E) soundcore(E) ttm(E) rfkill(E) intel_pch_thermal(E) joydev(E) cec(E) intel_pmc_core(E) drm_kms_helper(E) intel_xhci_usb_role_switch(E) intel_hid(E) May 19 09:41:12 deyme18 kernel: sparse_keymap(E) intel_vsec(E) pmt_telemetry(E) acpi_pad(E) pmt_class(E) drm(E) ext4(E) mbcache(E) jbd2(E) hid_logitech_hidpp(E) hid_logitech_dj(E) rtsx_pci_sdmmc(E) mmc_core(E) ahci(E) nvme(E) crct10dif_pclmul(E) crc32_pclmul(E) libahci(E) crc32c_intel(E) polyval_clmulni(E) polyval_generic(E) nvme_core(E) libata(E) r8169(E) rtsx_pci(E) ghash_clmulni_intel(E) t10_pi(E) video(E) wmi(E) serio_raw(E) fuse(E) May 19 09:41:12 deyme18 kernel: CPU: 2 PID: 1959 Comm: cifsd Kdump: loaded Tainted: G E 6.8.4-1.el9.elrepo.x86_64 #1 May 19 09:41:12 deyme18 kernel: Hardware name: PC Specialist LTD N2x0WU /N2x0WU , BIOS 1.07.18 02/15/2019 May 19 09:41:12 deyme18 kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x72/0x2d0 May 19 09:41:12 deyme18 kernel: Code: 08 0f 92 c2 8b 45 00 0f b6 d2 c1 e2 08 30 e4 09 d0 a9 00 01 ff ff 0f 85 f2 01 00 00 85 c0 74 12 0f b6 45 00 84 c0 74 0a f3 90 <0f> b6 45 00 84 c0 75 f6 b8 01 00 00 00 66 89 45 00 5b 5d 41 5c 41 May 19 09:41:12 deyme18 kernel: RSP: 0018:ffffad7d40d3bd60 EFLAGS: 00000202 May 19 09:41:12 deyme18 kernel: RAX: 0000000000000001 RBX: ffffffffc1ede988 RCX: 0000000000000000 May 19 09:41:12 deyme18 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc1ede988 May 19 09:41:12 deyme18 kernel: RBP: ffffffffc1ede988 R08: 0000000000000000 R09: ffff9f5a4b100b40 May 19 09:41:12 deyme18 kernel: R10: ffff9f5a259eb1c8 R11: 01d9e3d51959a6a5 R12: ffff9f5a03db0038 May 19 09:41:12 deyme18 kernel: R13: ffff9f5a516f4ec0 R14: ffff9f5a259eb000 R15: 0000000000000000 May 19 09:41:12 deyme18 kernel: FS: 0000000000000000(0000) GS:ffff9f5d5ec80000(0000) knlGS:0000000000000000 May 19 09:41:12 deyme18 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 19 09:41:12 deyme18 kernel: CR2: 000055fa37ada5c0 CR3: 00000002d7a1e001 CR4: 00000000003706f0 May 19 09:41:12 deyme18 kernel: Call Trace: May 19 09:41:12 deyme18 kernel: <IRQ> May 19 09:41:12 deyme18 kernel: ? watchdog_timer_fn+0x261/0x2f0 May 19 09:41:12 deyme18 kernel: ? __pfx_watchdog_timer_fn+0x10/0x10 May 19 09:41:12 deyme18 kernel: ? __hrtimer_run_queues+0x10f/0x2b0 May 19 09:41:12 deyme18 kernel: ? hrtimer_interrupt+0x106/0x240 May 19 09:41:12 deyme18 kernel: ? __sysvec_apic_timer_interrupt+0x6b/0x180 May 19 09:41:12 deyme18 kernel: ? sysvec_apic_timer_interrupt+0x9d/0xd0 May 19 09:41:12 deyme18 kernel: </IRQ> May 19 09:41:12 deyme18 kernel: <TASK> May 19 09:41:12 deyme18 kernel: ? asm_sysvec_apic_timer_interrupt+0x16/0x20 May 19 09:41:12 deyme18 kernel: ? native_queued_spin_lock_slowpath+0x72/0x2d0 May 19 09:41:12 deyme18 kernel: _raw_spin_lock+0x30/0x40 May 19 09:41:12 deyme18 kernel: __cifs_put_smb_ses+0x53/0x440 [cifs] May 19 09:41:12 deyme18 kernel: smb2_find_smb_tcon+0x61/0xd0 [cifs] May 19 09:41:12 deyme18 kernel: smb2_handle_cancelled_mid+0x42/0x90 [cifs] May 19 09:41:12 deyme18 kernel: __release_mid+0x8a/0xb0 [cifs] May 19 09:41:12 deyme18 kernel: cifs_demultiplex_thread+0x2fc/0x790 [cifs] May 19 09:41:12 deyme18 kernel: ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] May 19 09:41:12 deyme18 kernel: kthread+0xee/0x120 May 19 09:41:12 deyme18 kernel: ? __pfx_kthread+0x10/0x10 May 19 09:41:12 deyme18 kernel: ret_from_fork+0x2d/0x50 May 19 09:41:12 deyme18 kernel: ? __pfx_kthread+0x10/0x10 May 19 09:41:12 deyme18 kernel: ret_from_fork_asm+0x1b/0x30 May 19 09:41:12 deyme18 kernel: </TASK> May 19 09:41:23 deyme18 systemd-logind[719]: The system will reboot now! May 19 09:41:23 deyme18 systemd-logind[719]: System is rebooting.
(In reply to dufgrinder from comment #0) > > This bug exists since kernel 6.8.4 So 6.8.3 works fine? Or did you mean 6.7.y worked fine? Then a bisection would be good (and might be required): https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
According to : https://elrepo.org/bugs/view.php?id=1454 There is one other person who reported what looks like the same issue. kernel 6.7.9-1 works fine.
Sure, elrepo reporter is me. The bug appears on 2 of my last updated computers to AlmaLinux 9.4. I'll try to do the bisect, but it'll be my first on a kernel ! KR, Olivier
Just for info, the other reporter on the ELRepo's bug tracker runs CentOS Stream 9.
FWIW, there is a patch [1] that has been sent to the mailing list that is a good candidate for fixing this deadlock. [1] https://lore.kernel.org/r/20240606161313.25521-1-ematsumiya@suse.de
Using the patch provided in Comment 5, I have built a kernel-ml package set, kernel-ml-6.9.3-1.1.el9.elrepo. https://toracat.org/test/kernel/bug218902/ Can you test and see if the issue is fixed?
(In reply to Akemi Yagi from comment #6) > Using the patch provided in Comment 5, I have built a kernel-ml package set, > kernel-ml-6.9.3-1.1.el9.elrepo. > > https://toracat.org/test/kernel/bug218902/ > > Can you test and see if the issue is fixed? I get the same crash with this kernel with CentOS Stream 9
> I get the same crash with this kernel with CentOS Stream 9 Apparently, the candidate patch did not fix the issue. (dufgrinder also reported the same result.)
Sure, Crash confirmed with candidate fix (kernel-ml-6.9.3-1.1.el9.elrepo) on AlmaLinux 9.4
Anyone try it on 6.10-rc3 yet?
We (elrepo) just built kernel-ml-6.10.0-0.rc3.el9.elrepo. We'll make it available for testing.
kernel-ml-6.10.0-0.rc3.el9.elrepo.x86_64 is available here: https://elrepo.org/people/akemi/testing/el9/kernel/218902/x86_64/
kernel-ml-6.10.0-0.rc3.el9.elrepo.x86_64 worked for me with CentOS Stream 9. Writing to a CIFS mount no longer crashes my system.
That's good news. kernel-ml-6.10 GA version is expected to be out in mid July.
Steve, you need to figure out which commit fixed the copy regression so it can be backported to v6.8.y and v6.9.y -- if it hasn't been backported yet. Note that they reported two different oopses. The one from description seems to be fixed by [1]. [1] https://lore.kernel.org/r/20240606161313.25521-1-ematsumiya@suse.de
Hi kernel-ml-6.10.0-0.rc3.el9.elrepo.x86_64 worked for me using AlmaLinux 9.4 on two laptops ( Intel i5-8250U and an old Intel Pentium B970) Tested with the same kind of copy to the CIFS mounted directory
I think this can be closed. CIFS writes still work with kernel-ml-6.10.6-1.el9.elrepo.x86_64 (CentOS Stream 9).
I confirm, CIFS copy works correctly now This bug can be closed