Symptoms: after receiving the warning below I continued to work, but some processes with heavy I/O (namely g++), started to hang. I could kill them only via ctrl+c. The reboot program hanged too (no any reaction). Then I forced to turn off power and got corrupted files. 2019-11-04T12:31:44.706915+03:00 localhost kernel: [ 263.974691] WARNING: CPU: 3 PID: 138 at fs/ext4/inode.c:3941 ext4_set_page_dirty+0x3e/0x50 [ext4] 2019-11-04T12:31:44.706932+03:00 localhost kernel: [ 263.974694] Modules linked in: psmouse rfcomm 8021q garp mrp stp ctr ccm psnap llc cmac algif_hash algif_skcipher af_alg bnep joydev cor etemp hwmon hid_multitouch hid_generic x86_pkg_temp_thermal intel_powerclamp kvm_intel i2c_designware_platform iTCO_wdt iTCO_vendor_support i2c_designware_core intel_rapl_msr mei_hdcp kvm ir qbypass snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc crct10dif_pclmul snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi ghash_clmulni_intel aesni_intel a es_x86_64 snd_hda_codec_hdmi crypto_simd iwlmvm snd_soc_core cryptd snd_hda_codec_conexant glue_helper snd_hda_codec_generic snd_compress intel_cstate ledtrig_audio snd_pcm_dmaengine intel_r apl_perf ac97_bus mac80211 snd_hda_intel hp_wmi input_leds sparse_keymap libarc4 wmi_bmof snd_hda_codec iwlwifi snd_hda_core uvcvideo snd_hwdep videobuf2_vmalloc i2c_i801 snd_pcm videobuf2_m emops r8169 rtsx_pci_ms videobuf2_v4l2 memstick cfg80211 realtek videobuf2_common mei_me mei videodev btusb 2019-11-04T12:31:44.706938+03:00 localhost kernel: [ 263.974729] idma64 virt_dma btbcm btrtl intel_lpss_pci mc btintel intel_lpss intel_xhci_usb_role_switch processor_thermal_device roles intel_soc_dts_iosf intel_rapl_common intel_pch_thermal battery int3403_thermal int340x_thermal_zone tpm_crb tpm_tis tpm_tis_core hp_accel lis3lv02d tpm input_polldev rng_core int3400_thermal acpi_thermal_rel hp_wireless acpi_pad evdev ac mac_hid thermal pci_stub vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsoc k_virtio_transport_common vsock vhost_net vhost tap uhid hci_vhci bluetooth ecdh_generic rfkill ecc vfio_iommu_type1 vfio uinput userio ppp_generic slhc tun loop nvram btrfs xor raid6_pq lib crc32c cuse fuse ext4 crc32c_generic crc16 mbcache jbd2 i915 intel_gtt i2c_algo_bit rtsx_pci_sdmmc drm_kms_helper mmc_core ahci syscopyarea sysfillrect libahci xhci_pci sysimgblt fb_sys_fops xhci_hcd libata crc32_pclmul drm crc32c_intel serio_raw usbcore rtsx_pci scsi_mod i2c_hid agpgart hid wmi 2019-11-04T12:31:44.706940+03:00 localhost kernel: [ 263.974772] pinctrl_sunrisepoint video pinctrl_intel button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: psmouse] 2019-11-04T12:31:44.706941+03:00 localhost kernel: [ 263.974781] CPU: 3 PID: 138 Comm: kworker/u16:2 Tainted: G O 5.3.8_2 #1 2019-11-04T12:31:44.706943+03:00 localhost kernel: [ 263.974782] Hardware name: HP HP ProBook 450 G5/837D, BIOS Q85 Ver. 01.01.00 08/19/2017 2019-11-04T12:31:44.706944+03:00 localhost kernel: [ 263.974835] Workqueue: i915 __i915_gem_free_work [i915] 2019-11-04T12:31:44.706945+03:00 localhost kernel: [ 263.974850] RIP: 0010:ext4_set_page_dirty+0x3e/0x50 [ext4] 2019-11-04T12:31:44.706945+03:00 localhost kernel: [ 263.974853] Code: 48 8b 00 a8 01 75 16 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 08 74 0d 48 8b 07 f6 c4 20 74 0f e9 32 6 7 bd c6 <0f> 0b 48 8b 07 f6 c4 20 75 f1 0f 0b e9 21 67 bd c6 90 0f 1f 44 00 2019-11-04T12:31:44.706947+03:00 localhost kernel: [ 263.974854] RSP: 0018:ffffa0814039bd98 EFLAGS: 00010246 2019-11-04T12:31:44.706948+03:00 localhost kernel: [ 263.974856] RAX: 017fff8000002036 RBX: ffff901a846f6000 RCX: 0000000000000000 2019-11-04T12:31:44.706949+03:00 localhost kernel: [ 263.974857] RDX: 0000000000000000 RSI: 0000000000000282 RDI: fffff783cffce540 2019-11-04T12:31:44.706950+03:00 localhost kernel: [ 263.974858] RBP: fffff783cffce540 R08: 00000000000f6798 R09: 0000000000000000 2019-11-04T12:31:44.706952+03:00 localhost kernel: [ 263.974859] R10: 0000000000000000 R11: 0000000000000004 R12: 00000000003ff395 2019-11-04T12:31:44.706953+03:00 localhost kernel: [ 263.974860] R13: ffff901a8093ed00 R14: ffff901a8343cf90 R15: 0000000000000000 2019-11-04T12:31:44.706954+03:00 localhost kernel: [ 263.974861] FS: 0000000000000000(0000) GS:ffff901a888c0000(0000) knlGS:0000000000000000 2019-11-04T12:31:44.706955+03:00 localhost kernel: [ 263.974862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2019-11-04T12:31:44.706956+03:00 localhost kernel: [ 263.974863] CR2: 00007f0a9e0a1d78 CR3: 00000003b800a006 CR4: 00000000003606e0 2019-11-04T12:31:44.706957+03:00 localhost kernel: [ 263.974864] Call Trace: 2019-11-04T12:31:44.706959+03:00 localhost kernel: [ 263.974899] i915_gem_userptr_put_pages+0x148/0x1d0 [i915] 2019-11-04T12:31:44.706960+03:00 localhost kernel: [ 263.974934] __i915_gem_object_put_pages+0x5e/0xa0 [i915] 2019-11-04T12:31:44.706961+03:00 localhost kernel: [ 263.974968] __i915_gem_free_objects+0x12c/0x240 [i915] 2019-11-04T12:31:44.706962+03:00 localhost kernel: [ 263.975005] __i915_gem_free_work+0x69/0x90 [i915] 2019-11-04T12:31:44.706963+03:00 localhost kernel: [ 263.975014] process_one_work+0x186/0x390 2019-11-04T12:31:44.706964+03:00 localhost kernel: [ 263.975018] worker_thread+0x50/0x3a0 2019-11-04T12:31:44.706965+03:00 localhost kernel: [ 263.975023] kthread+0xfb/0x130 2019-11-04T12:31:44.706966+03:00 localhost kernel: [ 263.975027] ? process_one_work+0x390/0x390 2019-11-04T12:31:44.706967+03:00 localhost kernel: [ 263.975029] ? kthread_park+0x80/0x80 2019-11-04T12:31:44.706967+03:00 localhost kernel: [ 263.975035] ret_from_fork+0x35/0x40 2019-11-04T12:31:44.706968+03:00 localhost kernel: [ 263.975039] ---[ end trace b7a4449c28785cdf ]---
Can you send the complete /var/log/messages and/or /var/log/kern.log from say, a day before the hang up to the hang? These sorts of issues can be caused by I/O errors and/or other problems in the device driver and/or device mapper storage stack. So please also describe your hardware and storage configuration, and what distribution are you using. Also, is this a 5.3.8 upstream kernel which you compiled yourself or a distro-supplied kernel? If you compiled it yourself, please also send the kernel config Thanks!
Created attachment 285781 [details] /var/log/messages before I powered it off
> So please also describe your hardware and storage configuration, and what > distribution are you using. I use void linux distribution. My notebook is HP Probook 450 G5 ( https://support.hp.com/gb-en/document/c05682645 ). > Also, is this a 5.3.8 upstream kernel which you compiled yourself or a > distro-supplied kernel? If you compiled it yourself, please also send the > kernel config I use my distro kernel. I think their patches/config can be found at https://github.com/void-linux/void-packages/tree/master/srcpkgs/linux5.3/ Also, please note, that the reported issue (at least the reported stack trace) does not happen on 5.2.21 (again, shipped by void-linux). I currently us it.
Hmm, so less than ten minutes after the system was booted, with no other interesting messages in /var/log/messages. Is this reproducible? If you boot 5.3.8 again, can you reliably reproduce the failure? And what sort of workload are you running when the system goes south? Thanks!
Yes, the original message is reproduced every time I boot with 5.3.8 approximately after a few mins boot. I just launch my DE(awesome), Firefox, Telegram, Claws-mail, and terminal. Actually visually no buggy behavior can be observed, except the message in /var/log/messages. The buggy behavior was in Saturnday, when 1. I launched a WarThunder game, which does heavy I/O (I assume it calculates checksums of files, and if they mismatch, then download new asseets) - it just hanged. 2. Then I was sure that it is a bug in game, and removed it. A bit later I just spawned "make -j9" several times simultaneously for the same project (debug, release, and sanitizer-build) - and when it hanged, I finally payed attention to the original kernel message.
I'm still not sure that the two events are linked: the kernel message and hangs on I/O a few hours later.
I'm also seeing this message on Fedora 30, 5.3.8-200.fc30.x86_64 Could be the same as: https://bugs.freedesktop.org/show_bug.cgi?id=112012 https://bugzilla.redhat.com/show_bug.cgi?id=1758948
I believe this got tracked down to a problematic change in i915 driver in https://bugs.freedesktop.org/show_bug.cgi?id=111601 and should be fixed by now. Closing.