Bug 205417 - Files corruption ( fs/ext4/inode.c:3941 ext4_set_page_dirty+0x3e/0x50 [ext4] )
Summary: Files corruption ( fs/ext4/inode.c:3941 ext4_set_page_dirty+0x3e/0x50 [ext4] )
Status: RESOLVED CODE_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: ext4 (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_ext4@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-04 13:14 UTC by Ivan Baidakou
Modified: 2019-12-18 12:52 UTC (History)
3 users (show)

See Also:
Kernel Version: 5.3.8
Subsystem:
Regression: No
Bisected commit-id:


Attachments
/var/log/messages before I powered it off (466.38 KB, text/plain)
2019-11-04 14:52 UTC, Ivan Baidakou
Details

Description Ivan Baidakou 2019-11-04 13:14:31 UTC
Symptoms: after receiving the warning below I continued to work, but some processes with heavy I/O (namely g++), started to hang. I could kill them only via ctrl+c. The reboot program hanged too (no any reaction). Then I forced to turn off power and got corrupted files.



2019-11-04T12:31:44.706915+03:00 localhost kernel: [  263.974691] WARNING: CPU: 3 PID: 138 at fs/ext4/inode.c:3941 ext4_set_page_dirty+0x3e/0x50 [ext4]
2019-11-04T12:31:44.706932+03:00 localhost kernel: [  263.974694] Modules linked in: psmouse rfcomm 8021q garp mrp stp ctr ccm psnap llc cmac algif_hash algif_skcipher af_alg bnep joydev cor
etemp hwmon hid_multitouch hid_generic x86_pkg_temp_thermal intel_powerclamp kvm_intel i2c_designware_platform iTCO_wdt iTCO_vendor_support i2c_designware_core intel_rapl_msr mei_hdcp kvm ir
qbypass snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc crct10dif_pclmul snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi ghash_clmulni_intel aesni_intel a
es_x86_64 snd_hda_codec_hdmi crypto_simd iwlmvm snd_soc_core cryptd snd_hda_codec_conexant glue_helper snd_hda_codec_generic snd_compress intel_cstate ledtrig_audio snd_pcm_dmaengine intel_r
apl_perf ac97_bus mac80211 snd_hda_intel hp_wmi input_leds sparse_keymap libarc4 wmi_bmof snd_hda_codec iwlwifi snd_hda_core uvcvideo snd_hwdep videobuf2_vmalloc i2c_i801 snd_pcm videobuf2_m
emops r8169 rtsx_pci_ms videobuf2_v4l2 memstick cfg80211 realtek videobuf2_common mei_me mei videodev btusb
2019-11-04T12:31:44.706938+03:00 localhost kernel: [  263.974729]  idma64 virt_dma btbcm btrtl intel_lpss_pci mc btintel intel_lpss intel_xhci_usb_role_switch processor_thermal_device roles 
intel_soc_dts_iosf intel_rapl_common intel_pch_thermal battery int3403_thermal int340x_thermal_zone tpm_crb tpm_tis tpm_tis_core hp_accel lis3lv02d tpm input_polldev rng_core int3400_thermal
 acpi_thermal_rel hp_wireless acpi_pad evdev ac mac_hid thermal pci_stub vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsoc
k_virtio_transport_common vsock vhost_net vhost tap uhid hci_vhci bluetooth ecdh_generic rfkill ecc vfio_iommu_type1 vfio uinput userio ppp_generic slhc tun loop nvram btrfs xor raid6_pq lib
crc32c cuse fuse ext4 crc32c_generic crc16 mbcache jbd2 i915 intel_gtt i2c_algo_bit rtsx_pci_sdmmc drm_kms_helper mmc_core ahci syscopyarea sysfillrect libahci xhci_pci sysimgblt fb_sys_fops
 xhci_hcd libata crc32_pclmul drm crc32c_intel serio_raw usbcore rtsx_pci scsi_mod i2c_hid agpgart hid wmi
2019-11-04T12:31:44.706940+03:00 localhost kernel: [  263.974772]  pinctrl_sunrisepoint video pinctrl_intel button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: psmouse]
2019-11-04T12:31:44.706941+03:00 localhost kernel: [  263.974781] CPU: 3 PID: 138 Comm: kworker/u16:2 Tainted: G           O      5.3.8_2 #1
2019-11-04T12:31:44.706943+03:00 localhost kernel: [  263.974782] Hardware name: HP HP ProBook 450 G5/837D, BIOS Q85 Ver. 01.01.00 08/19/2017
2019-11-04T12:31:44.706944+03:00 localhost kernel: [  263.974835] Workqueue: i915 __i915_gem_free_work [i915]
2019-11-04T12:31:44.706945+03:00 localhost kernel: [  263.974850] RIP: 0010:ext4_set_page_dirty+0x3e/0x50 [ext4]
2019-11-04T12:31:44.706945+03:00 localhost kernel: [  263.974853] Code: 48 8b 00 a8 01 75 16 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 08 74 0d 48 8b 07 f6 c4 20 74 0f e9 32 6
7 bd c6 <0f> 0b 48 8b 07 f6 c4 20 75 f1 0f 0b e9 21 67 bd c6 90 0f 1f 44 00
2019-11-04T12:31:44.706947+03:00 localhost kernel: [  263.974854] RSP: 0018:ffffa0814039bd98 EFLAGS: 00010246
2019-11-04T12:31:44.706948+03:00 localhost kernel: [  263.974856] RAX: 017fff8000002036 RBX: ffff901a846f6000 RCX: 0000000000000000
2019-11-04T12:31:44.706949+03:00 localhost kernel: [  263.974857] RDX: 0000000000000000 RSI: 0000000000000282 RDI: fffff783cffce540
2019-11-04T12:31:44.706950+03:00 localhost kernel: [  263.974858] RBP: fffff783cffce540 R08: 00000000000f6798 R09: 0000000000000000
2019-11-04T12:31:44.706952+03:00 localhost kernel: [  263.974859] R10: 0000000000000000 R11: 0000000000000004 R12: 00000000003ff395
2019-11-04T12:31:44.706953+03:00 localhost kernel: [  263.974860] R13: ffff901a8093ed00 R14: ffff901a8343cf90 R15: 0000000000000000
2019-11-04T12:31:44.706954+03:00 localhost kernel: [  263.974861] FS:  0000000000000000(0000) GS:ffff901a888c0000(0000) knlGS:0000000000000000
2019-11-04T12:31:44.706955+03:00 localhost kernel: [  263.974862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-11-04T12:31:44.706956+03:00 localhost kernel: [  263.974863] CR2: 00007f0a9e0a1d78 CR3: 00000003b800a006 CR4: 00000000003606e0
2019-11-04T12:31:44.706957+03:00 localhost kernel: [  263.974864] Call Trace:
2019-11-04T12:31:44.706959+03:00 localhost kernel: [  263.974899]  i915_gem_userptr_put_pages+0x148/0x1d0 [i915]
2019-11-04T12:31:44.706960+03:00 localhost kernel: [  263.974934]  __i915_gem_object_put_pages+0x5e/0xa0 [i915]
2019-11-04T12:31:44.706961+03:00 localhost kernel: [  263.974968]  __i915_gem_free_objects+0x12c/0x240 [i915]
2019-11-04T12:31:44.706962+03:00 localhost kernel: [  263.975005]  __i915_gem_free_work+0x69/0x90 [i915]
2019-11-04T12:31:44.706963+03:00 localhost kernel: [  263.975014]  process_one_work+0x186/0x390
2019-11-04T12:31:44.706964+03:00 localhost kernel: [  263.975018]  worker_thread+0x50/0x3a0
2019-11-04T12:31:44.706965+03:00 localhost kernel: [  263.975023]  kthread+0xfb/0x130
2019-11-04T12:31:44.706966+03:00 localhost kernel: [  263.975027]  ? process_one_work+0x390/0x390
2019-11-04T12:31:44.706967+03:00 localhost kernel: [  263.975029]  ? kthread_park+0x80/0x80
2019-11-04T12:31:44.706967+03:00 localhost kernel: [  263.975035]  ret_from_fork+0x35/0x40
2019-11-04T12:31:44.706968+03:00 localhost kernel: [  263.975039] ---[ end trace b7a4449c28785cdf ]---
Comment 1 Theodore Tso 2019-11-04 14:32:37 UTC
Can you send the complete /var/log/messages and/or /var/log/kern.log from say, a day before the hang up to the hang?

These sorts of issues can be caused by I/O errors and/or other problems in the device driver and/or device mapper storage stack.   So please also describe your hardware and storage configuration, and what distribution are you using.

Also, is this a 5.3.8 upstream kernel which you compiled yourself or a distro-supplied kernel?   If you compiled it yourself, please also send the kernel config

Thanks!
Comment 2 Ivan Baidakou 2019-11-04 14:52:00 UTC
Created attachment 285781 [details]
/var/log/messages before I powered it off
Comment 3 Ivan Baidakou 2019-11-04 14:55:54 UTC
> So please also describe your hardware and storage configuration, and what
> distribution are you using.

I use void linux distribution. My notebook is HP Probook 450 G5 ( https://support.hp.com/gb-en/document/c05682645 ). 

> Also, is this a 5.3.8 upstream kernel which you compiled yourself or a
> distro-supplied kernel?   If you compiled it yourself, please also send the
> kernel config

I use my distro kernel. I think their patches/config can be found at  https://github.com/void-linux/void-packages/tree/master/srcpkgs/linux5.3/

Also, please note, that the reported issue (at least the reported stack trace) does not happen on 5.2.21 (again, shipped by void-linux). I currently us it.
Comment 4 Theodore Tso 2019-11-04 15:55:58 UTC
Hmm, so less than ten minutes after the system was booted, with no other interesting messages in /var/log/messages.

Is this reproducible?  If you boot 5.3.8 again, can you reliably reproduce the failure?   And what sort of workload are you running when the system goes south?

Thanks!
Comment 5 Ivan Baidakou 2019-11-04 16:11:51 UTC
Yes, the original message is reproduced every time I boot with 5.3.8 approximately after a few mins boot. I just launch my DE(awesome), Firefox, Telegram, Claws-mail, and terminal. Actually visually no buggy behavior can be observed, except the message in /var/log/messages.

The buggy behavior was in Saturnday, when 
1. I launched a WarThunder game, which does heavy I/O (I assume it calculates 
checksums of files, and if they mismatch, then download new asseets) - it just hanged.

2. Then I was sure that it is a bug in game, and removed it. A bit later I just spawned "make -j9" several times simultaneously for the same project (debug, release, and sanitizer-build) - and when it hanged, I finally payed attention to the original kernel message.
Comment 6 Ivan Baidakou 2019-11-04 16:12:57 UTC
I'm still not sure that the two events are linked: the kernel message and hangs on I/O a few hours later.
Comment 7 David Noriega 2019-11-05 21:12:40 UTC
I'm also seeing this message on Fedora 30, 5.3.8-200.fc30.x86_64

Could be the same as:
https://bugs.freedesktop.org/show_bug.cgi?id=112012
https://bugzilla.redhat.com/show_bug.cgi?id=1758948
Comment 8 Jan Kara 2019-12-18 12:52:51 UTC
I believe this got tracked down to a problematic change in i915 driver in https://bugs.freedesktop.org/show_bug.cgi?id=111601 and should be fixed by now. Closing.

Note You need to log in before you can comment on or make changes to this bug.