Bug 209275

Summary: Graphics freeze after WARNING: CPU: 2 PID: 156207 at fs/ext4/inode.c:3599 ext4_set_page_dirty+0x3e/0x50
Product: File System Reporter: Dennis Wagelaar (dwagelaar)
Component: ext4Assignee: fs_ext4 (fs_ext4)
Status: RESOLVED CODE_FIX    
Severity: normal CC: rhmcruiser
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.8.6 Subsystem:
Regression: No Bisected commit-id:
Attachments: /var/log/messages

Description Dennis Wagelaar 2020-09-15 07:24:45 UTC
Since updating from kernel 5.7.15 to 5.6.8, I get system freezes (graphics and input only, sound continues to work) about once every day.

Sep 15 08:51:36 styx kernel: ------------[ cut here ]------------
Sep 15 08:51:36 styx kernel: WARNING: CPU: 2 PID: 156207 at fs/ext4/inode.c:3599 ext4_set_page_dirty+0x3e/0x50
Sep 15 08:51:36 styx kernel: Modules linked in: hid_steam uinput vfat fat xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJE
CT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable
_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter it87(OE) hwmon_vid snd_hda_codec_realtek uvcvideo snd_hda_codec_generic vid
eobuf2_vmalloc videobuf2_memops ledtrig_audio snd_hda_codec_hdmi videobuf2_v4l2 snd_hda_intel videobuf2_common snd_intel_dspcfg snd_usb_audio snd_usbmidi_lib videodev snd_hda_codec snd_rawmidi edac_mce_amd joyde
v mc kvm_amd snd_hda_core snd_hwdep kvm snd_seq snd_seq_device irqbypass eeepc_wmi asus_wmi snd_pcm rapl sparse_keymap rfkill snd_timer video sp5100_tco wmi_bmof pcspkr snd
Sep 15 08:51:36 styx kernel: i2c_piix4 k10temp soundcore gpio_amdpt gpio_generic acpi_cpufreq binfmt_misc ip_tables dm_crypt hid_logitech_hidpp amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm crct10dif_pclmul drm_kms
_helper crc32_pclmul crc32c_intel cec ghash_clmulni_intel drm ccp r8169 nvme uas usb_storage nvme_core hid_logitech_dj wmi pinctrl_amd fuse
Sep 15 08:51:36 styx kernel: CPU: 2 PID: 156207 Comm: gnome-shell Tainted: G           OE     5.8.6-101.fc31.x86_64 #1
Sep 15 08:51:36 styx kernel: Hardware name: System manufacturer System Product Name/PRIME B350M-A, BIOS 5603 07/28/2020
Sep 15 08:51:36 styx kernel: RIP: 0010:ext4_set_page_dirty+0x3e/0x50
Sep 15 08:51:36 styx kernel: Code: 48 8b 00 a8 01 75 16 48 8b 57 08 48 8d 42 ff 83 e2 01 48 0f 44 c7 48 8b 00 a8 08 74 0d 48 8b 07 f6 c4 20 74 0f e9 92 ec f6 ff <0f> 0b 48 8b 07 f6 c4 20 75 f1 0f 0b e9 81 ec f6 
ff 90 0f 1f 44 00
Sep 15 08:51:36 styx kernel: RSP: 0018:ffffb0660e0b7b58 EFLAGS: 00010246
Sep 15 08:51:36 styx kernel: RAX: 0017ffffc0020016 RBX: ffffdf579f8b8240 RCX: 0000000000000000
Sep 15 08:51:36 styx kernel: RDX: 0000000000000000 RSI: 000055a8b037a000 RDI: ffffdf579f8b8240
Sep 15 08:51:36 styx kernel: RBP: ffff9c81a54a5bd0 R08: ffff9c8192920960 R09: 0000000000000000
Sep 15 08:51:36 styx kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffb0660e0b7ca8
Sep 15 08:51:36 styx kernel: R13: 000055a8b037b000 R14: 80000007e2e09845 R15: 000055a8b037a000
Sep 15 08:51:36 styx kernel: FS:  00007fbaa607f200(0000) GS:ffff9c81be880000(0000) knlGS:0000000000000000
Sep 15 08:51:36 styx kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 08:51:36 styx kernel: CR2: 00007fbaa9c69430 CR3: 000000025ccc4000 CR4: 00000000003406e0
Sep 15 08:51:36 styx kernel: Call Trace:
Sep 15 08:51:36 styx kernel: unmap_page_range+0xa8d/0xee0
Sep 15 08:51:36 styx kernel: unmap_vmas+0x6a/0xd0
Sep 15 08:51:36 styx kernel: exit_mmap+0x97/0x170
Sep 15 08:51:36 styx kernel: mmput+0x61/0x140
Sep 15 08:51:36 styx kernel: begin_new_exec+0x377/0x98c
Sep 15 08:51:36 styx kernel: load_elf_binary+0x13e/0x16f0
Sep 15 08:51:36 styx kernel: __do_execve_file.isra.0+0x5d7/0xb90
Sep 15 08:51:36 styx kernel: __x64_sys_execve+0x35/0x40
Sep 15 08:51:36 styx kernel: do_syscall_64+0x52/0x90
Sep 15 08:51:36 styx kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Sep 15 08:51:36 styx kernel: RIP: 0033:0x7fbaa9d104db
Sep 15 08:51:36 styx kernel: Code: Bad RIP value.
Sep 15 08:51:36 styx kernel: RSP: 002b:00007ffe62149158 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
Sep 15 08:51:36 styx kernel: RAX: ffffffffffffffda RBX: 00007fbaa607f180 RCX: 00007fbaa9d104db
Sep 15 08:51:36 styx kernel: RDX: 000055a8ad07acf0 RSI: 000055a8b25f8160 RDI: 000055a8ada2b9e0
Sep 15 08:51:36 styx kernel: RBP: 000000000000007b R08: 0000000000000000 R09: 00007fbaa6e984e0
Sep 15 08:51:36 styx kernel: R10: 000055a8b25f8160 R11: 0000000000000246 R12: 000055a8ada2b9e0
Sep 15 08:51:36 styx kernel: R13: 000055a8b25f8160 R14: 0000000000000016 R15: 0000000000000000
Sep 15 08:51:36 styx kernel: ---[ end trace 17be478a5fbbe377 ]---
Comment 1 Dennis Wagelaar 2020-09-15 08:15:05 UTC
Created attachment 292507 [details]
/var/log/messages
Comment 2 Monthero Ronald 2020-09-17 14:42:15 UTC
Checking the log messages file attachment in Comment #1 

The freeze like slowness that you seem to describe is probably caused by the gnome-shell dumping core often as seen in the messages log. 

I observe that the gnome-shell seems to be dumping core often in your environment. 

Sep 15 08:23:46 styx journal[67420]: clutter_actor_has_effects: assertion 'CLUTTER_IS_ACTOR (self)' failed
Sep 15 08:23:46 styx journal[67420]: Object .Gjs_GameModeIndicator (0x55a8b041c530), has been already deallocated — impossible to set any property on it. This might be caused by the object having been destroyed from C code using something such as destroy(), dispose(), or remove() vfuncs.
Sep 15 08:23:46 styx gnome-shell[67420]: == Stack trace for context 0x55a8ac63e3a0 ==
Sep 15 08:23:46 styx gnome-shell[67420]: #0   55a8b19fc6d8 i   /usr/share/gnome-shell/extensions/gamemode@christian.kellner.me/extension.js:223 (7fba10824820 @ 164)
Sep 15 08:23:46 styx gnome-shell[67420]: #1   7ffe62146c90 

...
...

ep 15 08:23:46 styx gnome-shell[67420]: #9   55a8b19fc420 i   resource:///org/gnome/gjs/modules/overrides/Gio.js:169 (7fba6c0d3e50 @ 39)
Sep 15 08:23:46 styx gnome-shell[67420]: == Stack trace for context 0x55a8ac63e3a0 ==
Sep 15 08:23:46 styx gnome-shell[67420]: #0   55a8b19fc6d8 i   /usr/share/gnome-shell/extensions/gamemode@christian.kellner.me/extension.js:231 (7fba10824820 @ 287)
Sep 15 08:23:46 styx gnome-shell[67420]: #1   7ffe62146c90 b   self-hosted:977 (7fba45130a60 @ 413)
Sep 15 08:23:46 styx gnome-shell[67420]: #2   7ffe62146d70 b   resource:///org/gnome/gjs/modules/signals.js:135 (7fba6c0c6040 @ 376)
Sep 15 08:23:46 styx journal[67420]: Object St.Icon (0x55a8ad298110), has been already deallocated — impossible to access it. This might be caused by the object having been destroyed from C code using something such as destroy(), dispose(), or remove() vfuncs.

...

Further on there has been a bad page error as well for the gnome-shell process: 


40837 Sep 15 08:51:36 styx kernel: file:(null) fault:0x0 mmap:0x0 readpage:0x0
40838 Sep 15 08:51:36 styx kernel: CPU: 2 PID: 156207 Comm: gnome-shell Tainted: G        W  OE     5.8.6-101.f      c31.x86_64 #1
40839 Sep 15 08:51:36 styx kernel: Hardware name: System manufacturer System Product Name/PRIME B350M-A, BIOS 5      603 07/28/2020
40840 Sep 15 08:51:36 styx kernel: Call Trace:
40841 Sep 15 08:51:36 styx kernel: dump_stack+0x6d/0x90
40842 Sep 15 08:51:36 styx kernel: print_bad_pte.cold+0x6a/0xd2
40843 Sep 15 08:51:36 styx kernel: unmap_page_range+0x919/0xee0
40844 Sep 15 08:51:36 styx kernel: unmap_vmas+0x6a/0xd0
40845 Sep 15 08:51:36 styx kernel: exit_mmap+0x97/0x170
40846 Sep 15 08:51:36 styx kernel: mmput+0x61/0x140
40847 Sep 15 08:51:36 styx kernel: begin_new_exec+0x377/0x98c
40848 Sep 15 08:51:36 styx kernel: load_elf_binary+0x13e/0x16f0
40849 Sep 15 08:51:36 styx kernel: __do_execve_file.isra.0+0x5d7/0xb90
40850 Sep 15 08:51:36 styx kernel: __x64_sys_execve+0x35/0x40
40851 Sep 15 08:51:36 styx kernel: do_syscall_64+0x52/0x90
40852 Sep 15 08:51:36 styx kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
40853 Sep 15 08:51:36 styx kernel: RIP: 0033:0x7fbaa9d104db
40854 Sep 15 08:51:36 styx kernel: Code: Bad RIP value.
.. 

40828 Sep 15 08:51:36 styx kernel: BUG: Bad page map in process gnome-shell  pte:80000007e2e09845 pmd:7e54a5067
40829 Sep 15 08:51:36 styx kernel: page:ffffdf579f8b8240 refcount:1 mapcount:-1 mapping:000000009b19d3d2 index:      0x61aa
40830 Sep 15 08:51:36 styx kernel: mapping->aops:ext4_da_aops dentry name:"system@afae2c84506c4614bd5df0c7f65c5      045-0000000001d0a0ab-0005a448a7d41f69.journal"
40831 Sep 15 08:51:36 styx kernel: flags: 0x17ffffc002001e(referenced|uptodate|dirty|lru|mappedtodisk)
40832 Sep 15 08:51:36 styx kernel: raw: 0017ffffc002001e ffffdf579efb8208 ffffdf579f907148 ffff9c81ae253178
40833 Sep 15 08:51:36 styx kernel: raw: 00000000000061aa 0000000000000000 00000001fffffffe ffff9c81954d4000
40834 Sep 15 08:51:36 styx kernel: page dumped because: bad pte
40835 Sep 15 08:51:36 styx kernel: page->mem_cgroup:ffff9c81954d4000
40836 Sep 15 08:51:36 styx kernel: addr:000055a8b037a000 vm_flags:08100073 anon_vma:ffff9c8151ad0ec8 mapping:00      00000000000000 index:55a8b037a


bash-5.0$ grep -Ri "Bad Page" messages
Sep 15 08:51:36 styx kernel: BUG: Bad page map in process gnome-shell  pte:80000007e2e09845 pmd:7e54a5067
Sep 15 08:51:36 styx kernel: BUG: Bad page state in process gnome-shell  pfn:7e2e09
Comment 3 Dennis Wagelaar 2020-09-18 08:50:37 UTC
The core dumps for gnome-shell stop after disabling shell extensions. These core dumps can be reliably triggered for both gnome-shell with extensions and firefox by putting the system into S3 sleep and waking it back up (something I do multiple times per day).

I upgraded to kernel 5.8.9-101.fc31.x86_64 this morning. I'll report back whether this issue persists or not.
Comment 4 Dennis Wagelaar 2020-09-23 19:00:39 UTC
This bug no longer occurs under kernel 5.8.9-101.fc31.x86_64 - closing.
Comment 5 Monthero Ronald 2020-09-24 00:18:10 UTC
(In reply to Dennis Wagelaar from comment #3)
> The core dumps for gnome-shell stop after disabling shell extensions. These
> core dumps can be reliably triggered for both gnome-shell with extensions
> and firefox by putting the system into S3 sleep and waking it back up
> (something I do multiple times per day).
> 
> I upgraded to kernel 5.8.9-101.fc31.x86_64 this morning. I'll report back
> whether this issue persists or not.

Hi Dennis, 
Yes its quite indicative that the gnome-shell hang (graphics and input) and repeated core dumps of gnome-shell were triggering this behavior.
Comment 6 Monthero Ronald 2020-09-24 00:22:54 UTC
(In reply to Dennis Wagelaar from comment #4)
> This bug no longer occurs under kernel 5.8.9-101.fc31.x86_64 - closing.
Ah good to know issue is resolved with upgrade to newer kernel. (Probbaly the issue was within the gnome-shell where the above stack was indicating a use after free sort of condition in the gnome userspace code causing core dumps.  


ep 15 08:23:46 styx gnome-shell[67420]: #9   55a8b19fc420 i   resource:///org/gnome/gjs/modules/overrides/Gio.js:169 (7fba6c0d3e50 @ 39)
Sep 15 08:23:46 styx gnome-shell[67420]: == Stack trace for context 0x55a8ac63e3a0 ==
Sep 15 08:23:46 styx gnome-shell[67420]: #0   55a8b19fc6d8 i   /usr/share/gnome-shell/extensions/gamemode@christian.kellner.me/extension.js:231 (7fba10824820 @ 287)
Sep 15 08:23:46 styx gnome-shell[67420]: #1   7ffe62146c90 b   self-hosted:977 (7fba45130a60 @ 413)
Sep 15 08:23:46 styx gnome-shell[67420]: #2   7ffe62146d70 b   resource:///org/gnome/gjs/modules/signals.js:135 (7fba6c0c6040 @ 376)
Sep 15 08:23:46 styx journal[67420]: Object St.Icon (0x55a8ad298110), has been already deallocated — impossible to access it. This might be caused by the object having been destroyed from C code using something such as destroy(), dispose(), or remove() vfuncs.