Bug 216399 - 5.19.1 CPU Stalls on MD RAID
Summary: 5.19.1 CPU Stalls on MD RAID
Status: RESOLVED DUPLICATE of bug 216388
Alias: None
Product: File System
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: fs_other
URL:
Keywords:
: 216405 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-08-22 22:13 UTC by Robert Dinse
Modified: 2022-09-04 04:18 UTC (History)
0 users

See Also:
Kernel Version: 5.19.1
Subsystem:
Regression: No
Bisected commit-id:


Attachments
The configuration file used to compile this kernel (261.96 KB, text/plain)
2022-08-22 22:13 UTC, Robert Dinse
Details

Description Robert Dinse 2022-08-22 22:13:00 UTC
Created attachment 301633 [details]
The configuration file used to compile this kernel

[83804.579035] INFO: task jbd2/md0p5-8:1337 blocked for more than 122 seconds.
[83804.579040]       Not tainted 5.19.1 #1
[83804.579042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[83804.579043] task:jbd2/md0p5-8    state:D stack:    0 pid: 1337 ppid:     2 flags:0x00004000
[83804.579046] Call Trace:
[83804.579047]  <TASK>
[83804.579050]  __schedule+0x367/0x1400
[83804.579053]  ? __submit_bio+0x6c/0x170
[83804.579056]  ? submit_bio_noacct_nocheck+0xe5/0x2d0
[83804.579058]  schedule+0x49/0xb0
[83804.579060]  io_schedule+0x46/0x80
[83804.579062]  bit_wait_io+0x11/0x70
[83804.579063]  __wait_on_bit+0x4a/0x110
[83804.579065]  ? bit_wait+0x70/0x70
[83804.579066]  out_of_line_wait_on_bit+0x8c/0xb0
[83804.579068]  ? swake_up_one+0x70/0x70
[83804.579070]  __wait_on_buffer+0x2b/0x40
[83804.579072]  jbd2_journal_commit_transaction+0x143f/0x17f0
[83804.579077]  kjournald2+0xb4/0x270
[83804.579079]  ? destroy_sched_domains_rcu+0x30/0x30
[83804.579081]  ? load_superblock.part.0+0xc0/0xc0
[83804.579083]  kthread+0xce/0xf0
[83804.579085]  ? kthread_complete_and_exit+0x20/0x20
[83804.579086]  ret_from_fork+0x1f/0x30
[83804.579089]  </TASK>
[83804.579090] INFO: task jbd2/md0p4-8:1343 blocked for more than 122 seconds.
[83804.579092]       Not tainted 5.19.1 #1
[83804.579093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[83804.579094] task:jbd2/md0p4-8    state:D stack:    0 pid: 1343 ppid:     2 flags:0x00004000
[83804.579096] Call Trace:
[83804.579096]  <TASK>
[83804.579097]  __schedule+0x367/0x1400
[83804.579099]  ? __submit_bio+0x6c/0x170
[83804.579101]  ? submit_bio_noacct_nocheck+0xe5/0x2d0
[83804.579102]  schedule+0x49/0xb0
[83804.579104]  io_schedule+0x46/0x80
[83804.579105]  bit_wait_io+0x11/0x70
[83804.579107]  __wait_on_bit+0x4a/0x110
[83804.579108]  ? bit_wait+0x70/0x70
[83804.579110]  out_of_line_wait_on_bit+0x8c/0xb0
[83804.579112]  ? swake_up_one+0x70/0x70
[83804.579113]  __wait_on_buffer+0x2b/0x40
[83804.579114]  jbd2_journal_commit_transaction+0x143f/0x17f0
[83804.579118]  kjournald2+0xb4/0x270
[83804.579120]  ? destroy_sched_domains_rcu+0x30/0x30

     I am also getting CPU stalls on KVM task, so don't know if this is related or not, that is bug #216388.

     I apologize if this is not the right section, wasn't sure if this would be a file system or driver related issue.
[83804.579122]  ? load_superblock.part.0+0xc0/0xc0
[83804.579124]  kthread+0xce/0xf0
[83804.579125]  ? kthread_complete_and_exit+0x20/0x20
[83804.579127]  ret_from_fork+0x1f/0x30
[83804.579129]  </TASK>
Comment 1 Robert Dinse 2022-08-22 22:14:58 UTC
The hardware platform is an Intel i7-6700K based machine.  The operating system is Ubuntu 22.04.  This is not the kernel supplied with Ubuntu however but one built from source from kernel.org with the attached config file and compiled with GCC 12.1.0.
Comment 2 Robert Dinse 2022-08-22 22:16:01 UTC
Wish bugzilla had an edit function as what I had hoped to type at the end of the paste somehow ended up in the middle.
Comment 3 Artem S. Tashkinov 2022-08-26 06:45:23 UTC
Please perform git bisect:

https://docs.kernel.org/admin-guide/bug-bisect.html
https://ldpreload.com/blog/git-bisect-run
https://wiki.gentoo.org/wiki/Kernel_git-bisect

It looks like it's not a widely spread/known issue.
Comment 4 Robert Dinse 2022-08-26 07:54:08 UTC
     I can not because these machines are in service providing services to customers.  I have a narrow maintenance window on Friday night where I do kernel upgrades, but it takes sometimes 3 days for these to occur.
Comment 5 Robert Dinse 2022-09-04 04:17:15 UTC
     It's become fairly obvious that this part part of bug #216388 and this bug was definitely introduced between 5.18.19 which runs clean and 5.19.0 which does not.  These CPU stalls happen pretty much everywhere.

*** This bug has been marked as a duplicate of bug 216388 ***
Comment 6 Robert Dinse 2022-09-04 04:18:55 UTC
*** Bug 216405 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.