Bug 217158

Summary: stutter with kernel 6.2
Product: Other Reporter: proteve
Component: OtherAssignee: other_other
Status: NEEDINFO ---    
Severity: high CC: mario.limonciello, rafael.ristovski, regressions, vinibali1
Priority: P1    
Hardware: AMD   
OS: Linux   
Kernel Version: Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg output while playing the stuttering game with kernel 6.2.2

Description proteve 2023-03-07 17:05:31 UTC
After upgrading to kernel 6.2 provided by arch linux my games started to stutter very much. my system is a hp laptop 4500u and the of course i use amd APU.


if i downgrade the kernel to 6.1.xx i have no stutters again.

the system never had stutters until 6.2 but i also had tpm disabled in bios for quite a few months now.

on reddit (https://www.reddit.com/r/openSUSE/comments/11fr7lb/62_update_today_desktop_stutterssluggish_dont/?sort=new)  i've seen there are other people complaining about stutters with 6.2.

also sorry if the bug category is wrong.
Comment 1 Artem S. Tashkinov 2023-03-08 04:58:34 UTC
Would be great if you bisected otherwise this bug will likely linger on forever and never be acted upon.
Comment 2 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-08 06:27:49 UTC
(In reply to proteve from comment #0)
> i also had tpm disabled in bios for quite a few months now.

There is a known bug with TPM, the fix is pending:

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=8699d5244e37f8b3bf0e1deaa8e17f94130677b0

But that should happen with 6.1 as well (maybe just less often?).  Might be worth disabling it again. If not a bisection would be really helpful.
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-08 06:35:16 UTC
Are you using Btrfs? It's just a wild guess, but maybe you see discard storms:
https://lore.kernel.org/linux-btrfs/Y%2F%2Bn1wS%2F4XAH7X1p@nz/

Maybe try if temporarily disabling discard helps.
Comment 4 user.for.forms 2023-03-08 06:40:43 UTC
Created attachment 303899 [details]
attachment-24879-0.html

i still have TPM disabled in bios. so nothing change with ftpm.
and my partition is ext4

⁣Get BlueMail for Android ​

On Mar 8, 2023, 08:35, at 08:35, bugzilla-daemon@kernel.org wrote:
>https://bugzilla.kernel.org/show_bug.cgi?id=217158
>
>--- Comment #3 from The Linux kernel's regression tracker (Thorsten
>Leemhuis) (regressions@leemhuis.info) ---
>Are you using Btrfs? It's just a wild guess, but maybe you see discard
>storms:
>https://lore.kernel.org/linux-btrfs/Y%2F%2Bn1wS%2F4XAH7X1p@nz/
>
>Maybe try if temporarily disabling discard helps.
>
>--
>You may reply to this email to add a comment.
>
>You are receiving this mail because:
>You reported the bug.
Comment 5 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-08 08:44:52 UTC
Mario, are you by any chance aware of any regressions leading to stuttering (games in this case) that are not caused by known issues like the fTPM issue or the btrfs discard storms (see above). This ticket and the reddit thread (see above) makes me wonder if that something (graphics related) might be lingering somewhere.
Comment 6 Mario Limonciello (AMD) 2023-03-08 13:46:35 UTC
I'm not aware anything else to cause a stutter.  If it's still there in 6.2.2, I think we need a bisect.
Comment 7 proteve 2023-03-08 18:10:16 UTC
6.2.2 stutters too
Comment 8 Mario Limonciello (AMD) 2023-03-08 18:12:36 UTC
OK, so next steps then:

1) Please share a full dmesg log from boot up until after the stutter has occurred.  We can check in case anything stands out in it.

2) Please run a bisect between v6.1 and v6.2: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html

Ideally leave it to v6.1 and v6.2 not 6.1.y and 6.2.y to avoid complexity of patches that were backported from 6.3 influencing the result.
Comment 9 proteve 2023-03-08 18:59:24 UTC
Created attachment 303903 [details]
dmesg output while playing the stuttering game with kernel 6.2.2
Comment 10 proteve 2023-03-08 19:05:44 UTC
to make it clear: 6.1.x had no stutters for me. the last version i had was 6.1.9
stutters started with 6.2.1 (this is the first version of 6.2.x that i had installed)

so to bisect is probably last version of 6.1.x to 6.2.1. but i have no clue how to do this on arch. ive tried to compile 6.2.rc1 and had error under archlinux. i really don't know how this works. i understand the idea but i don't know how to make it work.
Comment 11 Mario Limonciello (AMD) 2023-03-08 19:09:32 UTC
> to make it clear: 6.1.x had no stutters for me. the last version i had was
> 6.1.9
stutters started with 6.2.1 (this is the first version of 6.2.x that i had installed)

Bisect works best with linear history, which is why I'm suggesting to ignore the .y's.
You can compile v6.2 and v6.1 to confirm your start/end points.

> so to bisect is probably last version of 6.1.x to 6.2.1. but i have no clue
> how to do this on arch. ive tried to compile 6.2.rc1 and had error under
> archlinux. i really don't know how this works. i understand the idea but i
> don't know how to make it work.

https://wiki.archlinux.org/title/Kernel/Arch_Build_System
I think you basically want to check out the kernel and change your "PKGBUILD" each time, install that and then record your results.
Comment 12 proteve 2023-03-09 16:38:47 UTC
i've manage to compile 6.2 rc1 and the bug is present there. i don't know how to go lower than than. 
i've tried kervel 6.1.12 (from arch) and the bug is not present there.
as i said before, i think the bug started with 6.2. i dont know when but rc1 has the bug.
Comment 13 Mario Limonciello (AMD) 2023-03-09 17:16:05 UTC
What you'll need to do is:
1) Check out your source tree
2) Add packaging stuff
3) git bisect start --term-new=broken --term-old=-fixed
4) git bisect broken v6.2
5) git bisect fixed v6.1
6) build package
7) install package/reboot
8) git bisect $ACTION

Then repeat 6-8 until you have a conclusion.
Comment 14 proteve 2023-03-09 18:47:34 UTC
git bisect start --term-new=broken --term-old=-fixed
status: waiting for both good and bad commits

git bisect broken v6.2
error: Bad rev input: v6.2

git bisect fixed v6.1
error: unknown subcommand: `fixed'
...

i give up on this bisect. it really is beyond my capabilities (including the 45-60min to compile a kernel)
Comment 15 Mario Limonciello (AMD) 2023-03-09 18:49:42 UTC
>status: waiting for both good and bad commits

I think I had a typo with an extra dash.  The command should be:

# git bisect start --term-new=broken --term-old=fixed

> error: Bad rev input: v6.2

Your checkout might be missing tags.  Hopefully it's not a --depth 0 checkout.
Comment 16 proteve 2023-03-09 21:29:01 UTC
git fetch --tags  --verbose

gives
.....
 = [up to date]                v6.0.9-arch1   -> v6.0.9-arch1
 = [up to date]                v6.1-arch1     -> v6.1-arch1
 = [up to date]                v6.1.1-arch1   -> v6.1.1-arch1
 = [up to date]                v6.1.10-arch1  -> v6.1.10-arch1
. ....
 = [up to date]                v6.2.2-arch1   -> v6.2.2-arch1
 = [up to date]                v6.2.2-arch2   -> v6.2.2-arch2
Comment 17 Mario Limonciello (AMD) 2023-03-09 21:46:52 UTC
Those should work too then as long as there is relatively linear history between them.
Comment 18 Balazs Vinarz 2023-03-11 05:57:33 UTC
could you please try the ondemand governor?
i remember back in end of last year I had some time to play and schedutil was very laggy back then. it helped to avoid micro shutters with the 2400G.
it seems to be much better than schedutil with 5700G as well.
I'm compiling with 17 cores at the moment and it's much better now, than just browsing with schedutil.
Comment 19 Balazs Vinarz 2023-03-11 05:58:15 UTC
I forgot to mention that, I'm using the official 5.15.83-1-lts package at the moment.
Comment 20 proteve 2023-03-11 08:33:36 UTC
i've tried performance and ondemand governors but still the same stutters. as for compiling kernel i gave up
Comment 21 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-11 15:09:21 UTC
Balazs, you seem to have a problem that happens in 5.15 (and earlier?), while 
proteve has one that started in 6.2. That's a strong indicator that you have a different problem. Please open a new ticket and mention it here, otherwise things get confusing.

BTW, is "5.15.83-1-lts" vanilla or close to vanilla?  if not it's something you should report to your distro.
Comment 22 proteve 2023-03-14 14:10:52 UTC
made a new discovery. the stutters happens only on plasma with wayland. if i switch to X11 there is almost no stutters.
why almost?! well the first few tests was done under cinnamon and everything was ok execpt 1 time when the game stutter like before. but after a restart of the game everything was back to normal.

but under kde with wayland the game will stutter every time. no problem witk kde and x11 so far
Comment 23 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-23 14:12:47 UTC
(In reply to proteve from comment #22)
> made a new discovery. the stutters happens only on plasma with wayland. if i
> switch to X11 there is almost no stutters.
> why almost?! well the first few tests was done under cinnamon and everything
> was ok execpt 1 time when the game stutter like before. but after a restart
> of the game everything was back to normal.
> 
> but under kde with wayland the game will stutter every time. no problem witk
> kde and x11 so far

I'm still tracking this and wonder: does anyone still care or can we consider this report settled? 

If not: proteve, did you ever try a bisection? Without one it seems unlikely that anyone will look into this.
Comment 24 proteve 2023-03-23 17:59:53 UTC
as i've said before, i've tried to bisect this using the makepkg system of arch. but i always get error at the end (i'm guessing) about some missing folders. if i reset the source back i have no error. so using the arch build system with bisect is not working and i have no clue about the error. i'm not that smart. the conclusion is that the bug is there on wayland. so i'm back on x11
Comment 25 Balazs Vinarz 2023-03-26 06:18:09 UTC
(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #21)
> Balazs, you seem to have a problem that happens in 5.15 (and earlier?),
> while 
> proteve has one that started in 6.2. That's a strong indicator that you have
> a different problem. Please open a new ticket and mention it here, otherwise
> things get confusing.
> 
> BTW, is "5.15.83-1-lts" vanilla or close to vanilla?  if not it's something
> you should report to your distro.

I just filed a new bug report.
https://bugzilla.kernel.org/show_bug.cgi?id=217249
Comment 26 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-03-27 09:58:55 UTC
(In reply to Balazs Vinarz from comment #25)

> I just filed a new bug report.
> https://bugzilla.kernel.org/show_bug.cgi?id=217249

TWIMC: there it turned out the kernel parameter "nopat" causes this. Might that be a problem here as well?
Comment 27 proteve 2023-03-27 14:37:07 UTC
i have only these in conf file
options root=PARTUUID=ff96b9eb-d86c-40e9-b45f-7e3dec84e099 rw intel_pstate=no_hwp rootfstype=ext4
Comment 28 proteve 2023-04-13 16:22:26 UTC
i just made a new discovery
the game im playing most of the time has issue when i enable the vertical sync option (in game) under x11. it stutters.
now i realize that under wayland the issue seems to be reverse since this bug started for me. i use to have vsync disabled in game. so now if i enable the vsync in game the stutter  bug disapears. i have to test with other games to see if its the same
Comment 29 proteve 2023-04-13 17:13:26 UTC
i've tried another game and the same. with vsync enable in game the stutter is gone. i also notice that the stutter was even more accentuated when the mouse was move on the screen. but everything was back to normal once vsync is enable.
Comment 30 proteve 2023-04-13 20:20:06 UTC
i've made a few more test after downgrading kernel to 6.1.12

under x11 (cinnamon DE) there is no issue with or without vsync
on wayland the games works with one issue: 1 game behave the same as it was on x11 with 6.2.x as it stutters with vsync on when mouse is move over the objects that can be selected. otherwise no stutter is present.

any changes to vsync in 6.2?
Comment 31 proteve 2023-04-15 09:49:10 UTC
i finally had some time with these eastern holidays so i manage to bisect this.

results:

git bisect broken
977d97f18b5b8efb7a94da84724113f15ae6cc2d is the first broken commit
commit 977d97f18b5b8efb7a94da84724113f15ae6cc2d
Author: Luben Tuikov <luben.tuikov@amd.com>
Date:   Mon Oct 24 17:26:34 2022 -0400

    drm/scheduler: Set the FIFO scheduling policy as the default
    
    The currently default Round-Robin GPU scheduling can result in starvation
    of entities which have a large number of jobs, over entities which have
    a very small number of jobs (single digit).
    
    This can be illustrated in the following diagram, where jobs are
    alphabetized to show their chronological order of arrival, where job A is
    the oldest, B is the second oldest, and so on, to J, the most recent job to
    arrive.
    
        ---> entities
    j | H-F-----A--E--I--
    o | --G-----B-----J--
    b | --------C--------
    s\/ --------D--------
    
    WLOG, assuming all jobs are "ready", then a R-R scheduling will execute them
    in the following order (a slice off of the top of the entities' list),
    
    H, F, A, E, I, G, B, J, C, D.
    
    However, to mitigate job starvation, we'd rather execute C and D before E,
    and so on, given, of course, that they're all ready to be executed.
    
    So, if all jobs are ready at this instant, the order of execution for this
    and the next 9 instances of picking the next job to execute, should really
    be,
    
    A, B, C, D, E, F, G, H, I, J,
    
    which is their chronological order. The only reason for this order to be
    broken, is if an older job is not yet ready, but a younger job is ready, at
    an instant of picking a new job to execute. For instance if job C wasn't
    ready at time 2, but job D was ready, then we'd pick job D, like this:
    
    0 +1 +2  ...
    A, B, D, ...
    
    And from then on, C would be preferred before all other jobs, if it is ready
    at the time when a new job for execution is picked. So, if C became ready
    two steps later, the execution order would look like this:
    
    ......0 +1 +2  ...
    A, B, D, E, C, F, G, H, I, J
    
    This is what the FIFO GPU scheduling algorithm achieves. It uses a
    Red-Black tree to keep jobs sorted in chronological order, where picking
    the oldest job is O(1) (we use the "cached" structure), and balancing the
    tree is O(log n). IOW, it picks the *oldest ready* job to execute now.
    
    The implementation is already in the kernel, and this commit only changes
    the default GPU scheduling algorithm to use.
    
    This was tested and achieves about 1% faster performance over the Round
    Robin algorithm.
    
    Cc: Christian König <christian.koenig@amd.com>
    Cc: Alex Deucher <Alexander.Deucher@amd.com>
    Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
    Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20221024212634.27230-1-luben.tuikov@amd.com
    Signed-off-by: Christian König <christian.koenig@amd.com>

 drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 32 The Linux kernel's regression tracker (Thorsten Leemhuis) 2023-04-17 10:40:50 UTC
(In reply to proteve from comment #31)
> i finally had some time with these eastern holidays so i manage to bisect
> this.
> 
> results:
> 
> git bisect broken
> 977d97f18b5b8efb7a94da84724113f15ae6cc2d is the first broken commit
> commit 977d97f18b5b8efb7a94da84724113f15ae6cc2d
> Author: Luben Tuikov <luben.tuikov@amd.com>
> Date:   Mon Oct 24 17:26:34 2022 -0400
> 
>     drm/scheduler: Set the FIFO scheduling policy as the default

in that case please report this to https://gitlab.freedesktop.org/drm/amd/-/issues , as that's where amdgpu maintainers expect reports; afterwards please drop a link to the report here.
Comment 33 proteve 2023-04-17 10:59:21 UTC
ok. here is the link for bug report
https://gitlab.freedesktop.org/drm/amd/-/issues/2516