Bug 217158
Summary: | stutter with kernel 6.2 | ||
---|---|---|---|
Product: | Other | Reporter: | proteve |
Component: | Other | Assignee: | other_other |
Status: | NEEDINFO --- | ||
Severity: | high | CC: | mario.limonciello, rafael.ristovski, regressions, vinibali1 |
Priority: | P1 | ||
Hardware: | AMD | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: | dmesg output while playing the stuttering game with kernel 6.2.2 |
Description
proteve
2023-03-07 17:05:31 UTC
Would be great if you bisected otherwise this bug will likely linger on forever and never be acted upon. (In reply to proteve from comment #0) > i also had tpm disabled in bios for quite a few months now. There is a known bug with TPM, the fix is pending: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=master&id=8699d5244e37f8b3bf0e1deaa8e17f94130677b0 But that should happen with 6.1 as well (maybe just less often?). Might be worth disabling it again. If not a bisection would be really helpful. Are you using Btrfs? It's just a wild guess, but maybe you see discard storms: https://lore.kernel.org/linux-btrfs/Y%2F%2Bn1wS%2F4XAH7X1p@nz/ Maybe try if temporarily disabling discard helps. Created attachment 303899 [details] attachment-24879-0.html i still have TPM disabled in bios. so nothing change with ftpm. and my partition is ext4 Get BlueMail for Android On Mar 8, 2023, 08:35, at 08:35, bugzilla-daemon@kernel.org wrote: >https://bugzilla.kernel.org/show_bug.cgi?id=217158 > >--- Comment #3 from The Linux kernel's regression tracker (Thorsten >Leemhuis) (regressions@leemhuis.info) --- >Are you using Btrfs? It's just a wild guess, but maybe you see discard >storms: >https://lore.kernel.org/linux-btrfs/Y%2F%2Bn1wS%2F4XAH7X1p@nz/ > >Maybe try if temporarily disabling discard helps. > >-- >You may reply to this email to add a comment. > >You are receiving this mail because: >You reported the bug. Mario, are you by any chance aware of any regressions leading to stuttering (games in this case) that are not caused by known issues like the fTPM issue or the btrfs discard storms (see above). This ticket and the reddit thread (see above) makes me wonder if that something (graphics related) might be lingering somewhere. I'm not aware anything else to cause a stutter. If it's still there in 6.2.2, I think we need a bisect. 6.2.2 stutters too OK, so next steps then: 1) Please share a full dmesg log from boot up until after the stutter has occurred. We can check in case anything stands out in it. 2) Please run a bisect between v6.1 and v6.2: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html Ideally leave it to v6.1 and v6.2 not 6.1.y and 6.2.y to avoid complexity of patches that were backported from 6.3 influencing the result. Created attachment 303903 [details]
dmesg output while playing the stuttering game with kernel 6.2.2
to make it clear: 6.1.x had no stutters for me. the last version i had was 6.1.9 stutters started with 6.2.1 (this is the first version of 6.2.x that i had installed) so to bisect is probably last version of 6.1.x to 6.2.1. but i have no clue how to do this on arch. ive tried to compile 6.2.rc1 and had error under archlinux. i really don't know how this works. i understand the idea but i don't know how to make it work. > to make it clear: 6.1.x had no stutters for me. the last version i had was > 6.1.9 stutters started with 6.2.1 (this is the first version of 6.2.x that i had installed) Bisect works best with linear history, which is why I'm suggesting to ignore the .y's. You can compile v6.2 and v6.1 to confirm your start/end points. > so to bisect is probably last version of 6.1.x to 6.2.1. but i have no clue > how to do this on arch. ive tried to compile 6.2.rc1 and had error under > archlinux. i really don't know how this works. i understand the idea but i > don't know how to make it work. https://wiki.archlinux.org/title/Kernel/Arch_Build_System I think you basically want to check out the kernel and change your "PKGBUILD" each time, install that and then record your results. i've manage to compile 6.2 rc1 and the bug is present there. i don't know how to go lower than than. i've tried kervel 6.1.12 (from arch) and the bug is not present there. as i said before, i think the bug started with 6.2. i dont know when but rc1 has the bug. What you'll need to do is: 1) Check out your source tree 2) Add packaging stuff 3) git bisect start --term-new=broken --term-old=-fixed 4) git bisect broken v6.2 5) git bisect fixed v6.1 6) build package 7) install package/reboot 8) git bisect $ACTION Then repeat 6-8 until you have a conclusion. git bisect start --term-new=broken --term-old=-fixed status: waiting for both good and bad commits git bisect broken v6.2 error: Bad rev input: v6.2 git bisect fixed v6.1 error: unknown subcommand: `fixed' ... i give up on this bisect. it really is beyond my capabilities (including the 45-60min to compile a kernel) >status: waiting for both good and bad commits I think I had a typo with an extra dash. The command should be: # git bisect start --term-new=broken --term-old=fixed > error: Bad rev input: v6.2 Your checkout might be missing tags. Hopefully it's not a --depth 0 checkout. git fetch --tags --verbose gives ..... = [up to date] v6.0.9-arch1 -> v6.0.9-arch1 = [up to date] v6.1-arch1 -> v6.1-arch1 = [up to date] v6.1.1-arch1 -> v6.1.1-arch1 = [up to date] v6.1.10-arch1 -> v6.1.10-arch1 . .... = [up to date] v6.2.2-arch1 -> v6.2.2-arch1 = [up to date] v6.2.2-arch2 -> v6.2.2-arch2 Those should work too then as long as there is relatively linear history between them. could you please try the ondemand governor? i remember back in end of last year I had some time to play and schedutil was very laggy back then. it helped to avoid micro shutters with the 2400G. it seems to be much better than schedutil with 5700G as well. I'm compiling with 17 cores at the moment and it's much better now, than just browsing with schedutil. I forgot to mention that, I'm using the official 5.15.83-1-lts package at the moment. i've tried performance and ondemand governors but still the same stutters. as for compiling kernel i gave up Balazs, you seem to have a problem that happens in 5.15 (and earlier?), while proteve has one that started in 6.2. That's a strong indicator that you have a different problem. Please open a new ticket and mention it here, otherwise things get confusing. BTW, is "5.15.83-1-lts" vanilla or close to vanilla? if not it's something you should report to your distro. made a new discovery. the stutters happens only on plasma with wayland. if i switch to X11 there is almost no stutters. why almost?! well the first few tests was done under cinnamon and everything was ok execpt 1 time when the game stutter like before. but after a restart of the game everything was back to normal. but under kde with wayland the game will stutter every time. no problem witk kde and x11 so far (In reply to proteve from comment #22) > made a new discovery. the stutters happens only on plasma with wayland. if i > switch to X11 there is almost no stutters. > why almost?! well the first few tests was done under cinnamon and everything > was ok execpt 1 time when the game stutter like before. but after a restart > of the game everything was back to normal. > > but under kde with wayland the game will stutter every time. no problem witk > kde and x11 so far I'm still tracking this and wonder: does anyone still care or can we consider this report settled? If not: proteve, did you ever try a bisection? Without one it seems unlikely that anyone will look into this. as i've said before, i've tried to bisect this using the makepkg system of arch. but i always get error at the end (i'm guessing) about some missing folders. if i reset the source back i have no error. so using the arch build system with bisect is not working and i have no clue about the error. i'm not that smart. the conclusion is that the bug is there on wayland. so i'm back on x11 (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #21) > Balazs, you seem to have a problem that happens in 5.15 (and earlier?), > while > proteve has one that started in 6.2. That's a strong indicator that you have > a different problem. Please open a new ticket and mention it here, otherwise > things get confusing. > > BTW, is "5.15.83-1-lts" vanilla or close to vanilla? if not it's something > you should report to your distro. I just filed a new bug report. https://bugzilla.kernel.org/show_bug.cgi?id=217249 (In reply to Balazs Vinarz from comment #25) > I just filed a new bug report. > https://bugzilla.kernel.org/show_bug.cgi?id=217249 TWIMC: there it turned out the kernel parameter "nopat" causes this. Might that be a problem here as well? i have only these in conf file options root=PARTUUID=ff96b9eb-d86c-40e9-b45f-7e3dec84e099 rw intel_pstate=no_hwp rootfstype=ext4 i just made a new discovery the game im playing most of the time has issue when i enable the vertical sync option (in game) under x11. it stutters. now i realize that under wayland the issue seems to be reverse since this bug started for me. i use to have vsync disabled in game. so now if i enable the vsync in game the stutter bug disapears. i have to test with other games to see if its the same i've tried another game and the same. with vsync enable in game the stutter is gone. i also notice that the stutter was even more accentuated when the mouse was move on the screen. but everything was back to normal once vsync is enable. i've made a few more test after downgrading kernel to 6.1.12 under x11 (cinnamon DE) there is no issue with or without vsync on wayland the games works with one issue: 1 game behave the same as it was on x11 with 6.2.x as it stutters with vsync on when mouse is move over the objects that can be selected. otherwise no stutter is present. any changes to vsync in 6.2? i finally had some time with these eastern holidays so i manage to bisect this. results: git bisect broken 977d97f18b5b8efb7a94da84724113f15ae6cc2d is the first broken commit commit 977d97f18b5b8efb7a94da84724113f15ae6cc2d Author: Luben Tuikov <luben.tuikov@amd.com> Date: Mon Oct 24 17:26:34 2022 -0400 drm/scheduler: Set the FIFO scheduling policy as the default The currently default Round-Robin GPU scheduling can result in starvation of entities which have a large number of jobs, over entities which have a very small number of jobs (single digit). This can be illustrated in the following diagram, where jobs are alphabetized to show their chronological order of arrival, where job A is the oldest, B is the second oldest, and so on, to J, the most recent job to arrive. ---> entities j | H-F-----A--E--I-- o | --G-----B-----J-- b | --------C-------- s\/ --------D-------- WLOG, assuming all jobs are "ready", then a R-R scheduling will execute them in the following order (a slice off of the top of the entities' list), H, F, A, E, I, G, B, J, C, D. However, to mitigate job starvation, we'd rather execute C and D before E, and so on, given, of course, that they're all ready to be executed. So, if all jobs are ready at this instant, the order of execution for this and the next 9 instances of picking the next job to execute, should really be, A, B, C, D, E, F, G, H, I, J, which is their chronological order. The only reason for this order to be broken, is if an older job is not yet ready, but a younger job is ready, at an instant of picking a new job to execute. For instance if job C wasn't ready at time 2, but job D was ready, then we'd pick job D, like this: 0 +1 +2 ... A, B, D, ... And from then on, C would be preferred before all other jobs, if it is ready at the time when a new job for execution is picked. So, if C became ready two steps later, the execution order would look like this: ......0 +1 +2 ... A, B, D, E, C, F, G, H, I, J This is what the FIFO GPU scheduling algorithm achieves. It uses a Red-Black tree to keep jobs sorted in chronological order, where picking the oldest job is O(1) (we use the "cached" structure), and balancing the tree is O(log n). IOW, it picks the *oldest ready* job to execute now. The implementation is already in the kernel, and this commit only changes the default GPU scheduling algorithm to use. This was tested and achieves about 1% faster performance over the Round Robin algorithm. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221024212634.27230-1-luben.tuikov@amd.com Signed-off-by: Christian König <christian.koenig@amd.com> drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (In reply to proteve from comment #31) > i finally had some time with these eastern holidays so i manage to bisect > this. > > results: > > git bisect broken > 977d97f18b5b8efb7a94da84724113f15ae6cc2d is the first broken commit > commit 977d97f18b5b8efb7a94da84724113f15ae6cc2d > Author: Luben Tuikov <luben.tuikov@amd.com> > Date: Mon Oct 24 17:26:34 2022 -0400 > > drm/scheduler: Set the FIFO scheduling policy as the default in that case please report this to https://gitlab.freedesktop.org/drm/amd/-/issues , as that's where amdgpu maintainers expect reports; afterwards please drop a link to the report here. ok. here is the link for bug report https://gitlab.freedesktop.org/drm/amd/-/issues/2516 |