Bug 219366
Summary: | [BISECTED] Performance regression caused by "mm: align larger anonymous mappings on THP boundaries" | ||
---|---|---|---|
Product: | Memory Management | Reporter: | Matthias (matthias) |
Component: | Page Allocator | Assignee: | Rik van Riel (riel) |
Status: | ASSIGNED --- | ||
Severity: | normal | CC: | regressions, riel |
Priority: | P3 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 6.7 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | efa7df3e3bb5da8e6abbe37727417f32a37fba47 |
Description
Matthias
2024-10-09 05:37:51 UTC
This seems to affect AMD only. I reproduced this performance degradation on two different Ryzen Desktop PCs (Ryzen 5 and Ryzen 9). But I can not reproduce it on my Intel PC (Lenovo X1 Carbon, core i5). Please perform regression testing using: https://docs.kernel.org/admin-guide/bug-bisect.html I have never done this before. I will try. But what is the best starting point for the bisect. "bad" is certainly 6.7.1. Thats the first one I know is having the issue. But which 6.6. kernel was right before that? 6.6.54 is not a predecessor of 6.7.1, right? (In reply to Matthias from comment #3) > I have never done this before. I will try. But what is the best starting > point for the bisect. "bad" is certainly 6.7.1. Thats the first one I know > is having the issue. But which 6.6. kernel was right before that? 6.6.54 is > not a predecessor of 6.7.1, right? You could simply start with 6.6.0, as there's no direct path between 6.6.x stable release and 6.7.0. > 6.6.54 is not a predecessor of 6.7.1, right?
Correct.
All stable releases are separate trees.
(In reply to Matthias from comment #3) > I have never done this before. I will try. But what is the best starting > point for the bisect. FWIW, the more detailed guide on bisection handles this -- and maybe other problems you might encounter: https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html I did the bisection and I ended up with a result. This is how I started: # git bisect good v6.6 # git bisect bad v6.7 The result is: #### ╰─# git bisect bad efa7df3e3bb5da8e6abbe37727417f32a37fba47 is the first bad commit commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 (HEAD) Author: Rik van Riel <riel@surriel.com> Date: Thu Dec 14 14:34:23 2023 -0800 mm: align larger anonymous mappings on THP boundaries Align larger anonymous memory mappings on THP boundaries by going through thp_get_unmapped_area if THPs are enabled for the current process. With this patch, larger anonymous mappings are now THP aligned. When a malloc library allocates a 2MB or larger arena, that arena can now be mapped with THPs right from the start, which can result in better TLB hit rates and execution time. Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com Link: https://lkml.kernel.org/r/20231214223423.1133074-1-yang@os.amperecomputing.com Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> mm/mmap.c | 3 +++ 1 file changed, 3 insertions(+) #### I did revert that commit with: #### # git revert --no-edit efa7df3e3bb5da8e6abbe37727417f32a37fba47 [losgelöster HEAD 48a6c81ff794] Revert "mm: align larger anonymous mappings on THP boundaries" Date: Fri Oct 11 17:14:23 2024 +0200 1 file changed, 3 deletions(-) #### And that solves the issue. But I was not abble to revert that commit for later kernel version 6.10.14 Thx. One more question: is 6.12-rc2 still affected? There are also quite a few hits on lore for that commit: https://lore.kernel.org/all/?q=efa7df3*+performance Might be worth taking a closer look (and searching without the word "performance", too. It might be one of those "a lot of things get fast, a few cases got slower" changes that might or might not be considered a regression that has to be fixed. I can not test kernel 6.12 yet because I am a ZFS user, and ZFS is not ready yet for 6.12. I have managed to create a patch for 6.10.14. It applies cleanly. Kernel is currently compiling. I will post the result shortly. The patch works for 6.10.14! #### --- a/mm/mmap.c 2024-10-11 17:54:22.503469512 +0200 +++ b/mm/mmap.c 2024-10-11 17:54:51.254123247 +0200 @@ -1881,10 +1881,6 @@ if (get_area) { addr = get_area(file, addr, len, pgoff, flags); - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - /* Ensures that larger anonymous mappings are THP aligned. */ - addr = thp_get_unmapped_area_vmflags(file, addr, len, - pgoff, flags, vm_flags); } else { addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len, pgoff, flags, vm_flags); #### With this patch applied the time consumption for the darktable pixel pipeline goes down from 4.7 s to 3.8 s on my Ryzen 9 5900x. That is a significant performance gain. I assume that other applications, like gimp, blender, etc., also suffer from this commit. But it is hard to measure. Luckily darktable provides the right debug output. My Intel laptop has no issue. May be this is just an AMD thing. The patch also fixes kernel 6.11.3. The commit is from last year. It was never back ported to LTS. What does that mean in terms of importance of that commit? Is it relevant for stability or security? Rick, please take a look. Rick, please take a look. Will this issue be fixed in the kernel or is there any recommendation for the darktable devs? Is there anything they could do differently to mitigate the issue? By the way, the patch also fixes the performance regression for kernel 6.11.4. Reminder: (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #9) > > It might be one of those "a lot of things get faster, a few cases got slower" > changes that might or might not be considered a regression that has to be > fixed. FWIW, I nevertheless consider forwarding it by mail, as bugzilla as so often is likely the wrong place for this. Matthias, can I CC you when doing so? This would expose your email address to the public. Yes, you can put me on CC. That’s fine.
> Am 21.10.2024 um 12:16 schrieb bugzilla-daemon@kernel.org:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=219366
>
> --- Comment #16 from The Linux kernel's regression tracker (Thorsten
> Leemhuis) (regressions@leemhuis.info) ---
> Reminder:
>
> (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from
> comment #9)
>>
>> It might be one of those "a lot of things get faster, a few cases got
>> slower"
>> changes that might or might not be considered a regression that has to be
>> fixed.
>
> FWIW, I nevertheless consider forwarding it by mail, as bugzilla as so often
> is
> likely the wrong place for this.
>
> Matthias, can I CC you when doing so? This would expose your email address to
> the public.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You reported the bug.
Ohh, wait, I just noticed: this is with ZFS. Will only forward this if somebody can reproduce this first with a vanilla kernel (you are of course free to report this by mail if you want). Back from a short vacation I installed endeavouros on an external USB drive and booted from there and reproduced the issue. It is a bare endeavouros installation with linux-lts (6.6.58) and linux (6.11.5) kernel. I only added darktable to it. No zfs, no nvidia, no opencl packages. with kernel 6.6.58 darktable spends 3,8 s in the pixel pipeline. with kernel 6.11.5 darktable spends 4,7 s in the pixel pipeline. By the way, there is also a thread in the darktable forum on this topic: https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945 Some users reproduced it there as well. |