Bug 219366

Summary:	[BISECTED] Performance regression caused by "mm: align larger anonymous mappings on THP boundaries"
Product:	Memory Management	Reporter:	Matthias (matthias)
Component:	Page Allocator	Assignee:	Rik van Riel (riel)
Status:	ASSIGNED ---
Severity:	normal	CC:	regressions, riel
Priority:	P3
Hardware:	All
OS:	Linux
Kernel Version:	6.7	Subsystem:
Regression:	Yes	Bisected commit-id:	efa7df3e3bb5da8e6abbe37727417f32a37fba47

Description Matthias 2024-10-09 05:37:51 UTC

I am using a darktable benchmark and I am finding that RAW-to-JPG conversion is about 15-25 % slower with kernels 6.7-6.10. The last fast kernel series is 6.6. I also tested kernel series 6.5 and it is as fast as 6.6

I know this sounds weird. What has darktable to do with the kernel? But the numbers are true. And the darktable devs tell me that this is a kernel regression. The darktable github issue is: https://github.com/darktable-org/darktable/issues/17397 You can find more details there.

What do I do to measure the performance?

I am executing darktable on the command line. opencl is disabled so that all activities are only on the CPU:

darktable-cli bench.SRW /tmp/test.jpg --core --disable-opencl -d perf -d opencl --configdir /tmp

( bench.SRW and the sidecar file can be found here: https://drive.google.com/drive/folders/1cfV2b893JuobVwGiZXcaNv5-yszH6j-N )

This will show some debug output. The line to look for is

4,2765 [dev_process_export] pixel pipeline processing took 3,811 secs (81,883 CPU)

This gives an exact number how much time darktable needed to convert the image. The time darktable needs has a clear dependency on the kernel version. It is fast with kernel 6.6. and older and slow with kernel 6.7 and newer. Something must have happened from 6.6 to 6.7 which slows down darktable.

The darktable debug output shows that basically only one module is responsible for the slow down: 'atrous'

with kernel 6.6.47:

4,0548 [dev_pixelpipe] took 0,635 secs (14,597 CPU) [export] processed 'atrous' on CPU, blended on CPU
...
4,2765 [dev_process_export] pixel pipeline processing took 3,811 secs (81,883 CPU)

with kernel 6.10.6:

4,9645 [dev_pixelpipe] took 1,489 secs (33,736 CPU) [export] processed 'atrous' on CPU, blended on CPU
...
5,2151 [dev_process_export] pixel pipeline processing took 4,773 secs (102,452 CPU)

This is also being discussed here: https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945/1
And other users confirm the performance degradation.

Comment 1 Matthias 2024-10-10 16:31:47 UTC

This seems to affect AMD only. I reproduced this performance degradation on two different Ryzen Desktop PCs (Ryzen 5 and Ryzen 9). But I can not reproduce it on my Intel PC (Lenovo X1 Carbon, core i5).

Comment 2 Artem S. Tashkinov 2024-10-10 19:54:34 UTC

Please perform regression testing using:

https://docs.kernel.org/admin-guide/bug-bisect.html

Comment 3 Matthias 2024-10-11 06:58:47 UTC

I have never done this before. I will try. But what is the best starting point for the bisect. "bad" is certainly 6.7.1. Thats the first one I know is having the issue. But which 6.6. kernel was right before that? 6.6.54 is not a predecessor of 6.7.1, right?

Comment 4 Artem S. Tashkinov 2024-10-11 07:02:23 UTC

(In reply to Matthias from comment #3)
> I have never done this before. I will try. But what is the best starting
> point for the bisect. "bad" is certainly 6.7.1. Thats the first one I know
> is having the issue. But which 6.6. kernel was right before that? 6.6.54 is
> not a predecessor of 6.7.1, right?

You could simply start with 6.6.0, as there's no direct path between 6.6.x stable release and 6.7.0.

Comment 5 Artem S. Tashkinov 2024-10-11 07:03:27 UTC

> 6.6.54 is not a predecessor of 6.7.1, right?

Correct.

All stable releases are separate trees.

Comment 6 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-11 10:45:45 UTC

(In reply to Matthias from comment #3)
> I have never done this before. I will try. But what is the best starting
> point for the bisect. 

FWIW, the more detailed guide on bisection handles this -- and maybe other problems you might encounter: https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html

Comment 7 Matthias 2024-10-11 15:48:09 UTC

I did the bisection and I ended up with a result.

This is how I started: 

# git bisect good v6.6
# git bisect bad v6.7

The result is:

####
╰─# git bisect bad
efa7df3e3bb5da8e6abbe37727417f32a37fba47 is the first bad commit
commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 (HEAD)
Author: Rik van Riel <riel@surriel.com>
Date:   Thu Dec 14 14:34:23 2023 -0800

    mm: align larger anonymous mappings on THP boundaries
    
    Align larger anonymous memory mappings on THP boundaries by going through
    thp_get_unmapped_area if THPs are enabled for the current process.
    
    With this patch, larger anonymous mappings are now THP aligned.  When a
    malloc library allocates a 2MB or larger arena, that arena can now be
    mapped with THPs right from the start, which can result in better TLB hit
    rates and execution time.
    
    Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com
    Link: https://lkml.kernel.org/r/20231214223423.1133074-1-yang@os.amperecomputing.com
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Christopher Lameter <cl@linux.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

 mm/mmap.c | 3 +++
 1 file changed, 3 insertions(+)
####

I did revert that commit with:

####
# git revert --no-edit efa7df3e3bb5da8e6abbe37727417f32a37fba47
[losgelöster HEAD 48a6c81ff794] Revert "mm: align larger anonymous mappings on THP boundaries"
 Date: Fri Oct 11 17:14:23 2024 +0200
 1 file changed, 3 deletions(-)
####

And that solves the issue. But I was not abble to revert that commit for later kernel version 6.10.14

Comment 8 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-11 15:56:15 UTC

Thx. One more question: is 6.12-rc2 still affected?

Comment 9 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-11 16:00:36 UTC

There are also quite a few hits on lore for that commit: https://lore.kernel.org/all/?q=efa7df3*+performance Might be worth taking a closer look (and searching without the word "performance", too.

It might be one of those "a lot of things get fast, a few cases got slower" changes that might or might not be considered a regression that has to be fixed.

Comment 10 Matthias 2024-10-11 16:02:32 UTC

I can not test kernel 6.12 yet because I am a ZFS user, and ZFS is not ready yet for 6.12.

I have managed to create a patch for 6.10.14. It applies cleanly. Kernel is currently compiling. I will post the result shortly.

Comment 11 Matthias 2024-10-11 16:35:30 UTC

The patch works for 6.10.14!

####
--- a/mm/mmap.c	2024-10-11 17:54:22.503469512 +0200
+++ b/mm/mmap.c	2024-10-11 17:54:51.254123247 +0200
@@ -1881,10 +1881,6 @@
 
 	if (get_area) {
 		addr = get_area(file, addr, len, pgoff, flags);
-	} else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-		/* Ensures that larger anonymous mappings are THP aligned. */
-		addr = thp_get_unmapped_area_vmflags(file, addr, len,
-						     pgoff, flags, vm_flags);
 	} else {
 		addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
 						    pgoff, flags, vm_flags);
####

With this patch applied the time consumption for the darktable pixel pipeline goes down from 4.7 s to 3.8 s on my Ryzen 9 5900x. 

That is a significant performance gain. I assume that other applications, like gimp, blender, etc., also suffer from this commit. But it is hard to measure. Luckily darktable provides the right debug output. 

My Intel laptop has no issue. May be this is just an AMD thing.

Comment 12 Matthias 2024-10-11 17:55:43 UTC

The patch also fixes kernel 6.11.3. 

The commit is from last year. It was never back ported to LTS. What does that mean in terms of importance of that commit? Is it relevant for stability or security?

Comment 13 Artem S. Tashkinov 2024-10-12 10:49:30 UTC

Rick, please take a look.

Comment 14 Artem S. Tashkinov 2024-10-15 08:49:00 UTC

Rick, please take a look.

Comment 15 Matthias 2024-10-18 12:55:03 UTC

Will this issue be fixed in the kernel or is there any recommendation for the darktable devs? Is there anything they could do differently to mitigate the issue?

By the way, the patch also fixes the performance regression for kernel 6.11.4.

Comment 16 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-21 10:16:20 UTC

Reminder:

(In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #9)
>
> It might be one of those "a lot of things get faster, a few cases got slower"
> changes that might or might not be considered a regression that has to be
> fixed.

FWIW, I nevertheless consider forwarding it by mail, as bugzilla as so often is likely the wrong place for this. 

Matthias, can I CC you when doing so? This would expose your email address to the public.

Comment 17 Matthias 2024-10-21 11:17:45 UTC

Yes, you can put me on CC. That’s fine. 

> Am 21.10.2024 um 12:16 schrieb bugzilla-daemon@kernel.org:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=219366
> 
> --- Comment #16 from The Linux kernel's regression tracker (Thorsten
> Leemhuis) (regressions@leemhuis.info) ---
> Reminder:
> 
> (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from
> comment #9)
>> 
>> It might be one of those "a lot of things get faster, a few cases got
>> slower"
>> changes that might or might not be considered a regression that has to be
>> fixed.
> 
> FWIW, I nevertheless consider forwarding it by mail, as bugzilla as so often
> is
> likely the wrong place for this.
> 
> Matthias, can I CC you when doing so? This would expose your email address to
> the public.
> 
> --
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You reported the bug.

Comment 18 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-10-21 11:25:38 UTC

Ohh, wait, I just noticed: this is with ZFS. Will only forward this if somebody can reproduce this first with a vanilla kernel (you are of course free to report this by mail if you want).

Comment 19 Matthias 2024-10-24 07:13:52 UTC

Back from a short vacation I installed endeavouros on an external USB drive and booted from there and reproduced the issue.

It is a bare endeavouros installation with linux-lts (6.6.58) and linux (6.11.5) kernel. I only added darktable to it. No zfs, no nvidia, no opencl packages.

with kernel 6.6.58 darktable spends 3,8 s in the pixel pipeline.

with kernel 6.11.5 darktable spends 4,7 s in the pixel pipeline.

Comment 20 Matthias 2024-10-24 07:19:04 UTC

By the way, there is also a thread in the darktable forum on this topic:
https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945
 
Some users reproduced it there as well.