I am using a darktable benchmark and I am finding that RAW-to-JPG conversion is about 15-25 % slower with kernels 6.7-6.10. The last fast kernel series is 6.6. I also tested kernel series 6.5 and it is as fast as 6.6 I know this sounds weird. What has darktable to do with the kernel? But the numbers are true. And the darktable devs tell me that this is a kernel regression. The darktable github issue is: https://github.com/darktable-org/darktable/issues/17397 You can find more details there. What do I do to measure the performance? I am executing darktable on the command line. opencl is disabled so that all activities are only on the CPU: darktable-cli bench.SRW /tmp/test.jpg --core --disable-opencl -d perf -d opencl --configdir /tmp ( bench.SRW and the sidecar file can be found here: https://drive.google.com/drive/folders/1cfV2b893JuobVwGiZXcaNv5-yszH6j-N ) This will show some debug output. The line to look for is 4,2765 [dev_process_export] pixel pipeline processing took 3,811 secs (81,883 CPU) This gives an exact number how much time darktable needed to convert the image. The time darktable needs has a clear dependency on the kernel version. It is fast with kernel 6.6. and older and slow with kernel 6.7 and newer. Something must have happened from 6.6 to 6.7 which slows down darktable. The darktable debug output shows that basically only one module is responsible for the slow down: 'atrous' with kernel 6.6.47: 4,0548 [dev_pixelpipe] took 0,635 secs (14,597 CPU) [export] processed 'atrous' on CPU, blended on CPU ... 4,2765 [dev_process_export] pixel pipeline processing took 3,811 secs (81,883 CPU) with kernel 6.10.6: 4,9645 [dev_pixelpipe] took 1,489 secs (33,736 CPU) [export] processed 'atrous' on CPU, blended on CPU ... 5,2151 [dev_process_export] pixel pipeline processing took 4,773 secs (102,452 CPU) This is also being discussed here: https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945/1 And other users confirm the performance degradation.
This seems to affect AMD only. I reproduced this performance degradation on two different Ryzen Desktop PCs (Ryzen 5 and Ryzen 9). But I can not reproduce it on my Intel PC (Lenovo X1 Carbon, core i5).
Please perform regression testing using: https://docs.kernel.org/admin-guide/bug-bisect.html
I have never done this before. I will try. But what is the best starting point for the bisect. "bad" is certainly 6.7.1. Thats the first one I know is having the issue. But which 6.6. kernel was right before that? 6.6.54 is not a predecessor of 6.7.1, right?
(In reply to Matthias from comment #3) > I have never done this before. I will try. But what is the best starting > point for the bisect. "bad" is certainly 6.7.1. Thats the first one I know > is having the issue. But which 6.6. kernel was right before that? 6.6.54 is > not a predecessor of 6.7.1, right? You could simply start with 6.6.0, as there's no direct path between 6.6.x stable release and 6.7.0.
> 6.6.54 is not a predecessor of 6.7.1, right? Correct. All stable releases are separate trees.
(In reply to Matthias from comment #3) > I have never done this before. I will try. But what is the best starting > point for the bisect. FWIW, the more detailed guide on bisection handles this -- and maybe other problems you might encounter: https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
I did the bisection and I ended up with a result. This is how I started: # git bisect good v6.6 # git bisect bad v6.7 The result is: #### ╰─# git bisect bad efa7df3e3bb5da8e6abbe37727417f32a37fba47 is the first bad commit commit efa7df3e3bb5da8e6abbe37727417f32a37fba47 (HEAD) Author: Rik van Riel <riel@surriel.com> Date: Thu Dec 14 14:34:23 2023 -0800 mm: align larger anonymous mappings on THP boundaries Align larger anonymous memory mappings on THP boundaries by going through thp_get_unmapped_area if THPs are enabled for the current process. With this patch, larger anonymous mappings are now THP aligned. When a malloc library allocates a 2MB or larger arena, that arena can now be mapped with THPs right from the start, which can result in better TLB hit rates and execution time. Link: https://lkml.kernel.org/r/20220809142457.4751229f@imladris.surriel.com Link: https://lkml.kernel.org/r/20231214223423.1133074-1-yang@os.amperecomputing.com Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christopher Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> mm/mmap.c | 3 +++ 1 file changed, 3 insertions(+) #### I did revert that commit with: #### # git revert --no-edit efa7df3e3bb5da8e6abbe37727417f32a37fba47 [losgelöster HEAD 48a6c81ff794] Revert "mm: align larger anonymous mappings on THP boundaries" Date: Fri Oct 11 17:14:23 2024 +0200 1 file changed, 3 deletions(-) #### And that solves the issue. But I was not abble to revert that commit for later kernel version 6.10.14
Thx. One more question: is 6.12-rc2 still affected?
There are also quite a few hits on lore for that commit: https://lore.kernel.org/all/?q=efa7df3*+performance Might be worth taking a closer look (and searching without the word "performance", too. It might be one of those "a lot of things get fast, a few cases got slower" changes that might or might not be considered a regression that has to be fixed.
I can not test kernel 6.12 yet because I am a ZFS user, and ZFS is not ready yet for 6.12. I have managed to create a patch for 6.10.14. It applies cleanly. Kernel is currently compiling. I will post the result shortly.
The patch works for 6.10.14! #### --- a/mm/mmap.c 2024-10-11 17:54:22.503469512 +0200 +++ b/mm/mmap.c 2024-10-11 17:54:51.254123247 +0200 @@ -1881,10 +1881,6 @@ if (get_area) { addr = get_area(file, addr, len, pgoff, flags); - } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) { - /* Ensures that larger anonymous mappings are THP aligned. */ - addr = thp_get_unmapped_area_vmflags(file, addr, len, - pgoff, flags, vm_flags); } else { addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len, pgoff, flags, vm_flags); #### With this patch applied the time consumption for the darktable pixel pipeline goes down from 4.7 s to 3.8 s on my Ryzen 9 5900x. That is a significant performance gain. I assume that other applications, like gimp, blender, etc., also suffer from this commit. But it is hard to measure. Luckily darktable provides the right debug output. My Intel laptop has no issue. May be this is just an AMD thing.
The patch also fixes kernel 6.11.3. The commit is from last year. It was never back ported to LTS. What does that mean in terms of importance of that commit? Is it relevant for stability or security?
Rick, please take a look.
Will this issue be fixed in the kernel or is there any recommendation for the darktable devs? Is there anything they could do differently to mitigate the issue? By the way, the patch also fixes the performance regression for kernel 6.11.4.
Reminder: (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #9) > > It might be one of those "a lot of things get faster, a few cases got slower" > changes that might or might not be considered a regression that has to be > fixed. FWIW, I nevertheless consider forwarding it by mail, as bugzilla as so often is likely the wrong place for this. Matthias, can I CC you when doing so? This would expose your email address to the public.
Yes, you can put me on CC. That’s fine. > Am 21.10.2024 um 12:16 schrieb bugzilla-daemon@kernel.org: > > https://bugzilla.kernel.org/show_bug.cgi?id=219366 > > --- Comment #16 from The Linux kernel's regression tracker (Thorsten > Leemhuis) (regressions@leemhuis.info) --- > Reminder: > > (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from > comment #9) >> >> It might be one of those "a lot of things get faster, a few cases got >> slower" >> changes that might or might not be considered a regression that has to be >> fixed. > > FWIW, I nevertheless consider forwarding it by mail, as bugzilla as so often > is > likely the wrong place for this. > > Matthias, can I CC you when doing so? This would expose your email address to > the public. > > -- > You may reply to this email to add a comment. > > You are receiving this mail because: > You reported the bug.
Ohh, wait, I just noticed: this is with ZFS. Will only forward this if somebody can reproduce this first with a vanilla kernel (you are of course free to report this by mail if you want).
Back from a short vacation I installed endeavouros on an external USB drive and booted from there and reproduced the issue. It is a bare endeavouros installation with linux-lts (6.6.58) and linux (6.11.5) kernel. I only added darktable to it. No zfs, no nvidia, no opencl packages. with kernel 6.6.58 darktable spends 3,8 s in the pixel pipeline. with kernel 6.11.5 darktable spends 4,7 s in the pixel pipeline.
By the way, there is also a thread in the darktable forum on this topic: https://discuss.pixls.us/t/darktable-performance-regression-with-kernel-6-7-and-newer/45945 Some users reproduced it there as well.