Bug 217022

Summary: Extremely Slow Hugepage Allocation
Product: Memory Management Reporter: Yuanxi Liu (y.liu)
Component: Page AllocatorAssignee: Andrew Morton (akpm)
Status: NEW ---    
Severity: normal CC: mgorman
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.15 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: perf data and kernel config

Description Yuanxi Liu 2023-02-11 06:54:45 UTC
Created attachment 303713 [details]
perf data and kernel config

We have some ICE lake server with 1TB memory installed. They were all running at 5.10.x branch LTS kernel and run fine. After we upgraded kernel to 5.15.x LTS kernel, the booting process were extremely slow. After some analysis, we realized that was caused by hugepage allocation. Our system used sysctl.conf to allocate hugepage at the boot time. In fact, "echo 960 > nr_hugepages"  had the same effect. The only way to do a fast allocation is to use boot cmd option: "default_hugepagesz=1G hugepages=960".

Our System is Xeon W3375 with 1TB memory installed. But this bug also occured with Xeon 8180 with 1.5TB memory too. Our OS is Gentoo Linux. With 5.10.x, the allocation speed is around 300GB/s, and 5.15.x only had 30GB/s. We also tried 6.1.1, it is the same as 5.15.x .

We compiled 5.10.163 and 5.15.88 with debug option and used "perf -a -g sleep 2" to catch kernel functions. Here are two perf outputs. I hope this can help.
Comment 1 Andrew Morton 2023-02-26 05:32:59 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 11 Feb 2023 06:54:45 +0000 bugzilla-daemon@kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=217022
> 
>             Bug ID: 217022
>            Summary: Extremely Slow Hugepage Allocation
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.15
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: y.liu@naruida.com
>         Regression: No
> 
> Created attachment 303713 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=303713&action=edit
> perf data and kernel config
> 
> We have some ICE lake server with 1TB memory installed. They were all running
> at 5.10.x branch LTS kernel and run fine. After we upgraded kernel to 5.15.x
> LTS kernel, the booting process were extremely slow. After some analysis, we
> realized that was caused by hugepage allocation. Our system used sysctl.conf
> to
> allocate hugepage at the boot time. In fact, "echo 960 > nr_hugepages"  had
> the
> same effect. The only way to do a fast allocation is to use boot cmd option:
> "default_hugepagesz=1G hugepages=960".
> 
> Our System is Xeon W3375 with 1TB memory installed. But this bug also occured
> with Xeon 8180 with 1.5TB memory too. Our OS is Gentoo Linux. With 5.10.x,
> the
> allocation speed is around 300GB/s, and 5.15.x only had 30GB/s. We also tried
> 6.1.1, it is the same as 5.15.x .
> 
> We compiled 5.10.163 and 5.15.88 with debug option and used "perf -a -g sleep
> 2" to catch kernel functions. Here are two perf outputs. I hope this can
> help.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are the assignee for the bug.