Bug 217022 - Extremely Slow Hugepage Allocation
Summary: Extremely Slow Hugepage Allocation
Status: NEW
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-02-11 06:54 UTC by Yuanxi Liu
Modified: 2023-04-14 15:19 UTC (History)
1 user (show)

See Also:
Kernel Version: 5.15
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
perf data and kernel config (429.77 KB, application/x-xz)
2023-02-11 06:54 UTC, Yuanxi Liu
Details

Description Yuanxi Liu 2023-02-11 06:54:45 UTC
Created attachment 303713 [details]
perf data and kernel config

We have some ICE lake server with 1TB memory installed. They were all running at 5.10.x branch LTS kernel and run fine. After we upgraded kernel to 5.15.x LTS kernel, the booting process were extremely slow. After some analysis, we realized that was caused by hugepage allocation. Our system used sysctl.conf to allocate hugepage at the boot time. In fact, "echo 960 > nr_hugepages"  had the same effect. The only way to do a fast allocation is to use boot cmd option: "default_hugepagesz=1G hugepages=960".

Our System is Xeon W3375 with 1TB memory installed. But this bug also occured with Xeon 8180 with 1.5TB memory too. Our OS is Gentoo Linux. With 5.10.x, the allocation speed is around 300GB/s, and 5.15.x only had 30GB/s. We also tried 6.1.1, it is the same as 5.15.x .

We compiled 5.10.163 and 5.15.88 with debug option and used "perf -a -g sleep 2" to catch kernel functions. Here are two perf outputs. I hope this can help.
Comment 1 Andrew Morton 2023-02-26 05:32:59 UTC
(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 11 Feb 2023 06:54:45 +0000 bugzilla-daemon@kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=217022
> 
>             Bug ID: 217022
>            Summary: Extremely Slow Hugepage Allocation
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.15
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: y.liu@naruida.com
>         Regression: No
> 
> Created attachment 303713 [details]
>   --> https://bugzilla.kernel.org/attachment.cgi?id=303713&action=edit
> perf data and kernel config
> 
> We have some ICE lake server with 1TB memory installed. They were all running
> at 5.10.x branch LTS kernel and run fine. After we upgraded kernel to 5.15.x
> LTS kernel, the booting process were extremely slow. After some analysis, we
> realized that was caused by hugepage allocation. Our system used sysctl.conf
> to
> allocate hugepage at the boot time. In fact, "echo 960 > nr_hugepages"  had
> the
> same effect. The only way to do a fast allocation is to use boot cmd option:
> "default_hugepagesz=1G hugepages=960".
> 
> Our System is Xeon W3375 with 1TB memory installed. But this bug also occured
> with Xeon 8180 with 1.5TB memory too. Our OS is Gentoo Linux. With 5.10.x,
> the
> allocation speed is around 300GB/s, and 5.15.x only had 30GB/s. We also tried
> 6.1.1, it is the same as 5.15.x .
> 
> We compiled 5.10.163 and 5.15.88 with debug option and used "perf -a -g sleep
> 2" to catch kernel functions. Here are two perf outputs. I hope this can
> help.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are the assignee for the bug.

Note You need to log in before you can comment on or make changes to this bug.