Bug 218404 - linux 6.7rc1 and up errors on boot with swapper/cohorent_pool/dma_pool
Summary: linux 6.7rc1 and up errors on boot with swapper/cohorent_pool/dma_pool
Status: ASSIGNED
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Page Allocator (show other bugs)
Hardware: All Linux
: P3 normal
Assignee: Ard Biesheuvel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-21 11:01 UTC by Tom Englund
Modified: 2024-03-17 14:21 UTC (History)
4 users (show)

See Also:
Kernel Version:
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dma pools fails to allocate on boot with 6.7rc1 and up (3.02 KB, text/plain)
2024-01-21 11:01 UTC, Tom Englund
Details

Description Tom Englund 2024-01-21 11:01:06 UTC
Created attachment 305736 [details]
dma pools fails to allocate on boot with 6.7rc1 and up

with linux 6.7-rc1 and up towards linux from git (20240121) im getting a error on boot "swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0"

attaching full trace, i dont notice any particular issues by just ignoring it but i can silence it by setting coherent_pool=1M , i verified with 6.6.8 and the 6.7x kernels /sys/kernel/debug/dma_pools/ the pools are being set to the same sizes without the parameter.
Comment 1 Bagas Sanjaya 2024-01-28 04:25:59 UTC
(In reply to Tom Englund from comment #0)
> Created attachment 305736 [details]
> dma pools fails to allocate on boot with 6.7rc1 and up
> 
> with linux 6.7-rc1 and up towards linux from git (20240121) im getting a
> error on boot "swapper/0: page allocation failure: order:10,
> mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0"
> 
> attaching full trace, i dont notice any particular issues by just ignoring
> it but i can silence it by setting coherent_pool=1M , i verified with 6.6.8
> and the 6.7x kernels /sys/kernel/debug/dma_pools/ the pools are being set to
> the same sizes without the parameter.

Then bisect the kernel to find exact culprit that introduces your regression.
To do so, see LKML thread at [1]:

[1]: https://lore.kernel.org/linux-doc/c763e15e-e82e-49f8-a540-d211d18768a3@leemhuis.info/
Comment 2 Tom Englund 2024-01-29 15:13:36 UTC
(In reply to Bagas Sanjaya from comment #1)
> (In reply to Tom Englund from comment #0)
> > Created attachment 305736 [details]
> > dma pools fails to allocate on boot with 6.7rc1 and up
> > 
> > with linux 6.7-rc1 and up towards linux from git (20240121) im getting a
> > error on boot "swapper/0: page allocation failure: order:10,
> > mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0"
> > 
> > attaching full trace, i dont notice any particular issues by just ignoring
> > it but i can silence it by setting coherent_pool=1M , i verified with 6.6.8
> > and the 6.7x kernels /sys/kernel/debug/dma_pools/ the pools are being set
> to
> > the same sizes without the parameter.
> 
> Then bisect the kernel to find exact culprit that introduces your regression.
> To do so, see LKML thread at [1]:
> 
> [1]:
> https://lore.kernel.org/linux-doc/c763e15e-e82e-49f8-a540-
> d211d18768a3@leemhuis.info/

spent some time bisecting and it ended up being this commit https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=8eace5b3555606e684739bef5bcdfcfe68235257 , also made sure i didnt git bisect wrong by compiling the commit before this one manually and then going there again and yeah, this is the culprit
Comment 3 The Linux kernel's regression tracker (Thorsten Leemhuis) 2024-01-30 08:00:40 UTC
(In reply to Tom Englund from comment #2)

> spent some time bisecting and it ended up being this commit
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/
> ?id=8eace5b3555606e684739bef5bcdfcfe68235257 

That's "x86/boot: Omit compression buffer from PE/COFF image memory footprint" from Ard Biesheuvel; I might be mistaken, but I think he sometimes is active here, hence adding him.
Comment 4 Ard Biesheuvel 2024-01-30 08:39:14 UTC
Talking to Tom on IRC, it appears that the use of 'nokaslr' is causing the kernel to be placed as close to the start of memory as possible. The patch in question reduces the size of that allocation, and so the kernel now fits into a smaller free region that overlaps with the GFP_DMA region, and leaving no room for the CMA allocation.

Dropping 'nokaslr' from the kernel command line works around the problem. I intend to fix this properly by avoiding the start of DRAM entirely when allocating pages from the EFI stub.

Note You need to log in before you can comment on or make changes to this bug.