Created attachment 305736 [details] dma pools fails to allocate on boot with 6.7rc1 and up with linux 6.7-rc1 and up towards linux from git (20240121) im getting a error on boot "swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0" attaching full trace, i dont notice any particular issues by just ignoring it but i can silence it by setting coherent_pool=1M , i verified with 6.6.8 and the 6.7x kernels /sys/kernel/debug/dma_pools/ the pools are being set to the same sizes without the parameter.
(In reply to Tom Englund from comment #0) > Created attachment 305736 [details] > dma pools fails to allocate on boot with 6.7rc1 and up > > with linux 6.7-rc1 and up towards linux from git (20240121) im getting a > error on boot "swapper/0: page allocation failure: order:10, > mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0" > > attaching full trace, i dont notice any particular issues by just ignoring > it but i can silence it by setting coherent_pool=1M , i verified with 6.6.8 > and the 6.7x kernels /sys/kernel/debug/dma_pools/ the pools are being set to > the same sizes without the parameter. Then bisect the kernel to find exact culprit that introduces your regression. To do so, see LKML thread at [1]: [1]: https://lore.kernel.org/linux-doc/c763e15e-e82e-49f8-a540-d211d18768a3@leemhuis.info/
(In reply to Bagas Sanjaya from comment #1) > (In reply to Tom Englund from comment #0) > > Created attachment 305736 [details] > > dma pools fails to allocate on boot with 6.7rc1 and up > > > > with linux 6.7-rc1 and up towards linux from git (20240121) im getting a > > error on boot "swapper/0: page allocation failure: order:10, > > mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0" > > > > attaching full trace, i dont notice any particular issues by just ignoring > > it but i can silence it by setting coherent_pool=1M , i verified with 6.6.8 > > and the 6.7x kernels /sys/kernel/debug/dma_pools/ the pools are being set > to > > the same sizes without the parameter. > > Then bisect the kernel to find exact culprit that introduces your regression. > To do so, see LKML thread at [1]: > > [1]: > https://lore.kernel.org/linux-doc/c763e15e-e82e-49f8-a540- > d211d18768a3@leemhuis.info/ spent some time bisecting and it ended up being this commit https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=8eace5b3555606e684739bef5bcdfcfe68235257 , also made sure i didnt git bisect wrong by compiling the commit before this one manually and then going there again and yeah, this is the culprit
(In reply to Tom Englund from comment #2) > spent some time bisecting and it ended up being this commit > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/ > ?id=8eace5b3555606e684739bef5bcdfcfe68235257 That's "x86/boot: Omit compression buffer from PE/COFF image memory footprint" from Ard Biesheuvel; I might be mistaken, but I think he sometimes is active here, hence adding him.
Talking to Tom on IRC, it appears that the use of 'nokaslr' is causing the kernel to be placed as close to the start of memory as possible. The patch in question reduces the size of that allocation, and so the kernel now fits into a smaller free region that overlaps with the GFP_DMA region, and leaving no room for the CMA allocation. Dropping 'nokaslr' from the kernel command line works around the problem. I intend to fix this properly by avoiding the start of DRAM entirely when allocating pages from the EFI stub.