Bug 219061
Summary: | Memory leaks on vmalloc crash every 32 bit kernel after a commit in 6.6.24 branch | ||
---|---|---|---|
Product: | Memory Management | Reporter: | makemehappy |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | NEW --- | ||
Severity: | high | CC: | harshit.m.mogalapalli, makemehappy, regressions |
Priority: | P3 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | No | Bisected commit-id: | |
Attachments: |
the log of a filed machine
the config of the failed kernel and machine This is a various boot of some working and some not working kernels Log of a Real working 6.6.3 session attachment-15542-0.html attachment-17344-0.html attachment-24185-0.html |
Description
makemehappy
2024-07-19 10:51:43 UTC
Created attachment 306585 [details]
the config of the failed kernel and machine
Created attachment 306587 [details]
This is a various boot of some working and some not working kernels
This is the log of a working session on the same machine with a working kernel (6.6.23); everything is fine, no vmalloc errors and the memory manager work perfect.
Comment on attachment 306585 [details]
the config of the failed kernel and machine
You can see the repeated vmalloc errors and this in the end slowly kill the session.
Comment on attachment 306584 [details]
the log of a filed machine
This is the .config I normally apply to my 32 bit machines. It is the same used and perfectly fine working in <6.6.24 kernels. Last working kernel 6.6.23.
Harshit Mogalapalli, Please take a look. Harshit Mogalapalli is not on this bug tracker, please send your finding to LKML and CC harshit.m.mogalapalli@oracle.com sorry how can I send mail to LKML? I always sent an email to harshit and all the people involved to the buggy commit with no luck. (In reply to Artem S. Tashkinov from comment #6) > Harshit Mogalapalli is not on this bug tracker, please send your finding to > LKML and CC harshit.m.mogalapalli@oracle.com Ok sent. Created attachment 306589 [details]
Log of a Real working 6.6.3 session
The other working log is infact just a a multiple boots log with working and not working kernes. Before found out the bubby kernel was 6.6.4 and everything I built and tested various kernels and that working log before contain a lot of sessions in it, some working and some not. This is just a working 6.6.3 session.
Comment on attachment 306587 [details]
This is a various boot of some working and some not working kernels
To discover in which version of the kernel the bug was introduced I tested everything over 6.6.21 (my starting point) to find the last working kernel and found out it was 6.6.23. In this log some boots and logs of sessions with kernel with the bug (everyghing over 6.6.23) and kernel with no bug (6.6.24 and everything over it).
Hi, Note that commit: 9a98ab01e3ac ("platform/x86: hp-bioscfg: Fix error handling in hp_add_other_attributes()") is not in the range of 6.6.23..6.6.24 , it is present since 6.6.4, so that couldn't be the cause of regression Someone in LKML asked me (and I was notified via e-mail if I tried 6.6.40 and 6.10.0 from kernel.org. Yes, I can confirm this bug is present in EVERY kernel after 6.6.24. Related with Harshit commit I can exclude this is the problem becouse, as he said, it is there from version 6.6.4. I'm not a programmer then for me this bug is related - like someone pointed me in - to a vmalloc() call (without a vfree()). Obviously I confirm I'm interested in testing every possible future patch against this bug. (In reply to makemehappy from comment #12) > > Related with Harshit commit I can exclude this is the problem becouse, as he > said, it is there from version 6.6.4. So 6.6.3 works fine? If it does: could you try if reverting the culprit on 6.10 or the latest 6.6.y release and see if this fixes things? Hi Thorsten. I can confirm kernel 6.6.4 is fine too and everything after it is fine. The bug come from 6.6.23 (fine) and 6.6.24. Any kernel after it, it has the problem. A patch -p1 -R to remove the Harshit patch didn't solved the problem. Reading many commits in various kernels pointed me against that Harshit patch, becouse I think this bug problem is related, like I wrote in previous comments, this is a problem with vmalloc() call (without a vfree()) and that pretty much kill the machine very fast in my many environments. JUST A GUESS. I'm not a kernel developer! So it can be other, just a guess. Hope to have clarified everything. Now just the time to push "Save Changes and I probably even can't reboot the machine becouse on this debian 32 bit VMware Virtual Guest I start to have browser tab crashes and terminal hoops after I start to get tons of this: Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 24576 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: alloc_vmap_area: 104 callbacks suppressed Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size (In reply to makemehappy from comment #14) > The bug come from 6.6.23 (fine) and 6.6.24. Any kernel after it, it has the > problem. So 6.6.24 is also fine, but 6.6.25 is not? If that's the case: could you bisect? No sorry!!! 6.6.23 IS FINE, 6.6.24 IS NOT. So if a bisect has to be done, it has to be done between 6.6.23 FINE and 6.6.24 NOT FINE. Thankx. to recap: kernel 6.6.23 the LAST working kernel (no bug) kernel 6.6.24 the FIRST with the big (not working, YES BUG) Everything I tried after 6.6.24 HAS the bug, INCLUDED 6.6.41 and 6.10,1. 6.10.1 is the last tested, just a moment ago, while writing this comments, and after a whiled (very soon, 10 firefox tabs opened and two terminals, one running the log) started to produce vmaalloc errors and crashed the machine in minutes (browser tabs crashed and when try to open a new terminal it will produce an output similar to this: Failed to Open PTY, impossible to allocate memory) and so on. I had to reset the machine (and BTW rebooted to kernel 6.6.23 to have it working). (In reply to makemehappy from comment #16) > if a bisect has to be done, it has to be done between 6.6.23 FINE and 6.6.24 Would be great if you could take care of that, as I doubt anyone will look into this otherwise, as it could be caused by changes in various subsystems. Created attachment 306615 [details] attachment-15542-0.html Hi Thorsten. I can confirm kernel 6.6.4 is fine too and everything after it is fine. The bug come from 6.6.23 (fine) and 6.6.24. Any kernel after it, it has the problem. A patch -p1 -R to remove the Harshit patch didn't solved the problem. Reading many commits in various kernels pointed me against that Harshit patch, becouse I think this bug problem is related, like I wrote in previous comments, this is a problem with vmalloc() call (without a vfree()) and that pretty much kill the machine very fast in my many environments. JUST A GUESS. I'm not a kernel developer! So it can be other, just a guess. Hope to have clarified everything. Now just the time to push "Save Changes and I probably even can't reboot the machine becouse on this debian 32 bit VMware Virtual Guest I start to have browser tab crashes and terminal hoops after I start to get tons of this: Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 24576 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:37 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: alloc_vmap_area: 104 callbacks suppressed Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size Jul 24 17:04:42 debian1232vm kernel: vmap allocation for size 20480 failed: use vmalloc=<size> to increase size The running kernel s a brand new 32 bit 6.10.1 downloaded from kernel.org and compiled. On Wednesday, July 24, 2024 at 11:11:42 AM GMT+2, bugzilla-daemon@kernel.org <bugzilla-daemon@kernel.org> wrote: https://bugzilla.kernel.org/show_bug.cgi?id=219061 The Linux kernel's regression tracker (Thorsten Leemhuis) (regressions@leemhuis.info) changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |regressions@leemhuis.info --- Comment #13 from The Linux kernel's regression tracker (Thorsten Leemhuis) (regressions@leemhuis.info) --- (In reply to makemehappy from comment #12) > > Related with Harshit commit I can exclude this is the problem becouse, as he > said, it is there from version 6.6.4. So 6.6.3 works fine? If it does: could you try if reverting the culprit on 6.10 or the latest 6.6.y release and see if this fixes things? Created attachment 306616 [details] attachment-17344-0.html Hello, the point is simple: I never had a bisect, I read documentation about it and it sems not that clear to me, I'm not sure I can handle it, I don't have spare time and I'm not a developer. I reported this and loose time to diocument it and 'm also conscious there are no more many 32 bit systems around, myself I don't have real machines running 32 bit OS; Then this is still a nasty bug, it is not something cosmetic, it crash machines and without a fix the kernel is broken. It is better, in this case, to say 32 bit not interest us anymore, and so we drop support. Regards MS On Wednesday, July 24, 2024 at 05:45:27 PM GMT+2, <bugzilla-daemon@kernel.org> wrote: https://bugzilla.kernel.org/show_bug.cgi?id=219061 --- Comment #17 from The Linux kernel's regression tracker (Thorsten Leemhuis) (regressions@leemhuis.info) --- (In reply to makemehappy from comment #16) > if a bisect has to be done, it has to be done between 6.6.23 FINE and 6.6.24 Would be great if you could take care of that, as I doubt anyone will look into this otherwise, as it could be caused by changes in various subsystems. Created attachment 306617 [details] attachment-24185-0.html No sorry!!! 6.6.23 IS FINE, 6.6.24 IS NOT. So if a bisect has to be done, it has to be done between 6.6.23 FINE and 6.6.24 NOT FINE. Thankx. to recap: kernel 6.6.23 the LAST working kernel (no bug) kernel 6.6.24 the FIRST with the big (not working, YES BUG) Everything I tried after 6.6.24 HAS the bug, INCLUDED 6.6.41 and 6.10,1. 6.10.1 is the last tested, just a moment ago, while writing this comments, and after a whiled (very soon, 10 firefox tabs opened and two terminals, one running the log) started to produce vmaalloc errors and crashed the machine in minutes (browser tabs crashed and when try to open a new terminal it will produce an output similar to this: Failed to Open PTY, impossible to allocate memory) and so on. I had to reset the machine (and BTW rebooted to kernel 6.6.23 to have it working). PS; I don't know if my messages ON BUGZILLA are also forwarted to this list or I do have to reply there and here. On Wednesday, July 24, 2024 at 05:26:42 PM GMT+2, <bugzilla-daemon@kernel.org> wrote: https://bugzilla.kernel.org/show_bug.cgi?id=219061 --- Comment #15 from The Linux kernel's regression tracker (Thorsten Leemhuis) (regressions@leemhuis.info) --- (In reply to makemehappy from comment #14) > The bug come from 6.6.23 (fine) and 6.6.24. Any kernel after it, it has the > problem. So 6.6.24 is also fine, but 6.6.25 is not? If that's the case: could you bisect? |