Greetings, As reported a while back at (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1062421) against kernel 6.5 (still present on kernel 6.9.12), iwlegacy ooopses on iwl4965 hardware. The bug report contains a lot of auto-collected information. Please ping me if anything else is needed. Thanks! Martin-Éric
Is this reproducible with 6.10.3 or 6.6.44?
This regression was introduced around 6.5 and remains present in releases up to 6.9.12. 6.10.3 is being build on Debian as we speak. I'll know more in a few hours.
Created attachment 306668 [details] kernel oops 6.10.3 Apparently, yes, it still applies to 6.10.3, as per attachment.
I fear no developer will look into this unless you find the change that broke things using a git bisection. Could you perform one? https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
Created attachment 306803 [details] possible fix for iwlegacy Hi Martin-Éric, could you please test whether the attached patch fixes this? You should be able to do that by following the instructions at: https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4 (although that is hosted on Salsa which is having an outage right now).
Created attachment 306826 [details] Trace from patched 6.11.0-rc6 kernel Jumping in as someone with an HP Compaq 8510w with an Intel 4965 wireless adapter. My testing seems to show the patch doesn't completely fix the issue when applied to kernel-6.11.0-0.rc6.
> Jumping in as someone with an HP Compaq 8510w with an Intel 4965 wireless > adapter. My testing seems to show the patch doesn't completely fix the issue > when applied to kernel-6.11.0-0.rc6. That error message shows that the target of the memcpy() is still "&out_cmd->cmd.payload" and not "payload", so the patch was not actually applied in your build.
(In reply to Ben Hutchings from comment #7) > > Jumping in as someone with an HP Compaq 8510w with an Intel 4965 wireless > > adapter. My testing seems to show the patch doesn't completely fix the > issue > > when applied to kernel-6.11.0-0.rc6. > > That error message shows that the target of the memcpy() is still > "&out_cmd->cmd.payload" and not "payload", so the patch was not actually > applied in your build. So sorry about that! Just got around to testing with the patch _actually_ applied. Not seeing any traces or errors on the journal with the patch applied. I do still see occasional disassociation <-> reassociation events ("Reason: 2=PREV_AUTH_NOT_VALID"). It was reproducible for 3 speedtests in a row, and then suddenly got better. I see the same behavior without the patch applied, so perhaps it's not related.
(In reply to Ben Hutchings from comment #5) > Created attachment 306803 [details] > possible fix for iwlegacy > > Hi Martin-Éric, could you please test whether the attached patch fixes this? > > You should be able to do that by following the instructions at: > https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id- > 1.6.6.4 > (although that is hosted on Salsa which is having an outage right now). The 'test-patch' command barfed. It tried configuring for flavor 'pae' instead of the expected '686-pae' flavor. Bug filed against 'devscripts' at Debian.
Created attachment 306849 [details] dmesg with above patch The patch indeed seems to quiet down the iwl4965 messages, but it introduced the following: WARNING: CPU: 0 PID: 1 at arch/x86/mm/pti.c:394 pti_clone_pgtable+0x2a1/0x2dc The trace for this appears in the above dmesg output.
(In reply to Martin-Éric Racine from comment #10) > Created attachment 306849 [details] > dmesg with above patch > > The patch indeed seems to quiet down the iwl4965 messages, but it introduced > the following: > > WARNING: CPU: 0 PID: 1 at arch/x86/mm/pti.c:394 pti_clone_pgtable+0x2a1/0x2dc > > The trace for this appears in the above dmesg output. It can't have introduced that warning, because that was emitted before the driver was even loaded.
(In reply to Ben Hutchings from comment #11) > (In reply to Martin-Éric Racine from comment #10) > > Created attachment 306849 [details] > > dmesg with above patch > > > > The patch indeed seems to quiet down the iwl4965 messages, but it > introduced > > the following: > > > > WARNING: CPU: 0 PID: 1 at arch/x86/mm/pti.c:394 > pti_clone_pgtable+0x2a1/0x2dc > > > > The trace for this appears in the above dmesg output. > > It can't have introduced that warning, because that was emitted before the > driver was even loaded. It's indeed extremely unlikely to have introduced it. Anyhow, at this point, with two people having tested it on real hardware, I think that the patch effectively fixes the iwl4965 kernel oops. As to what causes the above PTI warnings, that's a separate issue.
PS: I welcome pointers on which module should get the bug report about the above PTI oops.
(In reply to Martin-Éric Racine from comment #13) > PS: I welcome pointers on which module should get the bug report about the > above PTI oops. That was already reported in <https://lore.kernel.org/all/e541b49b-9cc2-47bb-b283-2de70ae3a359@roeck-us.net/>. The fix went into 6.11-rc3 but is not in 6.10-stable yet.
(In reply to Ben Hutchings from comment #14) > (In reply to Martin-Éric Racine from comment #13) > > PS: I welcome pointers on which module should get the bug report about the > > above PTI oops. > > That was already reported in > <https://lore.kernel.org/all/e541b49b-9cc2-47bb-b283-2de70ae3a359@roeck-us. > net/>. The fix went into 6.11-rc3 but is not in 6.10-stable yet. Actually it's in 6.10.10.
The PTI issue seems to be fixed in Debian 6.10.11-1, but the iwl4965 issue isn't.
Created attachment 306913 [details] dmesg 6.10.11-686-pae
Created attachment 306946 [details] dmesg 6.10.12 Still not fixed as of Debian 6.10.12-1.
Created attachment 307008 [details] dmesg 6.11.3 Not fixed as of 6.11.3.
I really have to wonder whether the fix ever got merged at all. I still get the same kernel oops with 6.10.12 and 6.11.3 as before.
(In reply to Martin-Éric Racine from comment #20) > I really have to wonder whether the fix ever got merged at all. I still get > the same kernel oops with 6.10.12 and 6.11.3 as before. I don't believe it did. I still see the issue with 6.11.4.
The fix has been merged and will be included in v6.12-rc6. It's also under review for inclusion on the 6.1, 6.6, and 6.11 stable branches.
Merged into 6.11.7. Kernel oops is gone. Thanks. However, 6.11.7 also merged plenty of related "fixes" as a result of which the connection is now somewhat slow and unstable, but that's for a different bug report. It however should be noted that this driver worked fine until about kernel 6.4, when someone decided that refactoring a bunch of code took precedence over the old adage "if it ain't broken, don't fix it." Given this, I really have to urge caution when pondering the necessity of backporting changes to kernel 6.1, which is the only one currently giving tried and proven, reliable WiFi on this chipset.
Unless I'm mistaken, this has now been merged into every possible stable release. We can probably close this?