Bug 217635
Summary: | iwlwifi driver broken on Intel 3165 network card | ||
---|---|---|---|
Product: | Drivers | Reporter: | joey.joey586 |
Component: | network-wireless-intel | Assignee: | Default virtual assignee for network-wireless-intel (drivers_network-wireless-intel) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | bagasdotme, hkallweit1, regressions |
Priority: | P3 | ||
Hardware: | Intel | ||
OS: | Linux | ||
Kernel Version: | Subsystem: | ||
Regression: | Yes | Bisected commit-id: | |
Attachments: |
dmesg logs
git bisect log second try of git bisect |
Description
joey.joey586
2023-07-05 16:26:00 UTC
(In reply to joey.joey586 from comment #0) > Created attachment 304552 [details] > dmesg logs > > Distro: Arch Linux > Kernel version: 6.4.1.arch1-1 > Happens on mainline kernel? : YES (linux-mainline 6.4-1) > Note: linux-mainline 6.4.1 is not available at time of this writing > > Arch linux bug: > https://bugs.archlinux.org/task/78984 > > Summary: > No network access even after connecting to wifi. Websites don't load, ping > doesn't work. > This didn't happen on kernel 6.3.x (specifically 6.3.9, the last 6.3 kernel > provided by Arch). > > Bug happens on both Arch-provided kernel and mainline kernel > > Steps to reproduce: > 1) On fresh boot, connect to a wifi network > a) Make sure wifi password is not saved beforehand > 2) Ping a url with terminal, or open a website with browser > 3) Ping fails to work / website doesn't load Can you perform bisection between v6.3 and v6.4? I guess this is related to similar issue reported on LKML [1]. [1]: https://lore.kernel.org/lkml/CAAJw_ZueYAHQtM++4259TXcxQ_btcRQKiX93u85WEs2b2p19wA@mail.gmail.com/ (In reply to Bagas Sanjaya from comment #2) > I guess this is related to similar issue reported on LKML [1]. FWIW, that is about mainline, this is about 6.4 -- and the problem looks different as well. So I doubt somewhat that these are the same problems. A bisection would be really helpful. Bisecting now, might take a while (In reply to joey.joey586 from comment #4) > Bisecting now, might take a while Sorry, I'm unable to bisect. After running 'git bisect bad' once, the kernel fails to build with error: make[5]: *** No rule to make target 'zip.h', needed by '/home/poweruser/Downloads/linux-git/src/linux-torvalds/tools/bpf/resolve_btfids/libbpf/staticobjs/libbpf.o'. Stop. make[4]: *** [Makefile:157: /home/poweruser/Downloads/linux-git/src/linux-torvalds/tools/bpf/resolve_btfids/libbpf/staticobjs/libbpf-in.o] Error 2 make[3]: *** [Makefile:63: /home/poweruser/Downloads/linux-git/src/linux-torvalds/tools/bpf/resolve_btfids//libbpf/libbpf.a] Error 2 make[2]: *** [Makefile:76: bpf/resolve_btfids] Error 2 make[1]: *** [Makefile:1440: tools/bpf/resolve_btfids] Error 2 make[1]: *** Waiting for unfinished jobs.... CALL scripts/checksyscalls.sh make: *** [Makefile:358: __build_one_by_one] Error 2 ==> ERROR: A failure occurred in build(). Aborting... makepkg -efs 7.42s user 2.72s system 114% cpu 8.843 total Same error happens on every git bisect bad I have zero experience with kernel development/building, so I have no idea what to do. (In reply to joey.joey586 from comment #5) > (In reply to joey.joey586 from comment #4) > > Sorry, I'm unable to bisect. After running 'git bisect bad' once, the kernel > fails to build That's not the kernel, that's the kernel tools; you don't need those to run a kernel. > with error: > > make[5]: *** No rule to make target 'zip.h', needed by You likely need a package called libzip-devel (or something like that -- whatever provides zip.h on your distro). libzip provides zip.h, so why is it complaining about this error? Here's my terminal output: joey@joey ~ % pacman -Qo /usr/include/zip.h /usr/include/zip.h is owned by libzip 1.10.0-1 From a quick search it seems there was a bug: https://lore.kernel.org/all/ZFJ39HKzBUg64QPO@kernel.org/ But again: you don't need to build the tools, just build the kernel (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #8) > From a quick search it seems there was a bug: > https://lore.kernel.org/all/ZFJ39HKzBUg64QPO@kernel.org/ > > But again: you don't need to build the tools, just build the kernel How do I do that? I tried replacing 'make all' with 'make vmlinux' (https://www.kernel.org/doc/makehelp.txt), but it still complains about zip.h I'm using the Arch PKGBUILD for linux-git here: https://aur.archlinux.org/packages/linux-git (In reply to joey.joey586 from comment #9) > but it still complains about zip.h Guess that tool then is needed during build. Apologies. Did a quick look, sadly could not find a fix for this. Try "git bisect skip", with a but if luck it will avoid the problematic area (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #10) > (In reply to joey.joey586 from comment #9) > > but it still complains about zip.h > Did a quick look, sadly could not find a fix for this. Try "git bisect > skip", with a but if luck it will avoid the problematic area "git bisect skip" works, thanks! And I think I found the bad commit: [bd54f3c29077f23dad92ef82a78061b40be30c65] wifi: mac80211: generate EMA beacons in AP mode Here's my terminal log: joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[master] % git bisect start status: waiting for both good and bad commits joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[master|bisect] % git bisect good v6.3 status: waiting for bad commit, 1 good commit known joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[master|bisect] % git bisect bad v6.4 Bisecting: 8012 revisions left to test after this (roughly 13 steps) [d42b1c47570eb2ed818dc3fe94b2678124af109d] Merge tag 'devicetree-for-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux git bisect bad v6.4 4.53s user 1.23s system 99% cpu 5.801 total joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[v6.4-rc1~128|bisect] % git bisect skip Bisecting: 8012 revisions left to test after this (roughly 13 steps) [1423885c84a5b3a53b79bcf241b18124d0d7cba6] cxl/hdm: Use 4-byte reads to retrieve HDM decoder base+limit joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[v6.4-rc1~68^2~2^2~3|bisect] % git bisect skip Bisecting: 8012 revisions left to test after this (roughly 13 steps) [2124f79de6a909630d1a62b01ecc32db9f967181] mm: shrinkers: fix debugfs file permissions joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[v6.4-rc1~103^2~12|bisect] % git bisect skip Bisecting: 8012 revisions left to test after this (roughly 13 steps) [c9fa320b00cff04980b8514d497068e59a8ee131] xfrm: copy_to_user_state fetch offloaded SA packets/bytes statistics joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[v6.4-rc1~132^2~231^2~4|bisect] % git bisect skip Bisecting: 8012 revisions left to test after this (roughly 13 steps) [bd54f3c29077f23dad92ef82a78061b40be30c65] wifi: mac80211: generate EMA beacons in AP mode joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[v6.4-rc1~132^2~151^2~82|bisect] % After that last "git bisect skip" the kernel compiles successfully and the wifi stopped crashing. I'm missing something here; you referred to the last "git bisect skip" which afaics is
> rc1~132^2~231^2~4|bisect] % git bisect skip
> Bisecting: 8012 revisions left to test after this (roughly 13 steps)
> [bd54f3c29077f23dad92ef82a78061b40be30c65] wifi: mac80211: generate EMA
> beacons in AP mode
Which sounds like you need to mark bd54f3c29077f23dad92ef82a78061b40be30c65 as bad and continue.
Created attachment 304570 [details]
git bisect log
log of the git bisect
(In reply to joey.joey586 from comment #13) > log of the git bisect thx for this, sorry, looked a bit odd earlier from here. Forwarded the report to the developers: https://lore.kernel.org/all/6f8715af-95c2-8333-2b32-206a143ebb52@leemhuis.info/ (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #14) > Forwarded the report to the developers: > https://lore.kernel.org/all/6f8715af-95c2-8333-2b32-206a143ebb52@leemhuis. > info/ Thanks, I appreciate it. Could you please recheck you bisection? Johannes doubts it was correct: https://lore.kernel.org/all/047c7bdc8057175f2bb78981a5f1a1aa6b493153.camel@sipsolutions.net/ (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #16) > Could you please recheck you bisection? Johannes doubts it was correct: Alright, I'll redo the bisect. (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #16) > Could you please recheck you bisection? Johannes doubts it was correct: Redid the bisection, got a different result: joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[v6.4-rc1~132^2~254|bisect] % git bisect good 5fc3f6c90cca19e4b13433621d9c2dcae875f4d7 is the first bad commit commit 5fc3f6c90cca19e4b13433621d9c2dcae875f4d7 Author: Heiner Kallweit <hkallweit1@gmail.com> Date: Sat Mar 18 22:50:10 2023 +0100 r8169: consolidate disabling ASPM before EPHY access Now that rtl_hw_aspm_clkreq_enable() is a no-op for chip versions < 32, we can consolidate disabling ASPM before EPHY access in rtl_hw_start(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> drivers/net/ethernet/realtek/r8169_main.c | 42 +++---------------------------- 1 file changed, 3 insertions(+), 39 deletions(-) joey@joey ~/Desktop/linux-git/src/linux-torvalds (git)-[bisect/good-c3892e8c51d27f73341eab042afa147a7ca2b966|bisect] % Created attachment 304610 [details]
second try of git bisect
full 'git bisect' log
r8169? that's somewhat odd as well, but who knows. Could you try to revert it ontop of a kernel version you know is affected (e.g. 6.4 or 6.4.1) to verify this result? And a shot in the dark: does blacklisting the driver change anything? How can I do that? Sorry, I'm not familiar with git. and I use localmodconfig to build the kernel, so r8169 driver might not even exist in the kernel (In reply to joey.joey586 from comment #21) > How can I do that? Sorry, I'm not familiar with git. git checkout --detach v6.4 git revert 5fc3f6c90cca19e4b13433621d9c2dcae875f4d7 --no-edit [build again] (In reply to joey.joey586 from comment #22) > and I use localmodconfig to build the kernel, so r8169 driver might not even > exist in the kernel In a earlier dmesg it was loaded iirc (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #20) > And a shot in the dark: does blacklisting the driver change anything? Blacklisting r8169 fixes the issue. (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #23) > In a earlier dmesg it was loaded iirc You're right, it is loaded. I didn't realize my ethernet is a realtek. I apologize. I'll rebuild the kernel later tonight. (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #23) > (In reply to joey.joey586 from comment #21) > > How can I do that? Sorry, I'm not familiar with git. > > git checkout --detach v6.4 > git revert 5fc3f6c90cca19e4b13433621d9c2dcae875f4d7 --no-edit > [build again] Reverting 5fc3f6c90cca19e4b13433621d9c2dcae875f4d7 fixes the issue! thx for confirmung, told relevant people by mail (see link above) Please test whether the following fixes the issue: diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 9445f04f8..2b3aa6b45 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -2747,6 +2747,13 @@ static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable) return; if (enable) { + /* On these chip versions ASPM can harm even other + * PCI devices. + */ + if (tp->mac_version == RTL_GIGA_MAC_VER_42 || + tp->mac_version == RTL_GIGA_MAC_VER_43) + return; + rtl_mod_config5(tp, 0, ASPM_en); rtl_mod_config2(tp, 0, ClkReqEn); -- 2.41.0 (In reply to Heiner Kallweit from comment #27) > Please test whether the following fixes the issue: Thanks, the patch fixes the issue. :) (In reply to Heiner Kallweit from comment #27) > Please test whether the following fixes the issue: Thx for this. > + /* On these chip versions ASPM can harm even other > + * PCI devices. The comment makes me wonder: might this also fix or somehow be related to other ASPM related regressions reports with r8169 that as of now are unfixed afaik? I mean these: https://lore.kernel.org/all/9ebb43ee-52a1-c77d-d609-ca447a32f3e6@posteo.at/ https://lore.kernel.org/all/c3465166-f04d-fcf5-d284-57357abb3f99@freenet.de/ (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #29) > (In reply to Heiner Kallweit from comment #27) > > Please test whether the following fixes the issue: > > Thx for this. > > > + /* On these chip versions ASPM can harm even other > > + * PCI devices. > > The comment makes me wonder: might this also fix or somehow be related to > other ASPM related regressions reports with r8169 that as of now are unfixed > afaik? I mean these: > > https://lore.kernel.org/all/9ebb43ee-52a1-c77d-d609-ca447a32f3e6@posteo.at/ > https://lore.kernel.org/all/c3465166-f04d-fcf5-d284-57357abb3f99@freenet.de/ It's unrelated IMO. Chip versions 42 + 43 have the same MAC, and letting this MAC version trigger a transition to a deeper ASPM state apparently can disturb the root port in a way that even communication with other PCIe devices is affected. The logic in the fix here has been there before and simply was accidentally removed by "r8169: consolidate disabling ASPM before EPHY access". The other reports refer to chip version 49 (RTL8168h). This chip version runs fine with ASPM up to L1.1. Interestingly these reports so far are only about systems where BIOS instructs the OS not to touch ASPM settings. This needs some more analysis. (In reply to Heiner Kallweit from comment #30) > It's unrelated IMO. […] Many thx for the assessment, much appreciated. Is the fix already in the mainline kernel? I can't find it in the changelogs. (In reply to joey.joey586 from comment #32) > Is the fix already in the mainline kernel? I can't find it in the changelogs. Here it is: https://git.kernel.org/torvalds/c/162d626f3013215b82b6514ca14f20932c7ccce5 Thanks. I probably should clarify a bit. By 'mainline' I mean the kernel.org website. What I'm trying to say is: the fix is not present in the 6.4.5 kernel based on the changelogs for that kernel in the kernel.org website. When will it be added to 6.4.x ? (In reply to joey.joey586 from comment #34) > By 'mainline' I mean the kernel.org website. The term "mainline" normally means "Linus git" tree. (In reply to joey.joey586 from comment #35) > When will it be added to 6.4.x ? Just checked, it now queued for the next release of that series. (In reply to The Linux kernel's regression tracker (Thorsten Leemhuis) from comment #36) > The term "mainline" normally means "Linus git" tree. Ah, I see. > Just checked, it now queued for the next release of that series. and thanks for the info. Fixed with kernel v6.4.7 |