Bug 196355 - iwlwifi 8260: after update to 4.12, dhcpcd cannot get IP
Summary: iwlwifi 8260: after update to 4.12, dhcpcd cannot get IP
Status: CLOSED CODE_FIX
Alias: None
Product: Networking
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: David Ahern
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-13 10:26 UTC by Yaroslav Isakov
Modified: 2019-03-03 06:40 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.12.0
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
syslog before and after suspend (9.71 KB, text/plain)
2017-07-13 14:36 UTC, Yaroslav Isakov
Details
New error (12.64 KB, text/plain)
2017-07-13 20:20 UTC, Yaroslav Isakov
Details
dhcpcd log from 4.11 (1.41 KB, text/plain)
2017-07-19 12:34 UTC, Yaroslav Isakov
Details
dhcpcd log from 4.12 (1.94 KB, text/plain)
2017-07-19 12:35 UTC, Yaroslav Isakov
Details

Description Yaroslav Isakov 2017-07-13 10:26:16 UTC
Hello! So, after boot to kernel 4.12.0, I found that my Intel 8260 WiFi cannot get IP address. I've restarted dhcpcd manually, and it worked, but after suspend-resume cycle, problem reappears - no address. I've rebooted - same problem. After reboot to 4.11.6 - no such problem, dhcpcd can get IP address.

My hardware is Thinkpad T460s
Comment 1 Yaroslav Isakov 2017-07-13 10:28:14 UTC
Here is a dmesg:
[    1.572417] iwlwifi 0000:04:00.0: enabling device (0000 -> 0002)
[    1.579121] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-30.ucode failed with error -2
[    1.579142] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-29.ucode failed with error -2
[    1.579157] iwlwifi 0000:04:00.0: Direct firmware load for iwlwifi-8000C-28.ucode failed with error -2
[    1.591271] iwlwifi 0000:04:00.0: capa flags index 3 larger than supported by driver
[    1.591746] iwlwifi 0000:04:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm
[    1.611860] iwlwifi 0000:04:00.0: Detected Intel(R) Dual Band Wireless AC 8260, REV=0x208
[    1.613747] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[    1.614840] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[    1.760741] iwlwifi 0000:04:00.0 wlp4s0: renamed from wlan0
[    4.004068] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[    4.005272] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[    4.149677] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
[    4.151074] iwlwifi 0000:04:00.0: L1 Enabled - LTR Enabled
Comment 2 Emmanuel Grumbach 2017-07-13 14:17:03 UTC
Hi, can we have the syslog?

I can't see why dhcpd works only after being restarted...
Comment 3 Yaroslav Isakov 2017-07-13 14:36:37 UTC
Created attachment 257477 [details]
syslog before and after suspend
Comment 4 Yaroslav Isakov 2017-07-13 14:38:09 UTC
Attached log file. I've used static IP address now, and the bug is worse - traffic is not working at all. I do not understand how dhcpcd is fixing it, but starting it afterwards fixed traffic again.
Comment 5 Emmanuel Grumbach 2017-07-13 14:53:00 UTC
I can't see anything related to the dhcp ok your syslog. Weird.

I'll ask tracing with -e iwlwifi -e iwlwifi_data.

I have to inform you that this will show ask the data in plaintext.
Comment 6 Yaroslav Isakov 2017-07-13 14:54:04 UTC
Yeah, in this log I'm using static IP, and the traffic is not working. Trying now udhcpc client
Comment 7 Emmanuel Grumbach 2017-07-13 15:10:13 UTC
Can you try also with power_scheme=1 passed as a module parameter to iwlmvm?

Please add this as an option in iwlwifi.conf.
Comment 8 Yaroslav Isakov 2017-07-13 15:12:09 UTC
I found the cause - it's preassoc_mac_addr=2 in wpa_supplicant.conf. When it's disabled - no problem
Comment 9 Yaroslav Isakov 2017-07-13 15:14:36 UTC
I'm not sure where this iwlwifi.conf should be placed - I'm on Gentoo, and there is no such flag. Can I just pass iwlmvm.power_scheme=1 on kernel cmdline?
Comment 10 Yaroslav Isakov 2017-07-13 15:21:16 UTC
Tried to reboot with iwlmvm.power_scheme=1 and preassoc_mac_addr=2 - same result, no address. So it's definitely a cause. But it worked in 4.11...
Comment 11 Emmanuel Grumbach 2017-07-13 15:26:21 UTC
Please check the firmware version in both kernel. We print the firmware version when iwlwifi loads.
Comment 12 Yaroslav Isakov 2017-07-13 15:40:37 UTC
They're the same:
yar@thinkpad ~ $ sudo journalctl -b | egrep "(Linux version|iwlwifi.*firmware version)"
июл 13 18:36:15 thinkpad.lan kernel: Linux version 4.12.0-gentoo (root@thinkpad.lan) (gcc version 5.4.0 (Gentoo 5.4.0-r3 p1.3, pie-0.6.5) ) #6 SMP PREEMPT Thu Jul 13 02:48:16 MSK 2017
июл 13 18:36:16 thinkpad.lan kernel: iwlwifi 0000:04:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm
yar@thinkpad ~ $ sudo journalctl -b -1 | egrep "(Linux version|iwlwifi.*firmware version)"
июл 13 18:35:24 thinkpad.lan kernel: Linux version 4.11.6-gentoo (root@thinkpad.lan) (gcc version 5.4.0 (Gentoo 5.4.0 p1.0, pie-0.6.5) ) #1 SMP PREEMPT Fri Jun 23 18:40:50 MSK 2017
июл 13 18:35:25 thinkpad.lan kernel: iwlwifi 0000:04:00.0: loaded firmware version 27.455470.0 op_mode iwlmvm
Comment 13 Emmanuel Grumbach 2017-07-13 15:48:14 UTC
All right, I'll take a look at the diff in the driver a bit later.
Comment 14 Emmanuel Grumbach 2017-07-13 17:56:09 UTC
I can't have access to my source code right now... It'll have to wait until Sunday...

What you can do in the meantime is to try our backport tree on 4.11 (it won't work on 4.12) and check what happens.
If you can repro the bug there, then it'll super easy to bisect.

The tree is here:
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/
Comment 15 Yaroslav Isakov 2017-07-13 20:20:40 UTC
Created attachment 257479 [details]
New error

I've tried to use 4.11.6 with iwlwifi-backport (master branch), but found another error
Comment 16 Emmanuel Grumbach 2017-07-16 18:21:47 UTC
Ok - that's embarrassing :)

This seems to be happening after another firmware crash, no?

iwlwifi 0000:04:00.0: Firmware error during reconfiguration - reprobe!

This happens when we have a firmware crash while recovering from another crash.
I guess I can fix this crash, but I'd prefer to see the first problem first.

Does this happen all the time?
Comment 17 Emmanuel Grumbach 2017-07-17 11:26:28 UTC
FWIW - I was able to use -27.ucode on the master branch.

But I had to load iwlwifi with led_mode=3

Can you try this and bisect?

thanks!
Comment 18 Yaroslav Isakov 2017-07-17 16:41:35 UTC
Emmanuel, you were right about error - and iwlwifi.led_mode=3 fixed this problem, but I was not able to reproduce original problem - with preassoc_mac_addr=2 everything works properly. I'm not sure how to bisect this... At least in 4.11 it works with and without backport (and in 4.12 it doesn't work, checked this again)
Comment 19 Emmanuel Grumbach 2017-07-17 17:23:07 UTC
The preassoc_mac_addr thing makes no sense to me.
It is supposed to be impacting scanning only and in your case, I can see that you succeed to connect, to the preassoc_mac_addr parameter shouldn't be impacting anything.

I'll take it with my wpa_s colleagues.

So, I understand that:

4.11 / preassoc_mac_addr=0: works
4.11 / preassoc_mac_addr=2: works
4.12 / preassoc_mac_addr=0: does NOT work
4.12 / preassoc_mac_addr=2: works


Am I correct?
Comment 20 Yaroslav Isakov 2017-07-17 17:29:47 UTC
for 4.11 - right. For 4.12 - preassoc_mac_addr=2 - problem, preassoc_mac_addr commented (I think that by default it's 0) - works.

My guess is that it happens like that:
1) wpa_supplicant is programming random mac, which is used for scanning
2) on connect, wpa_supplicant is trying to set MAC, but firmware and/or driver are still keeping old random MAC
3) But dhcpcd and wlan interface are using real mac - so packets are just not returning. Or maybe firmware is setting some kind of filter...

I'll try to do bisect on vanilla sources - 14k commits, but it's probably 14 compiles and reboots...
Comment 21 Emmanuel Grumbach 2017-07-17 17:33:02 UTC
Ok - this starts to be understandable :)
The fact that it works with 4.11 and backport means that our latest driver works.... or that the bug is elsewhere...
The best way to debug this is just to leave tracing open during suspend resume.

We'd need the data as well so:

sudo trace-cmd record -e iwlwifi -e iwlwifi_data -e mac80211 -e cfg80211

*beware* this will include tons of private data.

Our privacy notice is here:
https://wireless.wiki.kernel.org/en/users/drivers/iwlwifi/debugging#privacy_aspects
Comment 22 Yaroslav Isakov 2017-07-17 19:03:41 UTC
It's not quite reproducible on suspend-resume, but on boot I can reproduce it 100%. So I'm not quite sure that there is a way to do tracing. I'll try to do this bisecting.
Comment 23 Yaroslav Isakov 2017-07-17 21:37:38 UTC
So, I was wrong about iwlwifi - sorry!
Here is a commit which broke my wifi - https://github.com/torvalds/linux/commit/cd8966e75ed3c6b41a37047a904617bc44fa481f
rtnetlink: Do not generate notifications for CHANGEADDR event
Comment 24 Emmanuel Grumbach 2017-07-18 03:49:09 UTC
Thanks!

I am moving the bug to the right component. We'll still be CC'ed.
Comment 25 David Ahern 2017-07-18 15:28:31 UTC
Which code base processes the rtnetlink notifications? I'd like to understand why the redundant notification is needed.
Comment 26 Yaroslav Isakov 2017-07-18 23:05:01 UTC
I don;t quite understand the question. I'm using wpa_supplicant-2.6 with kernel 4.12
Comment 27 David Ahern 2017-07-18 23:08:13 UTC
Can you enable debugging for wpa_supplicant and collect the logs for 4.11.6 and 4.12? The revert is easy to do, but I want to understand why wpa_supplicant is not responding to the first rtnetlink message. Hoping that the debug logs will shed some light. Thanks.
Comment 28 Yaroslav Isakov 2017-07-19 12:31:11 UTC
So, I was probably wrong about wpa_supplicant - it looks like on 4.11 dhcpcd is getting two events on mac change, and on 4.12 - only one. But with wpa_supplicant option preassoc_mac_addr=2, MAC is changing two times - first to random (for scan) and back to real one. I'm attaching dhcpcd logs
Comment 29 Yaroslav Isakov 2017-07-19 12:34:37 UTC
Created attachment 257605 [details]
dhcpcd log from 4.11

two mac changes - correct
Comment 30 Yaroslav Isakov 2017-07-19 12:35:24 UTC
Created attachment 257607 [details]
dhcpcd log from 4.12

only one mac change - wrong
Comment 31 David Ahern 2017-07-19 17:24:11 UTC
I see why it breaks. thanks for the logs. patch sent
Comment 32 Emmanuel Grumbach 2017-07-19 18:05:29 UTC
Can you attach it here for educational purposes? :)
Comment 33 David Ahern 2017-07-19 18:10:26 UTC
The patch is just a revert of the change. The explanation of why:

"The duplicate CHANGEADDR event message is sent regardless of link
status whereas the setlink changes only generate a notification when
the link is up. Not sending a notification when the link is down breaks
dhcpcd which only processes hwaddr changes when the link is down."
Comment 34 Emmanuel Grumbach 2017-07-19 18:13:44 UTC
Thanks. Want to close the bug?
Comment 35 David Ahern 2017-07-19 18:16:07 UTC
Waiting for the patch to be picked up.

Note You need to log in before you can comment on or make changes to this bug.