Created attachment 273301 [details]
I'm using kernel 4.14.8 and linux-firmware-20171206 with an intel wifi 7265D card. I've been reading that there are (were?) some issues with kernel 4.14 and iwlwifi (most of them were for iwlwifi 8260) but nothing of what I've read there has solved my problem. If this is duplicated in anyhow, I'm sorry for wasting your time.
I can reproduce this bug in:
- Kernel 4.14.8
- Linux-firmware 20171206
- Iwlwifi 7265D with firmware 29.
- Connected to a 5Ghz WiFi (don't know if it's related, and I haven't tried in a 2.4Ghz). Can do if you need to.
- Performing something at full speed (+- 40MB/s) for a while (less than 5 minutes), for example a rsync in my LAN.
I'm attaching my dmesg and my kernel config. Feel free to ask anything you need.
Created attachment 273303 [details]
Another thing, I come from a kernel 4.12.12, with no issues there (same config, just made a make oldconfig to update it).
Same happens with kernel 4.15.0-rc5
Created attachment 273321 [details]
dmesg kernel 4.15
Created attachment 273323 [details]
With kernel 4.12 and firmware 29 I don't get the same speeds that I get with >=4.14 and firmware 29 (28MB/s in kernel 4.12 vs 44MB/s in kernel 4.14/4.15), but I don't get any error.
Hardware errors as shown in the first problem:
[ 86.216784] iwlwifi 0000:02:00.0: Hardware error detected. Restarting.
aren't fun to debug.
I can't think about anything that could improve the bandwidth either.
You run the same firmware in both configurations, and 7265D is a fairly old device so that we don't really add new stuff for that device.
What could help here is to run on 4.12 and install our backport tree. This means that you'll have our latest driver but on 4.12.
This will tell us if the difference comes from the kernel (PCI / Net) or from iwlwifi itself.
Our backport tree is also much more convenient to bisect.
The backport tree is here:
BTW: are you connect to an AC network?
Can you share the output of iw wlp2s0 link?
Created attachment 273325 [details]
dmesg with backport iwlwifi
very similar behavior, but slightly different.
I have reached again 44 MB/s (most 42 than 44, but still a noticeable difference). When the error has happened, I haven't been disconnected from the WiFi, but instead the transfer rate dropped to 5MB/s.
Yes, I am connected to an AC Network. Here's the output you requested:
Connected to 80:2a:a8:c8:79:e4 (on wlp2s0)
RX: 26323473 bytes (105204 packets)
TX: 1965211233 bytes (1276643 packets)
signal: -53 dBm
tx bitrate: 866.7 MBit/s VHT-MCS 9 80MHz short GI VHT-NSS 2
bss flags: short-slot-time
dtim period: 1
beacon int: 100
not exactly the same: now you have
iwlwifi 0000:02:00.0: HCMD skipped: index (199) 199 198
Can you please record tracing?
I am afraid I'll have to ask you to try to go back in the backport tree and find a commit that works.
You can try commit 335a37f70b86095794a048b34b70309117bf113b there.
I just hope that you'll be able to compile that old commit on 4.12. Sometimes, old old backport commits can't work on newer kernel and 4.12 wasn't really out when we merged 335a37f70b86095794a048b34b70309117bf113b.
I'm unable to set "CONFIG_IWLWIFI_TRACING", which option should I check in order to see it available?
Anyway, I cannot compile that commit with kernel 4.12.
I found that commit cebfea4918852b2cf9c5eec2f0a60a4280090e65 comiles with 4.12
Let's hope that on this commit, the bug doesn't reproduce and then we'll be able to bisect.
Hi, with that commit I've sent >20GB with no issues (usually, the bug appeared when sending 2GB or similar), in dmesg the only thing I see is:
[ 325.157294] userif-2: sent link down event.
[ 325.157296] userif-2: sent link up event.
but I didn't get a reconnection (at least NetworkManager didn't show anything). I'm sending the file at 40MB/s... everything is perfect.
So, what is the next step?
Then... we can start git bisect.
git bisect start
git bisect good cebfea4918852b2cf9c5eec2f0a60a4280090e65
git bisect bad origin/master
Then for each iteration you can either say:
git bisect bad
git bisect good
After a few iterations, it'll point to a commit, this is what we need.
Hi, these are the results:
99e6ccb18602eceec7f951fd989d5e6976f35097 is the first bad commit
Author: Emmanuel Grumbach <email@example.com>
Date: Sun Jul 16 12:28:05 2017 +0300
iwlwifi: pcie: support short Tx queues for A000 device family
This allows to modify TFD_TX_CMD_SLOTS to a power of 2
which is smaller than 256.
Note that we still need to set values to wrap at 256
into the scheduler's write pointer, but all the rest of
the code can use shorter transmit queues.
Signed-off-by: Emmanuel Grumbach <firstname.lastname@example.org>
Reviewed-by: Coelho, Luciano <email@example.com>
Reviewed-by: ec ger unix iil jenkins <EC.GER.UNIX.IIL.JENKINS@INTEL.COM>
Tested-by: ec ger unix iil jenkins <EC.GER.UNIX.IIL.JENKINS@INTEL.COM>
:040000 040000 6f8e4ca3caf27885d0db6b8f7cc00c3755053e7e 265a57860b1c1b0b0f082c64019b08044ecd2cf5 M drivers
:100644 100644 bf2b3c23ef9019d1bb5464806c24f9dd09f030bc 37124383ab7b928d9e0896d0653c507baebbc0b3 M versions
Can you please double check that the commit right before works?
This is very surprising because this patch is fairly trivial but it may be totally bogus.
I'll look at the patch.
Hi, sure, no problem. Give me 24 hours and I'll come back to you asap :)
Created attachment 273345 [details]
it's the same output. I've attached the trace git command; git output; dmesg... and so on.
Created attachment 273361 [details]
this is a selective revert of the patch you found in the bisect problem.
Please check if that helps, in the meantime, I'll try to understand why this commit cause you trouble.
Annoying ring indexes stuff.
Created attachment 273363 [details]
add debug data
I looked again at the patch you found with the bisection, and I can't understand why it's causing any issues. It should really be a noop for the device you own.
Please apply the attached patch and try to reproduce while you have tracing running.
Created attachment 273365 [details]
I think I found the bug. Please apply the patch attached and let me know.
sorry for the delay, have been a complicated days. The patch works perfect. I've tried it in a kernel 4.14 and transferred more than 15GB with no issues.
I think you can close this.
Thank you so much and have a happy new year :)
Thanks for testing and having bisected that.
I'd have never been able to fix this bug without your bisection, it is tremendously helpful.
Happy new year to you as well.