My desktop has an integrated 1RTL8821AE 802.11ac PCIe Wireless Network Adapter, ID 0ec:8821, using the rtl8821ae driver. This occasionally causes a ~100ms latency, as measured using perf. I first noticed this as audio stuttering when Pulseaudio wasn't scheduled in time. I believe rtl8821ae is to blame, since I see no latency problems when the wifi adapter is switched off.
To track down the latency, I:
1. Ran perf sched record for a while.
2. Checked perf sched latency to spot when the latency occurred.
3. Looked in perf sched script's output to see what was going on at the time.
Here's a 100ms scheduling delay in watchdog/2 at 170597.339822:
wpa_supplicant 1168  170597.339814: sched:sched_wakeup: kworker/2:2:22637  success=1 CPU:002
wpa_supplicant 1168  170597.339820: sched:sched_stat_runtime: comm=wpa_supplicant pid=1168 runtime=692443 [ns] vruntime=4718
wpa_supplicant 1168  170597.339822: sched:sched_switch: wpa_supplicant:1168  R ==> watchdog/2:20 
Another in kworker/1 at 171226.630376:
swapper 0  171226.629871: sched:sched_wakeup: skype:3592  success=1 CPU:003
swapper 0  171226.629874: sched:sched_switch: swapper/3:0  R ==> skype:3592 
wpa_supplicant 1168  171226.630372: sched:sched_switch: wpa_supplicant:1168  R ==> migration/1:14 
migration/1 14  171226.630376: sched:sched_switch: migration/1:14  S ==> kworker/1:0:24376 
And one more in at 171604.146285 in watchdog/1:
wpa_supplicant 1168  171604.146276: sched:sched_wakeup: kworker/1:0:24376  success=1 CPU:001
wpa_supplicant 1168  171604.146283: sched:sched_stat_runtime: comm=wpa_supplicant pid=1168 runtime=50022 [ns] vruntime=14207
wpa_supplicant 1168  171604.146284: sched:sched_switch: wpa_supplicant:1168  R ==> watchdog/1:13 
watchdog/1 13  171604.146287: sched:sched_switch: watchdog/1:13  S ==> kworker/1:0:24376 
They all seem to follow wpa_supplicant.
Note - I mispasted the dev id, it's 10ec:8821.
There are two things to try. Kernel 4.2.6 included a patch that reverted an earlier patch that disabled a clearing of RTL8821AE interrupts before they are set again, which led to kernel oops on boot.
First, I would disable MSI interrupts with the following (as root):
echo "msi=0" > /etc/modprobe.d/50-rtl8821ae.conf
modprobe -rv rtl8821ae
modprobe -v rtl8821ae
Then run tests to see if you still have audio stuttering.
If that test still fails, then do the following (again as root):
echo "int_clear=0" > /etc/modprobe.d/50-rtl8821ae.conf
modprobe -rv rtl8821ae
modprobe -v rtl8821ae
Now retest. Report both sets of results.
If neither changes help, please post the exact set of perf commands you used. I have not used that utility before, and I'm having some difficulty in repeating your tests.
Shouldn't that be:
echo "options rtl8821ae msi=0" > /etc/modprobe.d/50-rtl8821ae.conf
I should point out that this has been happening since at least kernel 4.1.6, the version around when I got the machine.
I'm not much of a perf expert either, just read enough of the man pages to try and figure out how to spot scheduling latencies. (Starting point here was the PA log reporting a scheduling delay.)
The commands are as follows. First I run:
# perf sched record
for a few minutes with audio running, or until I hear a glitch. Press Ctrl+C to stop. Then I run
# perf sched latency
This gives a top-like list of the scheduling delays for various processes, their maximum delay and when that occurred. I pick out the worst (~100ms with an unconnected interface, 20-50ms when connected) and note when it happened.
Finally, I run
# perf sched script
and search for the time I noted above. (Searching is like 'less' -- hit / key and enter a pattern.) This, as far as I understand, is a trace of what the scheduler did. I inspect what's going on just before the delay.
What I generally find:
Baseline with interface OFF:
Maximum latencies of about 2ms.
When the interface is up BUT NOT CONNECTED:
I see latencies of ~100ms. This is enough to audibly glitch PA. In 'perf sched script', I see that wpa_supplicant is always scheduled just beforehand.
When the interface is CONNECTED:
I see latencies of around 20-50ms, so higher than the baseline. PA seems to be able to cope with this, so I don't hear a glitch. Also the scheduling pattern in the 'perf sched script' is different - I don't see wpa_supplicant being scheduled just beforehand. But I do see kworker/u8:3 a lot before, not sure if this is doing something for rtl8821ae.
This happens with either no module options or msi=0. I haven't tested int_clear=0 yet.
With *just* 'int_clear=0', I get the same sort of results as with no options or just 'msi=0'.
OK, interrupts are not the problem. Yes, you are correct that the driver name is needed in the options line.
For some reason, I will not be able to duplicate your tests. When I use
perf sched record
my system responds with "invalid or unsupported event: 'sched:sched_switch'". I'm still trying to solve that problem.
When the wifi driver is not connected, the only usage of wpa_supplicant should be to trigger scans. What method are you using to control the network?
Does 'grep scan /etc/wpa_supplicant.conf' yield any output?
(In reply to Larry Finger from comment #5)
> OK, interrupts are not the problem. Yes, you are correct that the driver
> name is needed in the options line.
> For some reason, I will not be able to duplicate your tests. When I use
> perf sched record
> my system responds with "invalid or unsupported event:
> 'sched:sched_switch'". I'm still trying to solve that problem.
No idea what this means. If it helps, I'm using the stock Fedora kernel which has the following perf config options:
> When the wifi driver is not connected, the only usage of wpa_supplicant
> should be to trigger scans. What method are you using to control the network?
NetworkManager-1.0.6-8.fc23.x86_64 with standard Gnome 3.18.2 UI utils. This is a pretty standard Fedora 23 install.
> Does 'grep scan /etc/wpa_supplicant.conf' yield any output?
Nothing at all.
I should also point out: I've another two F23 machines with the same versions of everything, one using ath9k and the other iwl4965. Neither have any odd latency issues, so I'm not convinced it's anything else in the wireless stack.
My configuration has
All of the PERF options are selected.
Created attachment 195661 [details]
I've attached the config file for the kernel I'm currently using, but I don't know if that'll help -- I'm otherwise completely inexpert in perf and tracking down latencies.
The machine in question has an i5-5200U processor, so 2 cores... I guess if your machine has more cores on which to schedule stuff it's less likely you'd see (or hear!) the delay.
My CPU is an i7-4600M CPU @ 2.90GHz, thus I also have 2 cores plus hyperthreading.
I did install Fedora 23 and tested it. One thing I do not understand is how you keep an interface unconnected. I know how to do it with the KDE applet, but not with Gnome. I went back to openSUSE 13.2 where I finally managed to get a version of perf that works. The problem was the package included with openSUSE could not handle the sched option; however, when I built and installed the version included with the kernel source, then it works.
The longest scheduling delays happen when scanning. It doesn't seem to matter whether the interface is connected or not. I trigger the scans with a 'sudo iwlist <iface> scan' command. The longest delays are usually 40-50 ms and happen in the scheduling of kthreadd or one of the kernel workers. Obviously, I need to investigate what happens when scanning.
(In reply to Larry Finger from comment #10)
> I did install Fedora 23 and tested it. One thing I do not understand is how
> you keep an interface unconnected. I know how to do it with the KDE applet,
Thanks for going to all that effort!
To prevent connection, assuming you've selected and set up a network: Go into WiFi Settings. There's a '*' button next to the network. Click on that and select the 'Identity' page. Uncheck 'Connect automatically' and click 'Apply'.