Bug 203659

Summary: periodic xruns while idle, no DSP load; ftrace shows misplaced 'sched_waking' event of irq/x-xhci-hcd
Product: Drivers Reporter: Goli4thus (goli)
Component: Sound(ALSA)Assignee: Jaroslav Kysela (perex)
Status: NEW ---    
Severity: normal CC: goli
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 5.1.4 Subsystem:
Regression: No Bisected commit-id:
Attachments: cpuinfo
iomem
ioports
jack_tracing.patch
kernel_config
lspci_vvv
lsusb_v
modules
scsi
Trace-R23-Analysis.png.tar.gz
xrun_trace__19-05__19-36-02.dat.tar.gz
xrun_trace__19-05__19-47-02.dat.tar.gz

Description Goli4thus 2019-05-20 19:04:15 UTC
Created attachment 282859 [details]
cpuinfo

[1.] One line summary of the problem:
Motu624: periodic xruns (jackaudio); ftrace reveals misplaced 'sched_waking' event re jackdbus by irq/42-xhci-hcd.

[2.] Full description of the problem/report:
I experience periodic, audible (when playing audio) xruns regardless of system or dsp load when using audio interface Motu624.
The emphasis lies on "periodic" as the distance between consecutive xruns is consistent (including a bit of jitter).

The only thing that seems to change the frequency of xrun occurence are ALSA settings which alter the frequency of 
soundcard interrupts (sample rate, buffer size, periods/buffer):


| buffer size | periods/buffer | sample rate | xrun distance     |
|-------------|----------------|-------------|-------------------|
| 128         | 3              | 48k         | 7 to 8:30 min     |
| 256         | 3              | 48k         | 16 to 18:30 min   |
| 32          | 3              | 48k         | 2:10 to 2:20 min  |
| 32          | 3              | 96k         | approx. every 50s |


I can reproduce the described behaviour consistently with:
- various kernel versions:
    - 4.19.31-rt18-MANJARO
    - 4.19.37-rt19-MANJARO
    - 5.0.14-rt9-MANJARO
    - 5.1.1-2-MANJARO
- different ALSA settings (sample rate, buffer size, periods/buffer)
- various jack versions: 
    - Jack2-1.9.12-9
    - jack1: git (jack-0.125.0-7)
    - jack2: git (488081d1)
- idle system state; either with graphics or at runlevel 3 (tmux session; starting jackd and tracing only)


I did put some system optimizations in place while testing:
- cpupower frequency-set -g performance
- rtprio of irq/42-xhci_hcd set to 95
- disabled pulseaudio
- disconnected all usb devices apart from Motu624, mouse and keyboard
- 


I tried multiple configurations and generated traces with ftrace for each testrun.

How I generated traces:
-----------------------
- applied the following patch to jack2: git (488081d1): attachment 'jack_tracing.patch':
    - writes 'trace_marker' on xrun occurence
    - stops tracing via 'tracing_on = 0'
- ran a custom script in parallel which:
    - started tracing 
    - waited for transition of 'tracing_on' from 1 to 0
        - extracted trace buffer 
        - restarted tracing (tracing_on = 1)

- trace commands used:
    - Realtime kernel: 'trace-cmd start -p nop -e irq -e preemptirq -e sched -e xhci-hcd'
    - Non realtime kernel: 'trace-cmd start -p nop -e irq -e sched -e xhci-hcd'
    - More detailed tracing: 'trace-cmd start -p function --max-graph-depth 2 -e sched -e xhci-hcd -P $(pgrep irq/42-xhci_hcd)'

What I've found out:
--------------------
As stated before, the xruns occur periodically with nearly equal distance (jitter).
The frequency of xruns seems to be related to the frequency of sound card interrupts (implicitely set by ALSA settings).

Analysing the traces with kernelshark I noticed a delayed 'sched_waking' event regarding jackdbus by irq/42-xhci-hcd,
which looks like the reason for jackdbus reporting a xruns.
This pattern could be observed in a multitude of different testruns.


Please see the following attachments for easier explanation:
- xrun_trace__19-05__19-36-02.dat: 'Non realtime kernel' trace (see above)
- xrun_trace__19-05__19-47-02.dat: 'More detailed tracing' (see above)
- 'Trace-R23-Analysis.png': Graphical analysis of traces (showing delayed 'sched_waking' event)

These traces have been made with the following "system configuration":
- ALSA buffer size: 64  
- ALSA periods/buffer: 3 
- ALSA sample rate: 96k 
- kernel: 5.1.1-2-MANJARO 
- jack: jack2: git (488081d1) + tracing patch (see above)




[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Linux version 5.1.1-2-MANJARO (builduser@lancaster) (gcc version 8.3.0 (GCC)) #1 SMP PREEMPT Sat May 11 12:12:50 UTC 2019

[4.2.] Kernel .config file:
See attachment 'kernel_config'.

[5.] Most recent kernel version which did not have the bug:
None.

[8.] Environment
[8.1.] Software (add the output of the ver_linux script here)
See attachment 'ver_linux'.

[8.2.] Processor information (from /proc/cpuinfo):
See attachment 'cpuinfo'.

[8.3.] Module information (from /proc/modules):
See attachment 'modules'.

[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
See attachment 'ioports' and 'iomem'.

[8.5.] PCI information ('lspci -vvv' as root)
See attachment 'lspci_vvv'.

[8.6.] SCSI information (from /proc/scsi/scsi)
See attachment 'scsi'.

[8.7.] Other information that might be relevant to the problem
       (please look in /proc and include all information that you
       think to be relevant):
See attachment 'lsusb_v'.
       
[X.] Other notes, patches, fixes, workarounds:
Comment 1 Goli4thus 2019-05-20 19:05:03 UTC
Created attachment 282861 [details]
iomem
Comment 2 Goli4thus 2019-05-20 19:05:22 UTC
Created attachment 282863 [details]
ioports
Comment 3 Goli4thus 2019-05-20 19:05:36 UTC
Created attachment 282865 [details]
jack_tracing.patch
Comment 4 Goli4thus 2019-05-20 19:05:59 UTC
Created attachment 282867 [details]
kernel_config
Comment 5 Goli4thus 2019-05-20 19:06:14 UTC
Created attachment 282869 [details]
lspci_vvv
Comment 6 Goli4thus 2019-05-20 19:06:26 UTC
Created attachment 282871 [details]
lsusb_v
Comment 7 Goli4thus 2019-05-20 19:06:38 UTC
Created attachment 282873 [details]
modules
Comment 8 Goli4thus 2019-05-20 19:06:50 UTC
Created attachment 282875 [details]
scsi
Comment 9 Goli4thus 2019-05-20 19:07:20 UTC
Created attachment 282877 [details]
Trace-R23-Analysis.png.tar.gz
Comment 10 Goli4thus 2019-05-20 19:08:56 UTC
Created attachment 282879 [details]
xrun_trace__19-05__19-36-02.dat.tar.gz
Comment 11 Goli4thus 2019-05-20 19:09:14 UTC
Created attachment 282881 [details]
xrun_trace__19-05__19-47-02.dat.tar.gz
Comment 12 Goli4thus 2019-05-25 15:15:31 UTC
I compiled kernel 5.1.4 from source today.
It gave me the same phenomenon: "snd_pcm_period_elapsed" isn't called when it presumably should be (see previously attached image for discrepancy) and therefore no "sched_waking" event regarding jackdbus occurs it seems.

If other traces with trace-cmd are needed, just tell me and I'll try to get them.