Bug 204189

Summary: poor cyclictest results with 5.0 series rt for arm64
Product: Alternate Trees Reporter: Paul Thomas (pthomas8589)
Component: rtAssignee: alt-trees_rt
Status: RESOLVED CODE_FIX    
Severity: normal CC: korg-bugzilla.bigeasy
Priority: P1    
Hardware: ARM   
OS: Linux   
Kernel Version: 5.0.3-rt1 Subsystem:
Regression: No Bisected commit-id:
Attachments: trace data for a high latency 5.0.3-rt1 run
trace data for a high latency 5.2.0-rt1

Description Paul Thomas 2019-07-16 21:47:10 UTC
Created attachment 283753 [details]
trace data for a high latency 5.0.3-rt1 run

Hello,

We have been using preempt-rt 4.18 since last fall with very good results,
and we see this carry through to 4.19.25-rt16. However, for the 5.0.3-rt1
release, cyclictest hits multi thousand microsecond latenies this
continues all the way through to 5.0.21-rt15.

For the 4.19 kernels the v11 zynqmp firmware patch is applied:
https://patchwork.kernel.org/cover/10555405/
Otherwise only a small EPROBE_AGAIN change is needed in
xilinx_uartps.c on top of the rt tag. For the 5.0 kernels there are no
c file changes. Both for 4.19 and 5.0 a custom devicetree is used.

We would like to bisect this further, but it becomes more difficult
between 4.19.25-rt16 and 5.0.3-rt1.

Attached is the tracedata output for a high latency run with the 5.0.3-rt1 kernel. Just looking at the timestamp time of cyclicte-2778 I can see the last two finish_task_switch calls are seperated by 5801 uS. Is just before that the correct place to be looking? 

Some other details: CPU frequency is 1.2 GHz, memory is 2GB, rootfs is debian 9.

For 4.19.25-rt16 we get this (histogram removed):
# ./cyclictest -S -m -n -p 99 -i 200 -h 400 -D 900
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 2.58 2.77 1.93 1/122 2595

T: 0 ( 2577) P:99 I:200 C:4499985 Min:      5 Act:    6 Avg:    6 Max:      24
T: 1 ( 2578) P:99 I:200 C:4499893 Min:      6 Act:    6 Avg:    6 Max:      13
T: 2 ( 2579) P:99 I:200 C:4499779 Min:      6 Act:    6 Avg:    6 Max:      33
T: 3 ( 2580) P:99 I:200 C:4499666 Min:      6 Act:    6 Avg:    6 Max:      26

And for 5.0.3-rt1 we get:
# ./cyclictest -S -m -n -p 99 -i 200 -h 400 -D 900
# /dev/cpu_dma_latency set to 0us
policy: fifo: loadavg: 4.37 4.28 2.88 1/129 2773

T: 0 ( 2757) P:99 I:200 C:4499958 Min:      5 Act:    9 Avg:    6 Max:      31
T: 1 ( 2758) P:99 I:200 C:4499500 Min:      5 Act:    7 Avg:    6 Max:    9045
T: 2 ( 2759) P:99 I:200 C:4498926 Min:      5 Act:    6 Avg:    6 Max:     263
T: 3 ( 2760) P:99 I:200 C:4499644 Min:      5 Act:    6 Avg:    6 Max:      34
Comment 1 Paul Thomas 2019-08-14 16:52:16 UTC
Created attachment 284391 [details]
trace data for a high latency 5.2.0-rt1
Comment 2 Paul Thomas 2019-08-15 19:21:23 UTC
Disregard 'trace data for a high latency 5.2.0-rt1' this test was invalid due to incorrectly applied patch.
Comment 3 Sebastian A. Siewior 2019-10-11 13:37:54 UTC
fixed in v5.2.9-rt3