Bug 120461 - Drop on SFC interface around 30 %
Summary: Drop on SFC interface around 30 %
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-06-16 13:14 UTC by Otto Sabart
Modified: 2016-07-12 21:46 UTC (History)
4 users (show)

See Also:
Kernel Version: v4.6-10530-g28165ec
Subsystem:
Regression: No
Bisected commit-id:


Attachments
ipv4 (107.63 KB, image/png)
2016-06-16 13:14 UTC, Otto Sabart
Details
ipv4 over vlan (106.63 KB, image/png)
2016-06-16 13:15 UTC, Otto Sabart
Details
ipv6 (105.16 KB, image/png)
2016-06-16 13:16 UTC, Otto Sabart
Details
ipv6 over vlan (106.21 KB, image/png)
2016-06-16 13:16 UTC, Otto Sabart
Details
All the results pack (v4.6 vs. 4.7-rc{0,1,3}) (3.22 MB, application/x-gzip)
2016-06-17 10:05 UTC, Otto Sabart
Details
Set interrupt affinities (2.16 KB, patch)
2016-07-06 14:06 UTC, Bert Kenward
Details | Diff

Description Otto Sabart 2016-06-16 13:14:36 UTC
Created attachment 220251 [details]
ipv4

We see a performance drop (about ~30%) on sfc driver (SFC9020) when performing
netperf TCP maerts test. It seems it started from 4.7-rc0.

All the results you can find in attachments.

Which commit/change could cause this regression? Any hints?
Comment 1 Otto Sabart 2016-06-16 13:15:40 UTC
Created attachment 220261 [details]
ipv4 over vlan
Comment 2 Otto Sabart 2016-06-16 13:16:15 UTC
Created attachment 220271 [details]
ipv6
Comment 3 Otto Sabart 2016-06-16 13:16:47 UTC
Created attachment 220281 [details]
ipv6 over vlan
Comment 4 Bert Kenward 2016-06-16 15:18:38 UTC
From the labels on your plots I'm guessing this was v4.7-rc3, not -rc0?

Do you have a v4.7-rc1 kernel available? There were no driver changes between v4.6 and v4.7-rc1, and only three driver changes after that up to v4.7-rc3.
Comment 5 Robert Stonehouse 2016-06-16 16:13:52 UTC
(In reply to Otto Sabart from comment #0)
> We see a performance drop (about ~30%) on sfc driver (SFC9020) when
> performing
> netperf TCP maerts test. It seems it started from 4.7-rc0.

Thanks for the report (and for running the tests).

Was a Solarflare network adapter used on both sides?
I ask as the report is for netperf TCP maerts. Was a netperf TCP stream test also run? and did it show any performance regression? This helps determine if it is a TX or RX regression.
Comment 6 Otto Sabart 2016-06-17 10:04:26 UTC
Hi Bert,

(In reply to Bert Kenward from comment #4)
> From the labels on your plots I'm guessing this was v4.7-rc3, not -rc0?

unfortunately the regression shows on all RCs (starting from rc0).

> Do you have a v4.7-rc1 kernel available? There were no driver changes
> between v4.6 and v4.7-rc1, and only three driver changes after that up to
> v4.7-rc3.

In 'results-4.7-rc0-rc1-rc3.tar.gz' archive you can find all the comparisons
between v4.6 and v4.7-rc{0,1,3} (unfortunately we do not have results for rc2).
Comment 7 Otto Sabart 2016-06-17 10:05:33 UTC
Created attachment 220431 [details]
All the results pack (v4.6 vs. 4.7-rc{0,1,3})
Comment 8 Otto Sabart 2016-06-17 10:17:51 UTC
Hi Robert,

(In reply to Robert Stonehouse from comment #5)
> Was a Solarflare network adapter used on both sides?

yes, we have exactly the same configuration (cards, machines, ..) on both sides
(if you want more info about the machines, just let me know).

> I ask as the report is for netperf TCP maerts. Was a netperf TCP stream test
> also run? and did it show any performance regression? This helps determine
> if it is a TX or RX regression.

yes, TCP stream tests are also ran. I am able to see regression only on maerts tests.

All the results you can find in the attachemnt (in comment 7).


Our team exposed another regression on cpu scheduler BZ120481 [0]. Could it be
related?

[0] https://bugzilla.kernel.org/show_bug.cgi?id=120481


Strange thing here is that it affects only solarflare card.
Comment 9 Bert Kenward 2016-06-17 11:47:12 UTC
I don't think there's a v4.7-rc0 tag on the mainline kernel. Do you have git commit IDs for the various things you've tested? As mentioned before, there were no driver changes between v4.6 and v4.7-rc1, so it's apparently something elsewhere in the kernel. It could well relate to the scheduler issue you've spotted.
Comment 10 Otto Sabart 2016-06-20 10:20:08 UTC
(In reply to Bert Kenward from comment #9)
> I don't think there's a v4.7-rc0 tag on the mainline kernel. Do you have git
> commit IDs for the various things you've tested? As mentioned before, there
> were no driver changes between v4.6 and v4.7-rc1, so it's apparently
> something elsewhere in the kernel. It could well relate to the scheduler
> issue you've spotted.

Hi Bert,
yes, you are right, there is no v4.7-rc0 tag on the mainline kernel. This tag
was created because we build fedora upstream kernel more often then once per
rc. Sorry for confusion.

Here are the commit IDs:

Fedora upstream tag             Related mainline kernel tag
===========================================================
4.7.0-0.rc0.git8.2     ----     v4.6-10530-g28165ec
4.7.0-0.rc1.git1.2     ----     v4.7-rc1-12-g852f42a
4.7.0-0.rc3.git0.1     ----     v4.7-rc3
Comment 11 Bert Kenward 2016-06-20 12:05:12 UTC
Thanks for the commit IDs Otto. Do you have a commit ID for the latest "good" revision? I hope to try and recreate and bisect this.
Comment 12 Otto Sabart 2016-06-21 11:51:02 UTC
(In reply to Bert Kenward from comment #11)
> Do you have a commit ID for the latest "good" revision?
Latest good revision is 'v4.6'.

Unfortunately I have found small bug in our test suite.

The results I have attached so far are _not_ TCP_MAERTS tests but TCP_STREAM
tests with various -M size [0].

The reproducer:
$ ip a show dev sfc0
3: sfc0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:0f:53:08:29:10 brd ff:ff:ff:ff:ff:ff
    inet 172.20.20.10/24 brd 172.20.20.255 scope global sfc0
       valid_lft forever preferred_lft forever
    inet6 fd60::10/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::20f:53ff:fe08:2910/64 scope link
       valid_lft forever preferred_lft forever

$ netperf -cC -t TCP_STREAM  -l 30  -L 172.20.20.10  -H 172.20.20.20 -T ,0  -T 0, -- -M $SIZE

I have re-tested everything manually to make sure there is a regression:

TCP_STREAM test with various -M sizes:
======================================
v4.6:
+--------+------+------+------+
|   -M   | 512  | 1024 | 8192 |
+--------+------+------+------+
| 1. run | 4322 | 6499 | 9329 |
| 2. run | 4463 | 6516 | 9329 |
| 3. run | 4385 | 6471 | 9326 |
+--------+------+------+------+

v4.6-10530-g28165ec:
+--------+------+------+------+
|   -M   | 512  | 1024 | 8192 |
+--------+------+------+------+
| 1. run | 3649 | 4667 | 6315 |
| 2. run | 3684 | 4801 | 6313 |
| 3. run | 3693 | 4790 | 6283 |
+--------+------+------+------+


The regression is possible to see on TCP_MAERTS test too.

The reproducer:
$ netperf -cC -t TCP_MAERTS -l 30 -L 172.20.20.10 -H 172.20.20.20 -T ,0 -T 0, -- -M $SIZE

TCP_MAERTS test with various -M sizes:
======================================
v4.6:
+--------+------+------+------+
|   -M   | 512  | 1024 | 8192 |
+--------+------+------+------+
| 1. run | 6975 | 6967 | 6966 |
| 2. run | 6958 | 6978 | 6974 |
| 3. run | 6942 | 6933 | 6958 |
+--------+------+------+------+

v4.6-10530-g28165ec:
+--------+------+------+------+
|   -M   | 512  | 1024 | 8192 |
+--------+------+------+------+
| 1. run | 6756 | 6753 | 6762 |
| 2. run | 6757 | 6741 | 6769 |
| 3. run | 6760 | 6766 | 6759 |
+--------+------+------+------+


HW information:
===============
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E5-2403 v2 @ 1.80GHz
Stepping:              4
CPU MHz:               1591.040
BogoMIPS:              3599.14
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              10240K
NUMA node0 CPU(s):     0-3


ethtool info:
=============
$ ethtool -i sfc0
driver: sfc
version: 4.0
firmware-version: 3.3.0.6298
bus-info: 0000:1b:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

[0] http://www.netperf.org/svn/netperf2/tags/netperf-2.6.0/doc/netperf.html
Comment 13 Otto Sabart 2016-07-01 14:11:29 UTC
Still valid for kernel v4.7-rc5-227-ge7bdea7.
Comment 14 The Linux kernel's regression tracker (Thorsten Leemhuis) 2016-07-02 11:38:42 UTC
Bert, this issue is listed in my regression reports for 4.7 and I wonder what
the status is. It seems nothing much happened for more than a week now, which
is a bad sign as 4.7 final seems only a week or two away.
Comment 15 Bert Kenward 2016-07-04 07:55:39 UTC
I'm hoping to bisect it in the next day or so.
Comment 16 Bert Kenward 2016-07-04 16:10:09 UTC
I've attempted to bisect this today. Initially I thought I'd successfully reproduced the regression, but what I'm actually seeing is occasional (approx 1 run in 5) lower performance, regardless of the kernel. I'm seeing performance very close to line rate normally.

Two differences:
 - I'm using a pair of machines with older but higher clocked CPUs - E3-1230 @ 3.2 GHz. Netperf is reporting quite low CPU utilisation though. I'll see if I can find a pair of machines that are more similar.
 - The NICs I'm using have newer firmware - I'll try with the older firmware shortly.
Comment 17 Bert Kenward 2016-07-04 16:37:33 UTC
A further update - with the same firmware version (3.3.0.6298) I still see performance very close to line rate with tag v4.7-rc1.
Comment 18 Otto Sabart 2016-07-06 08:45:19 UTC
(In reply to Bert Kenward from comment #16)
> I've attempted to bisect this today. Initially I thought I'd successfully
> reproduced the regression, but what I'm actually seeing is occasional
> (approx 1 run in 5) lower performance, regardless of the kernel. I'm seeing
> performance very close to line rate normally.

I provisioned our machines to test it little bit more. For me is occasional
a higher performance and more often I can see lower performance:

Wed Jul  6 04:00:16 CEST 2016
1: measured throughput: 6270.64
2: measured throughput: 6274.19
3: measured throughput: 6244.32
4: measured throughput: 6252.82
5: measured throughput: 6256.34
6: measured throughput: 6244.57
7: measured throughput: 6231.55

Wed Jul  6 04:08:24 CEST 2016
1: measured throughput: 9327.26
2: measured throughput: 9326.68
3: measured throughput: 6255.88
4: measured throughput: 9326.81
5: measured throughput: 6248.88
6: measured throughput: 6251.87
7: measured throughput: 9326.16

Wed Jul  6 04:11:59 CEST 2016
1: measured throughput: 9325.41
2: measured throughput: 9327.90
3: measured throughput: 6240.21
4: measured throughput: 6245.22
5: measured throughput: 6241.08
6: measured throughput: 6254.99
7: measured throughput: 6251.38

Wed Jul  6 04:17:15 CEST 2016
1: measured throughput: 9322.18
2: measured throughput: 6239.87
3: measured throughput: 9324.77
4: measured throughput: 6250.38
5: measured throughput: 9327.27
6: measured throughput: 9328.37
7: measured throughput: 9329.61


> Two differences:
>  - I'm using a pair of machines with older but higher clocked CPUs - E3-1230
> @ 3.2 GHz. Netperf is reporting quite low CPU utilisation though. I'll see
> if I can find a pair of machines that are more similar.
>  - The NICs I'm using have newer firmware - I'll try with the older firmware
> shortly.

I think the problem is CPU affinity. I do not know the exact kernel
implementation but it seems that kernel by default pins all the sfc interrupts
mostly to core 0.

$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
......
 46:        166          0          0          2  IR-PCI-MSI 14155776-edge      sfc0-0
 47:    1060610          0          0          0  IR-PCI-MSI 14155777-edge      sfc0-1
 49:    6576243          0          0          0  IR-PCI-MSI 14155778-edge      sfc0-2
 50:     772157          0          0          0  IR-PCI-MSI 14155779-edge      sfc0-3
......

But we bind netperf and netserver to core 0 at the same time (-T options):
$ netperf -P0 -cC -t TCP_STREAM -l 30 -L 172.20.20.10 -H 172.20.20.20 -T ,0 -T 0, -- -M 8192


- If I bind netperf+netserver to diffrent core, I cannot reproduce lower
  performance anymore:
$ netperf -P0 -cC -t TCP_STREAM -l 30 -L 172.20.20.10 -H 172.20.20.20 -T ,3 -T 3, -- -M 8192

- If I change affinity to handle sfc's interrupts on a diffrent core and I keep
  netperf+netserver running on core 0, I cannot reproduce lower performance
  anymore.

$ echo 8 > /proc/irq/46/smp_affinity
$ echo 8 > /proc/irq/47/smp_affinity
$ echo 8 > /proc/irq/49/smp_affinity
$ echo 8 > /proc/irq/50/smp_affinity
$ netperf -P0 -cC -t TCP_STREAM -l 30 -L 172.20.20.10 -H 172.20.20.20 -T ,0 -T 0, -- -M 8192

$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 46:        166          0          0          0  IR-PCI-MSI 14155776-edge      sfc0-0
 47:    1060610          0          0          0  IR-PCI-MSI 14155777-edge      sfc0-1
 49:    6576243          0          0          0  IR-PCI-MSI 14155778-edge      sfc0-2
 50:     772157          0          0     534538  IR-PCI-MSI 14155779-edge      sfc0-3

What do you think?

Thank you!
Comment 19 Bert Kenward 2016-07-06 14:06:22 UTC
Created attachment 222161 [details]
Set interrupt affinities

Thanks Otto. We sent a patch to net-next back in May that included affinity hints as part of a wider change, but I don't believe it ever got merged. The attached patch only includes the affinity hints - can you try it on your system?
Comment 20 Bert Kenward 2016-07-06 15:31:36 UTC
I see the same bimodal performance with older kernels (4.6 or earlier), so I don't think this is a regression with 4.7-rc1. Otto, do you agree?
Comment 21 Otto Sabart 2016-07-07 12:38:49 UTC
(In reply to Bert Kenward from comment #19)
> Created attachment 222161 [details]
> Set interrupt affinities
> 
> Thanks Otto. We sent a patch to net-next back in May that included affinity
> hints as part of a wider change, but I don't believe it ever got merged. The
> attached patch only includes the affinity hints - can you try it on your
> system?

With this patch applied I can reproduce lower rate only occasionally (1 run in ~10).

The interrupts are spread between all cores:
           CPU0       CPU1       CPU2       CPU3
 37:   36130760          0          0          0  IR-PCI-MSI 14155776-edge      sfc0-0
 38:          0    8439395          0          0  IR-PCI-MSI 14155777-edge      sfc0-1
 39:          0          0    8611333          0  IR-PCI-MSI 14155778-edge      sfc0-2
 40:          0          0          0   12262519  IR-PCI-MSI 14155779-edge      sfc0-3


(In reply to Bert Kenward from comment #20)
> I see the same bimodal performance with older kernels (4.6 or earlier), so I
> don't think this is a regression with 4.7-rc1. Otto, do you agree?

Yes, I agree. I think we can close this bug. Thank you for collaboration!

Note You need to log in before you can comment on or make changes to this bug.