Bug 201181

Summary: r8169: Enable MSI-X interrupt on RTL8106e
Product: Drivers Reporter: jian-hong
Component: NetworkAssignee: drivers_network (drivers_network)
Status: RESOLVED CODE_FIX    
Severity: normal CC: gogen, hkallweit1, kai.heng.feng, steved424
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 4.19 Subsystem:
Regression: No Bisected commit-id:
Attachments: dmesg log
RTL8106e ethernet adapter status before suspend
RTL8106e ethernet adapter status after resume
r8169-Enable-MSI-X-on-RTL8106e.patch

Description jian-hong 2018-09-19 09:06:59 UTC
Originally, we have an issue where r8169 MSI-X interrupts were broken
after S3 suspend/resume on RTL8106e of ASUS X441UAR.

We had done some diagnostic and reported result. https://lists.openwall.net/netdev/2018/08/27/33

Here is the status of the Ethernet adapter before suspend:

dev@endless:~$ sudo lspci -xxxs 02:00.0
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)
00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 b0 02 02 c0 8d 90 05 10 20 10 00 11 7c 47 00
80: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 1f 08 0c 00 00 04 00 00 02 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev@endless:~$ sudo ~/pcimem/pcimem /sys/devices/pci0000\:00/0000\:00\:1c.4/0000\:02\:00.0/resource4 0 b*16384
[sudo] password for dev: 
/sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened.
Target offset is 0x0, page size is 4096
mmap(0, 16384, 0x3, 0x1, 3, 0x0)
PCI Memory mapped to address 0x7f15186d1000.
0x0000: 0x38
0x0001: 0x03
0x0002: 0xE0
0x0003: 0xFE
0x0004: 0x00
...
0x0010: 0x41
0x0011: 0x72
.
.
.

The status of the Ethernet adapter after resume:

dev@endless:~$ sudo lspci -xxxs 02:00.0
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)
00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 b0 02 02 c0 8d 90 05 10 20 10 00 11 7c 47 00
80: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 1f 08 0c 00 00 04 00 00 02 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev@endless:~$ sudo ~/pcimem/pcimem /sys/devices/pci0000\:00/0000\:00\:1c.4/0000\:02\:00.0/resource4 0 b*16384
/sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened.
Target offset is 0x0, page size is 4096
mmap(0, 16384, 0x3, 0x1, 3, 0x0)
PCI Memory mapped to address 0x7f8d68dd5000.
0x0000: 0xFF
...

The config is the same, but values in BAR=4 are weird after resume. They all become 0xFF.

We can only fall back to MSI interrupt with the commit 7bb05b85bc2d1a1b647b91424b "r8169: don't use MSI-X on RTL8106e" to fix the issue at that time.
Comment 1 jian-hong 2018-09-19 09:12:28 UTC
However, there is a commit which resolves the drivers getting nothing in PCI BAR after system resumes.
The commit is 04cb3ae895d7efdc60f0fe17182b200a3da20f09 "PCI: Reprogram bridge prefetch registers on resume" by Daniel Drake.
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/enumeration&id=04cb3ae895d7efdc60f0fe17182b200a3da20f09
Comment 2 jian-hong 2018-09-19 09:19:05 UTC
Created attachment 278657 [details]
dmesg log

I checked with applying the commit "PCI: Reprogram bridge prefetch registers on resume" and reverting the commit "r8169: don't use MSI-X on RTL8106e" based on Linux kernel 4.19-rc4.

The Ethernet controller uses MSI-X interrupt now.

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136] (rev 07)
	Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet controller [1043:200f]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at e000 [size=256]
	Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
	Memory at e0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Kernel driver in use: r8169
	Kernel modules: r8169

The ethernet adapter works fine before suspend and after resume.

The following attachments are the status of the ethernet adapter
Comment 3 jian-hong 2018-09-19 09:20:17 UTC
Created attachment 278659 [details]
RTL8106e ethernet adapter status before suspend
Comment 4 jian-hong 2018-09-19 09:21:12 UTC
Created attachment 278661 [details]
RTL8106e ethernet adapter status after resume
Comment 5 jian-hong 2018-09-19 09:24:04 UTC
The difference between before suspend (attachment 278659 [details]) and after resume (attachment 278661 [details]):

dev@endless:~$ diff -Naru before\ suspend.txt after\ resume.txt 
--- "before suspend.txt"	2018-09-19 16:24:01.726978637 +0800
+++ "after resume.txt"	2018-09-19 16:24:58.655313698 +0800
@@ -21,14 +21,14 @@
 /sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened.
 Target offset is 0x0, page size is 4096
 mmap(0, 16384, 0x3, 0x1, 3, 0x0)
-PCI Memory mapped to address 0x7fc4f7c1b000.
+PCI Memory mapped to address 0x7f5ec6f78000.
 0x0000: 0x04
 0x0001: 0x40
 0x0002: 0xE0
 0x0003: 0xFE
 0x0004: 0x00
 ...
-0x0008: 0x24
+0x0008: 0x21
 0x0009: 0x40
 0x000A: 0x00
 ...
@@ -83,7 +83,7 @@
 0x1003: 0xFE
 0x1004: 0x00
 ...
-0x1008: 0x24
+0x1008: 0x21
 0x1009: 0x40
 0x100A: 0x00
 ...
@@ -138,7 +138,7 @@
 0x2003: 0xFE
 0x2004: 0x00
 ...
-0x2008: 0x24
+0x2008: 0x21
 0x2009: 0x40
 0x200A: 0x00
 ...
@@ -193,7 +193,7 @@
 0x3003: 0xFE
 0x3004: 0x00
 ...
-0x3008: 0x24
+0x3008: 0x21
 0x3009: 0x40
 0x300A: 0x00
 ...

System gets some from the PCI BAR4 of ethernet adapter after system resumes.

So, the the ethernet adapter works correctly after system resumes.
Comment 6 jian-hong 2018-09-19 09:25:32 UTC
I am going to make a patch to revert the commit the commit "r8169: don't use MSI-X on RTL8106e" after the commit "PCI: Reprogram bridge prefetch registers on resume".
Comment 7 Heiner Kallweit 2018-09-19 10:54:30 UTC
Thanks for adding me, I saw this PCI bridge fix too and that its commit message mentions it fixes also the r8169 MSI-X issue for few users.
So I think we can safely assume that we can effectively revert both MSI-X-related workarounds (also "r8169: don't use MSI-X on RTL8168g").
Comment 8 jian-hong 2018-09-20 03:16:52 UTC
Created attachment 278675 [details]
r8169-Enable-MSI-X-on-RTL8106e.patch

This patch need the commit 04cb3ae895d7efdc60f0fe17182b200a3da20f09 "PCI: Reprogram bridge prefetch registers on resume" by Daniel Drake.

https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=pci/enumeration&id=04cb3ae895d7efdc60f0fe17182b200a3da20f09
Comment 9 jian-hong 2018-09-20 03:27:17 UTC
(In reply to Heiner Kallweit from comment #7)
> Thanks for adding me, I saw this PCI bridge fix too and that its commit
> message mentions it fixes also the r8169 MSI-X issue for few users.
> So I think we can safely assume that we can effectively revert both
> MSI-X-related workarounds (also "r8169: don't use MSI-X on RTL8168g").

Maybe we can get the test result from Steve and Lou for RTL8168g.
Comment 10 jian-hong 2018-09-20 03:34:17 UTC
Hi Steve and Lou,

Could you help test as Comment 8?

You can apply the commit 04cb3ae895d7efdc60f0fe17182b200a3da20f09 "PCI: Reprogram bridge prefetch registers on resume" by Daniel Drake first.

Then, apply the patch attachment 278675 [details].

Finally, revert the commit 7c53a722459c1d6ffb0f5b2058c06ca8980b8600 "r8169: don't use MSI-X on RTL8168g".

If the ethernet adapter works fine before suspend and after resume, then the commit "PCI: Reprogram bridge prefetch registers on resume" also fixes the RTL8168g issue.
Comment 11 Steve Dodd 2018-09-20 09:43:36 UTC
I'm a bit groggy from medication this morning so not 100% sure I have patched and built correctly, but it *seems* Daniel Drake's patch does the trick :)
Comment 12 jian-hong 2018-09-21 09:16:22 UTC
(In reply to Steve Dodd from comment #11)
> I'm a bit groggy from medication this morning so not 100% sure I have
> patched and built correctly, but it *seems* Daniel Drake's patch does the
> trick :)

Uhm, great!!! It seems we can remove the falling back to MSI interrupt codes for RTL8168g and RTL8106e after the commit "PCI: Reprogram bridge prefetch registers on resume".
Comment 13 Steve Dodd 2018-09-21 09:45:17 UTC
As I said, not 100% sure - health problems flaring at the moment and
struggling to concentrate. Maybe Jan could confirm or someone on the
Ubuntu side could roll me a.deb just to be sure?

On 21 September 2018 at 10:16,  <bugzilla-daemon@bugzilla.kernel.org> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=201181
>
> --- Comment #12 from jian-hong@endlessm.com ---
> (In reply to Steve Dodd from comment #11)
>> I'm a bit groggy from medication this morning so not 100% sure I have
>> patched and built correctly, but it *seems* Daniel Drake's patch does the
>> trick :)
>
> Uhm, great!!! It seems we can remove the falling back to MSI interrupt codes
> for RTL8168g and RTL8106e after the commit "PCI: Reprogram bridge prefetch
> registers on resume".
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
Comment 14 Heiner Kallweit 2018-09-26 11:47:08 UTC
(In reply to jian-hong from comment #12)
> (In reply to Steve Dodd from comment #11)
> > I'm a bit groggy from medication this morning so not 100% sure I have
> > patched and built correctly, but it *seems* Daniel Drake's patch does the
> > trick :)
> 
> Uhm, great!!! It seems we can remove the falling back to MSI interrupt codes
> for RTL8168g and RTL8106e after the commit "PCI: Reprogram bridge prefetch
> registers on resume".

The PCI bridge fix is included in latest linux-next. So the MSI-X workarounds in r8169 can be removed. Are you going to submit the change?
Comment 15 jian-hong 2018-09-26 13:10:43 UTC
(In reply to Heiner Kallweit from comment #14)

> The PCI bridge fix is included in latest linux-next. So the MSI-X
> workarounds in r8169 can be removed. Are you going to submit the change?

Sure!  Going to submit attachment 278675 [details] tomorrow.

But I still not confirm RTL8168g part completely, due to comment 13.  However, still thanks for Steve's test.  Any idea for RTL8168g?
Comment 16 Heiner Kallweit 2018-10-19 22:37:23 UTC
RTL8168g is now back to MSI-X too (applied to net stable tree). So I think this bug can be closed.
Comment 17 Lou Reed 2018-12-05 07:35:08 UTC
(In reply to jian-hong from comment #15)
(In reply to Heiner Kallweit from comment #16)
Works good for me either, thanks.