Bug 204343 - r8169: r8169 oops since kernel 5.3-rc1
Summary: r8169: r8169 oops since kernel 5.3-rc1
Status: NEW
Alias: None
Product: Networking
Classification: Unclassified
Component: Wireless (show other bugs)
Hardware: Intel Linux
: P1 high
Assignee: networking_wireless@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-27 14:25 UTC by Masanari Iida
Modified: 2020-11-29 14:00 UTC (History)
13 users (show)

See Also:
Kernel Version: 5.3-rc1
Tree: Mainline
Regression: No


Attachments
in reply to #3 dmesg lines, net kconfig and net lsmod (8.82 KB, text/plain)
2019-10-09 10:36 UTC, enometh
Details

Description Masanari Iida 2019-07-27 14:25:31 UTC
Summary
Starting from kernel 5.3-rc1, the r8169 driver oops during os boot.
I can not use the network.

r8169 driver in kernel 5.2 works fine without this oops.
This symptom reproduced on my notebook every time I reboot the system.

lspci output
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c)
	Subsystem: Hewlett-Packard Company Device [103c:1946]
	Flags: bus master, fast devsel, latency 0, IRQ 18
	I/O ports at 3000 [size=256]
	Memory at d0600000 (64-bit, non-prefetchable) [size=4K]
	Memory at d0400000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-68-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Kernel driver in use: r8169
	Kernel modules: r8169

dmesg
[   14.135621] Generic PHY r8169-300:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
[   14.143239] BUG: kernel NULL pointer dereference, address: 0000000000000000
[   14.144722] #PF: supervisor instruction fetch in kernel mode
[   14.146146] #PF: error_code(0x0010) - not-present page
[   14.147540] PGD 0 P4D 0
[   14.148523] Oops: 0010 [#1] SMP PTI
[   14.149325] CPU: 0 PID: 1090 Comm: NetworkManager Not tainted 5.3.0-rc1+ #1
[   14.150147] Hardware name: Hewlett-Packard HP ProBook 430 G1/1946, BIOS L73 Ver. 01.11 04/29/2014
[   14.150999] RIP: 0010:0x0
[   14.151843] Code: Bad RIP value.
[   14.152696] RSP: 0018:ffffb2ed805e3558 EFLAGS: 00010246
[   14.153558] RAX: ffffffff905959a0 RBX: ffff9d83dffb9800 RCX: 0000000000000000
[   14.154441] RDX: ffff9d83f1590000 RSI: 0000000000000000 RDI: ffff9d83dffb9800
[   14.155316] RBP: ffff9d83ef802000 R08: 0000000000027c40 R09: 0000000000027c00
[   14.156205] R10: ffffb2ed806c3988 R11: 0000000000000000 R12: 0000000000000a46
[   14.157101] R13: 0000000000000800 R14: ffff9d83f129a000 R15: ffff9d83ef8028c0
[   14.158008] FS:  00007ffff7f0d980(0000) GS:ffff9d83f3800000(0000) knlGS:0000000000000000
[   14.158934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.159859] CR2: ffffffffffffffd6 CR3: 000000032755a001 CR4: 00000000001606f0
[   14.160798] Call Trace:
[   14.161735]  phy_select_page+0x2f/0x60
[   14.162683]  phy_read_paged+0x14/0x50
[   14.163636]  rtl8168g_1_hw_phy_config+0x2d/0x250 [r8169]
[   14.164585]  rtl8169_init_phy+0x29/0xb0 [r8169]
[   14.165531]  rtl_open+0x3e4/0x620 [r8169]
[   14.166471]  __dev_open+0xb3/0x140
[   14.167400]  __dev_change_flags+0x191/0x1d0
[   14.168322]  dev_change_flags+0x23/0x60
[   14.169251]  do_setlink+0x324/0xcd0
[   14.170177]  ? kmem_cache_alloc+0x35/0x220
[   14.171099]  ? __nla_validate_parse+0x45/0x8c0
[   14.172014]  __rtnl_newlink+0x508/0x7e0
[   14.172920]  ? __nla_reserve+0x38/0x50
[   14.173832]  ? __kmalloc_node_track_caller+0x58/0x300
[   14.174752]  ? __kmalloc_reserve.isra.53+0x2e/0x80
[   14.175673]  ? sk_filter_trim_cap+0x35/0x2e0
[   14.176594]  ? skb_queue_tail+0x1b/0x50
[   14.177509]  ? __netlink_sendskb+0x48/0x60
[   14.178423]  ? netlink_unicast+0x1eb/0x220
[   14.179320]  rtnl_newlink+0x47/0x70
[   14.180211]  ? ns_capable_common+0x27/0x50
[   14.181104]  rtnetlink_rcv_msg+0x25e/0x340
[   14.181991]  ? _cond_resched+0x16/0x40
[   14.182870]  ? __kmalloc_node_track_caller+0x58/0x300
[   14.183755]  ? rtnl_calcit.isra.27+0x100/0x100
[   14.184635]  netlink_rcv_skb+0xbf/0xe0
[   14.185512]  netlink_unicast+0x174/0x220
[   14.186366]  netlink_sendmsg+0x2b7/0x3b0
[   14.187197]  sock_sendmsg+0x30/0x40
[   14.188001]  ___sys_sendmsg+0x2c7/0x2d0
[   14.188795]  ? proc_get_long.constprop.14+0x12f/0x1c0
[   14.189590]  ? _copy_from_user+0x31/0x60
[   14.190371]  ? do_proc_douintvec_minmax_conv+0x50/0x50
[   14.191166]  ? __sys_sendmsg+0x53/0x90
[   14.191946]  __sys_sendmsg+0x53/0x90
[   14.192752]  do_syscall_64+0x4f/0x190
[   14.193535]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   14.194336] RIP: 0033:0x7ffff69b42ed
[   14.195140] Code: b0 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[   14.196871] RSP: 002b:00007fffffffe540 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[   14.197778] RAX: ffffffffffffffda RBX: 0000555555aa0a40 RCX: 00007ffff69b42ed
[   14.198737] RDX: 0000000000000000 RSI: 00007fffffffe5d0 RDI: 000000000000000c
[   14.199636] RBP: 0000555555aa0950 R08: 0000000000000020 R09: 0000555555b35a40
[   14.200528] R10: 0000555555b35a40 R11: 0000000000000293 R12: 0000555555b35990
[   14.201415] R13: 00007fffffffe5d0 R14: 00007fffffffe744 R15: 0000555555b35990
[   14.202285] Modules linked in: nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bpfilter cmac bnep intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt intel_cstate snd_hda_codec_idt btusb iTCO_vendor_support snd_hda_codec_generic uvcvideo btrtl snd_hda_codec_hdmi btbcm ledtrig_audio btintel snd_hda_intel intel_uncore snd_hda_codec videobuf2_vmalloc bluetooth intel_rapl_perf videobuf2_memops snd_hda_core videobuf2_v4l2 snd_hwdep snd_seq videobuf2_common ecdh_generic hp_wmi sparse_keymap videodev snd_seq_device ecc joydev pcspkr snd_pcm mc rfkill wmi_bmof snd_timer lpc_ich mfd_core snd mei_me soundcore mei hp_accel lis3lv02d nfsd input_polldev auth_rpcgss hp_wireless nfs_acl lockd grace binfmt_misc sunrpc i915 i2c_algo_bit drm_kms_helper drm crc32c_intel serio_raw r8169 wmi video
[   14.208656] CR2: 0000000000000000
[   14.209825] ---[ end trace ff67c8031e86ac1e ]---
[   14.210991] RIP: 0010:0x0
[   14.212158] Code: Bad RIP value.
[   14.213257] RSP: 0018:ffffb2ed805e3558 EFLAGS: 00010246
[   14.214427] RAX: ffffffff905959a0 RBX: ffff9d83dffb9800 RCX: 0000000000000000
[   14.215570] RDX: ffff9d83f1590000 RSI: 0000000000000000 RDI: ffff9d83dffb9800
[   14.216743] RBP: ffff9d83ef802000 R08: 0000000000027c40 R09: 0000000000027c00
[   14.217927] R10: ffffb2ed806c3988 R11: 0000000000000000 R12: 0000000000000a46
[   14.219072] R13: 0000000000000800 R14: ffff9d83f129a000 R15: ffff9d83ef8028c0
[   14.220254] FS:  00007ffff7f0d980(0000) GS:ffff9d83f3800000(0000) knlGS:0000000000000000
[   14.221465] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.222646] CR2: ffffffffffffffd6 CR3: 000000032755a001 CR4: 00000000001606f0
[   16.087844] fuse: init (API version 7.31)

It seemed the r8169 driver had major changes between kernel 5,2 and 5.3.
Comment 1 Heiner Kallweit 2019-07-28 15:04:06 UTC
Generic PHY r8169-300:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)

This indicates that you miss the Realtek PHY driver (should be "Generic Realtek PHY"), and the genphy driver doesn't know how to handle Realtek paging. Please build phylib and the Realtek PHY driver and check again.
Comment 2 enometh 2019-10-09 02:32:09 UTC
I have the same problem and I had already enabed the realtek phy in phylib
CONFIG_PHYLIB=m
CONFIG_REALTEK_PHY=m
CONFIG_GENERIC_PHY=m

The card is         Subsystem: Micro-Star International Co., Ltd. [MSI] RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller

Both 5.3.4 and earlier kernels only report Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)

Is there something else that now needs to be include to use the
realtek PHY driver?
Comment 3 Heiner Kallweit 2019-10-09 07:29:37 UTC
(In reply to enometh from comment #2)
> I have the same problem and I had already enabed the realtek phy in phylib
> CONFIG_PHYLIB=m
> CONFIG_REALTEK_PHY=m
> CONFIG_GENERIC_PHY=m
> 
> The card is         Subsystem: Micro-Star International Co., Ltd. [MSI]
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> 
There are dozens of different ones. Please add dmesg line with "XID".

> Both 5.3.4 and earlier kernels only report Generic PHY r8169-100:00:
> attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> 
> Is there something else that now needs to be include to use the
> realtek PHY driver?

Please attach kernel config and lsmod output.
Comment 4 enometh 2019-10-09 10:36:33 UTC
Created attachment 285417 [details]
in reply to #3 dmesg lines, net kconfig and net lsmod

Sorry for neglecting to mention that info.
I hope you didn't want me to attach the whole kernel config and lsmod
output and just wanted the `net' specific parts:
Comment 5 Heiner Kallweit 2019-10-09 10:55:36 UTC
I see no "realtek" in your lsmod output, so the PHY driver module doesn't seem to be loaded. Is there a realtek.ko in your modules directory?
What's the reported PHY ID (look for phy_id property in sysfs)?
Comment 6 enometh 2019-10-09 11:45:41 UTC
(In reply to Heiner Kallweit from comment #5)
> I see no "realtek" in your lsmod output, so the PHY driver module doesn't
> seem to be loaded. Is there a realtek.ko in your modules directory?
> What's the reported PHY ID (look for phy_id property in sysfs)?

I see there is a realtek.ko but it isnt getting loaded
If I ensure  (via a modprobe softdep) that it gets loaded before
r8169 gets loaded then I avoid the oops and lsmod looks like this.

r8169                  81920  0
libphy                 81920  2 r8169,realtek

In this situation the phy_id  is 0x001cc800

I think my problem is solved. Thank you. But perhaps depmod should
figure this out?
Comment 7 Heiner Kallweit 2019-10-09 12:37:50 UTC
Actually two things should take care that the Realtek PHY driver module gets loaded:

1. phylib when probing the PHY (based on PHY ID)
2. r8169 driver uses the following to ensure PHY driver gets loaded before:
   MODULE_SOFTDEP("pre: realtek")

Both code parts haven't changed for quite some time, therefore I think the root cause is not in the network driver but in the module loading subsystem.
I'll check whether I find any info about known issues/bugs with depmod et al.
Comment 8 Heiner Kallweit 2019-10-10 18:23:30 UTC
In addition: which depmod version do you have?
Very helpful would be if you could bisect the issue. On my systems I can't reproduce the issue.
Comment 9 Rick Moss 2019-10-13 15:01:42 UTC
Same issue , any kernel above 5.3.+  works 5.2 and 4.19 (Gentoo vanilla src)

depmod -V kmod version 25
-XZ +ZLIB -EXPERIMENTAL
Comment 10 Heiner Kallweit 2019-10-13 16:01:36 UTC
I have kmod v26 and can't reproduce the issue. Could you upgrade your kmod package and re-test?
Comment 11 Heiner Kallweit 2019-10-14 18:19:48 UTC
The maintainers of the module subsystem see a possible link to recent changes and ask to test the following:
Use latest 5.4-rc kernel and apply following patches on top:
https://lore.kernel.org/linux-modules/20191010151443.7399-1-maennich@google.com/
Comment 12 François Valenduc 2019-12-01 17:32:03 UTC
It seems these patches are included in linux 5.4 and I still get the same error.
Comment 13 Bernd Raschke 2019-12-08 12:40:54 UTC
(In reply to François Valenduc from comment #12)
> It seems these patches are included in linux 5.4 and I still get the same
> error.

I can confirm, I also see this since 5.3.x and now with 5.4.2-gentoo and Gentoo's sys-apps/kmod-26-r3.

Dec  8 12:57:45 borg kernel: Generic PHY r8169-300:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r
8169-300:00, irq=IGNORE)

# lspci
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 11)
# lspci -n
03:00.0 0200: 10ec:8168 (rev 11)
Comment 14 Heiner Kallweit 2019-12-08 18:50:38 UTC
Same symptoms happened due to dracut nor properly considerung softdeps. Do you have r8169 in your initramfs?
Comment 15 François Valenduc 2019-12-08 20:35:11 UTC
I am using genkernel with gentoo and r8169 is indeed included in the initramfs. It seems that genkernel is not smart enough to also include the realtek module. Forcing its inclusion also solves the problem. Another solution is to add a file in /etc/modprobe.d with this line:
softdep r8169 pre: realtek

I do not really understand why it works with r8169 alone in the initramfs and the softdep indicated above, but anyway there are several workarounds for this problem.
Comment 16 Bernd Raschke 2019-12-09 16:05:10 UTC
(In reply to François Valenduc from comment #15)
> I am using genkernel with gentoo and r8169 is indeed included in the
> initramfs. It seems that genkernel is not smart enough to also include the
> realtek module. Forcing its inclusion also solves the problem.

Also solved it for me, adding the realtek module to genkernel's list adds realtek.ko to the initramfs and 5.4.2 boots without oopsies.
Comment 17 Thomas Deutschmann 2019-12-14 18:35:42 UTC
(In reply to François Valenduc from comment #15)
> I am using genkernel with gentoo and r8169 is indeed included in the
> initramfs. It seems that genkernel is not smart enough to also include the
> realtek module.

Just because r8169 modules declares a _softdep_ and not a _dep_. Kernel documentation is clear about this:

> The softdep command allows you to specify soft, or optional,
> module dependencies.  modulename can be used without these
> optional modules installed, but usually with some features
> missing.

So if you get an oops without realtek module, it's obviously not a softdep instead it's required. For Gentoo we will update genkernel and add a workaround to always force realtek module for now but r8169 maintainer should either fix the oops in case realtek module should be really just a softdep or set appropriate deps.
Comment 18 Randomtsk 2020-01-22 14:52:22 UTC
Can you clarify where this is being tracked upstream?  It most certainly still happens, not sure the realtek has anything to do with it though as I can trigger it (at least I think it), as soon as something tries to bring the link up.  At first I though it was network manager but I can boot into single user and do everything upto the point of ip link set foo up.

Full disclosure - new account, hopefully doesn't trigger any spam warnings.  Spent a few hours trying to figure this out.. To see others reporting the same thing is reassuring but a bit disturbing there's no real progress to resolution.
Comment 19 Heiner Kallweit 2020-01-22 15:33:53 UTC
(In reply to Randomtsk from comment #18)
> Can you clarify where this is being tracked upstream?  It most certainly
> still happens, not sure the realtek has anything to do with it though as I
> can trigger it (at least I think it), as soon as something tries to bring
> the link up.  At first I though it was network manager but I can boot into
> single user and do everything upto the point of ip link set foo up.
> 
If you still face the issue then first check what has been discussed here. If r8169 is in your initramfs, make sure realtek is as well. If you see in dmesg that the generic PHY driver is loaded, then there's something wrong.

See also upstream commit f325937735498afb054a0195291bbf68d0b60be5 ("r8169: check that Realtek PHY driver module is loaded").
Comment 20 Randomtsk 2020-01-22 16:08:00 UTC
Kernel tested: 5.4.6
Board: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F5 11/05/201
4

I don't see that message on newer / borked kernels.  I do however see realtek.ko in initrd

lsinitrd  /boot/initramfs-5.4.6-1.el7.elrepo.x86_64.img |grep -i realte
drwxr-xr-x   2 root     root            0 Jan 22 04:19 usr/lib/modules/5.4.6-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek
-rwxr--r--   1 root     root       131200 Jan 22 04:19 usr/lib/modules/5.4.6-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek/r8169.ko


Not entirely sure what all to include from dmesg

Jan 22 04:25:14 kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
Jan 22 04:25:14 kernel: r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
Jan 22 04:25:14 kernel: r8169 0000:03:00.0 eth0: RTL8168g/8111g at 0xffffc900018da000, fc:aa:14:76:fd:d8, XID 0c000880 IRQ 29
Jan 22 04:25:14 kernel: r8169 0000:03:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumm
ing: ko]
Comment 21 Heiner Kallweit 2020-01-22 16:11:12 UTC
(In reply to Randomtsk from comment #20)
> Kernel tested: 5.4.6
> Board: Gigabyte Technology Co., Ltd. H97N-WIFI/H97N-WIFI, BIOS F5 11/05/201
> 4
> 
> I don't see that message on newer / borked kernels.  I do however see
> realtek.ko in initrd
> 
> lsinitrd  /boot/initramfs-5.4.6-1.el7.elrepo.x86_64.img |grep -i realte
> drwxr-xr-x   2 root     root            0 Jan 22 04:19
> usr/lib/modules/5.4.6-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek
> -rwxr--r--   1 root     root       131200 Jan 22 04:19
> usr/lib/modules/5.4.6-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/
> realtek/r8169.ko
> 
> 
> Not entirely sure what all to include from dmesg
> 
> Jan 22 04:25:14 kernel: r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded

This log is not from a 5.4 kernel. This message has been removed long ago.

> Jan 22 04:25:14 kernel: r8169 0000:03:00.0: can't disable ASPM; OS doesn't
> have ASPM control
> Jan 22 04:25:14 kernel: r8169 0000:03:00.0 eth0: RTL8168g/8111g at
> 0xffffc900018da000, fc:aa:14:76:fd:d8, XID 0c000880 IRQ 29

Same here, this "at <address>" was removed long ago.

> Jan 22 04:25:14 kernel: r8169 0000:03:00.0 eth0: jumbo features [frames:
> 9200 bytes, tx checksumm
> ing: ko]
Comment 22 Randomtsk 2020-01-22 16:31:39 UTC
just realized that might not be clear .. there is no realtek.ko in it, however there isn't one in 4.4.206 either which works (as does upto 4.20)

# lsinitrd  /boot/initramfs-4.4.206-1.el7.elrepo.x86_64.img |grep -i realte
drwxr-xr-x   2 root     root            0 Jan 22 07:37 usr/lib/modules/4.4.206-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek
-rwxr--r--   1 root     root       145472 Jan 22 07:37 usr/lib/modules/4.4.206-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek/r8169.ko


# find /lib/modules/|grep phy/realtek.ko
/lib/modules/3.10.0-957.el7.x86_64/kernel/drivers/net/phy/realtek.ko.xz
/lib/modules/3.10.0-1062.9.1.el7.x86_64/kernel/drivers/net/phy/realtek.ko.xz
/lib/modules/5.4.5-1.el7.elrepo.x86_64/kernel/drivers/net/phy/realtek.ko
/lib/modules/4.4.206-1.el7.elrepo.x86_64/kernel/drivers/net/phy/realtek.ko
/lib/modules/5.4.6-1.el7.elrepo.x86_64/kernel/drivers/net/phy/realtek.ko
/lib/modules/4.20.0+/kernel/drivers/net/phy/realtek.ko



This is from 5.4.6-1:
Jan 21 18:07:35 kernel: libphy: r8169: probed
Jan 21 18:07:35 kernel: r8169 0000:03:00.0 eth0: RTL8168g/8111g, fc:aa:14:76:ff:aa, XID 4c0, IRQ 35
Jan 21 18:07:35  kernel: r8169 0000:03:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
Jan 21 18:07:35 kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0


There's a whole lot of spamy garbage but nothing from the driver.
Comment 23 Heiner Kallweit 2020-01-22 18:45:40 UTC
Quite some things have been changed since 4.4. The log snippet misses the most relevant line, when the PHY driver is attached.
Comment 24 Randomtsk 2020-01-22 19:39:38 UTC
Don't think it's missing, it doesn't happen.  Kernel panics.
Comment 25 Heiner Kallweit 2020-01-22 20:08:34 UTC
OK. Then just add realtek.ko to initramfs and re-test.
Kernel 5.6 will include an initial check in r8169 whether Realtek PHY drivers are available and bail out and warn if it doesn't find them. This avoids the panic.
Comment 26 Randomtsk 2020-01-23 02:10:02 UTC
Not sure we're on the same page.  realtek.ko isn't installed on versions that work either.  It is on the rootfs.
Comment 27 Heiner Kallweit 2020-01-24 13:43:12 UTC
(In reply to Randomtsk from comment #26)
> realtek.ko isn't installed on versions
> that work either.  It is on the rootfs.

In earlier versions the genphy driver was used when r8169.ko is in initramfs but realtek.ko not. This may have worked for some people, for others not (because all the version-specific PHY handling is missing). It's not sufficient any longer if realtek.ko is on rootfs only and r8169.ko is in initramfs.
Comment 28 Deweloper 2020-02-24 18:48:14 UTC
Hi,

I just upgraded the kernel from 5.5.3 to 5.5.6 and immediately got a critical regression - ethernet doesn't work at all.
Particularly "realtek.ko not loaded, maybe it needs to be added to initramfs?" message appears and r8169 driver refuses to work.
Up to 5.5.3 it used to work fine; there is no "Generic PHY" in my dmesg.

libphy: r8169: probed
r8169 0000:01:00.0 eth0: RTL8168d/8111d, xx:xx:xx:xx:xx:xx, XID 281, IRQ 15
r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
RTL8211B Gigabit Ethernet r8169-100:00: attached PHY driver [RTL8211B Gigabit Ethernet] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
r8169 0000:01:00.0 eth0: Link is Down
r8169 0000:01:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Can you please explain the solution applied in f325937735498afb054a0195291bbf68d0b60be5, because I don't get it...
1. Is r8169.ko able to work without realtek.ko, or not?
2. If not, then why this is called softdep?
3. If yes, then why its "probe" method returns -ENOENT?
Comment 29 Heiner Kallweit 2020-02-24 19:41:47 UTC
The softdep refers to a different issue. r8169.ko meanwhile has a hard dependency on realtek.ko, but as of today the kernel doesn't support to express a hard dependency that isn't a code dependency.

What would need to be checked in your case:
- Are realtek and r8169 both modules and/or built-in?
- Is either of the two modules in initramfs?
Comment 30 Deweloper 2020-02-24 20:10:06 UTC
Thanks for quick reply. In my case both are modules and both are in initramfs. They are loaded by mdev (when requested by kernel via netlink/uevent) using modprobe $MODALIAS. Both mdev and modprobe are part of busybox 1.31.1. It seems that the version of modutils included there ignores softdep. However, when I deleted just one line "return -ENOENT;" right after the new error message, my ethernet device started working again:

<3>[    3.291857] r8169 0000:01:00.0: realtek.ko not loaded, maybe it needs to be added to initramfs?
<6>[    3.415797] libphy: r8169: probed
<6>[    3.419810] r8169 0000:01:00.0 eth0: RTL8168d/8111d, 00:05:35:f0:d2:15, XID 281, IRQ 15
<6>[    3.419824] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
<6>[    5.808601] RTL8211B Gigabit Ethernet r8169-100:00: attached PHY driver [RTL8211B Gigabit Ethernet] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
<6>[    6.049065] r8169 0000:01:00.0 eth0: Link is Down
<6>[    8.878518] r8169 0000:01:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx

"lsmod | grep -e ^realtek -e ^r8169 -e ^libphy" shows
realtek                16384  1 
r8169                  57344  0 
libphy                 49152  2 realtek,r8169
Comment 31 Heiner Kallweit 2020-02-24 20:35:41 UTC
Thanks. Then:
1. Remove r8169 from initramfs if not needed to load rootfs. Or
2. Make sure realtek.ko is loaded from initramfs before r8169.ko.
Comment 32 Raffaello D. Di Napoli 2020-03-30 03:05:51 UTC
I think I’m seeing the same issue as Deweloper, upon upgrading from 5.2.14 to 5.5.13 (on Gentoo, using vanilla source).

$ lzop -d </boot/initramfs-5.2.14-dv7.cpio.lzo | cpio -t | grep -F realtek
lib/modules/5.2.14-dv7/kernel/drivers/net/phy/realtek.ko
lib/modules/5.2.14-dv7/kernel/drivers/net/ethernet/realtek
lib/modules/5.2.14-dv7/kernel/drivers/net/ethernet/realtek/r8169.ko
$ lzop -d </boot/initramfs-5.5.13-dv7.cpio.lzo | cpio -t | grep -F realtek
lib/modules/5.5.13-dv7/kernel/drivers/net/phy/realtek.ko
lib/modules/5.5.13-dv7/kernel/drivers/net/ethernet/realtek
lib/modules/5.5.13-dv7/kernel/drivers/net/ethernet/realtek/r8169.ko

Initramfs contents did not change, as you can see. However with 5.5.13 I get this in dmesg:

[    2.440967] r8169 0000:01:00.0: realtek.ko not loaded, maybe it needs to be added to initramfs?
[    2.442368] r8169: probe of 0000:01:00.0 failed with error -2

Once booted, I can see realtek.ko was not loaded:

$ lsmod | awk '$1~/^(realtek|r8169|libphy)/'
r8169                  86016  0
libphy                 69632  1 r8169

To confirm that it’s a loading order issue as suggested by Heiner, I run:

$ sudo rmmod r8169
$ sudo modprobe realtek
$ sudo modprobe r8169

After which I see in dmesg:

[ 3453.158817] r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 3453.160787] libphy: r8169: probed
[ 3453.161098] r8169 0000:01:00.0 eth0: RTL8168evl/8111evl, 08:2e:5f:8b:0d:07, XID 2c9, IRQ 42
[ 3453.161101] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[ 3453.161469] systemd-udevd[5983]: Using default interface naming scheme 'v243'.
[ 3453.213144] RTL8211E Gigabit Ethernet r8169-100:00: attached PHY driver [RTL8211E Gigabit Ethernet] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
[ 3453.399346] r8169 0000:01:00.0 eth0: Link is Down
[ 3456.124471] r8169 0000:01:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 3456.124499] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

As for the options in comment #31, neither is an option in my case. Anything that can be done so that loading r8169 only happens after realtek is loaded?
Comment 33 Raffaello D. Di Napoli 2020-03-30 03:27:50 UTC
After some thinking, I found out that changing CONFIG_PHYLIB and CONFIG_REALTEK_PHY from M to Y is an “okay” workaround for me, since it forces the PHY driver to always be loaded before r8169.

However I’d still like to find a real solution to this issue, leaving non-essential drivers as modules.
Comment 34 Heiner Kallweit 2020-03-30 10:15:04 UTC
Could you please re-check with brand-new 5.6? It includes a fix that should help with your issue.
Comment 35 François Valenduc 2020-03-30 13:44:20 UTC
The problem remains for the creation of initramfs images (see comment 15 & 17 for example). Either I need to remove r8169 from the initramfs or add the realtek modules to it. I don't know if the bug is in genkernel in gentoo or in the kernel. But If I only add r8169 (which seems to be what the default configuration does), the realtek module won't be included and network won't work.
Comment 36 Heiner Kallweit 2020-03-30 13:53:41 UTC
Currently there is no way to express a hard dependency that is not a code dependency. Adding support for this would require extensions to the kernel and to module-init-tools.
Comment 37 Heiner Kallweit 2020-04-13 16:10:47 UTC
You can re-test with the latest kernel versions, e.g. 5.6.4.
Comment 38 Heiner Kallweit 2020-04-13 16:12:44 UTC
With regard to initramfs creation:
Most distro's consider softdeps, also initramfs-tools does.
AFAIK only genkernel requires manual adjustments.
Comment 39 Dmitriy 2020-08-26 08:51:40 UTC
(In reply to Heiner Kallweit from comment #37)
> You can re-test with the latest kernel versions, e.g. 5.6.4.

Hi,

5.8.3-1.el7.elrepo.x86_64
"
network not started: 
kernel: r8169 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control
kernel: libphy: r8169: probed
kernel: r8169 0000:02:00.0: no dedicated PHY driver found for PHY ID 0x001cc800, maybe realtek.ko needs to be added to initramfs?
ernel: r8169: probe of 0000:02:00.0 failed with error -49
"

any idea why it not working?

On old kernel network periodically stuck
3.10.0-1062.7.1.el7.x86_64
3.10.0-1127.18.2.el7.x86_64
I get message 
kernel: NETDEV WATCHDOG: nic0 (r8169): transmit queue 0 timed out

I know, 5.2 not have these problems, but now I can't find 5.2 in elrepo.
Comment 40 Heiner Kallweit 2020-08-26 09:16:26 UTC
(In reply to Dmitriy from comment #39)
> (In reply to Heiner Kallweit from comment #37)
> > You can re-test with the latest kernel versions, e.g. 5.6.4.
> 
> Hi,
> 
> 5.8.3-1.el7.elrepo.x86_64
> "
> network not started: 
> kernel: r8169 0000:02:00.0: can't disable ASPM; OS doesn't have ASPM control
> kernel: libphy: r8169: probed
> kernel: r8169 0000:02:00.0: no dedicated PHY driver found for PHY ID
> 0x001cc800, maybe realtek.ko needs to be added to initramfs?
> ernel: r8169: probe of 0000:02:00.0 failed with error -49
> "
> 
Did you check what the error message indicates? Do you have r8169 in initramfs? Then either remove it from initramfs (except having e.g. a NFS root fs) or add realtek.ko.
Comment 41 Dmitriy 2020-08-26 10:40:08 UTC
(In reply to Heiner Kallweit from comment #40)
> (In reply to Dmitriy from comment #39)
...
> Did you check what the error message indicates? Do you have r8169 in
> initramfs? Then either remove it from initramfs (except having e.g. a NFS
> root fs) or add realtek.ko.

How I can check this? I don't understand, why it not working by default :-(

find /lib/modules/|grep phy/realtek.ko
/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/net/phy/realtek.ko.xz
/lib/modules/3.10.0-957.12.2.el7.x86_64/kernel/drivers/net/phy/realtek.ko.xz
/lib/modules/3.10.0-1062.1.1.el7.x86_64/kernel/drivers/net/phy/realtek.ko.xz
/lib/modules/3.10.0-1062.7.1.el7.x86_64/kernel/drivers/net/phy/realtek.ko.xz
/lib/modules/5.8.3-1.el7.elrepo.x86_64/kernel/drivers/net/phy/realtek.ko
Comment 42 Heiner Kallweit 2020-08-26 11:03:53 UTC
(In reply to Dmitriy from comment #41)
> (In reply to Heiner Kallweit from comment #40)
> > (In reply to Dmitriy from comment #39)
> ...
> > Did you check what the error message indicates? Do you have r8169 in
> > initramfs? Then either remove it from initramfs (except having e.g. a NFS
> > root fs) or add realtek.ko.
> 
> How I can check this? I don't understand, why it not working by default :-(
> 
That's something you better ask in a support forum of your distro.
Some have lsinitramfs, or you can check /etc/mkinitcpio.conf.
Comment 43 Dmitriy 2020-08-26 11:29:11 UTC
(In reply to Heiner Kallweit from comment #42)
> (In reply to Dmitriy from comment #41)
> > (In reply to Heiner Kallweit from comment #40)
> > > (In reply to Dmitriy from comment #39)

Thank you for answer!

OS CentOS 7.  
lsinitrd /boot/initramfs-5.8.3-1.el7.elrepo.x86_64.img | more
Image: /boot/initramfs-5.8.3-1.el7.elrepo.x86_64.img: 21M
========================================================================
Version: dracut-033-568.el7

Arguments: -f

dracut modules:
bash
nss-softokn
i18n
network
ifcfg
drm
plymouth
kernel-modules
rootfs-block
terminfo
udev-rules
biosdevname
systemd
usrmount
base
fs-lib
microcode_ctl-fw_dir_override
shutdown
========================================================================

lsinitrd /boot/initramfs-5.8.3-1.el7.elrepo.x86_64.img | grep realtek
drwxr-xr-x   2 root     root            0 Aug 25 23:39 usr/lib/modules/5.8.3-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek
-rwxr--r--   1 root     root       127024 Aug 25 23:39 usr/lib/modules/5.8.3-1.el7.elrepo.x86_64/kernel/drivers/net/ethernet/realtek/r8169.ko
Comment 44 Peter Draganov 2020-11-29 12:56:18 UTC
I had the same problem when upgraded from kernel 4.19.97 to 5.4.28. After reading this bug I changed the following kernel options in order to move modules to kernel:

# diff /boot/config-5.4.28-gentoo /boot/config-5.4.28-gentoo.old 
2241,2242c2241,2242
< CONFIG_MDIO_DEVICE=y
< CONFIG_MDIO_BUS=y
---
> CONFIG_MDIO_DEVICE=m
> CONFIG_MDIO_BUS=m
2247c2247
< CONFIG_PHYLIB=y
---
> CONFIG_PHYLIB=m
2283c2283
< CONFIG_REALTEK_PHY=y
---
> CONFIG_REALTEK_PHY=m

and created this file:

# cat /etc/modprobe.d/r8169.conf 
softdep r8169 pre: realtek

This fixed the problem in May 2020 but I upgraded now the kernel to 5.4.72 with "make syncconfig" and these changes no longer work for me. I tried also the following without success:

# diff /boot/config-5.4.72-gentoo /boot/config-5.4.72-gentoo.old 
2192c2192
< CONFIG_R8169=y
---
> CONFIG_R8169=m

# grep CONFIG_GENERIC_PHY= /boot/config-5.4.72-gentoo
CONFIG_GENERIC_PHY=y

# grep REALTEK /boot/config-5.4.28-gentoo
CONFIG_NET_VENDOR_REALTEK=y
CONFIG_REALTEK_PHY=y
# CONFIG_WLAN_VENDOR_REALTEK is not set
CONFIG_SND_HDA_CODEC_REALTEK=m
# CONFIG_USB_STORAGE_REALTEK is not set

dmesg:
[    2.597007] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
[    2.603467] libphy: r8169: probed
[    2.604862] r8169 0000:03:00.0: realtek.ko not loaded, maybe it needs to be added to initramfs?
[    2.606372] r8169: probe of 0000:03:00.0 failed with error -49

dmesg of working kernel 5.4.28:
[    3.537325] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
[    3.547687] libphy: r8169: probed
[    3.551848] r8169 0000:03:00.0 eth0: RTL8168d/8111d, 1c:6f:65:43:f8:a0, XID 283, IRQ 29
[    3.554657] r8169 0000:03:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[   23.212981] r8169 0000:03:00.0: Direct firmware load for rtl_nic/rtl8168d-2.fw failed with error -2
[   23.212992] r8169 0000:03:00.0: Unable to load firmware rtl_nic/rtl8168d-2.fw (-2)
[   23.213370] Generic PHY r8169-300:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
[   23.358023] r8169 0000:03:00.0 eth0: Link is Down
[   25.338479] r8169 0000:03:00.0 eth0: Link is Up - 100Mbps/Full - flow control off

# lspci
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)

# lspci -n|grep 03:00.0
03:00.0 0200: 10ec:8168 (rev 03)

# depmod -V
kmod version 27
+XZ +ZLIB -LIBCRYPTO -EXPERIMENTAL

I generate initramfs with following command:
#genkernel --install initramfs

Any idea how to make kernel 5.4.72 to use my Ethernet interface?
Comment 45 Heiner Kallweit 2020-11-29 13:31:08 UTC
Please report value of /sys/class/net/eth0/phydev/phy_id under 5.4.28.
What kind of board is it? There are known issues with buggy Gigabyte boards from about 2009.
Comment 46 Peter Draganov 2020-11-29 13:44:57 UTC
0xc1071002
It is Gigabyte GA-880GA-UD3H motherboard with Realtek RTL8111D chip
I never had problems with it until kernel 5
Comment 47 Heiner Kallweit 2020-11-29 14:00:08 UTC
Thanks, this is an invalid PHY ID, the typical bug on these Gigabyte boards.
You can try to update BIOS or enable network boot ROM in BIOS.

Note You need to log in before you can comment on or make changes to this bug.