Bug 213485 - Broadcom genet fails to attach ethernet PHY on Raspberry Pi 4B sometimes
Summary: Broadcom genet fails to attach ethernet PHY on Raspberry Pi 4B sometimes
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Network (show other bugs)
Hardware: ARM Linux
: P1 normal
Assignee: drivers_network@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-18 03:49 UTC by Jian-Hong Pan
Modified: 2021-10-15 08:31 UTC (History)
2 users (show)

See Also:
Kernel Version: 5.11+
Subsystem:
Regression: No
Bisected commit-id:


Attachments
5.13-rc6 dmesg log (35.13 KB, text/plain)
2021-06-18 03:49 UTC, Jian-Hong Pan
Details
The kernel 5.13-rc6 build config (197.30 KB, text/plain)
2021-06-18 03:50 UTC, Jian-Hong Pan
Details
Add debug messages based on 5.13-rc6 (2.12 KB, patch)
2021-06-21 03:41 UTC, Jian-Hong Pan
Details | Diff
The full dmesg with the debug message of comment #2 (35.91 KB, text/plain)
2021-06-21 03:44 UTC, Jian-Hong Pan
Details
The device tree decoded from bcm2711-rpi-4-b.dtb (31.59 KB, audio/vnd.dts)
2021-06-21 03:49 UTC, Jian-Hong Pan
Details

Description Jian-Hong Pan 2021-06-18 03:49:21 UTC
Created attachment 297443 [details]
5.13-rc6 dmesg log

I notice the Broadcom genet fails to attach ethernet PHY on Raspberry Pi 4B sometimes (more then 50%) when I test kernel 5.13-rc6, also tested 5.11.

It shows errors in the kernel message:

[   10.615778] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
...
[   11.585741] could not attach to PHY
[   11.585790] bcmgenet fd580000.ethernet eth0: failed to connect to PHY

The full dmesg as the attachment.
Comment 1 Jian-Hong Pan 2021-06-18 03:50:21 UTC
Created attachment 297445 [details]
The kernel 5.13-rc6 build config
Comment 2 Jian-Hong Pan 2021-06-21 03:41:50 UTC
Created attachment 297519 [details]
Add debug messages based on 5.13-rc6

Now, the ethernet is not detected as usual and showing the log

$ dmesg | grep -E "(net|phy|PHY|mdio)"
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
[    0.000000] arch_timer: cp15 timer(s) running at 54.00MHz (phys).
[    0.038269] audit: initializing netlink subsys (disabled)
[    0.072009] usb_phy_generic phy: supply vcc not found, using dummy regulator
[    0.223554] libphy: Fixed MDIO Bus: probed
[    0.226192] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[    0.226390] igb: Intel(R) Gigabit Ethernet Network Driver
[    0.240676] 9pnet: Installing 9P2000 support
[   10.353067] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
[   11.470402] libphy: bcmgenet MII bus: probed
[   11.509851] of_mdio_find_device: does not find ethernet-phy@1 on mdio bus
[   11.509867] of_phy_find_device: mdio does not find ethernet-phy@1
[   11.509874] bcmgenet fd580000.ethernet eth0: of_phy_connect: dn full_name ethernet-phy@1
[   11.509884] could not attach to PHY
[   11.509888] bcmgenet fd580000.ethernet eth0: failed to connect to PHY
[   11.657266] unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus

Tracing the code by following the debug message, I understand that genet tries to find out the MDIO device corresponding to genet on the MDIO bus. But, it fails at first time, because unimac-mdio from mdio-bcm-unimac module has not prepared it for genet, yet.
Comment 3 Jian-Hong Pan 2021-06-21 03:44:21 UTC
Created attachment 297521 [details]
The full dmesg with the debug message of comment #2
Comment 4 Jian-Hong Pan 2021-06-21 03:49:48 UTC
Created attachment 297523 [details]
The device tree decoded from bcm2711-rpi-4-b.dtb

The dependency relationship can be found in bcm2711-rpi-4-b's device tree.

ethernet@7d580000 {
    compatible = "brcm,bcm2711-genet-v5";
...
    phy-mode = "rgmii-rxid";

    mdio@e14 {
        compatible = "brcm,genet-mdio-v5";
        reg = <0xe14 0x08>;
        reg-names = "mdio";
        #address-cells = <0x00>;
        #size-cells = <0x01>;

        ethernet-phy@1 {
            reg = <0x01>;
            phandle = <0x27>;
        };
    };
};

$ git grep "brcm,genet-mdio-v5"
Documentation/devicetree/bindings/net/brcm,bcmgenet.txt:  "brcm,genet-mdio-v3", "brcm,genet-mdio-v4", "brcm,genet-mdio-v5", the version
Documentation/devicetree/bindings/net/brcm,unimac-mdio.txt:  "brcm,genet-mdio-v3", "brcm,genet-mdio-v4", "brcm,genet-mdio-v5" or
arch/arm/boot/dts/bcm2711.dtsi:                         compatible = "brcm,genet-mdio-v5";
drivers/net/mdio/mdio-bcm-unimac.c:     { .compatible = "brcm,genet-mdio-v5", },
Comment 5 Jian-Hong Pan 2021-06-21 03:54:40 UTC
According to comment #2 ~ #4, the unimac-mdio comes too late for genet, or system is too late to load mdio-bcm-unimac module.

So, I re-probe genet module manually again. The ethernet is detected as following log and works correctly.

[  116.210987] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000
[  116.226345] libphy: bcmgenet MII bus: probed
[  116.298480] unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus
[  116.350126] of_mdio_find_device: going to get ethernet-phy@1's mdio device
[  116.350146] of_phy_find_device: going to get ethernet-phy@1's phy_device
[  116.350154] bcmgenet fd580000.ethernet eth0: of_phy_connect: dn full_name ethernet-phy@1
[  116.350164] bcmgenet fd580000.ethernet eth0: of_phy_connect: going to connect ethernet-phy@1 directly
[  116.352460] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
[  116.355594] bcmgenet fd580000.ethernet eth0: Link is Down
[  118.402544] bcmgenet fd580000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx

The full dmesg is the same as comment #3
Comment 6 Florian Fainelli 2021-06-21 17:05:46 UTC
Could you try one of these two approaches:

diff --git a/drivers/net/mdio/mdio-bcm-unimac.c b/drivers/net/mdio/mdio-bcm-unimac.c
index bfc9be23c973..53fecb53cd65 100644
--- a/drivers/net/mdio/mdio-bcm-unimac.c
+++ b/drivers/net/mdio/mdio-bcm-unimac.c
@@ -349,6 +349,7 @@ static struct platform_driver unimac_mdio_driver = {
                .name = UNIMAC_MDIO_DRV_NAME,
                .of_match_table = unimac_mdio_ids,
                .pm = &unimac_mdio_pm_ops,
+               .probe_type = PROBE_FORCE_SYNCHRONOUS,
        },
        .probe  = unimac_mdio_probe,
        .remove = unimac_mdio_remove,

or preferably:

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index fcca023f22e5..41f7f078cd27 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -4296,3 +4296,4 @@ MODULE_AUTHOR("Broadcom Corporation");
 MODULE_DESCRIPTION("Broadcom GENET Ethernet controller driver");
 MODULE_ALIAS("platform:bcmgenet");
 MODULE_LICENSE("GPL");
+MODULE_SOFTDEP("pre: mdio-bcm-unimac");


and let me know if either work?
Comment 7 Jian-Hong Pan 2021-06-22 05:20:16 UTC
(In reply to Florian Fainelli from comment #6)
> Could you try one of these two approaches:
> 
> diff --git a/drivers/net/mdio/mdio-bcm-unimac.c
> b/drivers/net/mdio/mdio-bcm-unimac.c
> index bfc9be23c973..53fecb53cd65 100644
> --- a/drivers/net/mdio/mdio-bcm-unimac.c
> +++ b/drivers/net/mdio/mdio-bcm-unimac.c
> @@ -349,6 +349,7 @@ static struct platform_driver unimac_mdio_driver = {
>                 .name = UNIMAC_MDIO_DRV_NAME,
>                 .of_match_table = unimac_mdio_ids,
>                 .pm = &unimac_mdio_pm_ops,
> +               .probe_type = PROBE_FORCE_SYNCHRONOUS,
>         },
>         .probe  = unimac_mdio_probe,
>         .remove = unimac_mdio_remove,
The bug still can be reproduced with this approach.
Comment 8 Jian-Hong Pan 2021-06-22 05:23:04 UTC
(In reply to Florian Fainelli from comment #6)
> Could you try one of these two approaches:
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index fcca023f22e5..41f7f078cd27 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -4296,3 +4296,4 @@ MODULE_AUTHOR("Broadcom Corporation");
>  MODULE_DESCRIPTION("Broadcom GENET Ethernet controller driver");
>  MODULE_ALIAS("platform:bcmgenet");
>  MODULE_LICENSE("GPL");
> +MODULE_SOFTDEP("pre: mdio-bcm-unimac");
MODULE_SOFTDEP("pre: mdio-bcm-unimac") solves this issue.

Tracing the code by following the debug message in comment #2, I learned the path bcmgenet_mii_probe()'s of_phy_connect() -> of_phy_find_device() -> of_mdio_find_device() -> bus_find_device_by_of_node().  And, bus_find_device_by_of_node() cannot find the device on the mdio bus.

So, I traced bcm2711-rpi-4-b's device tree to find out which one is the mdio device and why it has not been prepared ready on the mdio bus for genet.
Then, I found out it is mdio-bcm-unimac module as mentioned in comment #4.  Also, noticed "unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus" comes after "bcmgenet fd580000.ethernet eth0: failed to connect to PHY" in the log.

With these findings, I try to re-modprobe genet module again.  The ethernet on RPi 4B works correctly!  Also, noticed mdio-bcm-unimac module is loaded before I re-modprobe genet module.
Therefore, I try to make mdio-bcm-unimac built in kernel image, instead of a module.  Then, genet always can find the mdio device on the bus and the ethernet works as well.

Consequently, the idea, loading mdio-bcm-unimac module earlier than genet module comes in my head!  But, I don't know the key word "MODULE_SOFTDEP" until Florian's guide.

I think this is like the module dependency situation mentioned in commit 11287b693d03 ("r8169: load Realtek PHY driver module before r8169") [1].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11287b693d03830010356339e4ceddf47dee34fa
Comment 9 Jian-Hong Pan 2021-06-22 07:33:22 UTC
Place some reference here:

* MODULE_SOFTDEP is defined in include/linux/module.h [1]

* modprobe.d has an example: [2]
  Assume "softdep c pre: a b post: d e" is provided in
  the configuration. Running "modprobe c" is now equivalent to
  "modprobe a b c d e" without the softdep.

[1] https://elixir.bootlin.com/linux/v5.13-rc7/source/include/linux/module.h#L170
[2] https://man7.org/linux/man-pages/man5/modprobe.d.5.html
Comment 10 Jian-Hong Pan 2021-06-22 10:08:57 UTC
I can make a new patch with the approach MODULE_SOFTDEP("pre: mdio-bcm-unimac").

Florian, may I have your singed-off-by in the commit message?  Or, you prefer tag the SoB by yourself?
Comment 11 Jian-Hong Pan 2021-07-07 06:37:14 UTC
Fixed by the commit 19938bafa7ae ("net: bcmgenet: Add mdio-bcm-unimac soft dependency").
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=19938bafa7ae8fc0a4a2c1c1430abb1a04668da1
Comment 12 Vivien 2021-10-15 08:31:46 UTC
(In reply to Jian-Hong Pan from comment #11)
> Fixed by the commit 19938bafa7ae ("net: bcmgenet: Add mdio-bcm-unimac soft
> dependency").
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> ?id=19938bafa7ae8fc0a4a2c1c1430abb1a04668da1

Hello ! I have the same problem on my raspberry pi 400 but I'm new to this and I don't understand how to clean it. How git diff works ? What is it doing ? It's downloading something ?
Thx for your answer and sorry ton up this topic !

Note You need to log in before you can comment on or make changes to this bug.