Bug 80751 - ethernet driver et131x oops on hotplug insert
Summary: ethernet driver et131x oops on hotplug insert
Status: RESOLVED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: Staging (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: drivers_staging@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-20 19:21 UTC by Mark Einon
Modified: 2014-08-14 21:03 UTC (History)
0 users

See Also:
Kernel Version: 3.15.0-rc6+
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Mark Einon 2014-07-20 19:21:05 UTC
Inserting an et131x device into the PCI express port results in the following oops:

[   68.099018] libphy: et131x_eth_mii: probed
[   68.099025] et131x_mii_probe(3723) phydev->dev.driver           (null)
[   68.099035] phy_attach_direct(579) device d ffff8800b911b810
[   68.099653] phy_resume(704) phydev ffff8800b911b800
[   68.099660] phy_resume(705) drv ffffffffffffff48
[   68.099668] phy_resume(706) dev.driver           (null)
[   68.099687] BUG: unable to handle kernel paging request at ffffffffffffff88
[   68.099735] IP: [<ffffffffa07452cc>] phy_resume+0x73/0xa0 [libphy]
[   68.099774] PGD 1a0f067 PUD 1a11067 PMD 0 
[   68.099806] Oops: 0000 [#1] SMP 
[   68.099831] Modules linked in: et131x(C+) libphy ctr ccm joydev binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs nouveau lockd fscache sunrpc arc4 iTCO_wdt iTCO_vendor_support coretemp kvm_intel iwldvm kvm snd_hda_codec_hdmi mac80211 snd_hda_codec_realtek snd_hda_codec_generic psmouse snd_hda_intel snd_hda_controller serio_raw msi_wmi sparse_keymap pcspkr i7core_edac snd_hda_codec iwlwifi mxm_wmi ttm drm_kms_helper snd_hwdep evdev snd_pcm edac_core i2c_i801 ehci_pci drm ehci_hcd cfg80211 i2c_algo_bit snd_timer jmb38x_ms snd i2c_core rfkill usbcore memstick lpc_ich mfd_core soundcore usb_common shpchp battery ac acpi_cpufreq video wmi button processor loop fuse parport_pc ppdev lp parport autofs4 ext4 crc16 jbd2 mbcache sg sd_mod sr_mod crc_t10dif cdrom crct10dif_common crc32c_intel ahci libahci microcode libata firewire_ohci sdhci_pci sdhci r8169 firewire_core scsi_mod mmc_core mii crc_itu_t thermal thermal_sys
[   68.100580] CPU: 2 PID: 2010 Comm: systemd-udevd Tainted: G         C    3.15.0-rc6+ #18
[   68.100609] Hardware name: MICRO-STAR INTERNATIONAL CO., LTD MS-1727/MS-1727, BIOS E1727IMS.10F 05/27/2011
[   68.100643] task: ffff8800b8b6a510 ti: ffff88002f7fc000 task.ti: ffff88002f7fc000
[   68.100669] RIP: 0010:[<ffffffffa07452cc>]  [<ffffffffa07452cc>] phy_resume+0x73/0xa0 [libphy]
[   68.100712] RSP: 0018:ffff88002f7fda48  EFLAGS: 00010246
[   68.100734] RAX: 0000000000000000 RBX: ffff8800b911b800 RCX: 0000000000000001
[   68.100760] RDX: 0000000000000006 RSI: ffff8800b8b6acc8 RDI: ffff8800b8b6a510
[   68.100786] RBP: ffff88002f7fda58 R08: 0000000000000001 R09: ffffffff810743aa
[   68.100811] R10: ffffffff810743aa R11: 0000000000000000 R12: 0000000000000000
[   68.100837] R13: ffff8800b911b810 R14: 00000000fffffffb R15: 0000000000000000
[   68.100864] FS:  00007ffe49427880(0000) GS:ffff88013aa00000(0000) knlGS:0000000000000000
[   68.100892] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   68.100915] CR2: ffffffffffffff88 CR3: 00000000b93b1000 CR4: 00000000000007e0
[   68.100939] Stack:
[   68.100951]  ffff8800b911b800 ffff8800b9108000 ffff88002f7fdaa8 ffffffffa07453f1
[   68.100994]  ffff88002f7fda88 00000001813eb735 ffff8800b911b810 ffff8800b911b800
[   68.101037]  ffff8800b8ecada0 ffffffffa07579b7 ffffffffa07579b7 ffffffffa075a2b0
[   68.101080] Call Trace:
[   68.101103]  [<ffffffffa07453f1>] phy_attach_direct+0xf8/0x10a [libphy]
[   68.101135]  [<ffffffffa07579b7>] ? et131x_isr_handler+0x25c/0x25c [et131x]
[   68.101167]  [<ffffffffa07579b7>] ? et131x_isr_handler+0x25c/0x25c [et131x]
[   68.101201]  [<ffffffffa074548e>] phy_connect_direct+0x1c/0x4e [libphy]
[   68.101233]  [<ffffffffa074551e>] phy_connect+0x5e/0x78 [libphy]
[   68.101263]  [<ffffffffa0758685>] et131x_pci_setup+0x82f/0x9ca [et131x]
[   68.101295]  [<ffffffff812db041>] ? __pm_runtime_resume+0x79/0x88
[   68.101325]  [<ffffffff8123a175>] local_pci_probe+0x38/0x7e
[   68.101352]  [<ffffffff812d39b2>] ? driver_probe_device+0x308/0x308
[   68.101380]  [<ffffffff8123a341>] pci_device_probe+0xcf/0xf5
[   68.101406]  [<ffffffff812d37c8>] driver_probe_device+0x11e/0x308
[   68.101433]  [<ffffffff812d39b2>] ? driver_probe_device+0x308/0x308
[   68.101460]  [<ffffffff812d3a00>] __driver_attach+0x4e/0x6f
[   68.101485]  [<ffffffff812d1c0c>] bus_for_each_dev+0x5a/0x8c
[   68.101511]  [<ffffffff812d316b>] driver_attach+0x19/0x1b
[   68.101535]  [<ffffffff812d2e43>] bus_add_driver+0x115/0x1fa
[   68.101561]  [<ffffffff812d3fe7>] driver_register+0x87/0xbe
[   68.101587]  [<ffffffff812398f0>] __pci_register_driver+0x5d/0x62
[   68.101616]  [<ffffffffa075d000>] ? 0xffffffffa075cfff
[   68.101643]  [<ffffffffa075d01e>] et131x_driver_init+0x1e/0x1000 [et131x]
[   68.101674]  [<ffffffff810002c3>] do_one_initcall+0x9f/0x12c
[   68.101702]  [<ffffffff81034594>] ? change_page_attr_set+0x27/0x29
[   68.101729]  [<ffffffff810345e1>] ? set_memory_nx+0x2d/0x2f
[   68.101756]  [<ffffffffa075d000>] ? 0xffffffffa075cfff
[   68.101782]  [<ffffffff810a5e90>] load_module+0x1d7d/0x2091
[   68.101807]  [<ffffffff810a25a4>] ? mod_kobject_put+0x78/0x78
[   68.101838]  [<ffffffff810a630d>] SyS_finit_module+0x8e/0xa8
[   68.101866]  [<ffffffff813fe131>] ? _raw_spin_unlock+0x23/0x2f
[   68.101893]  [<ffffffff814039e2>] system_call_fastpath+0x16/0x1b
[   68.101916] Code: a0 31 c0 e8 3e 0a cb e0 48 8b 8b 20 01 00 00 31 c0 ba c2 02 00 00 48 c7 c6 c8 65 74 a0 48 c7 c7 94 69 74 a0 e8 1d 0a cb e0 31 c0 <49> 83 7c 24 88 00 74 20 48 c7 c7 ac 69 74 a0 ba c4 02 00 00 48 
[   68.102261] RIP  [<ffffffffa07452cc>] phy_resume+0x73/0xa0 [libphy]
[   68.102296]  RSP <ffff88002f7fda48>
[   68.102312] CR2: ffffffffffffff88
[   68.106475] ---[ end trace a48d9ee93fbcb599 ]---
Comment 1 Mark Einon 2014-07-20 19:39:08 UTC
Bisected to (original addition), log: 

commit 1211ce53077164e0d34641d0ca5fb4d4a7574498
Author: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Date:   Fri Dec 13 10:20:28 2013 +0100

    net: phy: resume/suspend PHYs on attach/detach
    
    This ensures PHYs are resumed on attach and suspended on detach.
    
    Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
    Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

and line:

@@ -624,6 +624,8 @@ static int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
        if (err)
                phy_detach(phydev);
 
+       phy_resume(phydev);
+
        return err;

N.B, Line has subsequently been modified by:

commit b394745df2d9d4c30bf1bcc55773bec6f3bc7c67
Author: Guenter Roeck <linux@roeck-us.net>
Date:   Wed May 14 13:12:49 2014 -0700

    net: phy: Don't call phy_resume if phy_init_hw failed
Comment 2 Mark Einon 2014-07-20 20:05:13 UTC

*** This bug has been marked as a duplicate of bug 77121 ***
Comment 3 Mark Einon 2014-07-21 20:51:27 UTC
The call stack is almost identical to Bug 77121, but as the architecture is different (x86_64 / x86) and the last call differs, reopening this bug until we're sure they are the same.
Comment 4 Mark Einon 2014-08-02 15:14:31 UTC
The crash has been fixed by commit b394745df2d9, but the cause (presumably phy_init_hw() failing) is still an issue.

------------
commit b394745df2d9d4c30bf1bcc55773bec6f3bc7c67
Author: Guenter Roeck <linux@roeck-us.net>
Date:   Wed May 14 13:12:49 2014 -0700

    net: phy: Don't call phy_resume if phy_init_hw failed
    
    After the call to phy_init_hw failed in phy_attach_direct, phy_detach is called
    to detach the phy device from its network device. If the attached driver is a
    generic phy driver, this also detaches the driver. Subsequently phy_resume
    is called, which assumes without checking that a driver is attached to the
    device. This will result in a crash such as
    
    Unable to handle kernel paging request for data at address 0xffffffffffffff90
    Faulting instruction address: 0xc0000000003a0e18
    Oops: Kernel access of bad area, sig: 11 [#1]
    ...
    NIP [c0000000003a0e18] .phy_attach_direct+0x68/0x17c
    LR [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c
    Call Trace:
    [c0000003fc0475d0] [c0000000003a0e6c] .phy_attach_direct+0xbc/0x17c (unreliable)
    [c0000003fc047670] [c0000000003a0ff8] .phy_connect_direct+0x28/0x98
    [c0000003fc047700] [c0000000003f0074] .of_phy_connect+0x4c/0xa4
    
    Only call phy_resume if phy_init_hw was successful.
Comment 5 Mark Einon 2014-08-03 21:16:51 UTC
phy_init_hw() fails because a call to phy_init_hw() results in a call to et131x_mii_write() When adapter->phydev is accessed in this function, phydev is null.

This adapter->phydev is only set at the end of et131x_mii_probe(), whereas the call to phy_connect() is at the beginning. This is similar to other drivers using a phy device (stmmac for example), so I'm initially suspecting the use of the adapter->phydev pointer in et131x_mii_write() may not be usual.
Comment 6 Mark Einon 2014-08-06 18:15:23 UTC
Fix patch posted at http://marc.info/?l=linux-netdev&m=140727945201126&w=2

Note You need to log in before you can comment on or make changes to this bug.