Bug 11454
Summary: | atl1e - BUG: scheduling while atomic: modprobe/678/0x00000002 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Michael Long (harn-solo) |
Component: | Network | Assignee: | Jeff Garzik (jgarzik) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | cijoml |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.27-rc5 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | dmesg log |
Description
Michael Long
2008-08-29 14:08:07 UTC
Created attachment 17534 [details]
dmesg log
Reply-To: akpm@linux-foundation.org On Fri, 29 Aug 2008 14:08:09 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11454 > > Summary: atl1e - BUG: scheduling while atomic: > modprobe/678/0x00000002 > Product: Drivers > Version: 2.5 > KernelVersion: 2.6.27-rc5 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Network > AssignedTo: jgarzik@pobox.com > ReportedBy: harn-solo@gmx.de > > > Latest working kernel version: - > Earliest failing kernel version: 2.6.27-rc1 > Software: x86_64 > > ... > > Problem Description: Loading the module atl1e results in the following > BUG-trace: > > [ 7.657410] ATL1E 0000:02:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 > [ 7.657421] ATL1E 0000:02:00.0: setting latency timer to 64 > [ 7.660012] BUG: scheduling while atomic: modprobe/678/0x00000002 > [ 7.660014] Modules linked in: atl1e(+) sg thermal i2c_core evdev rtc > processor button > [ 7.660021] Pid: 678, comm: modprobe Not tainted 2.6.27-rc5 #1 > [ 7.660023] > [ 7.660024] Call Trace: > [ 7.660030] [<ffffffff8022f7aa>] __schedule_bug+0x62/0x67 > [ 7.660034] [<ffffffff80402606>] schedule+0xdb/0x84c > [ 7.660037] [<ffffffff80402db4>] ? thread_return+0x3d/0xcb > [ 7.660041] [<ffffffff8023bfb8>] ? __mod_timer+0xbe/0xd0 > [ 7.660044] [<ffffffff804030df>] schedule_timeout+0x8d/0xb4 > [ 7.660047] [<ffffffff8023b94c>] ? process_timeout+0x0/0xb > [ 7.660049] [<ffffffff804030da>] ? schedule_timeout+0x88/0xb4 > [ 7.660052] [<ffffffff8040311f>] > schedule_timeout_uninterruptible+0x19/0x1b > [ 7.660055] [<ffffffff8023bfde>] msleep+0x14/0x1e > [ 7.660061] [<ffffffffa002df2a>] atl1e_phy_init+0x53/0x30f [atl1e] > [ 7.660066] [<ffffffffa002f26c>] atl1e_probe+0x534/0x690 [atl1e] > [ 7.660070] [<ffffffff80302529>] ? kobject_get+0x1a/0x22 > [ 7.660073] [<ffffffff803110f2>] pci_device_probe+0x4f/0x75 > [ 7.660077] [<ffffffff8036973a>] driver_probe_device+0xc0/0x16e > [ 7.660080] [<ffffffff80369837>] __driver_attach+0x4f/0x79 > [ 7.660082] [<ffffffff803697e8>] ? __driver_attach+0x0/0x79 > [ 7.660085] [<ffffffff80368fcf>] bus_for_each_dev+0x4f/0x89 > [ 7.660087] [<ffffffff80302529>] ? kobject_get+0x1a/0x22 > [ 7.660090] [<ffffffff80369585>] driver_attach+0x1c/0x1e > [ 7.660092] [<ffffffff803688ef>] bus_add_driver+0xb7/0x201 > [ 7.660095] [<ffffffff80369a28>] driver_register+0xa8/0x128 > [ 7.660097] [<ffffffff8031136f>] __pci_register_driver+0x66/0x9f > [ 7.660102] [<ffffffffa0015000>] ? atl1e_init_module+0x0/0x20 [atl1e] > [ 7.660107] [<ffffffffa001501e>] atl1e_init_module+0x1e/0x20 [atl1e] > [ 7.660110] [<ffffffff80209047>] _stext+0x47/0x13f > [ 7.660113] [<ffffffff802562ca>] sys_init_module+0xa9/0x1b6 > [ 7.660116] [<ffffffff8020b5bb>] system_call_fastpath+0x16/0x1b > [ 7.660118] > [ 7.663008] BUG: scheduling while atomic: modprobe/678/0x00000002 Guys, this is fairly pathetic. spin_lock(&adapter->mdio_lock); atl1e_phy_init(&adapter->hw); spin_unlock(&adapter->mdio_lock); int atl1e_phy_init(struct atl1e_hw *hw) { ... msleep(...); ... msleep(...); ... msleep(...); ... msleep(...); } we spent all that time developing all that runtime-debugging infrastructure and many people are just ignoring it. Please, read Documentation/SubmitChecklist very carefully and *do what it says*. Please also fix the above bug. btw, these casts: drivers/net/atl1e/atl1e_hw.c: struct atl1e_adapter *adapter = (struct atl1e_adapter *)hw->adapter; drivers/net/atl1e/atl1e_hw.c: struct atl1e_adapter *adapter = (struct atl1e_adapter *)hw->adapter; drivers/net/atl1e/atl1e_hw.c: struct atl1e_adapter *adapter = (struct atl1e_adapter *)hw->adapter; are unneeded and undesirable. hw->adapter already has type atl1e_adapter*. Reply-To: jie.yang@Atheros.com from Jie Yang <jie.yang@atheros.com> On Saturday, August 30, 2008 5:27 AM Andrew Morton <akpm@linux-foundation.org]> > On Fri, 29 Aug 2008 14:08:09 -0700 (PDT) > bugme-daemon@bugzilla.kernel.org wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=11454 > > > > Summary: atl1e - BUG: scheduling while atomic: > > modprobe/678/0x00000002 > > Product: Drivers > > Version: 2.5 > > KernelVersion: 2.6.27-rc5 > > Platform: All > > OS/Version: Linux > > Tree: Mainline > > Status: NEW > > Severity: high > > Priority: P1 > > Component: Network > > AssignedTo: jgarzik@pobox.com > > ReportedBy: harn-solo@gmx.de > > > > > > Latest working kernel version: - > > Earliest failing kernel version: 2.6.27-rc1 > > Software: x86_64 > > > > drivers/net/atl1e/atl1e_hw.c: struct atl1e_adapter *adapter > = (struct atl1e_adapter *)hw->adapter; > drivers/net/atl1e/atl1e_hw.c: struct atl1e_adapter *adapter > = (struct atl1e_adapter *)hw->adapter; > drivers/net/atl1e/atl1e_hw.c: struct atl1e_adapter *adapter > = (struct atl1e_adapter *)hw->adapter; > > are unneeded and undesirable. hw->adapter already has type > atl1e_adapter*. > just as Matthew Wilcox <matthew@wil.cx> mentioned: > Lockdep warns about the mdio_lock taken with interrupts enabled then > later taken from interrupt context. Initially, I considered changing > these to spin_lock_irq/spin_unlock_irq, but then I looked at > atl1e_phy_init() and saw that it calls msleep(). Sleeping while > holding a spinlock is not allowed either. > > In the probe path, we haven't registered the interrupt handler, so it > can't poke at this card yet. It's before we call register_netdev(), > so I don't think any other threads can reach this card either. If I'm > right, we don't need a spinlock at all. So, just do not take mdio_lock lock in atl1e_probe, and remove the unneeded (struct atl1e_adapter *) Signed-off-by: Jie Yang <jie.yang@atheros.com> --- BTW: I do not know if this format is suitable for repling [Bugme-new], if it is not suitable, just let me know. diff --git a/drivers/net/atl1e/atl1e_hw.c b/drivers/net/atl1e/atl1e_hw.c index 949e753..8cbc1b5 100644 --- a/drivers/net/atl1e/atl1e_hw.c +++ b/drivers/net/atl1e/atl1e_hw.c @@ -397,7 +397,7 @@ static int atl1e_phy_setup_autoneg_adv(struct atl1e_hw *hw) */ int atl1e_phy_commit(struct atl1e_hw *hw) { - struct atl1e_adapter *adapter = (struct atl1e_adapter *)hw->adapter; + struct atl1e_adapter *adapter = hw->adapter; struct pci_dev *pdev = adapter->pdev; int ret_val; u16 phy_data; @@ -431,7 +431,7 @@ int atl1e_phy_commit(struct atl1e_hw *hw) int atl1e_phy_init(struct atl1e_hw *hw) { - struct atl1e_adapter *adapter = (struct atl1e_adapter *)hw->adapter; + struct atl1e_adapter *adapter = hw->adapter; struct pci_dev *pdev = adapter->pdev; s32 ret_val; u16 phy_val; @@ -525,7 +525,7 @@ int atl1e_phy_init(struct atl1e_hw *hw) */ int atl1e_reset_hw(struct atl1e_hw *hw) { - struct atl1e_adapter *adapter = (struct atl1e_adapter *)hw->adapter; + struct atl1e_adapter *adapter = hw->adapter; struct pci_dev *pdev = adapter->pdev; u32 idle_status_data = 0; diff --git a/drivers/net/atl1e/atl1e_main.c b/drivers/net/atl1e/atl1e_main.c index 7685b99..9b60352 100644 --- a/drivers/net/atl1e/atl1e_main.c +++ b/drivers/net/atl1e/atl1e_main.c @@ -2390,9 +2390,7 @@ static int __devinit atl1e_probe(struct pci_dev *pdev, } /* Init GPHY as early as possible due to power saving issue */ - spin_lock(&adapter->mdio_lock); atl1e_phy_init(&adapter->hw); - spin_unlock(&adapter->mdio_lock); /* reset the controller to * put the device in a known good starting state */ err = atl1e_reset_hw(&adapter->hw); On Sun, Aug 31, 2008 at 12:51:58PM +0800, jie.yang@atheros.com wrote: > from Jie Yang <jie.yang@atheros.com> > > On Saturday, August 30, 2008 5:27 AM > Andrew Morton <akpm@linux-foundation.org]> > > On Fri, 29 Aug 2008 14:08:09 -0700 (PDT) > > bugme-daemon@bugzilla.kernel.org wrote: > > > > > http://bugzilla.kernel.org/show_bug.cgi?id=11454 > > > > > > Summary: atl1e - BUG: scheduling while atomic: > > > modprobe/678/0x00000002 > > > Product: Drivers > > > Version: 2.5 > > > KernelVersion: 2.6.27-rc5 > > > Platform: All > > > OS/Version: Linux > > > Tree: Mainline > > > Status: NEW > > > Severity: high > > > Priority: P1 > > > Component: Network > > > AssignedTo: jgarzik@pobox.com How about taking my original patch, sent August 12th instead? That has a good changelog and proper attribution. The removal of the unnecessary casts can be a separate message. If you weren't cc'd on the patch (Jeff was), you can pick it up from netdev. Reply-To: Jie.Yang@Atheros.com On Sunday, August 31, 2008 1:09 PM Matthew Wilcox <matthew@wil.cx> wrote: > How about taking my original patch, sent August 12th instead? > That has a good changelog and proper attribution. The > removal of the unnecessary casts can be a separate message. > > If you weren't cc'd on the patch (Jeff was), you can pick it > up from netdev. > Yes, I do have the patch, should I resend with Signed-off-by: Matthew Wilcox <willy@linux.intel.com>. for that patch may have conflicts now. the origin patch: diff --git a/drivers/net/atl1e/atl1e_main.c b/drivers/net/atl1e/atl1e_main.c index 82d7be1..ba22a51 100644 --- a/drivers/net/atl1e/atl1e_main.c +++ b/drivers/net/atl1e/atl1e_main.c @@ -2389,9 +2389,7 @@ static int __devinit atl1e_probe(struct pci_dev *pdev, } /* Init GPHY as early as possible due to power saving issue */ - spin_lock(&adapter->mdio_lock); atl1e_phy_init(&adapter->hw); - spin_unlock(&adapter->mdio_lock); /* reset the controller to * put the device in a known good starting state */ err = atl1e_reset_hw(&adapter->hw); the new one: diff --git a/drivers/net/atl1e/atl1e_main.c b/drivers/net/atl1e/atl1e_main.c index 7685b99..9b60352 100644 --- a/drivers/net/atl1e/atl1e_main.c +++ b/drivers/net/atl1e/atl1e_main.c @@ -2390,9 +2390,7 @@ static int __devinit atl1e_probe(struct pci_dev *pdev, } /* Init GPHY as early as possible due to power saving issue */ - spin_lock(&adapter->mdio_lock); atl1e_phy_init(&adapter->hw); - spin_unlock(&adapter->mdio_lock); /* reset the controller to * put the device in a known good starting state */ err = atl1e_reset_hw(&adapter->hw); the line num changed from 2389 to 2390. :( Best wishes jie *** Bug 11510 has been marked as a duplicate of this bug. *** Since 2.6.27-rc6 (rc8 also tested), the LAN-adapter works great for me. Although I'm still getting two of those "BUG: scheduling while atomic" traces in dmesg. Thanks. As for 2.6.27.2 it is fixed now like described in the patch changelog. Fixed for me, thanks |