Bug 27482

Summary: WARNING: at kernel/lockdep.c:2323 trace_hardirqs_on_caller
Product: Power Management Reporter: tim blechmann (tim)
Component: OtherAssignee: power-management_other
Status: CLOSED CODE_FIX    
Severity: low CC: florian, kirjanov, maciej.rutecki, rjw, stern
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.38-rc2 Subsystem:
Regression: Yes Bisected commit-id:
Bug Depends on:    
Bug Blocks: 7216, 27352    
Attachments: dmesg

Description tim blechmann 2011-01-24 10:49:20 UTC
2.6.38-rc2 gives me the following warning:


[   59.524272] ------------[ cut here ]------------
[   59.524281] WARNING: at kernel/lockdep.c:2323 trace_hardirqs_on_caller+0x157/0x1f0()
[   59.524283] Hardware name: System Product Name
[   59.524285] Modules linked in: binfmt_misc kvm_intel kvm parport_pc ppdev snd_hda_codec_realtek snd_hda_intel snd_usb_audio snd_hda_codec snd_pcm_oss snd_hdsp snd_seq_dummy snd_mixer_oss arc4 snd_seq_oss snd_pcm snd_hwdep snd_usbmidi_lib rt61pci snd_seq_midi snd_seq_midi_event rt2x00pci rt2x00lib snd_rawmidi snd_seq snd_timer mac80211 snd_seq_device psmouse snd serio_raw cfg80211 eeprom_93cx6 soundcore snd_page_alloc i7core_edac asus_atk0110 edac_core coretemp lp parport usbhid hid firewire_ohci pata_jmicron firewire_core ahci libahci r8169 crc_itu_t mii btrfs zlib_deflate crc32c libcrc32c
[   59.524333] Pid: 2532, comm: nm-applet Not tainted 2.6.38-rc2+ #33
[   59.524335] Call Trace:
[   59.524336]  <IRQ>  [<ffffffff8106598f>] ? warn_slowpath_common+0x7f/0xc0
[   59.524344]  [<ffffffff815c3360>] ? _raw_spin_unlock_irq+0x30/0x70
[   59.524347]  [<ffffffff810659ea>] ? warn_slowpath_null+0x1a/0x20
[   59.524350]  [<ffffffff8109eda7>] ? trace_hardirqs_on_caller+0x157/0x1f0
[   59.524353]  [<ffffffff8109ee4d>] ? trace_hardirqs_on+0xd/0x10
[   59.524356]  [<ffffffff815c3360>] ? _raw_spin_unlock_irq+0x30/0x70
[   59.524360]  [<ffffffff813b85fe>] ? rpm_suspend+0x21e/0x4a0
[   59.524365]  [<ffffffff812eae84>] ? do_raw_spin_lock+0x54/0x160
[   59.524368]  [<ffffffff813b8a8b>] ? rpm_idle+0x17b/0x1e0
[   59.524371]  [<ffffffff813b8bcb>] ? __pm_runtime_idle+0x5b/0x90
[   59.524376]  [<ffffffff8142fd50>] ? usb_autopm_put_interface_async+0x30/0x40
[   59.524380]  [<ffffffffa00fa98e>] ? hid_ctrl+0x17e/0x1c0 [usbhid]
[   59.524384]  [<ffffffff8142ac54>] ? usb_hcd_giveback_urb+0x64/0xe0
[   59.524388]  [<ffffffff8144acc6>] ? uhci_giveback_urb+0xb6/0x240
[   59.524391]  [<ffffffff8144b346>] ? uhci_scan_schedule+0x486/0xac0
[   59.524395]  [<ffffffff8144cc2b>] ? uhci_irq+0x7b/0x150
[   59.524397]  [<ffffffff8142a4d5>] ? usb_hcd_irq+0x55/0xd0
[   59.524401]  [<ffffffff810d3c24>] ? handle_IRQ_event+0x64/0x1e0
[   59.524405]  [<ffffffff810d649d>] ? handle_fasteoi_irq+0x7d/0x100
[   59.524409]  [<ffffffff8100ea22>] ? handle_irq+0x22/0x40
[   59.524412]  [<ffffffff815cab5d>] ? do_IRQ+0x5d/0xe0
[   59.524415]  [<ffffffff815c3a13>] ? ret_from_intr+0x0/0x1a
[   59.524416]  <EOI>  [<ffffffff8100bffa>] ? sysret_check+0x2e/0x69
[   59.524421] ---[ end trace ed0e2a86e79a3437 ]---

it seems to happen after booting, when logging into gnome.
Comment 1 Rafael J. Wysocki 2011-01-24 23:15:22 UTC
On Monday, January 24, 2011, Alan Stern wrote:
> On Mon, 24 Jan 2011, Rafael J. Wysocki wrote:
> 
> > On Monday, January 24, 2011, Andrew Morton wrote:
> > > On Mon, 24 Jan 2011 10:49:23 GMT
> > > bugzilla-daemon@bugzilla.kernel.org wrote:
> > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=27482
> > > 
> > > post-2.6.37 PM regression.  Rafael's fault :)
> > 
> > Nah, seems like USB rather or a false-positive.  Adding Alan to CC.
> 
> It's a real bug, but it's not in USB -- it's in the no_callbacks
> addition to the runtime PM framework.  I failed to consider a
> particular type of call path.
> 
> Basically the problem is that the rpm_suspend() routine includes a
> sequence like this:
> 
>       spin_unlock_irq(&a);
>       spin_lock_irqsave(&b);
>       spin_unlock_irqsave(&b);
>       spin_lock_irq(&a);
> 
> Before it was okay, although wasteful in the way interrupts were
> enabled and disabled.  Now it's just wrong, since the code can run in
> an interrupt handler.
> 
> Replacing everything with simple spin_unlock() and spin_lock() calls 
> should fix the problem.  Does this patch work?
> 
> Alan Stern
> 
> 
> Index: usb-2.6/drivers/base/power/runtime.c
> ===================================================================
> --- usb-2.6.orig/drivers/base/power/runtime.c
> +++ usb-2.6/drivers/base/power/runtime.c
> @@ -407,12 +407,15 @@ static int rpm_suspend(struct device *de
>               goto out;
>       }
>  
> +     /* Maybe the parent is now able to suspend. */
>       if (parent && !parent->power.ignore_children && !dev->power.irq_safe) {
> -             spin_unlock_irq(&dev->power.lock);
> +             spin_unlock(&dev->power.lock);
>  
> -             pm_request_idle(parent);
> +             spin_lock(&parent->power.lock);
> +             rpm_idle(parent, RPM_ASYNC);
> +             spin_unlock(&parent->power.lock);
>  
> -             spin_lock_irq(&dev->power.lock);
> +             spin_lock(&dev->power.lock);
>       }
>  
>   out:

It seems that the Bugzilla didn't take your message, can you attach the patch
directly to the bug entry, please?
Comment 2 tim blechmann 2011-01-25 11:57:08 UTC
tested and the warning disappears!
Comment 3 Alan Stern 2011-01-25 15:06:24 UTC
On Mon, 24 Jan 2011, Rafael J. Wysocki wrote:

> On Monday, January 24, 2011, Andrew Morton wrote:
> > On Mon, 24 Jan 2011 10:49:23 GMT
> > bugzilla-daemon@bugzilla.kernel.org wrote:
> > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=27482
> > 
> > post-2.6.37 PM regression.  Rafael's fault :)
> 
> Nah, seems like USB rather or a false-positive.  Adding Alan to CC.

It's a real bug, but it's not in USB -- it's in the no_callbacks
addition to the runtime PM framework.  I failed to consider a
particular type of call path.

Basically the problem is that the rpm_suspend() routine includes a
sequence like this:

	spin_unlock_irq(&a);
	spin_lock_irqsave(&b);
	spin_unlock_irqsave(&b);
	spin_lock_irq(&a);

Before it was okay, although wasteful in the way interrupts were
enabled and disabled.  Now it's just wrong, since the code can run in
an interrupt handler.

Replacing everything with simple spin_unlock() and spin_lock() calls 
should fix the problem.  Does this patch work?

Alan Stern


Index: usb-2.6/drivers/base/power/runtime.c
===================================================================
--- usb-2.6.orig/drivers/base/power/runtime.c
+++ usb-2.6/drivers/base/power/runtime.c
@@ -407,12 +407,15 @@ static int rpm_suspend(struct device *de
 		goto out;
 	}
 
+	/* Maybe the parent is now able to suspend. */
 	if (parent && !parent->power.ignore_children && !dev->power.irq_safe) {
-		spin_unlock_irq(&dev->power.lock);
+		spin_unlock(&dev->power.lock);
 
-		pm_request_idle(parent);
+		spin_lock(&parent->power.lock);
+		rpm_idle(parent, RPM_ASYNC);
+		spin_unlock(&parent->power.lock);
 
-		spin_lock_irq(&dev->power.lock);
+		spin_lock(&dev->power.lock);
 	}
 
  out:
Comment 4 Alan Stern 2011-01-25 15:31:29 UTC
Okay, I'll send an official patch to Rafael.
Comment 5 Denis Kirjanov 2011-01-28 15:01:27 UTC
I [   96.942093] ------------[ cut here ]------------
[   96.942117] WARNING: at kernel/lockdep.c:2321
[   96.942129] Modules linked in: snd_aoa_codec_onyx snd_aoa_fabric_layout snd_aoa snd_aoa_i2sbus snd_aoa_soundbus nouveau snd_pcm snd_page_alloc snd_timer snd drm_kms_helper firewire_ohci ttm soundcore firewire_core cfbcopyarea cfbimgblt cfbfillrect crc_itu_t uninorth_agp
[   96.942266] NIP: c0000000000a4368 LR: c0000000000a434c CTR: 00000fffa02f8610
[   96.942284] REGS: c000000177d9fb20 TRAP: 0700   Not tainted  (2.6.38-rc2-00274-g1f0324c)
[   96.942300] MSR: 9000000000021032 <ME,CE,IR,DR>  CR: 24004422  XER: 00000000
[   96.942346] TASK = c00000017ab342e0[2858] 'syslog-ng' THREAD: c000000177d9c000 CPU: 3
[   96.942367] GPR00: 0000000000000000 c000000177d9fda0 c000000000854448 0000000000000001 
[   96.942402] GPR04: 0000000000000005 0000000000122c58 0000000000000000 0000000000000028 
[   96.942438] GPR08: 0000000000000000 c000000001124c0c 00000fffa02f8610 0000000000000001 
[   96.942473] GPR12: 900000000200d032 c00000000ffff780 0000000010118e24 0000000010118e2c 
[   96.942508] GPR16: 0000000042dfa400 0000000000000000 0000000010117648 0000000000000000 
[   96.942543] GPR20: 0000000000000000 0000000000000001 0000000016804f28 0000000000122c58 
[   96.942579] GPR24: 00000000100668e8 00000fffe16e0e04 00000fffa07ff020 0000000016804f28 
[   96.942614] GPR28: 0000000000000005 c000000000007540 c0000000007d57e0 c00000017ab342e0 
[   96.942658] NIP [c0000000000a4368] .trace_hardirqs_on_caller+0x198/0x1a0
[   96.942674] LR [c0000000000a434c] .trace_hardirqs_on_caller+0x17c/0x1a0
[   96.942689] Call Trace:
[   96.942700] [c000000177d9fda0] [0000000100000000] 0x100000000 (unreliable)
[   96.942725] [c000000177d9fe30] [c000000000007540] system_call_common+0xc0/0x110
[   96.942744] Instruction dump:
[   96.942756] 4bfffbc9 2fa30000 409eff24 4bffff38 481d0fc9 60000000 2fa30000 419eff28 
[   96.942805] e93e8120 80090000 2f800000 409eff18 <0fe00000> 4bffff10 7c6802a6 4bfffe5c 
[   96.942855] ---[ end trace 9413391f15683486 ]---
have the similar bug on my ppc64 box.
It happens when running perf top
Comment 6 Denis Kirjanov 2011-01-28 15:04:54 UTC
Created attachment 45372 [details]
dmesg
Comment 7 Florian Mickler 2011-01-29 11:39:31 UTC
Patch merged.
commit c3810c88788d505d4ffd786addd111b745e42161
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Tue Jan 25 20:50:07 2011 +0100

    PM / Runtime: Don't enable interrupts while running in_interrupt
Comment 8 Denis Kirjanov 2011-01-29 20:22:08 UTC
This is already applied
Comment 9 Florian Mickler 2011-03-30 21:40:33 UTC
Different backtrace. 

If you can reproduce the issue consistently, please open a new bug and post a link here. 

Thanks,
Flo