Bug 13130

Summary: "Scheduling while atomic" on SysRq-G (inteldrmfb)
Product: Drivers Reporter: Darren Salt (bugspam)
Component: Console/FramebuffersAssignee: Jesse Barnes (jbarnes)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: akpm, jbarnes
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.29.1, 2.6.30-rc Subsystem:
Regression: No Bisected commit-id:
Attachments: remove non-atomic calls from i915 sysrq call

Description Darren Salt 2009-04-17 22:05:42 UTC
SysRq : Restore framebuffer console
BUG: scheduling while atomic: swapper/0/0x10010000
Modules linked in: af_packet ip6t_LOG xt_tcpudp ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_mangle ip6table_filter ip6_tables x_tables microcode snd_virmidi snd_seq_virmidi snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device vfat fat mmc_block pcmcia joydev sdhci_pci sdhci yenta_socket rsrc_nonstatic mmc_core iwl3945 iwlcore

Pid: 0, comm: swapper Not tainted (2.6.30-rc2-eq-a110 #2) EQUIUM A110
EIP: 0060:[<c0235fdf>] EFLAGS: 00000282 CPU: 0
EIP is at acpi_idle_enter_bm+0x1df/0x208
EAX: c0531f94 EBX: 00001f27 ECX: 0000000a EDX: 0000000a
ESI: 00000000 EDI: f684ed88 EBP: c0531fb0 ESP: c0531f94
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
CR0: 8005003b CR2: b808cb10 CR3: 368ac000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Call Trace:
 [<c02d2443>] cpuidle_idle_call+0x57/0x8c
 [<c0101b23>] cpu_idle+0x26/0x3f
 [<c03a18e9>] rest_init+0x4d/0x4f
 [<c05328f1>] start_kernel+0x20d/0x212
 [<c05322d9>] i386_start_kernel+0x33/0x3b


I've not checked older kernels.
Comment 1 Andrew Morton 2009-04-20 21:44:58 UTC
I'm confused.  sysrq.c has:

        /* g: May be registered by ppc for kgdb */
        NULL,                           /* g */

and you're not using powerpc.
Comment 2 Darren Salt 2009-04-21 00:03:21 UTC
drivers/gpu/drm/i915/intel_fb.c has
	register_sysrq_key('g', &sysrq_intelfb_restore_op);

Anyway, more info:

SysRq : Restore framebuffer console
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2280 lockdep_trace_alloc+0x80/0xa5()
Hardware name: EQUIUM A110
Modules linked in: af_packet ip6t_LOG xt_tcpudp ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack ip6table_mangle ip6table_filter ip6_tables x_tables microcode snd_virmidi snd_seq_virmidi snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device vfat fat mmc_block pcmcia joydev sdhci_pci sdhci yenta_socket rsrc_nonstatic mmc_core iwl3945 iwlcore
Pid: 0, comm: swapper Not tainted 2.6.30-rc2-eq-a110 #1
Call Trace:
 [<c011dd85>] warn_slowpath+0x74/0x8a
 [<c033007f>] ? netpoll_send_udp+0x1f6/0x1fe
 [<c01385f8>] ? trace_hardirqs_off+0xb/0xd
 [<c03d0036>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c02b68f5>] ? write_msg+0x91/0x9c
 [<c01385f8>] ? trace_hardirqs_off+0xb/0xd
 [<c03d0036>] ? _spin_unlock_irqrestore+0x2f/0x3c
 [<c011e37e>] ? release_console_sem+0x19d/0x1a5
 [<c013bd99>] lockdep_trace_alloc+0x80/0xa5
 [<c0165a68>] __kmalloc+0x59/0x112
 [<c0276f32>] ? drm_crtc_helper_set_config+0x9d/0x543
 [<c0276f32>] drm_crtc_helper_set_config+0x9d/0x543
 [<c0289261>] intelfb_restore+0xd/0xf
 [<c028926b>] intelfb_sysrq+0x8/0xa
 [<c026417b>] __handle_sysrq+0x85/0xfc
 [<c026437f>] handle_sysrq+0x1f/0x24
 [<c025e917>] kbd_event+0x2ee/0x553
 [<c02d7791>] input_pass_event+0x71/0xa2
 [<c02d7ae9>] input_handle_event+0x327/0x330
 [<c02d8d20>] input_event+0x4b/0x5e
 [<c02dcc9b>] atkbd_interrupt+0x3ff/0x4cc
 [<c02d578c>] serio_interrupt+0x33/0x69
 [<c02d6457>] i8042_interrupt+0x1d8/0x1ec
 [<c014724f>] handle_IRQ_event+0x57/0xd4
 [<c01487a8>] handle_edge_irq+0xb0/0x109
 [<c0104306>] handle_irq+0x1a/0x24
 [<c01042af>] do_IRQ+0x33/0x70
 [<c0102f2e>] common_interrupt+0x2e/0x34
 [<c0136169>] ? tick_notify+0x17f/0x27a
 [<c0130f1e>] notifier_call_chain+0x2b/0x55
 [<c0130f6a>] __raw_notifier_call_chain+0xe/0x10
 [<c0130f78>] raw_notifier_call_chain+0xc/0xe
 [<c0135baa>] clockevents_do_notify+0x11/0x13
 [<c0135bc8>] clockevents_notify+0x1c/0x5c
 [<c0249e3c>] acpi_state_timer_broadcast+0x31/0x34
 [<c024a8b9>] acpi_idle_enter_bm+0x233/0x244
 [<c02eb9ec>] cpuidle_idle_call+0x57/0x8f
 [<c0101b28>] cpu_idle+0x2b/0x44
 [<c03c6a4d>] rest_init+0x4d/0x4f
 [<c056c914>] start_kernel+0x230/0x235
 [<c056c2d9>] i386_start_kernel+0x33/0x3b
---[ end trace 71a627eeaa896b3e ]---
BUG: scheduling while atomic: swapper/0/0x10010000
INFO: lockdep is turned off.
Modules linked in: af_packet ip6t_LOG ip6t_REJECT nf_conntrack_ipv6 xt_state nf_conntrack snd_seqhardirqs last  enabled at (491815): [<c0102f27>] common_interrupt+0x27/0x34

Pid: 0, comm: swapper Tainted: G        W  (2.6.30-rc2-eq-a110 #1) EQUIUM A110
CR0: 8005003b CR2: b7fbdc50 CR3: 3610f000 CR4: 000006d0
 [<c0130f1e>] notifier_call_chain+0x2b/0x55
 [<c0135baa>] clockevents_do_notify+0x11/0x13
 [<c024a8b9>] acpi_idle_enter_bm+0x233/0x244
 [<c056c914>] start_kernel+0x230/0x235
Comment 3 Andrew Morton 2009-04-21 00:11:24 UTC
(In reply to comment #2)
> drivers/gpu/drm/i915/intel_fb.c has
>     register_sysrq_key('g', &sysrq_intelfb_restore_op);

Well damn, that was sneaky.

That code is doing things which can't remotely be done from within hardirq
context and I suspect it just hasn't been used/tested at all.  Probably the
best fix is to delete it.

I reassigned the bug to dri-devel@lists.sourceforge.net, thanks.
Comment 4 Jesse Barnes 2009-04-21 00:21:59 UTC
No I tested it!  I'll admit to not checking the restrictions for sysrq, but we definitely want to make this work...  I guess I'll have to make it schedule some work or something instead?
Comment 5 Andrew Morton 2009-04-21 02:05:37 UTC
well... _why_ do we want it to work?  It's only a debug thing and shouldn't
be needed at all once the driver is finished?

And if there _is_ a legitimate long-term need for it then it surely should not be implemented way down
inside one specific driver?

I'm surprised that anyone even found out that sysrq-g exists - I didn't know,
and the chances are good that we'll later steal `g' from you and we won't find out
for weeks :(

Yes, a schedule_work() thing will probably be OK, although there's a new class
of deadlock around that, so some care will be needed.

This'll be our fourth sysrq-uses-schedule_work instance (at least).  I guess we should
add a sysrq_key_op.call_in_process_context and centralise the implementation..
Comment 6 Jesse Barnes 2009-04-21 15:56:24 UTC
(In reply to comment #5)
> well... _why_ do we want it to work?  It's only a debug thing and shouldn't
> be needed at all once the driver is finished?

Oh not at all; it's also about debugging bad userspace programs that mess with the display but don't crash gracefully (which would automatically restore the display).  Maybe they hang or just display a mess and you want your console back.

> And if there _is_ a legitimate long-term need for it then it surely should
> not
> be implemented way down
> inside one specific driver?

It could probably be made drm generic, but so far i915 is the only in-tree driver using the DRM mode setting code.  We could factor it out when the radeon stuff lands I guess.

> I'm surprised that anyone even found out that sysrq-g exists - I didn't know,
> and the chances are good that we'll later steal `g' from you and we won't
> find
> out
> for weeks :(

Is there a registry or something we should use instead?  I saw a few instances of sysrqs being registered in subsystems or arches, but I think I picked one that's free (and somewhat sensible)...

> Yes, a schedule_work() thing will probably be OK, although there's a new
> class
> of deadlock around that, so some care will be needed.
> 
> This'll be our fourth sysrq-uses-schedule_work instance (at least).  I guess
> we
> should
> add a sysrq_key_op.call_in_process_context and centralise the
> implementation..

Sounds good.  I'll add it to my TODO list.  Thanks.
Comment 7 Andrew Morton 2009-04-21 17:05:34 UTC
(In reply to comment #6)
> Is there a registry or something we should use instead?

powerpc put a comment into the main table in sysrq.c.  That works nicely.
Comment 8 Darren Salt 2009-04-23 23:26:04 UTC
Looks like this is UP-only.
Comment 9 Jesse Barnes 2009-05-15 21:20:53 UTC
Created attachment 21367 [details]
remove non-atomic calls from i915 sysrq call

Here's what I sent to intel-gfx recently to fix this.  There's a separate patch from Jason Wessel to change the sysrq key itself.
Comment 10 Darren Salt 2009-05-16 18:10:52 UTC
With the schedule_work() fix applied, all is fine here.
Comment 11 Jesse Barnes 2009-05-16 18:26:15 UTC
Great, thanks for testing.  The fix should make its way upstream soon.