Bug 6716

Summary: [8250] Deadlock on SysRQ
Product: Drivers Reporter: Enrico Scholz (enrico.scholz+bugzilla.kernel)
Component: SerialAssignee: Russell King (rmk)
Status: CLOSED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.17 Subsystem:
Regression: --- Bisected commit-id:

Description Enrico Scholz 2006-06-19 07:57:17 UTC
Most recent kernel where this bug did not occur: 2.6.17
Hardware Environment: ARM PXA270, TL16C754 UART
Problem Description:

When having the console on the 8250 UART, a SysRQ over the serial line
causes the following deadlock:

--> create break (e.g. ^A f s)

| serial8250_handle_port()
|   spin_lock(&up->port.lock);
|   receive_chars(...);
|     uart_handle_sysrq_char(...);
|       ...
|       serial8250_console_write(...)
|       spin_lock_irqsave(&up->port.lock, flags);  <<<<<<<


Applying

| --- a/drivers/serial/8250.c
| +++ b/drivers/serial/8250.c
| @@ -2241,7 +2241,7 @@ serial8250_console_write(struct console 
|  
|         touch_nmi_watchdog();
|  
| -       if (oops_in_progress) {
| +       if (oops_in_progress || up->port.sysrq) {
|                 locked = spin_trylock_irqsave(&up->port.lock, flags);
|         } else
|                 spin_lock_irqsave(&up->port.lock, flags);

performs the wanted action but creates a

| / # SysRq : <0>BUG: spinlock trylock failure on UP on CPU#0, swapper/0
|  lock: c029df84, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0
| [<c00250a0>] (dump_stack+0x0/0x14) from [<c00f396c>] (spin_bug+0x90/0xd0)
| [<c00f38dc>] (spin_bug+0x0/0xd0) from [<c00f39f4>] (_raw_spin_trylock+0x48/0x54)
|  r6 = C02850D2  r5 = 00000008  r4 = 00000000 
| [<c00f39ac>] (_raw_spin_trylock+0x0/0x54) from [<c01e716c>] (_spin_trylock+0x10/0x1c)
|  r4 = C029DF84 
| [<c01e715c>] (_spin_trylock+0x0/0x1c) from [<c0122518>] (serial8250_console_write+0x58/0x174)
| [<c01224c0>] (serial8250_console_write+0x0/0x174) from [<c0035cbc>] (__call_console_drivers+0x60/0x7c)
| [<c0035c5c>] (__call_console_drivers+0x0/0x7c) from [<c0035d54>] (_call_console_drivers+0x7c/0x8c)
|  r6 = 0000174E  r5 = C022CE68  r4 = 0000174E 
| [<c0035cd8>] (_call_console_drivers+0x0/0x8c) from [<c0035fd8>] (release_console_sem+0x180/0x248)
|  r5 = 00001746  r4 = 0000174E 
| [<c0035e58>] (release_console_sem+0x0/0x248) from [<c003685c>] (vprintk+0x2bc/0x330)
| [<c00365a0>] (vprintk+0x0/0x330) from [<c00368f8>] (printk+0x28/0x30)
| [<c00368d0>] (printk+0x0/0x30) from [<c011d79c>] (__handle_sysrq+0x40/0x14c)
|  r3 = 00000007  r2 = C022CE50  r1 = C0227F5C  r0 = C01FF6C0
| [<c011d75c>] (__handle_sysrq+0x0/0x14c) from [<c011d8cc>] (handle_sysrq+0x24/0x2c)
| [<c011d8a8>] (handle_sysrq+0x0/0x2c) from [<c0123ce4>] (receive_chars+0x17c/0x2ec)
| [<c0123b68>] (receive_chars+0x0/0x2ec) from [<c0124b5c>] (serial8250_interrupt+0x7c/0x100)
| [<c0124ae0>] (serial8250_interrupt+0x0/0x100) from [<c0020e74>] (__do_irq+0x54/0x94)
| [<c0020e20>] (__do_irq+0x0/0x94) from [<c0020f68>] (do_edge_IRQ+0xb4/0x130)

stacktrace.


A better patch (which avoids this spin_bug() message but contains
races) would be

--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -411,7 +411,12 @@ uart_handle_sysrq_char(struct uart_port 
 #ifdef SUPPORT_SYSRQ
        if (port->sysrq) {
                if (ch && time_before(jiffies, port->sysrq)) {
+                       int             is_locked = spin_is_locked(&port->lock);
+                       if (is_locked)
+                               spin_unlock(&port->lock);
                        handle_sysrq(ch, regs, NULL);
+                       if (is_locked)
+                               spin_lock(&port->lock);
                        port->sysrq = 0;
                        return 1;
                }

------
Comment 1 Andrew Morton 2006-06-20 00:36:09 UTC
How about this?

--- a/drivers/serial/8250.c~serial-8250-sysrq-deadlock-fix
+++ a/drivers/serial/8250.c
@@ -2254,7 +2254,10 @@ serial8250_console_write(struct console 
 
        touch_nmi_watchdog();
 
-       if (oops_in_progress) {
+       if (port->sysrq) {
+               /* serial8250_handle_port() already took the lock */
+               locked = 0;
+       } else if (oops_in_progress) {
                locked = spin_trylock_irqsave(&up->port.lock, flags);
        } else
                spin_lock_irqsave(&up->port.lock, flags);
Comment 2 Russell King 2006-06-25 09:35:48 UTC
That should solve it - Enrico, care to test Andrew's patch?
Comment 3 Enrico Scholz 2006-06-25 10:15:35 UTC
Yes; patch works. But whole locking looks somehow unclean; e.g.

| receive_chars(struct uart_8250_port *up, int *status, struct pt_regs *regs)
| ...
|    DEBUG_INTR("handling break....");

or manual debug printk's would trigger the deadlock too. But these are debugging
situations only which can be probably ignored.
Comment 4 Russell King 2006-07-16 09:33:28 UTC
A similar fix is in -rc2.  Can this bug be closed now?
Comment 5 Russell King 2006-08-01 07:22:36 UTC
Ping?
Comment 6 Adrian Bunk 2006-08-31 12:28:02 UTC
Please reopen this bug if it's still present in 2.6.18-rc5.