Bug 8552

Summary: 2.6.21 8250_pnp module causes oops' on load with console on serial port
Product: Drivers Reporter: Joshua Hoblitt (j_kernel)
Component: SerialAssignee: Alan (alan)
Status: RESOLVED OBSOLETE    
Severity: normal CC: alan, bjorn.helgaas, kernel, suzuki
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.21 Subsystem:
Regression: No Bisected commit-id:

Description Joshua Hoblitt 2007-05-29 17:56:50 UTC
Most recent kernel where this bug did *NOT* occur: Unknown, this bug has existed
since at least 2.6.19
Distribution: Gentoo
Hardware Environment: Opterons (multiple SKUs), Tyan motherboards (multiple SKUs)
Problem Description:

When the system console is directed to a serial port and the 8250_pnp driver is
built (either static or as a module), upon init the 8250_pnp driver tries to
unregister the serial port while it is in use causing an oops.  My workaround
has been to simply not build the 8250_pnp driver (and the serial console still
works just fine) however this problem is fundementaly caused by the serial layer
not protecting ports while they are in use. Also worth noting is that that the
8250_pnp cause the oops when compiled in static, when loaded by hotplug, but NOT
if loaded by hand after the system is fully booted.

There is more information on this issue in the Gentoo bug tracker:

http://bugs.gentoo.org/show_bug.cgi?id=178821
Steps to reproduce:
Comment 1 Daniel Drake 2007-05-29 19:56:49 UTC
Joshua: we usually post all info on the upstream bug to make it easier for
developers to digest it. sorry for not pointing this out.

Here is the info:

If a serial console is in use, a kernel oops occurs immediately after the
8250_pnp module is loaded (precisely at the point when the next message is sent
through the serial console):

Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP: 
 [<ffffffff80293e82>] uart_write_room+0xb/0x19
PGD 223501067 PUD 22287a067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: 8250_pnp pcspkr forcedeth arcmsr sg tg3 e1000 nfs nfs_acl
lockd sunrpc raid10 raid1 raid0 dm_mirror dm_mod sata_nv libata sbp2 ohci1394
ieee1394 sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd
usbcore
Pid: 3346, comm: modprobe.sh Not tainted 2.6.21-gentoo #6
RIP: 0010:[<ffffffff80293e82>]  [<ffffffff80293e82>] uart_write_room+0xb/0x19
RSP: 0018:ffff810222f3de00  EFLAGS: 00010202
RAX: ffff810123eb7400 RBX: ffff810123cb7000 RCX: ffff810123cb7228
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff810123cb7000
RBP: 0000000000000020 R08: 0000000000000001 R09: ffff810222f3de58
R10: 0000000000000022 R11: 0000000000000246 R12: ffff8101281b8c00
R13: 0000000000000020 R14: 0000000000000020 R15: 0000000000000000
FS:  00002b599b677e10(0000) GS:ffff810123e2cd40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000014 CR3: 000000022327d000 CR4: 00000000000006e0
Process modprobe.sh (pid: 3346, threadinfo ffff810222f3c000, task
ffff810223bb40c0)
Stack:  ffffffff801185cf ffff8101281b8c00 ffff81022389de80 0000000000000000
 ffff810223bb40c0 ffffffff8017a7ec 0000000000000000 0000000000000000
 fffffffffffffffb 0000000000000000 ffff810223bb40c0 ffffffff8017a7ec
Call Trace:
 [<ffffffff801185cf>] write_chan+0x14f/0x32d
 [<ffffffff8017a7ec>] default_wake_function+0x0/0xe
 [<ffffffff8017a7ec>] default_wake_function+0x0/0xe
 [<ffffffff801268d7>] tty_write+0x188/0x22c
 [<ffffffff80118480>] write_chan+0x0/0x32d
 [<ffffffff80115531>] vfs_write+0xaf/0x131
 [<ffffffff80115e9a>] sys_write+0x45/0x6e
 [<ffffffff8015911e>] system_call+0x7e/0x83

Suzuki Kp identified this situation a while ago:

http://lkml.org/lkml/2006/10/3/393

Codepath:

During boot:
 uart_open()
 -> uart_get() // state->info allocated

During 8250_pnp serial_pnp_probe():
 serial8250_register_port()
 -> uart_remove_one_port() (state->info set to NULL, memory leaked)
 -> uart_add_one_port() (state->info not modified)

During serial console operation:
 tty_write()
 -> write_chan()
  -> uart_write_room() (dereferences state->info)

I'm not sure what the best approach is to fix this -- one driver replacing a
resource hosted by another isn't something you see very often.

Should 8250_pnp fail to register the new port when the old one is open? This
would make probe() fail which might also be not desirable...
Comment 2 Russell King 2007-05-30 01:39:30 UTC
On Tue, May 29, 2007 at 07:53:32PM -0700, bugme-daemon@bugzilla.kernel.org wrote:
> During boot:
>  uart_open()
>  -> uart_get() // state->info allocated
> 
> During 8250_pnp serial_pnp_probe():
>  serial8250_register_port()
>  -> uart_remove_one_port() (state->info set to NULL, memory leaked)

At this point, the port is supposed to be hung up:

        info = state->info;
        if (info && info->tty)
                tty_vhangup(info->tty);

However, because the console is open, the hangup function doesn't.  So,
IOW, there is no way for the serial layer to say to the tty layer "this
port is really going away, remove all users."

It's a tty <-> serial problem which needs sorting on the tty side first.

Unfortunately it does mean that your console will vanish until it's re-
opened but that's what has to happen when the port ownership or hardware
changes.
Comment 3 Bjorn Helgaas 2007-05-30 14:13:11 UTC
> -> uart_remove_one_port() (state->info set to NULL, memory leaked)

The memory isn't leaked; there's a kfree(info) below.

This scenario ("console=ttyS0,9600" where ttyS0 is discovered by
8250_pnp) works on ia64.  One big difference is that
serial8250_isa_init_ports() is a no-op on ia64 but not on x86.

Can you try this with current upstream, e.g., 2.6.22-rc3?
I made some changes post-2.6.21 that make x86 more like ia64
in this regard:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26
Comment 4 Joshua Hoblitt 2007-05-30 18:54:49 UTC
I just tested 2.6.22-rc3 with SERIAL_8250 support compiled in staticly (which
selects SERIAL_8250_PNP=Y) and indeed the oops is gone!  It's worth noting that
setting the SERIAL_8250 support to M, which also causes SERIAL_8250_PNP to be
built as a module now seems to disable serial console support.  So this problems
seems to that this issue has been both fixed in the code and in the way Kconfig
is selecting options.  It doesn't look anything relevant changed in
drivers/serial/Kconfig -- has something fundamental changed in the way Kconfig
works?
Comment 5 Bjorn Helgaas 2007-05-31 08:13:11 UTC
Thanks for testing that.  I think we should close this bug, since the
oops seems to be fixed.

It sounds like you think there is still another problem, but I don't
understand exactly what it is.  Is the problem that serial console
used to work with 8250 built as a module, but now it doesn't?

Can you open a new bug report and give more details of the problem?
Comment 6 Russell King 2007-05-31 12:23:55 UTC
Serial console is _never_ selectable with any modular serial driver.
Comment 7 Daniel Drake 2007-06-01 17:43:57 UTC
Thanks Bjorn. Closing as this is fixed in Linus' tree.
Comment 8 Daniel Drake 2008-05-31 03:08:43 UTC
The above patch got reverted, so this bug is back.
http://marc.info/?t=120885070900001&r=1&w=2
http://bugs.gentoo.org/show_bug.cgi?id=223489
Comment 9 Alan 2009-07-28 19:17:05 UTC
(no longer maintainer)