Most recent kernel where this bug did *NOT* occur: Unknown, this bug has existed since at least 2.6.19 Distribution: Gentoo Hardware Environment: Opterons (multiple SKUs), Tyan motherboards (multiple SKUs) Problem Description: When the system console is directed to a serial port and the 8250_pnp driver is built (either static or as a module), upon init the 8250_pnp driver tries to unregister the serial port while it is in use causing an oops. My workaround has been to simply not build the 8250_pnp driver (and the serial console still works just fine) however this problem is fundementaly caused by the serial layer not protecting ports while they are in use. Also worth noting is that that the 8250_pnp cause the oops when compiled in static, when loaded by hotplug, but NOT if loaded by hand after the system is fully booted. There is more information on this issue in the Gentoo bug tracker: http://bugs.gentoo.org/show_bug.cgi?id=178821 Steps to reproduce:
Joshua: we usually post all info on the upstream bug to make it easier for developers to digest it. sorry for not pointing this out. Here is the info: If a serial console is in use, a kernel oops occurs immediately after the 8250_pnp module is loaded (precisely at the point when the next message is sent through the serial console): Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP: [<ffffffff80293e82>] uart_write_room+0xb/0x19 PGD 223501067 PUD 22287a067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: 8250_pnp pcspkr forcedeth arcmsr sg tg3 e1000 nfs nfs_acl lockd sunrpc raid10 raid1 raid0 dm_mirror dm_mod sata_nv libata sbp2 ohci1394 ieee1394 sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore Pid: 3346, comm: modprobe.sh Not tainted 2.6.21-gentoo #6 RIP: 0010:[<ffffffff80293e82>] [<ffffffff80293e82>] uart_write_room+0xb/0x19 RSP: 0018:ffff810222f3de00 EFLAGS: 00010202 RAX: ffff810123eb7400 RBX: ffff810123cb7000 RCX: ffff810123cb7228 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff810123cb7000 RBP: 0000000000000020 R08: 0000000000000001 R09: ffff810222f3de58 R10: 0000000000000022 R11: 0000000000000246 R12: ffff8101281b8c00 R13: 0000000000000020 R14: 0000000000000020 R15: 0000000000000000 FS: 00002b599b677e10(0000) GS:ffff810123e2cd40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000014 CR3: 000000022327d000 CR4: 00000000000006e0 Process modprobe.sh (pid: 3346, threadinfo ffff810222f3c000, task ffff810223bb40c0) Stack: ffffffff801185cf ffff8101281b8c00 ffff81022389de80 0000000000000000 ffff810223bb40c0 ffffffff8017a7ec 0000000000000000 0000000000000000 fffffffffffffffb 0000000000000000 ffff810223bb40c0 ffffffff8017a7ec Call Trace: [<ffffffff801185cf>] write_chan+0x14f/0x32d [<ffffffff8017a7ec>] default_wake_function+0x0/0xe [<ffffffff8017a7ec>] default_wake_function+0x0/0xe [<ffffffff801268d7>] tty_write+0x188/0x22c [<ffffffff80118480>] write_chan+0x0/0x32d [<ffffffff80115531>] vfs_write+0xaf/0x131 [<ffffffff80115e9a>] sys_write+0x45/0x6e [<ffffffff8015911e>] system_call+0x7e/0x83 Suzuki Kp identified this situation a while ago: http://lkml.org/lkml/2006/10/3/393 Codepath: During boot: uart_open() -> uart_get() // state->info allocated During 8250_pnp serial_pnp_probe(): serial8250_register_port() -> uart_remove_one_port() (state->info set to NULL, memory leaked) -> uart_add_one_port() (state->info not modified) During serial console operation: tty_write() -> write_chan() -> uart_write_room() (dereferences state->info) I'm not sure what the best approach is to fix this -- one driver replacing a resource hosted by another isn't something you see very often. Should 8250_pnp fail to register the new port when the old one is open? This would make probe() fail which might also be not desirable...
On Tue, May 29, 2007 at 07:53:32PM -0700, bugme-daemon@bugzilla.kernel.org wrote: > During boot: > uart_open() > -> uart_get() // state->info allocated > > During 8250_pnp serial_pnp_probe(): > serial8250_register_port() > -> uart_remove_one_port() (state->info set to NULL, memory leaked) At this point, the port is supposed to be hung up: info = state->info; if (info && info->tty) tty_vhangup(info->tty); However, because the console is open, the hangup function doesn't. So, IOW, there is no way for the serial layer to say to the tty layer "this port is really going away, remove all users." It's a tty <-> serial problem which needs sorting on the tty side first. Unfortunately it does mean that your console will vanish until it's re- opened but that's what has to happen when the port ownership or hardware changes.
> -> uart_remove_one_port() (state->info set to NULL, memory leaked) The memory isn't leaked; there's a kfree(info) below. This scenario ("console=ttyS0,9600" where ttyS0 is discovered by 8250_pnp) works on ia64. One big difference is that serial8250_isa_init_ports() is a no-op on ia64 but not on x86. Can you try this with current upstream, e.g., 2.6.22-rc3? I made some changes post-2.6.21 that make x86 more like ia64 in this regard: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26
I just tested 2.6.22-rc3 with SERIAL_8250 support compiled in staticly (which selects SERIAL_8250_PNP=Y) and indeed the oops is gone! It's worth noting that setting the SERIAL_8250 support to M, which also causes SERIAL_8250_PNP to be built as a module now seems to disable serial console support. So this problems seems to that this issue has been both fixed in the code and in the way Kconfig is selecting options. It doesn't look anything relevant changed in drivers/serial/Kconfig -- has something fundamental changed in the way Kconfig works?
Thanks for testing that. I think we should close this bug, since the oops seems to be fixed. It sounds like you think there is still another problem, but I don't understand exactly what it is. Is the problem that serial console used to work with 8250 built as a module, but now it doesn't? Can you open a new bug report and give more details of the problem?
Serial console is _never_ selectable with any modular serial driver.
Thanks Bjorn. Closing as this is fixed in Linus' tree.
The above patch got reverted, so this bug is back. http://marc.info/?t=120885070900001&r=1&w=2 http://bugs.gentoo.org/show_bug.cgi?id=223489
(no longer maintainer)