Bug 13367 (serial-tec-7000) - serial port (COM6) setup leads to Oops on Toshiba TEC-7000
Summary: serial port (COM6) setup leads to Oops on Toshiba TEC-7000
Status: RESOLVED PATCH_ALREADY_AVAILABLE
Alias: serial-tec-7000
Product: Drivers
Classification: Unclassified
Component: Serial (show other bugs)
Hardware: i386 Linux
: P1 high
Assignee: Alan
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-23 08:31 UTC by Seryodkin Victor
Modified: 2009-05-28 11:57 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.30-rc6
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
Data for bug analysis (tgz archive) (87.81 KB, application/octet-stream)
2009-05-23 08:31 UTC, Seryodkin Victor
Details
Bug description in one file (7.89 KB, text/plain)
2009-05-23 08:33 UTC, Seryodkin Victor
Details

Description Seryodkin Victor 2009-05-23 08:31:06 UTC
Created attachment 21503 [details]
Data for bug analysis (tgz archive)

[1.] One Line Summary
serial port (COM6) setup leads to Oops on Toshiba TEC-7000

[2.] Full description of the problem/report:

We are using POS (Point of Sale) terminal PCs
which usualy have more than 4 serial ports.
That is why kernel build configuration has following options set:
  CONFIG_SERIAL_8250_NR_UARTS=10
  CONFIG_SERIAL_8250_RUNTIME_UARTS=10

One of such PCs is Toshiba TEC-7000 POS.
It has 6 serial ports which must be configured in the following way:
/dev/ttyS0, Line 0, UART: 16550A, Port: 0x03f8, IRQ: 4
/dev/ttyS1, Line 1, UART: 16550A, Port: 0x02f8, IRQ: 3
/dev/ttyS2, Line 2, UART: 16550A, Port: 0x03e8, IRQ: 10
/dev/ttyS3, Line 3, UART: 16550A, Port: 0x02e8, IRQ: 11
/dev/ttyS4, Line 4, UART: 16550A, Port: 0x0128, IRQ: 7
/dev/ttyS5, Line 5, UART: 16550A, Port: 0x02f0, IRQ: 5

All 6 serial ports work fine for Toshiba TEC-7000 till inclusive 2.6.28.10

When trying to upgrade to 2.6.29.2 I'v got kernel Oops error
when trying to configure COM6 on Toshiba TEC-7000 using the following command

  setserial /dev/ttyS5 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 5 port 0x2f0

The samme story is with 2.6.29.4 and 2.6.30-rc6
See Oops message below for for 2.6.30-rc6

[3.] Keywords
serial


[4.] Kernel version

Linux version 2.6.30-rc6 (root@nc4010vvs) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP PREEMPT Fri May 22 16:57:37 MSD 2009

[5.] Output of Oops message

May 22 19:11:08 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
May 22 19:11:08 localhost kernel: IP: [<(null)>] (null)
May 22 19:11:08 localhost kernel: *pde = 00000000
May 22 19:11:08 localhost kernel: Oops: 0000 [#1] PREEMPT SMP
May 22 19:11:08 localhost kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/class
May 22 19:11:08 localhost kernel: Modules linked in: psmouse via_rhine pata_via libata atkbd i8042
May 22 19:11:08 localhost kernel:
May 22 19:11:08 localhost kernel: Pid: 2507, comm: setserial Not tainted (2.6.30-rc6 #1) ST-700/ST-7000/M-7000
May 22 19:11:08 localhost kernel: EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0
May 22 19:11:08 localhost kernel: EIP is at 0x0
May 22 19:11:08 localhost kernel: EAX: c06db304 EBX: c06db304 ECX: 00000001 EDX: 00000002
May 22 19:11:08 localhost kernel: ESI: c06db304 EDI: 00000001 EBP: fffffff4 ESP: c140bda8
May 22 19:11:08 localhost kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
May 22 19:11:08 localhost kernel: Process setserial (pid: 2507, ti=c140a000 task=c5944ec0 task.ti=c140a000)
May 22 19:11:08 localhost kernel: Stack:
May 22 19:11:08 localhost kernel:  c029955f c06db304 c029aa65 c58a8348 c06db304 c58a8348 00000001 fffffff4
May 22 19:11:08 localhost kernel:  c02980d7 c06db304 00000040 00000000 c58a8348 c02986ce 00000001 00000001
May 22 19:11:08 localhost kernel:  c5b070f8 0000541f c312f180 c5b07000 00000282 00000000 00000000 000001f4
May 22 19:11:08 localhost kernel: Call Trace:
May 22 19:11:08 localhost kernel:  [<c029955f>] ? serial8250_clear_fifos+0x19/0x36
May 22 19:11:08 localhost kernel:  [<c029aa65>] ? serial8250_startup+0xde/0x57f
May 22 19:11:08 localhost kernel:  [<c02980d7>] ? uart_startup+0x69/0x115
May 22 19:11:08 localhost kernel:  [<c02986ce>] ? uart_ioctl+0x54b/0x8f7
May 22 19:11:08 localhost kernel:  [<c0138438>] ? remove_wait_queue+0xb/0x2f
May 22 19:11:08 localhost kernel:  [<c0298cfe>] ? uart_open+0x284/0x2f4
May 22 19:11:08 localhost kernel:  [<c01230a3>] ? default_wake_function+0x0/0x8
May 22 19:11:08 localhost kernel:  [<c0273b9a>] ? tty_open+0x33a/0x378
May 22 19:11:08 localhost kernel:  [<c014fee6>] ? filemap_fault+0x98/0x372
May 22 19:11:08 localhost kernel:  [<c0298183>] ? uart_ioctl+0x0/0x8f7
May 22 19:11:08 localhost kernel:  [<c0273213>] ? tty_ioctl+0x694/0x6fc
May 22 19:11:08 localhost kernel:  [<c0272b7f>] ? tty_ioctl+0x0/0x6fc
May 22 19:11:08 localhost kernel:  [<c0179499>] ? vfs_ioctl+0x1c/0x5d
May 22 19:11:08 localhost kernel:  [<c0179936>] ? do_vfs_ioctl+0x45c/0x497
May 22 19:11:08 localhost kernel:  [<c017999d>] ? sys_ioctl+0x2c/0x42
May 22 19:11:08 localhost kernel:  [<c0102955>] ? syscall_call+0x7/0xb
May 22 19:11:08 localhost kernel: Code:  Bad EIP value.
May 22 19:11:08 localhost kernel: EIP: [<00000000>] 0x0 SS:ESP 0068:c140bda8
May 22 19:11:08 localhost kernel: CR2: 0000000000000000
May 22 19:11:08 localhost kernel: ---[ end trace cdf96902a7639d3c ]---

[6.] A small shell script or example program which triggers the problem

See attachment tec-7000-serial-trouble.tgz
tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/serial-setup.sh

--- serial-setup.sh ------------
echo "Configuring /dev/ttyS0 - /dev/ttyS4"
setserial /dev/ttyS0 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 4 port 0x3f8
setserial /dev/ttyS1 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 3 port 0x2f8
setserial /dev/ttyS2 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 10 port 0x3e8
setserial /dev/ttyS3 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 11 port 0x2e8
setserial /dev/ttyS4 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 7 port 0x128

echo "Serial ports configuration"
setserial -g /dev/ttyS[0-4]

echo "Configuring /dev/ttyS5"
echo "Attention!!! After setserial invocation OOPS will happen."
echo "Press any key ..."
read
setserial /dev/ttyS5 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 5 port 0x2f0
--- serial-setup.sh ------------

Error happens when
setserial /dev/ttyS5 ...
command is being executed

Steps to Reproduce:

1) Build kernel using configuration
tec-7000-serial-trouble.tgz/kernel-malfunction/2.6.30-rc6-bad/.config
2) boot on Toshiba TEC-7000
3) Execute serial-setup.sh


Actual Results:
  Serial driver subsystem is mulfunctioning.

Expected Results:

Following configuration:
/dev/ttyS0, Line 0, UART: 16550A, Port: 0x03f8, IRQ: 4
/dev/ttyS1, Line 1, UART: 16550A, Port: 0x02f8, IRQ: 3
/dev/ttyS2, Line 2, UART: 16550A, Port: 0x03e8, IRQ: 10
/dev/ttyS3, Line 3, UART: 16550A, Port: 0x02e8, IRQ: 11
/dev/ttyS4, Line 4, UART: 16550A, Port: 0x0128, IRQ: 7
/dev/ttyS5, Line 5, UART: 16550A, Port: 0x02f0, IRQ: 5

Build Date & Platform:
Kernel built on machine
 Linux nc4010vvs 2.6.27.5-117.fc10.i686 #1 SMP Tue Nov 18 12:19:59 EST 2008 i686 i686 i386 GNU/Linux
Kernel is tested on the other machine (Toshiba TEC-7000)
See comments below


Additional Information:
Attachment tec-7000-serial-trouble.tgz content


tec-7000-serial-trouble/kernel-malfunction - contains data for malfunctioning kernel
tec-7000-serial-trouble/kernel-working     - contains data fow working kernel

tec-7000-serial-trouble/kernel-malfunction/2.6.29.2-bad/.config
Kernel build configuration for malfunctioning 2.6.29.2 kernel

tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/.config
Kernel build configuration for malfunctioning 2.6.30-rc6 kernel

tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/serial-setup.sh
Scritp for reproducing the bug

tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/ver_linux-2.6.30-rc6.txt
ver_linux script output


tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/1-before-fault
system data and logs before error

tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/1-before-fault/sysdata.txt
System runtime data from /proc before error


tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/2-after-fault
system data and logs after error (after executing script serial-setup.sh)
System runtime data from /proc after error


tec-7000-serial-trouble/kernel-working/2.6.28.10-ok:
.config - kernel build configuration (working fine)
dmesg - kernel log
serial-setup.sh - test script
serial.txt - output from "setserial -ga /dev/ttyS[0-5]" command
sysdata.txt - System runtime data from /proc
ver_linux-2.6.28.10.txt - ver_linux script output
Comment 1 Seryodkin Victor 2009-05-23 08:33:50 UTC
Created attachment 21504 [details]
Bug description in one file
Comment 2 Seryodkin Victor 2009-05-23 08:37:01 UTC
To clarify:

The bug was originally found in 2.6.29.2

Analysis data in attachment are from
2.6.30-rc6 (which is malfunctioning)
2.6.28.10  (which works fine)
Comment 3 Alan 2009-05-24 13:53:57 UTC
Is it always the configuration of the port to address 0x2f0 fails, 

It looks a bit odd - the segment registers seem to be corrupted - which basically "can't happen" and the trace also makes no rational sense - its a valid code path but one that suggests that touching 0x2f0 caused some serious hardware weirdness to occur or some other event made the trace bogus

Are you building with 4K or 8K stacks, and if you are building with 4K stacks does it occur with 8K stacks.

Also is the problem specifically tied to configuring that port to 0x2f0 ?
Comment 4 Seryodkin Victor 2009-05-25 07:01:59 UTC
As you can see in 
  tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/.config
from the tec-7000-serial-trouble.tgz attachment
the 2.6.30 faulty kernel was built with

  # CONFIG_4KSTACKS is not set 

option (kernel use 8K stacks)

Attempt to use for testing purposes another I/O port be means of
 setserial /dev/ttyS5 uart 16550A baud_base 115200 irq 5 port 0x300
produces the same Oops error

Hardware specific value defined by manufacturer for COM6 is 0x2f0
Using another port value is meaningless
Comment 5 Alan 2009-05-25 08:16:01 UTC
One thing that I wanted to rule out was that it was going pop because some other hardware was also at 0x2f0 and perhaps now enabled. The fact it does the same at 0x300 nicely rules that out.
Comment 6 Andrew Morton 2009-05-26 22:20:12 UTC
It looks to me like the CPU jumped to 0x00000000 while running serial8250_clear_fifos(), so perhaps

#define serial_out(up, offset, value)   \
        (up->port.serial_out(&(up)->port, (offset), (value)))

port.serial_out hasn't been initialised.
Comment 7 Alan 2009-05-27 12:46:44 UTC
There are three places clear_fifos is used and two of them reference port.serial_out before the call. It also doesn't explain how the segment register gets zapped


        serial_out(up, UART_LCR, serial_inp(up, UART_LCR) & ~UART_LCR_SBC);
        serial8250_clear_fifos(up);


        serial_outp(up, UART_MCR, save_mcr);
        serial8250_clear_fifos(up);

The third case is reconfiguring a port which fits the description but the port->serial_out is set when the port is registered and then never touched.

All very strange
Comment 8 Seryodkin Victor 2009-05-28 05:51:27 UTC
As you can see in

  tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/1-before-fault/logs/dmesg

from the attachment at boot time ttyS0 - ttyS4 are autodetected by the kernel

--- dmsg chunk ----
Serial: 8250/16550 driver, 10 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250: ttyS2 at I/O 0x3e8 (irq = 10) is a 16550A
serial8250: ttyS3 at I/O 0x2e8 (irq = 11) is a 16550A
00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:0a: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:0b: ttyS2 at I/O 0x3e8 (irq = 10) is a 16550A
00:0c: ttyS3 at I/O 0x2e8 (irq = 11) is a 16550A
00:0d: ttyS4 at I/O 0x128 (irq = 7) is a 16550A 
--- dmsg chunk ----


And the bug appears when attempt to configure ttyS5 is being performed.
Comment 9 Alan 2009-05-28 11:57:45 UTC
Om not sure why the 32bit traces are so odd but I've finally managed to duplicate this on a 64bit box and get a clean trace

With a better trace it turns out its nice and easy to fix - patch queued and will aim it at Linus asap

Note You need to log in before you can comment on or make changes to this bug.