Bug 13367 (serial-tec-7000)
Summary: | serial port (COM6) setup leads to Oops on Toshiba TEC-7000 | ||
---|---|---|---|
Product: | Drivers | Reporter: | Seryodkin Victor (vvscore) |
Component: | Serial | Assignee: | Alan (alan) |
Status: | RESOLVED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | akpm, alan |
Priority: | P1 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.30-rc6 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Data for bug analysis (tgz archive)
Bug description in one file |
Created attachment 21504 [details]
Bug description in one file
To clarify: The bug was originally found in 2.6.29.2 Analysis data in attachment are from 2.6.30-rc6 (which is malfunctioning) 2.6.28.10 (which works fine) Is it always the configuration of the port to address 0x2f0 fails, It looks a bit odd - the segment registers seem to be corrupted - which basically "can't happen" and the trace also makes no rational sense - its a valid code path but one that suggests that touching 0x2f0 caused some serious hardware weirdness to occur or some other event made the trace bogus Are you building with 4K or 8K stacks, and if you are building with 4K stacks does it occur with 8K stacks. Also is the problem specifically tied to configuring that port to 0x2f0 ? As you can see in tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/.config from the tec-7000-serial-trouble.tgz attachment the 2.6.30 faulty kernel was built with # CONFIG_4KSTACKS is not set option (kernel use 8K stacks) Attempt to use for testing purposes another I/O port be means of setserial /dev/ttyS5 uart 16550A baud_base 115200 irq 5 port 0x300 produces the same Oops error Hardware specific value defined by manufacturer for COM6 is 0x2f0 Using another port value is meaningless One thing that I wanted to rule out was that it was going pop because some other hardware was also at 0x2f0 and perhaps now enabled. The fact it does the same at 0x300 nicely rules that out. It looks to me like the CPU jumped to 0x00000000 while running serial8250_clear_fifos(), so perhaps #define serial_out(up, offset, value) \ (up->port.serial_out(&(up)->port, (offset), (value))) port.serial_out hasn't been initialised. There are three places clear_fifos is used and two of them reference port.serial_out before the call. It also doesn't explain how the segment register gets zapped serial_out(up, UART_LCR, serial_inp(up, UART_LCR) & ~UART_LCR_SBC); serial8250_clear_fifos(up); serial_outp(up, UART_MCR, save_mcr); serial8250_clear_fifos(up); The third case is reconfiguring a port which fits the description but the port->serial_out is set when the port is registered and then never touched. All very strange As you can see in tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/1-before-fault/logs/dmesg from the attachment at boot time ttyS0 - ttyS4 are autodetected by the kernel --- dmsg chunk ---- Serial: 8250/16550 driver, 10 ports, IRQ sharing enabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A serial8250: ttyS2 at I/O 0x3e8 (irq = 10) is a 16550A serial8250: ttyS3 at I/O 0x2e8 (irq = 11) is a 16550A 00:09: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 00:0a: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A 00:0b: ttyS2 at I/O 0x3e8 (irq = 10) is a 16550A 00:0c: ttyS3 at I/O 0x2e8 (irq = 11) is a 16550A 00:0d: ttyS4 at I/O 0x128 (irq = 7) is a 16550A --- dmsg chunk ---- And the bug appears when attempt to configure ttyS5 is being performed. Om not sure why the 32bit traces are so odd but I've finally managed to duplicate this on a 64bit box and get a clean trace With a better trace it turns out its nice and easy to fix - patch queued and will aim it at Linus asap |
Created attachment 21503 [details] Data for bug analysis (tgz archive) [1.] One Line Summary serial port (COM6) setup leads to Oops on Toshiba TEC-7000 [2.] Full description of the problem/report: We are using POS (Point of Sale) terminal PCs which usualy have more than 4 serial ports. That is why kernel build configuration has following options set: CONFIG_SERIAL_8250_NR_UARTS=10 CONFIG_SERIAL_8250_RUNTIME_UARTS=10 One of such PCs is Toshiba TEC-7000 POS. It has 6 serial ports which must be configured in the following way: /dev/ttyS0, Line 0, UART: 16550A, Port: 0x03f8, IRQ: 4 /dev/ttyS1, Line 1, UART: 16550A, Port: 0x02f8, IRQ: 3 /dev/ttyS2, Line 2, UART: 16550A, Port: 0x03e8, IRQ: 10 /dev/ttyS3, Line 3, UART: 16550A, Port: 0x02e8, IRQ: 11 /dev/ttyS4, Line 4, UART: 16550A, Port: 0x0128, IRQ: 7 /dev/ttyS5, Line 5, UART: 16550A, Port: 0x02f0, IRQ: 5 All 6 serial ports work fine for Toshiba TEC-7000 till inclusive 2.6.28.10 When trying to upgrade to 2.6.29.2 I'v got kernel Oops error when trying to configure COM6 on Toshiba TEC-7000 using the following command setserial /dev/ttyS5 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 5 port 0x2f0 The samme story is with 2.6.29.4 and 2.6.30-rc6 See Oops message below for for 2.6.30-rc6 [3.] Keywords serial [4.] Kernel version Linux version 2.6.30-rc6 (root@nc4010vvs) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #1 SMP PREEMPT Fri May 22 16:57:37 MSD 2009 [5.] Output of Oops message May 22 19:11:08 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at (null) May 22 19:11:08 localhost kernel: IP: [<(null)>] (null) May 22 19:11:08 localhost kernel: *pde = 00000000 May 22 19:11:08 localhost kernel: Oops: 0000 [#1] PREEMPT SMP May 22 19:11:08 localhost kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/class May 22 19:11:08 localhost kernel: Modules linked in: psmouse via_rhine pata_via libata atkbd i8042 May 22 19:11:08 localhost kernel: May 22 19:11:08 localhost kernel: Pid: 2507, comm: setserial Not tainted (2.6.30-rc6 #1) ST-700/ST-7000/M-7000 May 22 19:11:08 localhost kernel: EIP: 0060:[<00000000>] EFLAGS: 00010202 CPU: 0 May 22 19:11:08 localhost kernel: EIP is at 0x0 May 22 19:11:08 localhost kernel: EAX: c06db304 EBX: c06db304 ECX: 00000001 EDX: 00000002 May 22 19:11:08 localhost kernel: ESI: c06db304 EDI: 00000001 EBP: fffffff4 ESP: c140bda8 May 22 19:11:08 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 May 22 19:11:08 localhost kernel: Process setserial (pid: 2507, ti=c140a000 task=c5944ec0 task.ti=c140a000) May 22 19:11:08 localhost kernel: Stack: May 22 19:11:08 localhost kernel: c029955f c06db304 c029aa65 c58a8348 c06db304 c58a8348 00000001 fffffff4 May 22 19:11:08 localhost kernel: c02980d7 c06db304 00000040 00000000 c58a8348 c02986ce 00000001 00000001 May 22 19:11:08 localhost kernel: c5b070f8 0000541f c312f180 c5b07000 00000282 00000000 00000000 000001f4 May 22 19:11:08 localhost kernel: Call Trace: May 22 19:11:08 localhost kernel: [<c029955f>] ? serial8250_clear_fifos+0x19/0x36 May 22 19:11:08 localhost kernel: [<c029aa65>] ? serial8250_startup+0xde/0x57f May 22 19:11:08 localhost kernel: [<c02980d7>] ? uart_startup+0x69/0x115 May 22 19:11:08 localhost kernel: [<c02986ce>] ? uart_ioctl+0x54b/0x8f7 May 22 19:11:08 localhost kernel: [<c0138438>] ? remove_wait_queue+0xb/0x2f May 22 19:11:08 localhost kernel: [<c0298cfe>] ? uart_open+0x284/0x2f4 May 22 19:11:08 localhost kernel: [<c01230a3>] ? default_wake_function+0x0/0x8 May 22 19:11:08 localhost kernel: [<c0273b9a>] ? tty_open+0x33a/0x378 May 22 19:11:08 localhost kernel: [<c014fee6>] ? filemap_fault+0x98/0x372 May 22 19:11:08 localhost kernel: [<c0298183>] ? uart_ioctl+0x0/0x8f7 May 22 19:11:08 localhost kernel: [<c0273213>] ? tty_ioctl+0x694/0x6fc May 22 19:11:08 localhost kernel: [<c0272b7f>] ? tty_ioctl+0x0/0x6fc May 22 19:11:08 localhost kernel: [<c0179499>] ? vfs_ioctl+0x1c/0x5d May 22 19:11:08 localhost kernel: [<c0179936>] ? do_vfs_ioctl+0x45c/0x497 May 22 19:11:08 localhost kernel: [<c017999d>] ? sys_ioctl+0x2c/0x42 May 22 19:11:08 localhost kernel: [<c0102955>] ? syscall_call+0x7/0xb May 22 19:11:08 localhost kernel: Code: Bad EIP value. May 22 19:11:08 localhost kernel: EIP: [<00000000>] 0x0 SS:ESP 0068:c140bda8 May 22 19:11:08 localhost kernel: CR2: 0000000000000000 May 22 19:11:08 localhost kernel: ---[ end trace cdf96902a7639d3c ]--- [6.] A small shell script or example program which triggers the problem See attachment tec-7000-serial-trouble.tgz tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/serial-setup.sh --- serial-setup.sh ------------ echo "Configuring /dev/ttyS0 - /dev/ttyS4" setserial /dev/ttyS0 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 4 port 0x3f8 setserial /dev/ttyS1 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 3 port 0x2f8 setserial /dev/ttyS2 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 10 port 0x3e8 setserial /dev/ttyS3 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 11 port 0x2e8 setserial /dev/ttyS4 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 7 port 0x128 echo "Serial ports configuration" setserial -g /dev/ttyS[0-4] echo "Configuring /dev/ttyS5" echo "Attention!!! After setserial invocation OOPS will happen." echo "Press any key ..." read setserial /dev/ttyS5 uart 16550A baud_base 115200 spd_normal skip_test ^fourport ^auto_irq irq 5 port 0x2f0 --- serial-setup.sh ------------ Error happens when setserial /dev/ttyS5 ... command is being executed Steps to Reproduce: 1) Build kernel using configuration tec-7000-serial-trouble.tgz/kernel-malfunction/2.6.30-rc6-bad/.config 2) boot on Toshiba TEC-7000 3) Execute serial-setup.sh Actual Results: Serial driver subsystem is mulfunctioning. Expected Results: Following configuration: /dev/ttyS0, Line 0, UART: 16550A, Port: 0x03f8, IRQ: 4 /dev/ttyS1, Line 1, UART: 16550A, Port: 0x02f8, IRQ: 3 /dev/ttyS2, Line 2, UART: 16550A, Port: 0x03e8, IRQ: 10 /dev/ttyS3, Line 3, UART: 16550A, Port: 0x02e8, IRQ: 11 /dev/ttyS4, Line 4, UART: 16550A, Port: 0x0128, IRQ: 7 /dev/ttyS5, Line 5, UART: 16550A, Port: 0x02f0, IRQ: 5 Build Date & Platform: Kernel built on machine Linux nc4010vvs 2.6.27.5-117.fc10.i686 #1 SMP Tue Nov 18 12:19:59 EST 2008 i686 i686 i386 GNU/Linux Kernel is tested on the other machine (Toshiba TEC-7000) See comments below Additional Information: Attachment tec-7000-serial-trouble.tgz content tec-7000-serial-trouble/kernel-malfunction - contains data for malfunctioning kernel tec-7000-serial-trouble/kernel-working - contains data fow working kernel tec-7000-serial-trouble/kernel-malfunction/2.6.29.2-bad/.config Kernel build configuration for malfunctioning 2.6.29.2 kernel tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/.config Kernel build configuration for malfunctioning 2.6.30-rc6 kernel tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/serial-setup.sh Scritp for reproducing the bug tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/ver_linux-2.6.30-rc6.txt ver_linux script output tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/1-before-fault system data and logs before error tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/1-before-fault/sysdata.txt System runtime data from /proc before error tec-7000-serial-trouble/kernel-malfunction/2.6.30-rc6-bad/2-after-fault system data and logs after error (after executing script serial-setup.sh) System runtime data from /proc after error tec-7000-serial-trouble/kernel-working/2.6.28.10-ok: .config - kernel build configuration (working fine) dmesg - kernel log serial-setup.sh - test script serial.txt - output from "setserial -ga /dev/ttyS[0-5]" command sysdata.txt - System runtime data from /proc ver_linux-2.6.28.10.txt - ver_linux script output