Bug 7509
Summary: | Bluetooth related panic | ||
---|---|---|---|
Product: | Networking | Reporter: | Mikko Rapeli (mikko.rapeli) |
Component: | Other | Assignee: | Arnaldo Carvalho de Melo (acme) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | akpm, marcel, protasnb |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.22-rc5 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
kern.log
Oops screen shot with 2.6.19rc5 sane crash log with 2.6.20-rc4 rfcomm log crash log with 2.6.20-rc5 crash screen shot Keep rfcomm_dev in list until it is freed crash with 2.6.22-rc5 + "Keep rfcomm_dev in list until it is freed" patch Hangup tty's before releasing rfcomm_dev |
Description
Mikko Rapeli
2006-11-13 02:06:15 UTC
Created attachment 9483 [details]
kern.log
Congratulations - you got the messiest-looking backtrace I've seen in a year. It _looks_ like it oopsed in the bluetooth->tty management code. But it could be the socket code too. Sigh. I recall we did fix one or two things in there. If you could, it would be good if you could test 2.6.19-rc5 or -rc6 and see if enabling CONFIG_UNWIND_INFO=y CONFIG_STACK_UNWIND=y gives us a more comprehensible backtrace. Thanks. Created attachment 9492 [details]
Oops screen shot with 2.6.19rc5
How on earth did I manage to get this crash to syslog? I must have played with
magic sysreq and been very, very lucky...
Created attachment 10192 [details] sane crash log with 2.6.20-rc4 I tried 2.6.8 from Debian sarge and it crashes too. Then tried 2.6.20rc4 and it crashes. Then played kdb and I get to log the first null pointer, yeah! As I don't how to trace it back to C source and my asm sucks I just read lots of code and play with kprint. Making the dmesg buffer large and removing this tty thing helps a lot: diff -ru linux-2.6.20-rc4/drivers/char/tty_io.c linux-2.6.20-rc4-kdb/drivers/char/tty_io.c --- linux-2.6.20-rc4/drivers/char/tty_io.c 2007-01-07 07:45:51.000000000 +0200 +++ linux-2.6.20-rc4-kdb/drivers/char/tty_io.c 2007-01-25 21:23:05.000000000 +0200 @@ -2313,8 +2313,8 @@ if (!do_sleep) break; - printk(KERN_WARNING "release_dev: %s: read/write wait queue " - "active!\n", tty_name(tty, buf)); + /*printk(KERN_WARNING "release_dev: %s: read/write wait queue " + "active!\n", tty_name(tty, buf));*/ mutex_unlock(&tty_mutex); schedule(); } Still being clueless I stumble to http://git2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b6e557fbf1dbba8cfa667a25503e5dbd0e9330b7 and now there's a dumb cure: --- linux-2.6.20-rc4/net/bluetooth/rfcomm/tty.c 2007-01-07 07:45:51.000000000 +0200 +++ linux-2.6.20-rc4-kdb/net/bluetooth/rfcomm/tty.c 2007-01-25 22:26:44.000000000 +0200 @@ -659,13 +659,23 @@ static int rfcomm_tty_write(struct tty_struct *tty, const unsigned char *buf, int count) { + BT_DBG("tty %p count %d", tty, count); + + if (!tty) return 0; + struct rfcomm_dev *dev = (struct rfcomm_dev *) tty->driver_data; + + BT_DBG("dev %p", dev); + if (!dev) return 0; + struct rfcomm_dlc *dlc = dev->dlc; + + BT_DBG("dlc %p", dlc); + if (!dlc) return 0; + struct sk_buff *skb; int err = 0, sent = 0, size; - BT_DBG("tty %p count %d", tty, count); - To my non-kernel hacker eyes this is a bit dumb, since I'd hope tty, dlc and others would be protected by some lock. I hope someone knows better. -Mikko Marcel, have you seen this? Created attachment 10193 [details]
rfcomm log
Debug prints from rfcomm with the previous patch. I had rfcomm2 open with
ppp/ip, l2ping'ed my phone and did 'telinit 1'. Shows (struct rfcomm_dev *)
tty->driver_data as 0 in the end.
-Mikko
Can you please make sure that 2.6.20-rc5 or later still produces these crashes for you. The rfcomm_tty_chars_in_buffer() routine should return 0 in cases where the the DLC has been detached from the TTY. So there should be no call to rfcomm_tty_write() with an unattached DLC. Created attachment 10196 [details]
crash log with 2.6.20-rc5
This I got with almost plain 2.6.20-rc5. Only these changes included:
diff -ru linux-2.6.20-rc5/net/bluetooth/rfcomm/Makefile
linux-2.6.20-rc5-mcf/net
/bluetooth/rfcomm/Makefile
--- linux-2.6.20-rc5/net/bluetooth/rfcomm/Makefile 2007-01-12
20:54:26.0000
00000 +0200
+++ linux-2.6.20-rc5-mcf/net/bluetooth/rfcomm/Makefile 2007-01-26
12:10:27.0000
00000 +0200
@@ -2,6 +2,8 @@
# Makefile for the Linux Bluetooth RFCOMM layer.
#
+CFLAGS += -DCONFIG_BT_RFCOMM_DEBUG
+
obj-$(CONFIG_BT_RFCOMM) += rfcomm.o
rfcomm-y := core.o sock.o
diff -ru linux-2.6.20-rc5/drivers/char/tty_io.c
linux-2.6.20-rc5-mcf/drivers/cha
r/tty_io.c
--- linux-2.6.20-rc5/drivers/char/tty_io.c 2007-01-12 20:54:26.000000000
+0
200
+++ linux-2.6.20-rc5-mcf/drivers/char/tty_io.c 2007-01-26 12:33:39.000000000
+0
200
@@ -2313,8 +2313,8 @@
if (!do_sleep)
break;
- printk(KERN_WARNING "release_dev: %s: read/write wait queue "
- "active!\n", tty_name(tty, buf));
+ /*printk(KERN_WARNING "release_dev: %s: read/write wait queue "
+ "active!\n", tty_name(tty, buf));*/
mutex_unlock(&tty_mutex);
schedule();
}
Like before, with 2.6.20-rc5 this patch helps: diff -ru ../linux-2.6.20-rc5/net/bluetooth/rfcomm/tty.c net/bluetooth/rfcomm/tty .c --- ../linux-2.6.20-rc5/net/bluetooth/rfcomm/tty.c 2007-01-12 20:54:26.0000 00000 +0200 +++ net/bluetooth/rfcomm/tty.c 2007-01-26 20:48:00.000000000 +0200 @@ -660,6 +660,9 @@ static int rfcomm_tty_write(struct tty_struct *tty, const unsigned char *buf, i nt count) { struct rfcomm_dev *dev = (struct rfcomm_dev *) tty->driver_data; + BT_DBG("dev %p", dev); + if (!dev) + return 0; struct rfcomm_dlc *dlc = dev->dlc; struct sk_buff *skb; int err = 0, sent = 0, size; And logs show up like this: [ 3224.384000] rfcomm_recv_mcc: c6d7bd60 type 0x38 cr 0 [ 3224.384000] rfcomm_recv_msc: dlci 2 cr 0 v24 0x8d [ 3224.384000] rfcomm_process_dlcs: session c6d7bd60 state 1 [ 3224.384000] rfcomm_process_tx: dlc c839ef38 state 8 cfc 40 rx_credits 23 tx_c redits 7 [ 3224.388000] rfcomm_tty_ioctl: tty caf4c7f8 cmd 0x5401 [ 3224.388000] rfcomm_tty_ioctl: TCGETS is not supported [ 3224.388000] rfcomm_tty_open: tty caf4c7f8 id 2 [ 3224.392000] rfcomm_l2data_ready: cc5a5bf8 bytes 4 [ 3224.392000] rfcomm_process_rx: session c6d7bd60 state 1 qlen 1 [ 3224.392000] rfcomm_recv_ua: session c6d7bd60 state 1 dlci 2 [ 3224.392000] __rfcomm_dlc_close: dlc c839ef38 state 9 dlci 2 err 0 session c6d 7bd60 [ 3224.392000] rfcomm_dlc_clear_timer: dlc c839ef38 state 9 [ 3224.392000] rfcomm_dlc_unlink: dlc c839ef38 refcnt 1 session c6d7bd60 [ 3224.392000] rfcomm_dlc_free: c839ef38 [ 3224.392000] rfcomm_process_dlcs: session c6d7bd60 state 1 [ 3224.392000] rfcomm_session_del: session c6d7bd60 state 1 [ 3224.392000] rfcomm_send_disc: c6d7bd60 dlci 0 [ 3224.392000] rfcomm_send_frame: session c6d7bd60 len 4 [ 3224.404000] rfcomm_tty_ioctl: tty caf4c7f8 cmd 0x5401 [ 3224.404000] rfcomm_tty_ioctl: TCGETS is not supported [ 3224.404000] rfcomm_tty_ioctl: tty caf4c7f8 cmd 0x5401 [ 3224.404000] rfcomm_tty_ioctl: TCGETS is not supported [ 3224.404000] rfcomm_tty_ioctl: tty caf4c7f8 cmd 0x5402 [ 3224.404000] rfcomm_tty_ioctl: TCSETS is not supported [ 3224.404000] rfcomm_tty_set_termios: tty caf4c7f8 termios c8597e00 [ 3224.404000] rfcomm_tty_ioctl: tty caf4c7f8 cmd 0x5401 [ 3224.404000] rfcomm_tty_ioctl: TCGETS is not supported [ 3224.404000] rfcomm_tty_ioctl: tty caf4c7f8 cmd 0x5409 [ 3224.420000] rfcomm_tty_write: dev 00000000 -------------------------------------^ [ 3224.420000] rfcomm_l2state_change: cc5a5bf8 state 9 [ 3226.440000] bus bluetooth: remove device acl00605762B2A5 [ 3226.440000] PM: Removing info for bluetooth:acl00605762B2A5 I guess best we can do right now is ask Miko to test with the latest kernel (2.6.22-rc5 as of now) and confirm if the problem still there. Thanks. Created attachment 11833 [details]
crash screen shot
The screenshot above is from 2.6.22-rc5. Perhaps someone could try to explain/decipher the locking logic of net/bluetooth/rfcomm/tty.c? At least I'm not getting it... Created attachment 11835 [details]
Keep rfcomm_dev in list until it is freed
This patch fixes another kernel crash. It looks like it could also fix your case. Please apply it to 2.6.22-rc5 and try it.
Created attachment 11855 [details]
crash with 2.6.22-rc5 + "Keep rfcomm_dev in list until it is freed" patch
With the 'Keep rfcomm_dev in list until it is freed' patch the oops is a bit harder to recreate: after 'pon gprs', l2ping on another terminal and 'telinit 1' on another, I get tons of these:
Jun 23 00:12:22 localhost kernel: release_dev: rfcomm2: read/write wait queue active!
Jun 23 00:12:22 localhost last message repeated 260 times
I got bored so I entered sysreq+k which resulted in a:
Jun 23 00:12:22 localhost kernel: BUG: unable to handle kernel NULL pointer dere
ference at virtual address 00000000
Jun 23 00:12:22 localhost kernel: printing eip:
Jun 23 00:12:22 localhost kernel: 00000000
Jun 23 00:12:22 localhost kernel: *pde = 00000000
Jun 23 00:12:22 localhost kernel: Oops: 0000 [#2]
Jun 23 00:12:22 localhost kernel: Modules linked in: ppp_async crc_ccitt ppp_gen
eric slhc hci_usb binfmt_misc rfcomm l2cap bluetooth ipv6 button ac battery nls_
iso8859_1 nls_cp437 vfat fat fuse tsdev pcmcia firmware_class floppy rtc parport
_pc parport psmouse i2c_piix4 i2c_core serio_raw intel_agp shpchp pci_hotplug ag
pgart yenta_socket rsrc_nonstatic pcmcia_core evdev ipt_REJECT xt_tcpudp iptable
_filter ip_tables x_tables ext3 jbd mbcache dm_mirror dm_snapshot dm_mod ide_gen
eric ide_cd cdrom ide_disk uhci_hcd usbcore piix generic ide_core e100 mii therm
al processor fan
Jun 23 00:12:22 localhost kernel: CPU: 0
Jun 23 00:12:22 localhost kernel: EIP: 0060:[<00000000>] Not tainted VLI
Jun 23 00:12:22 localhost kernel: EFLAGS: 00010006 (2.6.22-rc5 #9)
Jun 23 00:12:22 localhost kernel: EIP is at _stext+0x3feff000/0x14
Jun 23 00:12:22 localhost kernel: eax: c9daff2c ebx: c9daff2c ecx: 00000000
edx: 00000003
Jun 23 00:12:22 localhost kernel: esi: 00000000 edi: 00000001 ebp: c6c7deb4
esp: c6c7de94
Jun 23 00:12:22 localhost kernel: ds: 007b es: 007b fs: 0000 gs: 0033 ss:
0068
Jun 23 00:12:22 localhost kernel: Process pppd (pid: 3722, ti=c6c7c000 task=c5c2
6070 task.ti=c6c7c000)
Jun 23 00:12:22 localhost kernel: Stack: c01111fc 00000000 00000003 c6d0f520 000
00000 00000000 00000292 c6c7df1c
Jun 23 00:12:22 localhost kernel: c6c7decc c0112008 00000000 00000000 c6d
0f400 00000001 00000000 c01ef270
Jun 23 00:12:22 localhost kernel: 00000000 c6c7df14 00000001 c0140768 642
f0001 c6c7df14 ca445518 c6fdc120
Jun 23 00:12:22 localhost kernel: Call Trace:
Jun 23 00:12:22 localhost kernel: [__wake_up_common+49/79] __wake_up_common+0x3
1/0x4f
Jun 23 00:12:22 localhost kernel: [__wake_up+30/38] __wake_up+0x1e/0x26
Jun 23 00:12:22 localhost kernel: [release_dev+626/1410] release_dev+0x272/0x58
2
Jun 23 00:12:22 localhost kernel: [do_wp_page+88/877] do_wp_page+0x58/0x36d
Jun 23 00:12:22 localhost kernel: [__handle_mm_fault+1590/1713] __handle_mm_fau
lt+0x636/0x6b1
Jun 23 00:12:22 localhost kernel: [sk_free+176/197] sk_free+0xb0/0xc5
Jun 23 00:12:22 localhost kernel: [tty_ioctl+0/2695] tty_ioctl+0x0/0xa87
Jun 23 00:12:22 localhost kernel: [tty_release+7/10] tty_release+0x7/0xa
Jun 23 00:12:22 localhost kernel: [__fput+57/222] __fput+0x39/0xde
Jun 23 00:12:22 localhost kernel: [filp_close+81/88] filp_close+0x51/0x58
Jun 23 00:12:22 localhost kernel: [sys_close+84/131] sys_close+0x54/0x83
Jun 23 00:12:22 localhost kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Jun 23 00:12:22 localhost kernel: [xfrm_parse_spi+58/210] xfrm_parse_spi+0x3a/0
xd2
Jun 23 00:12:22 localhost kernel: =======================
Jun 23 00:12:22 localhost kernel: Code: Bad EIP value.
Jun 23 00:12:22 localhost kernel: EIP: [<00000000>] _stext+0x3feff000/0x14 SS:ES
P 0068:c6c7de94
I'm thinking this is a race between tty and ioctl code. I whish I could understand which lock protects rfcomm_dev list and each rfcomm_dev.
Created attachment 11909 [details]
Hangup tty's before releasing rfcomm_dev
Read a kernel book, studied to code, negledged the family for a long time... And here comes a patch, which I think is a proper fix for this issue.
The core problem is that rfcomm socket layer ioctl can release rfcomm_dev struct while rfcomm tty layer is still actively using it. Calling tty_vhangup is needed for a synchronous hangup before rfcomm_dev is freed.
|