Latest working kernel version: 2.6.25-rc5 Earliest failing kernel version: Distribution: Debian unstable Hardware Environment: amd64 2x Dual Core Opteron Software Environment: Problem Description: While booting up, when wiggling the mouse at the right moment (to make hidd notice and configure it), the kernel crashes with an oops. Sometimes the bluetooth deamons just crashes, and then the kernel oopses when trying to shutdown the system or when trying to restart the bluetooth tools. The mouse is a Logitech MX900 with the MX900 bluetooth usb dongle. Steps to reproduce: Crash seems to be a race, most often it works flawless. Bisecting the problem doesn't work because I have not found out a way to reliably reproduce the crash. Relevant portions of the dmesg when crashing while booting and mouse-wiggling (via serial-console): [ 180.691199] Bluetooth: L2CAP ver 2.9 [ 180.695294] Bluetooth: L2CAP socket layer initialized [ 180.763617] Bluetooth: HIDP (Human Interface Emulation) ver 1.2 [ 181.272621] Bluetooth: RFCOMM socket layer initialized [ 181.294815] Bluetooth: RFCOMM TTY layer initialized [ 181.300399] Bluetooth: RFCOMM ver 1.8 [ 181.792272] kobject_add_internal failed for acl0007610D7C7E with -EEXIST, do. [ 181.806802] Pid: 5158, comm: btaddconn Not tainted 2.6.26-rc7-00051-g62786b96 [ 181.814798] [ 181.814799] Call Trace: [ 181.818473] [<ffffffff8035c2f7>] ? kobject_put+0x27/0x60 [ 181.825841] [<ffffffff8035c662>] kobject_add_internal+0x112/0x1c0 [ 181.833846] [<ffffffff8035c7c4>] kobject_add_varg+0x54/0x90 [ 181.841976] [<ffffffffa0293cb0>] ? :bluetooth:add_conn+0x0/0x90 [ 181.849829] [<ffffffff8035c8f9>] kobject_add+0x69/0x90 [ 181.857407] [<ffffffff80260371>] ? trace_hardirqs_on+0x121/0x170 [ 181.861516] [<ffffffff8024d802>] ? flush_cpu_workqueue+0x82/0xa0 [ 181.875179] [<ffffffff8025e369>] ? get_lock_stats+0x19/0x70 [ 181.879189] [<ffffffff8035c4aa>] ? kobject_get+0x1a/0x30 [ 181.888821] [<ffffffff803ddb17>] ? get_device+0x17/0x20 [ 181.892830] [<ffffffff803de521>] device_add+0xe1/0x630 [ 181.899005] [<ffffffff8035abf9>] ? __next_cpu+0x19/0x30 [ 181.907194] [<ffffffffa0293cb0>] ? :bluetooth:add_conn+0x0/0x90 [ 181.919084] [<ffffffffa0293cd9>] :bluetooth:add_conn+0x29/0x90 [ 181.923974] [<ffffffff8024d3e5>] run_workqueue+0xf5/0x240 [ 181.931983] [<ffffffff8024d5d7>] worker_thread+0xa7/0x120 [ 181.935951] [<ffffffff80251ac0>] ? autoremove_wake_function+0x0/0x40 [ 181.943954] [<ffffffff8024d530>] ? worker_thread+0x0/0x120 [ 181.948393] [<ffffffff80251719>] kthread+0x49/0x80 [ 181.956268] [<ffffffff8020d478>] child_rip+0xa/0x12 [ 181.964275] [<ffffffff8020cb63>] ? restore_args+0x0/0x30 [ 181.967361] [<ffffffff802516d0>] ? kthread+0x0/0x80 [ 181.972072] [<ffffffff8020d46e>] ? child_rip+0x0/0x12 [ 181.982170] [ 181.982694] add_conn: Failed to register connection device [ 181.989678] kobject_add_internal failed for acl0007610D7C7E with -EEXIST, do. [ 182.002233] Pid: 5158, comm: btaddconn Not tainted 2.6.26-rc7-00051-g62786b96 [ 182.010379] [ 182.010379] Call Trace: [ 182.014406] [<ffffffff8035c2f7>] ? kobject_put+0x27/0x60 [ 182.023722] [<ffffffff8035c662>] kobject_add_internal+0x112/0x1c0 [ 182.034150] [<ffffffff8035c7c4>] kobject_add_varg+0x54/0x90 [ 182.038164] [<ffffffffa0293cb0>] ? :bluetooth:add_conn+0x0/0x90 [ 182.046994] [<ffffffff8035c8f9>] kobject_add+0x69/0x90 [ 182.049975] [<ffffffff80260371>] ? trace_hardirqs_on+0x121/0x170 [ 182.058809] [<ffffffff8024d802>] ? flush_cpu_workqueue+0x82/0xa0 [ 182.063473] [<ffffffff8025e369>] ? get_lock_stats+0x19/0x70 [ 182.072313] [<ffffffff8035c4aa>] ? kobject_get+0x1a/0x30 [ 182.076324] [<ffffffff803ddb17>] ? get_device+0x17/0x20 [ 182.082138] [<ffffffff803de521>] device_add+0xe1/0x630 [ 182.087519] [<ffffffff8035abf9>] ? __next_cpu+0x19/0x30 [ 182.096994] [<ffffffffa0293cb0>] ? :bluetooth:add_conn+0x0/0x90 [ 182.101338] [<ffffffffa0293cd9>] :bluetooth:add_conn+0x29/0x90 [ 182.110184] [<ffffffff8024d3e5>] run_workqueue+0xf5/0x240 [ 182.114398] [<ffffffff8024d5d7>] worker_thread+0xa7/0x120 [ 182.119826] [<ffffffff80251ac0>] ? autoremove_wake_function+0x0/0x40 [ 182.128673] [<ffffffff8024d530>] ? worker_thread+0x0/0x120 [ 182.138404] [<ffffffff80251719>] kthread+0x49/0x80 [ 182.142554] [<ffffffff8020d478>] child_rip+0xa/0x12 [ 182.151788] [<ffffffff8020cb63>] ? restore_args+0x0/0x30 [ 182.155803] [<ffffffff802516d0>] ? kthread+0x0/0x80 [ 182.161261] [<ffffffff8020d46e>] ? child_rip+0x0/0x12 [ 182.170818] [ 182.172391] add_conn: Failed to register connection device [ 182.306041] BUG: unable to handle kernel NULL pointer dereference at 00000008 [ 182.450066] IP: [<ffffffff8031423a>] sysfs_find_dirent+0xa/0x40 [ 182.450066] PGD 7c0d4067 PUD 7c0d5067 PMD 0 [ 182.450066] Oops: 0000 [1] PREEMPT SMP [ 182.450066] CPU 3 [ 182.450066] Modules linked in: rfcomm hidp l2cap nls_iso8859_1 nls_cp437 vfas [ 182.450066] Pid: 7725, comm: hidd Not tainted 2.6.26-rc7-00051-g62786b9 #6 [ 182.450066] RIP: 0010:[<ffffffff8031423a>] [<ffffffff8031423a>] sysfs_find_0 [ 182.450066] RSP: 0018:ffff81007c171a48 EFLAGS: 00010296 [ 182.450066] RAX: 2222222222222222 RBX: ffffffff8057744f RCX: 2222222222222222 [ 182.450066] RDX: 2222222222222222 RSI: ffffffff8057744f RDI: 0000000000000000 [ 182.450066] RBP: ffff81007c171a58 R08: 0000000000000001 R09: 0000000000000003 [ 182.450066] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057744f [ 182.450066] R13: ffffffff805eb1e0 R14: ffff8101614b7910 R15: ffff8101614b7698 [ 182.450066] FS: 00007f720ab0f6e0(0000) GS:ffff8101614292c0(0000) knlGS:00000 [ 182.450066] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 182.450066] CR2: 0000000000000028 CR3: 000000007c04d000 CR4: 00000000000006e0 [ 182.450066] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 182.450066] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 182.450066] Process hidd (pid: 7725, threadinfo ffff81007c170000, task ffff8) [ 182.450066] Stack: ffffffff8057744f 0000000000000000 ffff81007c171a78 fffff0 [ 182.450066] 0000000000000000 ffff8101614b7760 ffff81007c171aa8 ffffffff80315 [ 182.450066] ffff8101614b7760 ffff8101614b7760 ffff8101614b7000 ffff81015d058 [ 182.450066] Call Trace: [ 182.450066] [<ffffffff80314380>] sysfs_get_dirent+0x30/0x80 [ 182.450066] [<ffffffff80315c75>] sysfs_remove_group+0x25/0xe0 [ 182.450066] [<ffffffff803e3f27>] dpm_sysfs_remove+0x17/0x20 [ 182.450066] [<ffffffff803e4652>] device_pm_remove+0x22/0x60 [ 182.450066] [<ffffffff803de24a>] device_del+0x1a/0x190 [ 182.450066] [<ffffffff803de3d1>] device_unregister+0x11/0x20 [ 182.450066] [<ffffffff80403fbb>] input_unregister_device+0x19b/0x1c0 [ 182.450066] [<ffffffffa011f94d>] :hid:hidinput_disconnect+0x3d/0x60 [ 182.450066] [<ffffffffa01220ca>] :hid:hidinput_connect+0x24fa/0x2530 [ 182.450066] [<ffffffffa04d5c99>] :hidp:hidp_add_connection+0x479/0x740 [ 182.450066] [<ffffffff802b992f>] ? fget+0x7f/0xa0 [ 182.450066] [<ffffffffa04d6c0b>] :hidp:hidp_sock_ioctl+0x20b/0x260 [ 182.450066] [<ffffffff8032e959>] ? avc_has_perm+0x59/0x80 [ 182.450066] [<ffffffff80330e35>] ? inode_has_perm+0x45/0x80 [ 182.450066] [<ffffffff80412caa>] ? sys_sendto+0xea/0x120 [ 182.450066] [<ffffffff804117f6>] sock_ioctl+0x56/0x260 [ 182.450066] [<ffffffff802c74a1>] vfs_ioctl+0x31/0xa0 [ 182.450066] [<ffffffff802c7783>] do_vfs_ioctl+0x273/0x2e0 [ 182.450066] [<ffffffff802c7889>] sys_ioctl+0x99/0xa0 [ 182.450066] [<ffffffff8020c50b>] system_call_after_swapgs+0x7b/0x80 [ 182.450066] [ 182.450066] [ 182.450066] Code: ff ff 0f 1f 80 00 00 00 00 48 8b 43 38 0f 1f 40 00 e9 64 f [ 182.450066] RIP [<ffffffff8031423a>] sysfs_find_dirent+0xa/0x40 [ 182.450066] RSP <ffff81007c171a48> [ 182.450066] CR2: 0000000000000028 [ 183.254096] ---[ end trace f38a864186b8e81d ]--- [ 590.949898] SysRq : Emergency Sync [ 591.632111] Emergency Sync complete [ 594.045739] SysRq : Emergency Remount R/O [ 594.045739] Emergency Remount complete
You sort-of imply that this is a regression, but that isn't clear?
Hmm, I think we had a similar issue earlier when we tear down connection that is not completely set up. Maybe Marcel has an idea.
Created attachment 16642 [details] Use a more unique bus name for connections
You can test this patch if it works. Also you can pull in the bluetooth-2.6 tree since it contains something like 22 pending patches.
You said 2.6.25-rc5 worked. Does 2.6.25 also work?
Two things: 1. Why kobject EXIST? I remember some csr dongles need to reset firstly before being used or will cause something like this. 2. How can we retreive the work function return value? any idea? If the device_add fails we ignore the return value now, so further bug is triggered.
> ------- Comment #5 from bunk@kernel.org 2008-06-26 14:05 ------- > You said 2.6.25-rc5 worked. > > Does 2.6.25 also work? Debian flavour 2.6.25-2 works. I only started tracking -linus very recently, but if you want I could recheck mainline 2.6.25. The problem is that it's sometimes very hard to hit: Yesterday I rebooted consecutively at least 20 times without hitting it, right this morning I've hitted it on the first boot-up.
I just saw that I screwed up the version numbers. I meant 2.6.26-rc7-git doesn't work and 2.6.26-rc5 _does_ work. So s/25-rc/26-rc/ in all my comments. Sorry for that mess-up.
> ------- Comment #4 from marcel@holtmann.org 2008-06-26 13:44 ------- > You can test this patch if it works. Also you can pull in the bluetooth-2.6 > tree since it contains something like 22 pending patches. I tested: - 2.6.26-rc8 vanilla, which crashed, - 2.6.26-rc8 + bluetooth-2.6 and your patched reverted (it did not apply to rc8), which eventually crashed and - 2.6.26-rc8 + bluetooth-2.6, which survived quite a lot of test-runs. So I think your patch does prevent the oops from happening.
While retesting mainline I capture an oops for the second case (i.e. beside the mouse everything works 'till shutdown, were presumably a process is stuck somewhere odd). This is on 2.6.26-rc8-00076-g1702b52 from -linus.
Created attachment 16670 [details] dmesg output for comment #10
Handled-By : Marcel Holtmann <marcel@holtmann.org>
On Thursday, 10 of July 2008, Daniel Vetter wrote: > Hi > > After about a dozen rebots I managed to crash 2.6.26-rc9-00005-g1b40a89 in > the same fashion. > > Yours Daniel > On Sun, Jul 06, 2008 at 01:45:53PM +0200, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.25. Please verify if it still should be listed. > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10989 > > Subject : kernel oopses when wiggling the mouse to make it > known to hidd > > Submitter : Daniel Vetter <daniel@ffwll.ch> > > Date : 2008-06-26 10:32 (11 days old)
On Monday, 14 of July 2008, Daniel Vetter wrote: > 2.6.26 crashed right away on the first boot-up. > > On Sun, Jul 13, 2008 at 08:00:46PM +0200, Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.25. Please verify if it still should be listed. > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10989 > > Subject : kernel oopses when wiggling the mouse to make it > known to hidd > > Submitter : Daniel Vetter <daniel@ffwll.ch> > > Date : 2008-06-26 10:32 (18 days old) > > Handled-By : Marcel Holtmann <marcel@holtmann.org>
I'm closing this (fix arrived in -linus in time for 2.6.27). btw: is it the reporters duty to close a bug in a case like this?