Most recent kernel where this bug did *NOT* occur: 2.6.19 Distribution: Fedora Core 5 Hardware Environment: 4-processor IWill H8502 (2.8GHz single core Opterons) w/ 10GB of RAM installed 4/2/2/2 across 4 nodes. Software Environment: This is from the working kernel. All have been built with the same versions of software: Linux bubba.tuc.us.ray.com 2.6.19_00 #2 SMP PREEMPT Mon Feb 5 15:20:05 MST 2007 x86_64 x86_64 x86_64 GNU/Linux Gnu C 4.1.1 Gnu make 3.80 binutils 2.16.91.0.6 util-linux 2.13-pre6 mount 2.13-pre6 module-init-tools 3.2-pre9 e2fsprogs 1.38 quota-tools 3.13. PPP 2.4.3 Linux C Library > libc.2.4 Dynamic linker (ldd) 2.4 Procps 3.2.6 Net-tools 1.60 Kbd 1.12 Sh-utils 5.93 udev 084 Modules Loaded raw1394 ohci1394 ieee1394 tg3 Problem Description: Looks like something broke between 2.6.19 and 2.6.20-rc1. Immediately after boot, I tried to run a program that uses the bus via libraw1394, but no ports are detected. ohci1394 indicates via lsmod that it has 1 user immediately after boot before raw1394 was loaded (normally its 0). Additionally, the module cannot be unloaded, even with --force. [root@bubba rcrocomb]# uptime 16:03:14 up 2 min, 1 user, load average: 2.03, 0.86, 0.32 [root@bubba rcrocomb]# lsmod Module Size Used by ohci1394 41947 1 ieee1394 103984 1 ohci1394 tg3 106884 0 [root@bubba rcrocomb]# rmmod ohci1394 ERROR: Module ohci1394 is in use I have confirmed this with: 2.6.20-rt2 2.6.20 2.6.20-rc5 2.6.20-rc5-rt10 2.6.20-rc1 2.6.19 is okay. Kernel configs are as identical as possible. When booting 2.6.20-rt2 (only), an Oops is displayed: behavior is identical to other kernels. Feb 5 14:49:01 bubba kernel: ieee1394: nodemgr and IRM functionality disabled Feb 5 14:49:01 bubba kernel: ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 72 (level, low) -> IRQ 72 Feb 5 14:49:01 bubba kernel: stopped custom tracer. Feb 5 14:49:01 bubba kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: Feb 5 14:49:01 bubba kernel: [<ffffffff803f3c2b>] klist_add_tail+0x39/0x49 Feb 5 14:49:01 bubba kernel: PGD 279aee067 PUD 180383067 PMD 0 Feb 5 14:49:01 bubba kernel: Oops: 0002 [1] PREEMPT SMP Feb 5 14:49:01 bubba kernel: CPU 0 Feb 5 14:49:01 bubba kernel: Modules linked in: ohci1394 ieee1394 tg3 Feb 5 14:49:01 bubba kernel: Pid: 1344, comm: modprobe Not tainted 2.6.20-rt2_00 #2 Feb 5 14:49:01 bubba kernel: RIP: 0010:[<ffffffff803f3c2b>] [<ffffffff803f3c2b>] klist_add_tail+0x39/0x49 Feb 5 14:49:01 bubba kernel: RSP: 0018:ffff810279b3bce8 EFLAGS: 00010246 Feb 5 14:49:01 bubba kernel: RAX: ffffffff880342d0 RBX: ffffffff88034298 RCX: 0000000000000000 Feb 5 14:49:01 bubba kernel: RDX: ffff810002bf41f8 RSI: ffffffff80487511 RDI: ffffffff88034298 Feb 5 14:49:01 bubba kernel: RBP: ffff810002bf43b8 R08: ffff810037c4cf40 R09: ffffffff880337a0 Feb 5 14:49:01 bubba kernel: R10: 000000000000003b R11: ffffffff80536a40 R12: ffff810002bf41f0 Feb 5 14:49:01 bubba kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000f Feb 5 14:49:01 bubba kernel: FS: 00002b71f7fd81f0(0000) GS:ffffffff80515100(0000) knlGS:0000000000000000 Feb 5 14:49:01 bubba kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 5 14:49:01 bubba kernel: CR2: 0000000000000000 CR3: 0000000279b8e000 CR4: 00000000000006e0 Feb 5 14:49:01 bubba kernel: Process modprobe (pid: 1344, threadinfo ffff810279b3a000, task ffff8102797a7080) Feb 5 14:49:01 bubba kernel: Stack: ffff810037c4cf40 ffff810002bf4128 0000000000000000 ffffffff8034af87 Feb 5 14:49:01 bubba kernel: ffff810002bf4128 ffffffff8034afba ffff810002bf4430 ffff810002bf4128 Feb 5 14:49:01 bubba kernel: ffffffff880337a0 ffffffff8034a15d 0000000000000000 0000000000000000 Feb 5 14:49:01 bubba kernel: Call Trace: Feb 5 14:49:01 bubba kernel: [<ffffffff8034af87>] device_bind_driver+0x9/0x12 Feb 5 14:49:01 bubba kernel: [<ffffffff8034afba>] device_attach+0x2a/0x5d Feb 5 14:49:01 bubba kernel: [<ffffffff8034a15d>] bus_attach_device+0x23/0x49 Feb 5 14:49:01 bubba kernel: [<ffffffff803492b7>] device_add+0x359/0x509 Feb 5 14:49:01 bubba kernel: [<ffffffff8801e318>] :ieee1394:hpsb_alloc_host+0x20d/0x24e Feb 5 14:49:01 bubba kernel: [<ffffffff8803b010>] :ohci1394:ohci1394_pci_probe+0x45/0x625 Feb 5 14:49:01 bubba kernel: [<ffffffff8030a92e>] pci_device_probe+0xcd/0x134 Feb 5 14:49:01 bubba kernel: [<ffffffff8034add8>] really_probe+0x87/0x10c Feb 5 14:49:01 bubba kernel: [<ffffffff8034af52>] __driver_attach+0x46/0x6d Feb 5 14:49:02 bubba kernel: [<ffffffff8034af0c>] __driver_attach+0x0/0x6d Feb 5 14:49:02 bubba kernel: [<ffffffff8034a414>] bus_for_each_dev+0x43/0x6e Feb 5 14:49:02 bubba kernel: [<ffffffff8034a70b>] bus_add_driver+0x6e/0x190 Feb 5 14:49:02 bubba kernel: [<ffffffff8030ab32>] __pci_register_driver+0x85/0xba Feb 5 14:49:02 bubba kernel: [<ffffffff80292f74>] sys_init_module+0xab/0x169 Feb 5 14:49:02 bubba kernel: [<ffffffff8025639e>] system_call+0x7e/0x83 Feb 5 14:49:02 bubba kernel: Feb 5 14:49:02 bubba kernel: Feb 5 14:49:02 bubba kernel: Code: 48 89 11 48 89 4a 08 59 5b 41 5c e9 ee 75 e6 ff 41 54 49 89 Feb 5 14:49:02 bubba kernel: RIP [<ffffffff803f3c2b>] klist_add_tail+0x39/0x49 Feb 5 14:49:02 bubba kernel: RSP <ffff810279b3bce8> Feb 5 14:49:02 bubba kernel: CR2: 0000000000000000 Steps to reproduce: Boot any of aforementioned kernels.
Could you try with this reverted? "ieee1394: nodemgr: fix deadlock in shutdown" http://git2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8252bbb1363b7fe963a3eb6f8a36da619a6f5a65 (That one was to fix bug 6706, alas there is now bug 7792... grrr.)
Actually, that commit isn't in effect on your setup: "ieee1394: nodemgr and IRM functionality disabled" --- i.e. reverting it won't change anything AFAICS.
Testing here on IA32, 2.6.20-rc6 + latest 1394 drivers: # modprobe ieee1394 disable_nodmgr=1 # modprobe ohci1394 -> spinlock lockup
Could you try 2.6.19.2 plus a patch from http://me.in-berlin.de/~s5r6/linux1394/updates/2.6.19.y/ ? Either of v250 or v273 or OK for this purpose; v250 is almost identical to what is in 2.6.20. Did you use disable_nodemgr=1 under 2.6.19 too?
Re comment #2: Hmm, maybe it _is_ effective: hpsb_alloc_host() adds a host device and host class device which, according to nodemgr_dev_template_host has .driver = &nodemgr_mid_layer_driver, However nodemgr_mid_layer_driver is registered with the driver core in init_ieee1394_nodemgr() which is not run if disable_nodemgr=1.
Created attachment 10304 [details] ieee1394: fix host device registering when nodemgr disabled Appears to fix the issue for me. Sorry, the regression was mostly my fault.
Created attachment 10306 [details] ieee1394: fix host device registering when nodemgr disabled
will submit the fix to Linus and -stable soon
patch committed to 2.6.20-git# and proposed for 2.6.20.1