Bug 7942

Summary: ohci1394 module broken and cannot be removed
Product: Drivers Reporter: Robert Crocombe (rcrocomb)
Component: IEEE1394Assignee: Stefan Richter (stefanr)
Status: CLOSED CODE_FIX    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.20 Subsystem:
Regression: --- Bisected commit-id:
Attachments: ieee1394: fix host device registering when nodemgr disabled
ieee1394: fix host device registering when nodemgr disabled

Description Robert Crocombe 2007-02-05 14:58:30 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.19
Distribution: Fedora Core 5
Hardware Environment: 4-processor IWill H8502 (2.8GHz single core Opterons) w/
10GB of RAM installed 4/2/2/2 across 4 nodes.
Software Environment:

This is from the working kernel.  All have been built with the same versions of
software:

Linux bubba.tuc.us.ray.com 2.6.19_00 #2 SMP PREEMPT Mon Feb 5 15:20:05 MST 2007
x86_64 x86_64 x86_64 GNU/Linux

Gnu C                  4.1.1
Gnu make               3.80
binutils               2.16.91.0.6
util-linux             2.13-pre6
mount                  2.13-pre6
module-init-tools      3.2-pre9
e2fsprogs              1.38
quota-tools            3.13.
PPP                    2.4.3
Linux C Library        > libc.2.4
Dynamic linker (ldd)   2.4
Procps                 3.2.6
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.93
udev                   084
Modules Loaded         raw1394 ohci1394 ieee1394 tg3


Problem Description:

Looks like something broke between 2.6.19 and 2.6.20-rc1.

Immediately after boot, I tried to run a program that uses the bus via
libraw1394, but no ports are detected.  ohci1394 indicates via lsmod that it has
1 user immediately after boot before raw1394 was loaded (normally its 0). 
Additionally, the module cannot be unloaded, even with --force.

[root@bubba rcrocomb]# uptime
 16:03:14 up 2 min,  1 user,  load average: 2.03, 0.86, 0.32
[root@bubba rcrocomb]# lsmod
Module                  Size  Used by
ohci1394               41947  1
ieee1394              103984  1 ohci1394
tg3                   106884  0
[root@bubba rcrocomb]# rmmod ohci1394
ERROR: Module ohci1394 is in use

I have confirmed this with:

2.6.20-rt2
2.6.20
2.6.20-rc5
2.6.20-rc5-rt10
2.6.20-rc1

2.6.19 is okay.  Kernel configs are as identical as possible.

When booting 2.6.20-rt2 (only), an Oops is displayed: behavior is identical to
other kernels.

Feb  5 14:49:01 bubba kernel: ieee1394: nodemgr and IRM functionality disabled
Feb  5 14:49:01 bubba kernel: ACPI: PCI Interrupt 0000:02:04.0[A] -> GSI 72
(level, low) -> IRQ 72
Feb  5 14:49:01 bubba kernel: stopped custom tracer.
Feb  5 14:49:01 bubba kernel: Unable to handle kernel NULL pointer dereference
at 0000000000000000 RIP:
Feb  5 14:49:01 bubba kernel:  [<ffffffff803f3c2b>] klist_add_tail+0x39/0x49
Feb  5 14:49:01 bubba kernel: PGD 279aee067 PUD 180383067 PMD 0
Feb  5 14:49:01 bubba kernel: Oops: 0002 [1] PREEMPT SMP
Feb  5 14:49:01 bubba kernel: CPU 0
Feb  5 14:49:01 bubba kernel: Modules linked in: ohci1394 ieee1394 tg3
Feb  5 14:49:01 bubba kernel: Pid: 1344, comm: modprobe Not tainted 2.6.20-rt2_00 #2
Feb  5 14:49:01 bubba kernel: RIP: 0010:[<ffffffff803f3c2b>] 
[<ffffffff803f3c2b>] klist_add_tail+0x39/0x49
Feb  5 14:49:01 bubba kernel: RSP: 0018:ffff810279b3bce8  EFLAGS: 00010246
Feb  5 14:49:01 bubba kernel: RAX: ffffffff880342d0 RBX: ffffffff88034298 RCX:
0000000000000000
Feb  5 14:49:01 bubba kernel: RDX: ffff810002bf41f8 RSI: ffffffff80487511 RDI:
ffffffff88034298
Feb  5 14:49:01 bubba kernel: RBP: ffff810002bf43b8 R08: ffff810037c4cf40 R09:
ffffffff880337a0
Feb  5 14:49:01 bubba kernel: R10: 000000000000003b R11: ffffffff80536a40 R12:
ffff810002bf41f0
Feb  5 14:49:01 bubba kernel: R13: 0000000000000000 R14: 0000000000000000 R15:
000000000000000f
Feb  5 14:49:01 bubba kernel: FS:  00002b71f7fd81f0(0000)
GS:ffffffff80515100(0000) knlGS:0000000000000000
Feb  5 14:49:01 bubba kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb  5 14:49:01 bubba kernel: CR2: 0000000000000000 CR3: 0000000279b8e000 CR4:
00000000000006e0
Feb  5 14:49:01 bubba kernel: Process modprobe (pid: 1344, threadinfo
ffff810279b3a000, task ffff8102797a7080)
Feb  5 14:49:01 bubba kernel: Stack:  ffff810037c4cf40 ffff810002bf4128
0000000000000000 ffffffff8034af87
Feb  5 14:49:01 bubba kernel:  ffff810002bf4128 ffffffff8034afba
ffff810002bf4430 ffff810002bf4128
Feb  5 14:49:01 bubba kernel:  ffffffff880337a0 ffffffff8034a15d
0000000000000000 0000000000000000
Feb  5 14:49:01 bubba kernel: Call Trace:
Feb  5 14:49:01 bubba kernel:  [<ffffffff8034af87>] device_bind_driver+0x9/0x12
Feb  5 14:49:01 bubba kernel:  [<ffffffff8034afba>] device_attach+0x2a/0x5d
Feb  5 14:49:01 bubba kernel:  [<ffffffff8034a15d>] bus_attach_device+0x23/0x49
Feb  5 14:49:01 bubba kernel:  [<ffffffff803492b7>] device_add+0x359/0x509
Feb  5 14:49:01 bubba kernel:  [<ffffffff8801e318>]
:ieee1394:hpsb_alloc_host+0x20d/0x24e
Feb  5 14:49:01 bubba kernel:  [<ffffffff8803b010>]
:ohci1394:ohci1394_pci_probe+0x45/0x625
Feb  5 14:49:01 bubba kernel:  [<ffffffff8030a92e>] pci_device_probe+0xcd/0x134
Feb  5 14:49:01 bubba kernel:  [<ffffffff8034add8>] really_probe+0x87/0x10c
Feb  5 14:49:01 bubba kernel:  [<ffffffff8034af52>] __driver_attach+0x46/0x6d
Feb  5 14:49:02 bubba kernel:  [<ffffffff8034af0c>] __driver_attach+0x0/0x6d
Feb  5 14:49:02 bubba kernel:  [<ffffffff8034a414>] bus_for_each_dev+0x43/0x6e
Feb  5 14:49:02 bubba kernel:  [<ffffffff8034a70b>] bus_add_driver+0x6e/0x190
Feb  5 14:49:02 bubba kernel:  [<ffffffff8030ab32>] __pci_register_driver+0x85/0xba
Feb  5 14:49:02 bubba kernel:  [<ffffffff80292f74>] sys_init_module+0xab/0x169
Feb  5 14:49:02 bubba kernel:  [<ffffffff8025639e>] system_call+0x7e/0x83
Feb  5 14:49:02 bubba kernel:
Feb  5 14:49:02 bubba kernel:
Feb  5 14:49:02 bubba kernel: Code: 48 89 11 48 89 4a 08 59 5b 41 5c e9 ee 75 e6
ff 41 54 49 89
Feb  5 14:49:02 bubba kernel: RIP  [<ffffffff803f3c2b>] klist_add_tail+0x39/0x49
Feb  5 14:49:02 bubba kernel:  RSP <ffff810279b3bce8>
Feb  5 14:49:02 bubba kernel: CR2: 0000000000000000


Steps to reproduce:

Boot any of aforementioned kernels.
Comment 1 Stefan Richter 2007-02-05 15:15:36 UTC
Could you try with this reverted? "ieee1394: nodemgr: fix deadlock in shutdown"
http://git2.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8252bbb1363b7fe963a3eb6f8a36da619a6f5a65
(That one was to fix bug 6706, alas there is now bug 7792... grrr.)
Comment 2 Stefan Richter 2007-02-05 15:18:44 UTC
Actually, that commit isn't in effect on your setup: "ieee1394: nodemgr and IRM
functionality disabled" --- i.e. reverting it won't change anything AFAICS.
Comment 3 Stefan Richter 2007-02-05 15:24:44 UTC
Testing here on IA32, 2.6.20-rc6 + latest 1394 drivers:
# modprobe ieee1394 disable_nodmgr=1
# modprobe ohci1394
-> spinlock lockup
Comment 4 Stefan Richter 2007-02-05 15:37:19 UTC
Could you try 2.6.19.2 plus a patch from
http://me.in-berlin.de/~s5r6/linux1394/updates/2.6.19.y/ ?
Either of v250 or v273 or OK for this purpose; v250 is almost identical to what
is in 2.6.20.

Did you use disable_nodemgr=1 under 2.6.19 too?
Comment 5 Stefan Richter 2007-02-05 16:11:54 UTC
Re comment #2: Hmm, maybe it _is_ effective:
hpsb_alloc_host() adds a host device and host class device which, according to
nodemgr_dev_template_host has
    .driver		= &nodemgr_mid_layer_driver,

However nodemgr_mid_layer_driver is registered with the driver core in
init_ieee1394_nodemgr() which is not run if disable_nodemgr=1.
Comment 6 Stefan Richter 2007-02-05 16:34:13 UTC
Created attachment 10304 [details]
ieee1394: fix host device registering when nodemgr disabled

Appears to fix the issue for me.
Sorry, the regression was mostly my fault.
Comment 7 Stefan Richter 2007-02-05 17:13:14 UTC
Created attachment 10306 [details]
ieee1394: fix host device registering when nodemgr disabled
Comment 8 Stefan Richter 2007-02-06 10:34:17 UTC
will submit the fix to Linus and -stable soon
Comment 9 Stefan Richter 2007-02-09 15:41:39 UTC
patch committed to 2.6.20-git# and proposed for 2.6.20.1