Created attachment 24607 [details] screenshot of panic in fwnet_write_complete Reported at http://marc.info/?l=linux1394-devel&m=126341793319333 on 2010-01-13: >>> I run Archlinux x86, kernel 2.6.32.2-2 is from it's repo. The system falls to kernel panic when I copy lots of files (10Gb) from laptop with rsync or other progs via firewire. This happens in 90% cases within 1 minute of copying. Moving mouse or hitting keys on the keyboard make kernel panic sooner. lesser MTU value has no effect. I had no kernel panic yet with "-maxcpus=1" boot option. cpu is C2D E6550 -------------------- # lspci ... 04:03.0 FireWire (IEEE 1394): Agere Systems FW322/323 (rev 70) -------------------- traces look like this (see the screenshot: http://www.postimage.org/image.php?v=Pq1AO6Dr ): -------------------- ... Call Trace: [<xxxxxxxx>] ? ip_rcv_finish+0xeb/0x390 [<xxxxxxxx>] ? close_transaction+0xda/0xf0 [firewire_core] [<xxxxxxxx>] ? handle_at_packet+0xa1/0x100 [firewire_ohci] ... EIP: [<xxxxxxxx>] fwnet_write_complete+0x2c/0x160 [firewire_net] SS:ESP 0068:f7079e40 ... <<<
Created attachment 24608 [details] new screenshot
Created attachment 24609 [details] not a patch Hi. Thanks for creating the bug ticket. I followed your recommendation partially. Instead of INIT_LIST_HEAD(&ptask->pt_link) I initialize prev and next to 0, so I know the ptask is broken. Note, the attached patch fixes nothing, just shows what is wrong. see the new screenshot http://bugzilla.kernel.org/attachment.cgi?id=24608
Proposed patch: http://lkml.org/lkml/2010/1/18/438
Tried the patch. Problem stays, symptoms changed: 1) At first I started rsync on tty1. All worked fine until I switched to X. It showed me the desktop and the system hung. Keyboard leds not blinking. 2) started rsync in gnome-terminal. All worked fine until I switched to tty1. It flooded several screens with call traces and hung. 3) After the 3rd restart /sys/class/firewire0 not created and dmesg says nothing, although firewire_net loaded with no error messages. ...still playing. From the beginning there was another problem; I planned to report it separately, but maybe they're connected: very often firewire_net not auto-loaded and when I try to modprobe it, or to ifconfig firewire0 or to ping, dmesg says: firewire_ohci: isochronous cycle inconsistent or firewire_core: giving up on config rom for node id ffc1 and the only solution is to reboot and try again.
Reply-To: stefanr@s5r6.in-berlin.de bugzilla-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=15077 > > > --- Comment #4 from leniviy <basinilya@gmail.com> 2010-01-19 19:20:24 --- > Tried the patch. Problem stays, symptoms changed: > 1) > At first I started rsync on tty1. All worked fine until I switched to X. It > showed me the desktop and the system hung. Keyboard leds not blinking. > > 2) started rsync in gnome-terminal. All worked fine until I switched to tty1. > It flooded several screens with call traces and hung. Could both be the same kmemcache corruption bug which I saw, or could be anything else. > 3) After the 3rd restart /sys/class/firewire0 not created and dmesg says > nothing, although firewire_net loaded with no error messages. Most likely unrelated. > ...still playing. > >>From the beginning there was another problem; I planned to report it > separately, but maybe they're connected: very often firewire_net not > auto-loaded and when I try to modprobe it, or to ifconfig firewire0 or to > ping, > dmesg says: > firewire_ohci: isochronous cycle inconsistent Shouldn't be an issue if it only happens on bus topology changes. It is known to happen when the cycle master changes. It should not affect firewire-net. > or > firewire_core: giving up on config rom for node id ffc1 > and the only solution is to reboot and try again. In this situation, # modprobe -r firewire-ohci # modprobe firewire-ohci debug=7 could give more information. This could be unreliable hardware. The PC which gave up had the Agere FW322/323, right? What controller is on the peer?
Created attachment 24691 [details] dmesg with comments I think http://lkml.org/lkml/2010/1/18/438 fixes this very bug. But you were probably right about memory corruption. Fixing this bug just revealed that bug.
fwnet_write_complete fix from comment 3 was merged into 2.6.33. Renaming this bug to the one in prio_tree_left per comment 4/ comment 6 (attachment 24691 [details]). Also note the new crash in cache_free_debugcheck per comment 5 (http://lkml.org/lkml/2010/1/18/488).
Candidate fixes by Clemens Ladisch: http://thread.gmane.org/gmane.linux.kernel.firewire.devel/14502
The patches from comment 8 work for me. firewire-net is still not performing very well and sometimes connections break down entirely (workaround: reload firewire-ohci), but there are no crashes anymore.
Fixes merged into mainline, to appear in 2.6.37-rc2, also submitted for inclusion into currently active stable series.