Bug 8704
Summary: | Ethernet over firewire 4pin<->4pin cable crashes in high traffic!! | ||
---|---|---|---|
Product: | Drivers | Reporter: | Cijoml Cijomlovic Cijomlov (cijoml) |
Component: | IEEE1394 | Assignee: | Stefan Richter (stefanr) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | stefanr |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.22.2 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | gscanbus output |
Description
Cijoml Cijomlovic Cijomlov
2007-07-02 09:43:47 UTC
The type of cable shouldn't matter. Any combination of 4-, 6-, 9-pin plugs should work the same. The log messages from ieee1394 indicate that the nodes see each other. If you have gscanbus (low-level FireWire test tool with GUI), then this should too be able to "see" another Linux node connected. Questions: - When do the "Node suspended" messages happen? Arbitrarily while you want to use the eth1394 interface, or simply _after_ you were done with eth1394 and unplugged the cable? - Do you load the ieee1394 drivers with parameters different from their defaults? (grep -r 1394 /etc/modprobe.conf /etc/modprobe.d/) Node suspended is done "itself" - cable connected, but I see these messages. I used default params no additional parameters. Well, that's bad. It means that the connection was lost on a low IEEE 1394 level. I don't know why. Could yo please do this on both PCs after such a suspended message: $ cat /sys/bus/ieee1394/devices/fw-host0/node_count $ cat /sys/bus/ieee1394/devices/fw-host0/nodes_active Well so here is the update - I made some tests and disconnect looks like I moved cables - so now I don't see any suspended. Both nodes shows: Listik:~# cat /sys/bus/ieee1394/devices/fw-host0/nodes_active 2 Listik:~# cat /sys/bus/ieee1394/devices/fw-host0/node_count 2 notas:~# cat /sys/bus/ieee1394/devices/fw-host0/nodes_active 2 notas:~# cat /sys/bus/ieee1394/devices/fw-host0/node_count 2 But no ping/ssh/tcp/udp traffic is going through Created attachment 11936 [details]
gscanbus output
Attaching gscanbus output to see atributes of both hosts.
Hi Stefan, any updates/news in this bugreport? Michal Looks like ieee1394 and eth1394 should have all information they need. Do all of the following symlinks exist on both PCs when they are connected? /sys/bus/ieee1394/devices/00030d002558ad3d /sys/bus/ieee1394/devices/00030d002558ad3d-0 /sys/bus/ieee1394/devices/0090e600000024f7 /sys/bus/ieee1394/devices/0090e600000024f7-0 Hi, I moved to 2.6.22.2 kernel and now communication works, but under HIGH traffic it crashes. I tried ping (works) but in the moment I try scp copy from 1 host to other I gett to following problem: Laptop 1 (pcmcia card) ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[10] MMIO=[5c003000-5c0037ff] Max Packet=[2048] IR/IT contexts=[4/8] eth1394: eth1: IPv4 over IEEE 1394 (fw-host0) ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Node added: ID:BUS[0-01:1023] GUID[00030d002558ad3d] ieee1394: Host added: ID:BUS[0-00:1023] GUID[0090e600000024f7] ohci1394: fw-host0: AT dma reset ctx=0, aborting transmission ieee1394: Node suspended: ID:BUS[0-01:1023] GUID[00030d002558ad3d] ieee1394: Node resumed: ID:BUS[0-01:1023] GUID[00030d002558ad3d] Laptop 2 (onboard controller) ieee1394: The root node is not cycle master capable; selecting a new root node and resetting... ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[0090e600000024f7] ohci1394: fw-host0: AT dma reset ctx=0, aborting transmission ohci1394: fw-host0: AT dma reset ctx=1, aborting transmission ieee1394: Node changed: 0-01:1023 -> 0-00:1023 ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0090e600000024f7] ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[0090e600000024f7] ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ieee1394: Node changed: 0-01:1023 -> 0-00:1023 ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[0090e600000024f7] usb 4-5: USB disconnect, address 50 ieee1394: The root node is not cycle master capable; selecting a new root node and resetting... ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[0090e600000024f7] notas:/home/cijoml# scp /usr/src/linux-2.6.22.2.tar.bz2 10.0.0.2:/usr/src/ Password: linux-2.6.22.2.tar.bz2 10% 4744KB 0.2KB/s - stalled -n 64 bytes from 10.0.0.2: icmp_seq=48 ttl=64 time=0.346 ms 64 bytes from 10.0.0.2: icmp_seq=49 ttl=64 time=0.346 ms 64 bytes from 10.0.0.2: icmp_seq=50 ttl=64 time=0.354 ms 64 bytes from 10.0.0.2: icmp_seq=51 ttl=64 time=0.333 ms 64 bytes from 10.0.0.2: icmp_seq=52 ttl=64 time=0.348 ms 64 bytes from 10.0.0.2: icmp_seq=53 ttl=64 time=0.342 ms 64 bytes from 10.0.0.2: icmp_seq=54 ttl=64 time=0.340 ms 64 bytes from 10.0.0.2: icmp_seq=55 ttl=64 time=0.304 ms 64 bytes from 10.0.0.2: icmp_seq=56 ttl=64 time=0.471 ms From 10.0.0.1 icmp_seq=86 Destination Host Unreachable From 10.0.0.1 icmp_seq=87 Destination Host Unreachable From 10.0.0.1 icmp_seq=88 Destination Host Unreachable From 10.0.0.1 icmp_seq=90 Destination Host Unreachable From 10.0.0.1 icmp_seq=91 Destination Host Unreachable From 10.0.0.1 icmp_seq=92 Destination Host Unreachable From 10.0.0.1 icmp_seq=94 Destination Host Unreachable From 10.0.0.1 icmp_seq=95 Destination Host Unreachable From 10.0.0.1 icmp_seq=96 Destination Host Unreachable From 10.0.0.1 icmp_seq=98 Destination Host Unreachable From 10.0.0.1 icmp_seq=99 Destination Host Unreachable From 10.0.0.1 icmp_seq=100 Destination Host Unreachable From 10.0.0.1 icmp_seq=102 Destination Host Unreachable From 10.0.0.1 icmp_seq=103 Destination Host Unreachable From 10.0.0.1 icmp_seq=104 Destination Host Unreachable top - 10:39:06 up 4:22, 1 user, load average: 1.53, 1.21, 1.21 Tasks: 111 total, 2 running, 109 sleeping, 0 stopped, 0 zombie Cpu(s): 4.0%us, 50.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 1.7%hi, 44.4%si, 0.0%st Mem: 1018616k total, 449668k used, 568948k free, 42612k buffers Swap: 698816k total, 0k used, 698816k free, 188048k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3 root 39 19 0 0 0 R 60.7 0.0 3:28.48 ksoftirqd/0 5 root 10 -5 0 0 0 S 34.5 0.0 1:16.44 events/0 3346 root 5 -10 176m 43m 3632 S 2.3 4.4 27:39.64 Xorg 11559 cijoml 15 0 33768 15m 11m S 2.3 1.6 0:19.27 konsole 23442 cijoml 15 0 2360 1144 864 R 0.7 0.1 0:00.66 top 3891 cijoml 15 0 33868 15m 12m S 0.3 1.6 2:05.32 kicker 22574 cijoml 15 0 58812 40m 19m S 0.3 4.1 1:34.33 konqueror 1 root 15 0 2064 700 600 S 0.0 0.1 0:00.83 init 2 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 6 root 10 -5 0 0 0 S 0.0 0.0 0:00.19 khelper 35 root 10 -5 0 0 0 S 0.0 0.0 0:00.16 kblockd/0 36 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 37 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpi_notify 132 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0 133 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux 134 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 ksuspend_usbd 137 root 13 -5 0 0 0 S 0.0 0.0 0:00.04 khubd 139 root 10 -5 0 0 0 S 0.0 0.0 0:00.04 kseriod 159 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pdflush 160 root 15 0 0 0 0 S 0.0 0.0 0:01.13 pdflush 161 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0 212 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 847 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 pccardd 849 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 pccardd 865 root 10 -5 0 0 0 S 0.0 0.0 0:00.05 kondemand/0 878 root 10 -5 0 0 0 S 0.0 0.0 0:02.35 kjournald 980 root 16 -4 2888 1220 368 S 0.0 0.1 0:01.13 udevd 1290 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 1437 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khpsbpkt 1438 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 knodemgrd_0 Please do the following for me: Check which of the log files receives kernel messages. On many distros it is /var/log/messages, on some it is /var/log/syslog; there may be others. Then let a "tail -f /var/log/messages" (or whatever) running in one terminal while you generate eth1394 traffic in another terminal. As soon as traffic breaks down, check if there were kernel messages right in this moment. And what kind of machines do you have connected? (I.e. which processor horsepower do they have?) Hi, I use Debian - only logs I receive are stored in dmesg and same in syslog. laptop with internal FW card - Pentium M 1.7 GHz laptop with PCMCIA card - Pentium Celeron (P4 2.4 GHz) BTW I think problem is with this in top: 3 root 39 19 0 0 0 R 60.7 0.0 3:28.48 ksoftirqd/0 ^^^ Which disappeared when I removed pcmcia card from slot (only way howto get communication back to work) Right, the values for ksoftirqd/0 and events/0 are insane from what I can tell. Could you watch how fast the CPU time of ksoftirqd increases over a suitably long period, both while you generate traffic through eth1394 and also after that with eth1394 unloaded from the kernel? ("ps aux" shows the CPU time too.) I almost forgot: In the meantime, the Sparse checker found a bug in eth1394 regarding spinlock usage when handling incoming data. I don't know if this is in any way related to this bug here with high CPU load from softirqd and events thread. The patch is in -mm, linux1394-2.6.git, and at http://me.in-berlin.de/~s5r6/linux1394/updates/. Hi Stefan, I just tested 2.6.23 kernel and now eth code work - via SSH I moved 1.7 GB without any problem. Only transport speed 7 MB/s looks little bit small, but maybe it is caused by encryption 73%/30% CPU usage at both nodes. Thanks for staying patient and testing each current kernel. I'm glad that it works now; I'm not sure though which of the 2.6.22(.2)->2.6.23 changes did the trick... The throughput of IP over FireWire 400 (~400 Mbit/s base rate) is roughly comparable with Fast Ethernet (100 Mbit/s) due to considerably larger overhead, and perhaps lack of throughput optimization in Linux' IEEE 1394 drivers. And if a CPU intensive task is going on at the same time, IP over 1394 throughput will suffer more or less. I observed the latter also on OS X, where an older machine had a considerable drop in IP over 1394 throughput if SSH was used instead of FTP. |