Most recent kernel where this bug did not occur: N/A Distribution: Debian testing/unstable Hardware Environment: Intel Pentium M 1.5 Ghz RAM: 768MB Software Environment: Debian testing/unstable Vanilla Kernel 2.6.17.11 + Software Suspend 2 Problem Description: On resume from a hibernated state, ieee1394 networking doesn't work. The ifconfig out shows everything correct but you cannot ping to any host on that network. Reloading the modules also doesn't help. Steps to reproduce: 1) Install linux with Software Suspend 2 2) Hibernate. 3) On resume, the ieee1394 network doesn't work. Here are the logs: While doing a ping, the kernel logged the following messages. ohci1394: fw-host0: Error in reception of SelfID packets [0x00030014/0x000932d1] (count: 0) ohci1394: fw-host0: Error in reception of SelfID packets [0x00040014/0x000932d1] (count: 1) ohci1394: fw-host0: Error in reception of SelfID packets [0x00050014/0x000932d1] (count: 2) ohci1394: fw-host0: Error in reception of SelfID packets [0x00060014/0x000932d1] (count: 3) ohci1394: fw-host0: Error in reception of SelfID packets [0x00070014/0x000932d1] (count: 4) ohci1394: fw-host0: Error in reception of SelfID packets [0x00080014/0x000932d1] (count: 5) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x000a0014/0x000932d1] (count: 0) ohci1394: fw-host0: Error in reception of SelfID packets [0x000b0014/0x000932d1] (count: 1) ohci1394: fw-host0: Error in reception of SelfID packets [0x000c0014/0x000932d1] (count: 2) ohci1394: fw-host0: Error in reception of SelfID packets [0x000d0014/0x000932d1] (count: 3) ohci1394: fw-host0: Error in reception of SelfID packets [0x000e0014/0x000932d1] (count: 4) ohci1394: fw-host0: Error in reception of SelfID packets [0x000f0014/0x000932d1] (count: 5) ohci1394: fw-host0: Error in reception of SelfID packets [0x00100014/0x000932d1] (count: 6) ohci1394: fw-host0: Error in reception of SelfID packets [0x00110014/0x000932d1] (count: 7) ohci1394: fw-host0: Error in reception of SelfID packets [0x00120014/0x000932d1] (count: 8) ohci1394: fw-host0: Error in reception of SelfID packets [0x00130014/0x000932d1] (count: 9) ohci1394: fw-host0: Error in reception of SelfID packets [0x00140014/0x000932d1] (count: 10) ohci1394: fw-host0: Error in reception of SelfID packets [0x00150014/0x000932d1] (count: 11) ohci1394: fw-host0: Error in reception of SelfID packets [0x00160014/0x000932d1] (count: 12) ohci1394: fw-host0: Error in reception of SelfID packets [0x00170014/0x000932d1] (count: 13) ohci1394: fw-host0: Error in reception of SelfID packets [0x00180014/0x000932d1] (count: 14) ohci1394: fw-host0: Error in reception of SelfID packets [0x00190014/0x000932d1] (count: 15) ohci1394: fw-host0: Error in reception of SelfID packets [0x001a0014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x001b0014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x001c0014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x001d0014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x001e0014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ieee1394: Stopping reset loop for IRM sanity During reload, following messages were observed. ohci1394: fw-host0: Error in reception of SelfID packets [0x001f0014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x00200014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x00210014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x00220014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x00230014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ohci1394: fw-host0: Error in reception of SelfID packets [0x00240014/0x000932d1] (count: 16) ohci1394: fw-host0: Too many errors on SelfID error reception, giving up! ieee1394: impossible ack_complete from node 65535 (tcode 4) ieee1394: Current remote IRM is not 1394a-2000 compliant, resetting... ieee1394: Stopping reset loop for IRM sanity ieee1394: Node removed: ID:BUS[0-00:1023] GUID[00c09f00001f8a88] ieee1394: Node removed: ID:BUS[0-00:1023] GUID[354fc0002226c838] ieee1394: Unknown parameter `sbp2' ieee1394: Initialized config rom entry `ip1394' ACPI: PCI Interrupt 0000:02:07.0[A] -> Link [LNKF] -> GSI 11 (level, low) -> IRQ 11 ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[11] MMIO=[e0205000-e02057ff] Max Packet=[2048] IR/IT contexts=[4/8] ieee1394: Host added: ID:BUS[0-00:1023] GUID[00c09f00001f8a88] ieee1394: Node added: ID:BUS[0-01:1023] GUID[354fc0002226c838] eth1394: eth2: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0) ieee1394: sbp2: Driver forced to serialize I/O (serialize_io=1) ieee1394: sbp2: Try serialize_io=0 for better performance
Apart from some platform code for PPC, ohci1394's suspend and resume hooks do not save and restore highlevel configuration. (Only save and restore of PCI status has been added lately.) See OHCI 1.1 appendix A.4.2 [1]. Anything else above ohci1394, i.e. ieee1394, eth1394 and so on has not been checked for possible suspend/ resume bugs. So far ohci1394 needs to be unloaded before suspend. I don't know if ohci1394 could also be unloaded after resume and then be successfully reloaded. Did you really unload and reload ohci1394 to produce the second half of your log? This bug should be filed under Category Drivers, Component IEEE 1394. [1] http://developer.intel.com/technology/1394/download/ohci_11.htm
Today I got a kernel oop too. Sep 1 23:53:11 localhost kernel: ieee1394: Node changed: 0-01:1023 -> 0-00:1023 Sep 1 23:53:11 localhost kernel: ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[354fc0002226c838] Sep 1 23:53:15 localhost kernel: ieee1394: Node changed: 0-00:1023 -> 0-01:1023 Sep 1 23:53:33 localhost kernel: ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[354fc0002226c838] Sep 1 23:58:23 localhost kernel: ieee1394: Node changed: 0-01:1023 -> 0-00:1023 Sep 1 23:58:23 localhost kernel: ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[354fc0002226c838] Sep 1 23:58:29 localhost kernel: ieee1394: The root node is not cycle master capable; selecting a new root node and resetting... Sep 1 23:58:29 localhost kernel: ieee1394: Node changed: 0-00:1023 -> 0-01:1023 Sep 1 23:58:44 localhost kernel: ieee1394: Error parsing configrom for node 0-00:1023 Sep 2 00:00:01 localhost CRON[18992]: (pam_unix) session opened for user logcheck by (uid=0) Sep 2 00:00:01 localhost /USR/SBIN/CRON[18993]: (logcheck) CMD ( if [ -x /usr/sbin/logcheck ]; then nice -n10 /usr/sbin/logcheck; fi) Sep 2 00:00:16 localhost CRON[18992]: (pam_unix) session closed for user logcheck Sep 2 00:00:21 localhost kernel: ieee1394: Node changed: 0-01:1023 -> 0-00:1023 Sep 2 00:00:25 localhost kernel: ieee1394: Node changed: 0-00:1023 -> 0-01:1023 Sep 2 00:00:41 localhost kernel: BUG: unable to handle kernel paging request at virtual address ef014000 Sep 2 00:00:41 localhost kernel: printing eip: Sep 2 00:00:41 localhost kernel: eea78690 Sep 2 00:00:41 localhost kernel: *pde = 28367067 Sep 2 00:00:41 localhost kernel: *pte = 00000000 Sep 2 00:00:41 localhost kernel: Oops: 0000 [#1] Sep 2 00:00:41 localhost kernel: PREEMPT Sep 2 00:00:41 localhost kernel: Modules linked in: appletalk ax25 ipx p8023 i915 drm kqemu vmnet parport_pc parport vmmon binfmt_misc button ac battery autofs4 ipv6 pcmcia tun ipt_MASQUERADE iptable_nat ip_nat act_police sch_ingress cls_u32 sch_sfq sch_cbq xt_state ip_conntrack nfnetlink iptable_filter ip_tables x_tables fuse pcspkr cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative cpufreq_ondemand cn video speedstep_centrino freq_table sr_mod sbp2 tda9887 tuner saa7115 em28xx joydev compat_ioctl32 v4l1_compat v4l2_common ir_common videodev tveeprom mousedev tsdev evdev eth1394 psmouse serio_raw ipw2100 ieee80211 ieee80211_crypt snd_intel8x0 snd_intel8x0m rtc snd_ac97_codec snd_ac97_bus firmware_class snd_pcm_oss snd_mixer_oss snd_pcm yenta_socket rsrc_nonstatic pcmcia_core snd_timer i2c_i801 snd soundcore snd_page_alloc i2c_core hw_random shpchp pci_hotplug intel_agp agpgart sd_mod dm_mirror dm_snapshot ide_cd cdrom ide_disk 8139too usbhid usb_storage scsi_mod piix ohci1394 ieee1394 8139cp Sep 2 00:00:41 localhost kernel: ii generic ehci_hcd uhci_hcd usbcore thermal processor fan Sep 2 00:00:41 localhost kernel: CPU: 0 Sep 2 00:00:41 localhost kernel: EIP: 0060:[<eea78690>] Tainted: P VLI Sep 2 00:00:41 localhost kernel: EFLAGS: 00010293 (2.6.17-my-patches #4) Sep 2 00:00:41 localhost kernel: EIP is at csr1212_parse_keyval+0x57/0x1ee [ieee1394] Sep 2 00:00:41 localhost kernel: eax: ef01303c ebx: 00000000 ecx: 000003f0 edx: 00000020 Sep 2 00:00:41 localhost kernel: esi: fffffff4 edi: 00000020 ebp: ef015000 esp: ed7cff1c Sep 2 00:00:41 localhost kernel: ds: 007b es: 007b ss: 0068 Sep 2 00:00:41 localhost kernel: Process knodemgrd_0 (pid: 1369, threadinfo=ed7ce000 task=dff49a50) Sep 2 00:00:41 localhost kernel: Stack: ef01303c 000003f0 0000ffff 03fcdff4 00000000 ef015000 ef013000 ef017000 Sep 2 00:00:41 localhost kernel: eea78bf6 f0000414 0000ffff 00000014 00000004 fffffffc ffffffff eea859ec Sep 2 00:00:41 localhost kernel: ef015000 ef011000 fffffffc ef01303c ef013000 00000014 00000000 ef01303c Sep 2 00:00:41 localhost kernel: Call Trace: Sep 2 00:00:41 localhost kernel: <eea78bf6> _csr1212_read_keyval+0x3cf/0x40b [ieee1394] <eea78daa> csr1212_parse_csr+0x178/0x1b8 [ieee1394] Sep 2 00:00:41 localhost kernel: <eea76a8d> nodemgr_host_thread+0x354/0x8a7 [ieee1394] <eea76739> nodemgr_host_thread+0x0/0x8a7 [ieee1394] Sep 2 00:00:41 localhost kernel: <c0101005> kernel_thread_helper+0x5/0xb Sep 2 00:00:41 localhost kernel: Code: 24 08 8a 45 00 3c 02 0f 84 53 01 00 00 3c 03 0f 85 8a 01 00 00 31 f6 c7 44 24 04 00 00 00 00 e9 29 01 00 00 8b 4c 24 04 8b 04 24 <8b> 54 88 04 85 d2 0f 84 12 01 00 00 89 d1 8b 5d 20 0f c9 89 c8 Sep 2 00:00:41 localhost kernel: EIP: [<eea78690>] csr1212_parse_keyval+0x57/0x1ee [ieee1394] SS:ESP 0068:ed7cff1c Sep 2 00:02:40 localhost kernel: <6>eth2: Reseting on mode change. Sep 2 00:02:41 localhost kernel: ADDRCONF(NETDEV_UP): eth2: link is not ready Sep 2 00:02:41 localhost avahi-daemon[6994]: New relevant interface eth2.IPv4 for mDNS. Sep 2 00:02:41 localhost avahi-daemon[6994]: Joining mDNS multicast group on interface eth2.IPv4 with address 192.168.1.1. Sep 2 00:02:41 localhost avahi-daemon[6994]: Registering new address record for 192.168.1.1 on eth2. Sep 2 00:02:41 localhost kernel: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready Sep 2 00:02:52 localhost kernel: eth2: no IPv6 routers present Sep 2 00:06:57 localhost kernel: ieee1394: ether1394 rx: sender nodeid lookup failure: 0-00:1023 Sep 2 00:07:28 localhost last message repeated 21 times
I'm not sure if there are firewire hubs available in the market or not. But I believe the severity for this bug needs to be raised. Machine-A and Machine-B are connected to each other via firewire. Machine-A is a workstation which remains almost online. Machine-B is a laptop which is frequently suspended/resumed. The bad part about this bug is that when both the machines are connected, and you do a resume on the laptop, the kernel firewire information on both the machines is screwed up. Then if you connect a third laptop to Machine-A via firewire, and do a normal boot-up on it, it won't be able to use the firewire network with Machine-A. The kernel oops message in the previous comment that I've sent you is from my Machine-A which I use almost like a workstation. Hence, I feel the severity of this bug needs to be raised more because it not just affects itself but also other machines on the network that use firewire. If you're convinced, please raise the severity. Thanks, Ritesh
I think you just found at least one other independent bug ("unable to handle paging request" in csr1212_parse_keyval). I filed it as bug 7098. After Machine-B suspended and resumed, will Machine-A and Machine-C be unable to communicate everytime or only after such an oops? (Assumed the oops does not happen everytime...) With FireWire, every node which has more than one port is a hub. Often the PHY (physical interface chip) is separate from the LLC (link layer controller or just "link", e.g. the OHCI FireWire-to-PCI bus bridge). But sometimes they are integrated and/or the PHY is enabled or disabled via the LLC. So if you daisy-chained A--B--C and suspended (and possibly resumed) B, B's PHY might still be disabled. With C--A--B, communication between A and C should be possible unless B floods the bus with resets or other bogus signals.
This is how I came to the conclusion that this bug will affect other machines on the network. I have two laptops with me which currently I network using firewire. The old laptop (Machine-A) is used almost as a workstation as it remains up for longer durations (sometimes days). My new laptop (Machine-B) is what I carry with me. I use suspend/resume on it. So here's how it goes, * I boot up both, Machine-A and Machine-B, and both talk to each other. * I, then, hibernate Machine-B and go out for a coffee. At this time Machine-A is on. * I return back and resume Machine-B. At this moment firewire network gets corrupted. Both the machine's kernel log lots of messages to syslog. And sometimes they oop too. At this moment, unload and reload of modules also doesn't help. * Then I reboot Machine-B (Including unplugging/replugging the cable). This should be equivalent to attaching another machine to the network. * Still no help. Once ifup'ed, both the machine's kernels again start logging lots of messages. * The only solution at that moment is to reboot both the machines.
Then this might be a third bug. Here is a hypothesis: All FireWire nodes with LLC are identifiable by a persistent GUID alias EUI-64. Once Machine-B was rebooted, it will be back under its old GUID as it is supposed to. Machine-A's eth1394 interface to B's GUID however might still be inoperable due to the mess that B's resume created before. I suppose the currently necessary workaround is to unload ohci1394 on a machine that is to be suspended. Alas I know little about eth1394 and am quite busy right now, so I hesitate to assign this bug to me. More diagnosis from you especially regarding eth1394 would be appreciated though. Thanks.
PS: In case you also come to the conclusion that there may be a bug in eth1394 (concerning Machine-A's inability to get going again), please open another bug with the respective logs. That way we can concentrate bug 7072 on ohci1394 alone.
Currently, on Machine-B, at hibernation, I've configured the firewire interface to be put down and ohci1394 module to be removed. But that still doesn't help. Upon resume, when I ifup the firewire interface, it is unusable. Doing a ping result in the following kernel messages: ohci1394: fw-host0: Unrecoverable error! ohci1394: fw-host0: Async Req Tx Context died: ctrl[000088066] cmdptr[f7073048]
Does this message occur on B or on A? What if you unload all IEEE 1394 drivers on B before suspend?
Support for hibernate/suspend2disk is not implemented yet in ohci1394, and therefore, you currently have to unload this module before hibernating. This may of course a big problem if you find yourself in a situation where it is difficult or even impossible to do this, e.g. if bugs in other some other modules which have a dependancy on ohci1394 occur and prevent it from being unloaded, but this is of course only one of many possible reasons. I took some initiative now on this and had a first success, but as of this moment, the implementation is not yet ready to be merged. Read the thread on: http://sourceforge.net/mailarchive/forum.php?thread_id=30474986&forum_id=5389
Could you test with all of the following patches applied on top of Linux 2.6.18? CONFIG_PM=n slim: drivers/ieee1394/ohci1394.c http://www.kernel.org/git/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff_plain;h=2a874182842c6a70f245b7f1ad859f9152517951 set power state of firewire host during suspend http://www.kernel.org/git/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff_plain;h=f0645e7720e0baacbde61d7d1f0180309451c695 ieee1394: ohci1394: check for errors in suspend or resume http://www.kernel.org/git/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff_plain;h=ea6104c22468239083857fa07425c312b1ecb424 ieee1394: ohci1394: steps to implement suspend/resume http://www.kernel.org/git/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff_plain;h=c914d3cf8c54ec472b93eed03dc515435aaf29d0 ieee1394: ohci1394: suspend/resume cosmetics http://www.kernel.org/git/?p=linux/kernel/git/ieee1394/linux1394-2.6.git;a=commitdiff_plain;h=6429e0e94a9185b92b33f7b1bb0195eb54022a89 You can get these patches also as part of my patchkits v161 or later for Linux 2.6.16.x and 2.6.18: http://me.in-berlin.de/~s5r6/linux1394/updates/
Does the lack of response mean the patches fix the issue?
Not necessarily. I couldn't test the patches because I was ill. Will test it soon (probably on the weekend) and update.
Okay! I tested with the patches you mentioned. ohci doesn't spit messages now but the network connectivity is still broken after hibernate and resume. I also tried to ifdown and ifup the interface, but still it won't ping. And it won't log any kernel messages also. I'm attaching the dmesg output which also contains the logs before and after hibernation. I've also attached the lspci output.
Created attachment 9402 [details] dmesg output dmesg output
Created attachment 9403 [details] lspci output lspci output
Thanks for the reports. The last log line from ohci1394 looks promising. After resume, try the following (each one alone, and if necessary in combination): - replug the FireWire cable - unload and reload eth1394 - issue a bus reset with gscanbus (GUI program, menu item "force bus reset") or with 1394commander (command line program, command "br") If they are not available as a binary package, here are the sources: http://gscanbus.berlios.de/ http://www.ict.tuwien.ac.at/ieee1394/opensource.html 1394commander from tuwien is easier to compile. gscanbus requires gtk1 and also needs a patch from the patch tracker of the project page at berlios if you have a newer gcc. (Hmm, maybe ieee1394 should get a sysfs attribute to inject bus resets without those tools...)
Hi Stefan, Here's what you'd asked. Upon cable replug on the server laptop: ieee1394: Node changed: 0-01:1023 -> 0-00:1023 ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[354fc0002226c838] ieee1394: Node changed: 0-00:1023 -> 0-01:1023 ohci1394: fw-host0: AT dma reset ctx=0, aborting transmission ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Error parsing configrom for node 0-01:1023 ieee1394: Node suspended: ID:BUS[0-01:1023] GUID[00c09f00001f8a88] ieee1394: Node suspended: ID:BUS[0-00:1023] GUID[354fc0002226c838] ieee1394: Error parsing configrom for node 0-00:1023 ieee1394: Node resumed: ID:BUS[0-01:1023] GUID[00c09f00001f8a88] Upon eth1394 reload on the server laptop: eth1394: eth1: IEEE-1394 IPv4 over 1394 Ethernet (fw-host0) ieee1394: ether1394 rx: sender nodeid lookup failure: 0-00:1023 ieee1394: Node resumed: ID:BUS[0-00:1023] GUID[354fc0002226c838] Upon gscanbus on the server laptop: No messages were seen I also tried the same steps on the other laptop that connects to the server laptop and got the same messages for cable replug and eth1394 reload. On running gscanbus, I did get the following messages: geeKISSexy:/var/www# gscanbus 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length 1/0x0000fffff0000400: wrong bus info block length Also to my observation came that when I pinged from the server laptop to the client laptop (where ping failed), on the client laptop I got the following messages: ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: Error parsing configrom for node 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 ieee1394: ether1394 rx: sender nodeid lookup failure: 0-01:1023 Hope this helps. PS: The patches you mentioned were applied only to the server laptop, and only the server laptop was suspended. The client laptop's kernel doesn't have the patches applied and it wasn't suspended (suspend is broken on it atm :-( )
Thanks for the tests. Which one is the GUID of the card in the suspending/resuming server? Is it 00c09f00001f8a88?
(answering myself) Yes, it is, according to the dmesg output from comment #15.
Thinking out loud: It appears the remote node cannot read the resumed server's Configuration ROM. Software on the server should be able to read it (and apparently is indeed able to do so) because local read and write requests bypass the hardware.
Yes, and here's some details taken from gscanbus for the card on the suspending/resuming server. SelfID Info ----------- Physical ID: 0 Link active: Yes Gap Count: 63 PHY Speed: S400 PHY Delay: <=144ns IRM Capable: Yes Power Class: None Port 0: Not connected Init. reset: Yes CSR ROM Info ------------ GUID: 0x00C09F00001F8A88 Node Capabilities: 0x000083C0 Vendor ID: 0x0000C09F Unit Spec ID: 0x0000005E Unit SW Version: 0x00000001 Model ID: 0x00000000 Nr. Textual Leafes: 1 Vendor: QUANTA COMPUTER, INC. Textual Leafes: Linux - ohci1394 AV/C Subunits ------------- N/A PS: The Port 0 status is "not connected" because I've removed the cable and put it in the shelf.
Status note: The mentioned ohci1394 suspend/resume patches were merged upstream for Linux 2.6.20-rc1. We still need to implement the proper restoration of the configuration ROM and updating of the bus generation, and to test and fix all high-level drivers as far as found necessary.
Created attachment 10021 [details] patch to let external nodes rediscover the resuming node This fixes the bug as far as I could test: - gscanbus on a remote node is now able to fetch the config ROM of the resuming node. - sbp2 on the resuming node is now able to login back into a FireWire disk which stayed connected (and powered) during the suspend cycle. The SCSI device is the same as before suspend. I did *not* test eth1394 or any other application yet. You can get the patch also as part of my patchset v243 or later at http://me.in-berlin.de/~s5r6/linux1394/updates/.
PS: All of the previously mentioned patches are required too. If you have an older kernel which doesn't contain them, take my patchset from me.in-berlin.de.
Patch committed to linux1394-2.6.git, will send it to Linus after 2.6.20 was released, i.e. for 2.6.21-rc1. Please reopen this bug entry if eth1394 still doesn't work after resume.