Bug 6735
Summary: | network connection does not survive APM suspend and resume | ||
---|---|---|---|
Product: | Drivers | Reporter: | Robert Dyck (rob.dyck) |
Component: | Network | Assignee: | Francois Romieu (romieu) |
Status: | REJECTED INVALID | ||
Severity: | normal | CC: | stefan |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.17 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Robert Dyck
2006-06-22 16:38:24 UTC
bugme-daemon@bugzilla.kernel.org wrote: > > http://bugzilla.kernel.org/show_bug.cgi?id=6735 > > Summary: network connection does not survive APM suspend and > resume > > ... > > Steps to reproduce:Push button to suspend ( APM ), push button again to resume, > network connection is lost. > > Problem Description:The network connection will not survive suspend and resume. > Works correctly on 2.6.16.1. "/etc/rc.d/init.d/network restart" will restore > the network. I tried compiling with the file via-rhine.c from 2.6.16.1 but it > still fails. No error messages. This is a post-2.6.16 regression. It's probably unrelated to the device driver itself. Can anyone suggest where we should be looking? Thanks. How in the heck did I get on the CC list for this? ;-)
Lee
On Thu, 2006-06-22 at 16:54 -0700, Andrew Morton wrote:
> bugme-daemon@bugzilla.kernel.org wrote:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=6735
> >
> > Summary: network connection does not survive APM suspend and
> > resume
> >
> > ...
> >
> > Steps to reproduce:Push button to suspend ( APM ), push button again to resume,
> > network connection is lost.
> >
> > Problem Description:The network connection will not survive suspend and resume.
> > Works correctly on 2.6.16.1. "/etc/rc.d/init.d/network restart" will restore
> > the network. I tried compiling with the file via-rhine.c from 2.6.16.1 but it
> > still fails. No error messages.
>
> This is a post-2.6.16 regression.
>
> It's probably unrelated to the device driver itself.
>
> Can anyone suggest where we should be looking?
>
> Thanks.
>
I have verified that the reported problem does not exist with 2.6.16.22. 2.6.16.22 will not tell much regarding the step in the 2.6.17 branch where the driver broke. Can you try a bit bissect to find the culprit ? -- Ueimor Francois:
[...]
> Can you try a bit bissect to find the culprit ?
^^^
I meant a "git bissect".
--
Ueimor
I can give it a try. Do not hold your breath. Steep learning curve ahead. Thank you for steering me toward the handy git tool. Here is the commit that breaks my system. [root@fatboy linux-2.6]# git bisect bad b00055aacdb172c05067612278ba27265fcd05ce is first bad commit commit b00055aacdb172c05067612278ba27265fcd05ce Author: Stefan Rompf <stefan@loplof.de> Date: Mon Mar 20 17:09:11 2006 -0800 [NET] core: add RFC2863 operstate this patch adds a dormant flag to network devices, RFC2863 operstate derived from these flags and possibility for userspace interaction. It allows drivers to signal that a device is unusable for user traffic without disabling queueing (and therefore the possibility for protocol establishment traffic to flow) and a userspace supplicant (WPA, 802.1X) to mark a device unusable without changes to the driver. It is the result of our long discussion. However I must admit that it represents what Jamal and I agreed on with compromises towards Krzysztof, but Thomas and Krzysztof still disagree with some parts. Anyway I think it should be applied. Signed-off-by: Stefan Rompf <stefan@loplof.de> Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 eca50252b33b4e6a903d25adc9864108a4300ce9 9c71b9401fe94cc51e61964fe7a0e83f69410bcd M include :040000 040000 6706946206b8f7e0a06c574db554fdb7f5553ec1 8079fa648434108983078adb9cd6019d3e6d56e2 M net I have just tried 2.6.18-rc3 and the problem remains. Referring to Comment #7: Have you notified Stefan Rompf of the problem? I assumed ( perhaps wrongly ) kernel.org had a process whereby bug reports would be passed to someone who had some familiarity with the subject matter. This bug was assigned to Francois Romieu and I have tried to contact him directly but I have not received a reply. I will try contacting Stefan who is the author of the commit that breaks the Via Rhine driver. Hello Rob, thanks for notifying me offline, I've just created a bugzilla account. Can you add some details to the report: -network related dmesg output just after resume -Are you using stuff like 802.1X? -The contents of link_mode, carrier, dormant and operstate in /sys/class/net/eth0 (assuming the ethernet device is named eth0 ;) before and after suspend -Does it help to unplug and replug the cable after resume? Stefan Excuse my ignorance, if 802.1X is a reference to wireless or security, the answer is no. I saw nothing unusual in dmesg. Here are some excerpts. ide-disk 0.0: suspend platform floppy.0: suspend platform pcspkr: suspend pci 0000:01:00.0: suspend via-rhine 0000:00:12.0: suspend VIA 82xx Audio 0000:00:11.5: suspend VIA_IDE 0000:00:11.1: suspend pci 0000:00:11.0: suspend . . pci 0000:00:11.0: resuming VIA_IDE 0000:00:11.1: resuming VIA 82xx Audio 0000:00:11.5: resuming PCI: Enabling device 0000:00:11.5 (0000 -> 0001) PCI: Found IRQ 4 for device 0000:00:11.5 IRQ routing conflict for 0000:00:11.5, have irq 3, want irq 4 via-rhine 0000:00:12.0: resuming pci 0000:01:00.0: resuming platform pcspkr: resuming platform floppy.0: resuming ide-disk 0.0: resuming Here is /sys/class/net/eth0 before suspend [root@fatboy eth0]# cat link_mode 0 [root@fatboy eth0]# cat operstate unknown [root@fatboy eth0]# cat dormant 0 [root@fatboy eth0]# cat carrier 1 After resume [root@fatboy eth0]# ping jb connect: Network is unreachable [root@fatboy eth0]# cat link_mode 0 [root@fatboy eth0]# cat operstate down [root@fatboy eth0]# cat dormant cat: dormant: Invalid argumentcat: [root@fatboy eth0]# cat carrier cat: carrier: Invalid argument Restoring network [root@fatboy eth0]# /etc/rc.d/init.d/network restart Shutting down loopback interface: [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: [ OK ] [root@fatboy eth0]# cat dormant 0 [root@fatboy eth0]# cat carrier 1 [root@fatboy eth0]# cat link_mode 0 [root@fatboy eth0]# cat operstate unknown Almost forgot. Unplugging and replugging the cable does not restore the connection. Most interesting point in these results is the EINVAL returned for cat dormant / carrier. This only happens if the interface has been brought administrativly down. The network unreachable during ping supports this guess. However, from a grep over the 2.6.17 sources I've seen no place where the kernel shuts down an interface on its own, without an preceding ifconfig down or ip link down coming from userspace. Possibly a DHCP client? To find out whether we have a kernel/userspace interaction, can you boot the system into single user mode, configure the network with a static IP address, make sure that no udev bloat is running and try to suspend? I cannot reproduce you problem here, actually I'm using (ACPI) suspend on my notebook (Amilo 7400 with b44 and ipw2200) all the time without any networking problems. Stefan The box having the trouble has always had a static IP address. I booted into single user mode as you suggested and killed udevd. Suspend/resume yields the same result ( network unreachable ). The contents of /sys/class/net/eth0 were the same as before. I do not know what the significance of the operstate is but intuitively "unknown" does not seem right for an interface that is up. Problem solved. Your comments about the interface being taken down in userland got me on the right track. As a result of your code changes the output of the command "/sbin/ip -o link show" also changes. The apm script parses the output looking for interfaces that are up. It uses the output to create a temporary script which will be run when a resume is done. The apm script takes the interfaces down and the temporary script brings them back. The trouble is, no interfaces were found to be up. This problem would likely only affect apm and to be more specific, Fedora Core 4. The line which gave all the trouble is "eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000\ link/ether 00:11:2f:e7:f6:61 brd ff:ff:ff:ff:ff:ff". Awk was looking for "UP>" which no longer was present. It was replaced by "UP,10000>" The loopback interface was also not being restored. I am sorry about the bother. Thank you for your patience. Good to see this is sorted out. Can you close the bug, I don't have the access to do so. Stefan This is not a kernel bug. As a result of a code change some text output also changed. Scripts that rely on this text output may need to be rewritten. |