Bug 5149

Summary: Wake-on-lan broken using Intel e100 driver
Product: Drivers Reporter: Serge van den Boom (serge+bugzilla.kernel.org)
Component: NetworkAssignee: Jesse Brandeburg (jbrandeb)
Status: CLOSED OBSOLETE    
Severity: normal CC: alan, bruce.w.allan, bunk, david.graham, florian.hinzmann, greg, jbrandeb, jeffrey.t.kirsher, mattilinnanvuori, protasnb, undertakingyou
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.28 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Output of lspci
Current .config
Diff between 2.26 & 2.27 e100.c

Description Serge van den Boom 2005-08-29 10:55:31 UTC
Most recent kernel where this bug did not occur: 2.6.11
Distribution: Custom (ex-Slackware 8.0)
Hardware Environment: AMD Athlon K7, Via VT82C686A PCIset, Intel PRO/100+ PCI 
Management Adapter
Software Environment: GCC 3.3.4, ethtool 2
Problem Description: Wake-on-lan has ceased to work somewhere in between 2.6.11 
and 2.6.12.5. ACPI is disabled (as it was when WOL still worked). ethtool 
reports wake-on magic packet is on, and when the system is turned off with 'init 
0' the relevant lights on the switch will be on. But sending a magic packet does 
not cause the system to wake up. This happens when the e100 driver is loaded as 
a module as well as when it is built into the kernel.

ethtool output:
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x00000007 (7)
        Link detected: yes
Comment 1 Adrian Bunk 2006-01-29 12:34:48 UTC
Is this problem still present in kernel 2.6.16-rc1?
Comment 2 Serge van den Boom 2006-01-29 15:13:52 UTC
Alas, it's still broken.
Comment 3 Jesse Brandeburg 2006-04-21 14:17:19 UTC
please send the output of lspci -vvv -d :1229 after performing your test and 
restarting, but before loading the e100 module.  We're looking to see if the 
PME signal is being asserted by e100 card but blocked by the chipset.

so if you have link it is likely that e100 is not powering down the phy at 
shutdown time.  I'm currently suspecting something else.  Can you try the e100 
driver from 2.6.11 on 2.6.16.y?
Comment 4 Serge van den Boom 2006-04-22 02:52:25 UTC
Good to hear someone is looking at this. I was afraid I'd be using 2.6.11 for a 
long time.

While performing your tests (on a 2.6.16.9 kernel), I've found something 
interesting. If I don't load the module, but let the system go down before 
that, WOL works. Even more, if I load the module, and then immediately 
shutdown, it also works. It's only after the command 'ifconfig eth0 192.168.21.
2 broadcast 192.168.21.255 netmask 255.255.255.0' that WOL stops working.
And if I unload the e100 module again before shutdown, WOL will work again. So 
now I've got a workaround: explicitly unload the e100 module in the shutdown 
script.

The output of 'lspci -vvv -d :1229' remains the same all along (before module 
load, after, and after the ifconfig). I'm attaching the output.

My ifconfig is version 1.42, from net-tools 1.60.

Trying to load the module of the 2.6.11 kernel on 2.6.16.9 fails (incompatible).
Trying to compile the 2.6.11 e100.c with the 2.6.16.9 kernel by copying it into 
a separate dir, and then running 'make -C /usr/src/linux-2.6.16.9 M=$PWD modules
' fails too, with an error
    /home/svdb/c/tests/e100/e100.c: In function `e100_suspend':
    /home/svdb/c/tests/e100/e100.c:2326: error: incompatible type for argument 
2 of `pci_choose_state'
    /home/svdb/c/tests/e100/e100.c: At top level:
    /home/svdb/c/tests/e100/e100.c:2354: warning: initialization from 
incompatible pointer type
It seems that the type of the 'state' argument has changed.
Replacing the e100_suspend function by the one from 2.6.16.9 results in a 
working driver, with which the WOL problem does not exist.
Comment 5 Serge van den Boom 2006-04-22 02:54:02 UTC
Created attachment 7935 [details]
Output of lspci
Comment 6 Jesse Brandeburg 2006-06-16 13:40:15 UTC
we have a patch for this now, what kernel would you like to me to generate it
against?
Comment 7 Serge van den Boom 2006-06-16 13:51:17 UTC
Ah, nice.

I'm currently still running 2.6.16.9 but I can upgrade to the latest stable 
release if you want.

Comment 8 Jesse Brandeburg 2006-06-16 14:28:32 UTC
I think I didn't read closely enough before.  The newer driver in 2.6.16 has a
e100_shutdown function that should get called if the system is shutting down. 
(via the generic .shutdown handler) and apparently is not called during your
shutdown process?  This function works as far as we know.

The newer driver also *disables* wake on lan when the interface is "UP" to
prevent another bug where continuous assertion of PME causes system slowdowns. 
This in turn explains why your "workaround" to unload the driver works, because
WoL is re-enabled in e100_close.

In general, the shutdown scripts for "other" distros do things like ifconfig
down the interface as part of a runlevel change to runlevel 6 (shutdown), AFAIK

I actually think that the driver is working correctly and that your shutdown
scripts may be at fault?
Comment 9 Serge van den Boom 2006-06-16 15:16:10 UTC
I see.
Well, my shutdown script (pretty much the original (old) Slackware script) doesn't 
explicitly bring down interfaces. This used to work, and while I realise that "used 
to work" doesn't mean that it is actually guaranteed to remain working, it would 
have been nice if this change in behaviour had been documented.
That said, while I recognise that it is good practice for whatever handles 
initialisation to handle unitialisation (userland scripts bringing interfaces up 
and down in this case), something could be said for having a driver on termination 
leave a device in the state it reports it was in in the first place. That is, if 
the driver delays enabling WOL, it would make sense if it would perform this action 
before the driver doesn't have the chance to do so anymore (like it does on module 
unload).

Comment 10 Jesse Brandeburg 2006-06-23 11:25:58 UTC
even if it doesn't call close, the shutdown handler being called should actually
take care of it for you.

I'm wondering if your system doesn't have some piece that would broadcast the
internal system message (netlink?) that would prevent the shutdown handler from
ever being called by 
drivers/base/power/shutdown.c

CC'ing GregKH to see if he knows why the shutdown handler for a device might
*not* be called when shutting down.
Comment 11 Adrian Bunk 2006-07-10 12:53:40 UTC
It seems the Cc of Greg didn't work, so let's add it now.

Greg, please read comment #10.
Comment 12 Greg Kroah-Hartman 2006-07-11 16:09:45 UTC
Sorry, I don't know why that callback is not getting called, it should be.
Comment 13 Jesse Brandeburg 2006-07-31 09:07:53 UTC
suggest this bug be closed because I believe this to be a behavior of (maybe
slackware) the OS scripts not giving the driver any warning that the system is
shutting down.  Without the ->shutdown handler being called no driver should
work correctly to enable wake on lan.

I believe e100 is working correctly in this case.
Comment 14 Serge van den Boom 2006-07-31 09:46:01 UTC
Why is it up to the shutdown scripts to warn the driver that the system is 
going down? How would it give this warning anyhow? Are you talking about 
bringing the interface down before calling shutdown? It seems unlikely that 
that's got anything to do with shutdown being called or not, as shutdown isn't 
a network interface specific function.
Comment 15 Serge van den Boom 2006-08-01 12:55:34 UTC
Ok, I just added some printk()s, and it appears that e100_shutdown() is being 
called after all; it just isn't helping.
Comment 16 Jesse Brandeburg 2006-08-01 13:44:08 UTC
okay, can you add something to your shutdown script that will show the output of
lspci -vvv -d :1229 after your scripts would (cause a) call to e100_shutdown.

Also, please take a look if you can at the lights on the back of the adapter and
let us know if they flash when you send the magic packet?

What we may want to do to help debug this is a quick driver change to do the
"shutdown" style tasks at module unload so we still have a system that is up
with an adapter that is in D3.
Comment 17 Auke Kok 2006-08-01 14:05:26 UTC
can you paste your .config pls?
Comment 18 Serge van den Boom 2006-08-01 14:39:37 UTC
What the script does to cause the call to e100_shutdown() is merely execute the 
'shutdown' binary.

The lights do indeed flicker when the packet is sent.
Comment 19 Serge van den Boom 2006-08-01 14:47:50 UTC
Created attachment 8671 [details]
Current .config

Added the .config file.

Something I should mention for completeness sake: Some hardware (main board,
CPU, VGA, sound card) has been replaced since I originally created this report.
It's now a Pentium III on a mainboard with an Intel 815EP chipset. This has not
affected the problem I reported here.
Comment 20 Serge van den Boom 2006-08-01 15:00:24 UTC
Some clarification: The .config file will show CONFIG_ACPI, but as I said in the 
original report, ACPI is disabled (by "acpi=off" as boot parameter).
It was necessary with my old hardware. It probably isn't anymore now, but I'll leave 
it alone for the moment.
Comment 21 Auke Kok 2006-09-07 15:00:46 UTC
fixes were made into the e100/WoL driver and sent to 2.6.18. Please try
2.6.18rc5 or up. 
Comment 22 Serge van den Boom 2006-09-17 09:11:15 UTC
Still does not boot after shutting down while running 2.6.18-rc7.
Comment 23 Auke Kok 2006-09-18 07:28:07 UTC
More patches are on the way - I'll submit those this week to jgarzik/netdev-2.6.
Sorry for the delay.
Comment 24 Auke Kok 2006-11-29 14:16:20 UTC
The final fix made it into linus' tree after 2.6.18 was released. It should be
available in 2.6.19rc4 (possibly rc3 or rc2) and up. Look for commit ID
975b366af66280ed5b852a1a0446586ce71e306e.

Please verify it works and let us know.
Comment 25 Serge van den Boom 2006-11-30 07:05:06 UTC
It still does not work. Running 2.6.19-rc6 now.
Comment 26 Natalie Protasevich 2007-10-11 00:52:26 UTC
Serge,
Can you confirm the problem still exists with latest kernels?
Thanks.
Comment 27 Serge van den Boom 2007-10-23 07:49:18 UTC
Still the same with 2.6.23.
Comment 28 Matti Linnanvuori 2008-01-30 02:36:46 UTC
*** Bug 9336 has been marked as a duplicate of this bug. ***
Comment 29 Roger 2008-10-22 18:54:04 UTC
Wake On LAN was working here on my Intel 10/100 PCI NIC during 2.26 release.  Then, as 2.27 rolled-out here, WOL totally stopped.

I performed a diff of the 2.26 and 2.27 e100.c code and submit the following diff demostrating the differences.  I'll probably spend a little more time figuring out which of the three lines broke WOL here (since I've got it nailed down to this).  (No changes noticed in /proc/acpi/wakeup or ethtool eth0 either.)
Comment 30 Roger 2008-10-22 18:56:05 UTC
Created attachment 18407 [details]
Diff between 2.26 & 2.27 e100.c

Diff between 2.26 & 2.27 e100.c

One of these three lines probably broke WOL for the e100 driver.  Basically, when wakonlan sends the magic packet remotely.  The lights blink on the server NIC, but doesn't wake up.
Comment 31 Roger 2008-10-22 19:11:02 UTC
Sorry.  Incorrectly stated version numbers above & for attachment.

2.26 == 2.6.26
2.27 == 2.6.27
Comment 32 Roger 2008-10-22 20:27:36 UTC
Just built using the 2.6.26 old e100.c file, but required the additional pointer @ line 1790 to avoid a pointer error:

+   if (pci_dma_mapping_error(nic->pdev, rx->dma_addr))

WOL still doesn't work.  Either the problem lies here or it's deeper.

Should I start a new e100 WOL bug for this to avoid hijacking this one?
Comment 33 Will Smith 2009-07-16 07:13:59 UTC
Issues still present in 2.6.28 as of July 16, 2009. Any update on this one, it hasn't seen a lot of activity in a while.
Comment 34 Alan 2012-05-12 13:10:27 UTC
Closing as obsolete, if this is wrong feel free to re-open