Bug 5149
Summary: | Wake-on-lan broken using Intel e100 driver | ||
---|---|---|---|
Product: | Drivers | Reporter: | Serge van den Boom (serge+bugzilla.kernel.org) |
Component: | Network | Assignee: | Jesse Brandeburg (jbrandeb) |
Status: | CLOSED OBSOLETE | ||
Severity: | normal | CC: | alan, bruce.w.allan, bunk, david.graham, florian.hinzmann, greg, jbrandeb, jeffrey.t.kirsher, mattilinnanvuori, protasnb, undertakingyou |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.28 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: |
Output of lspci
Current .config Diff between 2.26 & 2.27 e100.c |
Description
Serge van den Boom
2005-08-29 10:55:31 UTC
Is this problem still present in kernel 2.6.16-rc1? Alas, it's still broken. please send the output of lspci -vvv -d :1229 after performing your test and restarting, but before loading the e100 module. We're looking to see if the PME signal is being asserted by e100 card but blocked by the chipset. so if you have link it is likely that e100 is not powering down the phy at shutdown time. I'm currently suspecting something else. Can you try the e100 driver from 2.6.11 on 2.6.16.y? Good to hear someone is looking at this. I was afraid I'd be using 2.6.11 for a long time. While performing your tests (on a 2.6.16.9 kernel), I've found something interesting. If I don't load the module, but let the system go down before that, WOL works. Even more, if I load the module, and then immediately shutdown, it also works. It's only after the command 'ifconfig eth0 192.168.21. 2 broadcast 192.168.21.255 netmask 255.255.255.0' that WOL stops working. And if I unload the e100 module again before shutdown, WOL will work again. So now I've got a workaround: explicitly unload the e100 module in the shutdown script. The output of 'lspci -vvv -d :1229' remains the same all along (before module load, after, and after the ifconfig). I'm attaching the output. My ifconfig is version 1.42, from net-tools 1.60. Trying to load the module of the 2.6.11 kernel on 2.6.16.9 fails (incompatible). Trying to compile the 2.6.11 e100.c with the 2.6.16.9 kernel by copying it into a separate dir, and then running 'make -C /usr/src/linux-2.6.16.9 M=$PWD modules ' fails too, with an error /home/svdb/c/tests/e100/e100.c: In function `e100_suspend': /home/svdb/c/tests/e100/e100.c:2326: error: incompatible type for argument 2 of `pci_choose_state' /home/svdb/c/tests/e100/e100.c: At top level: /home/svdb/c/tests/e100/e100.c:2354: warning: initialization from incompatible pointer type It seems that the type of the 'state' argument has changed. Replacing the e100_suspend function by the one from 2.6.16.9 results in a working driver, with which the WOL problem does not exist. Created attachment 7935 [details]
Output of lspci
we have a patch for this now, what kernel would you like to me to generate it against? Ah, nice. I'm currently still running 2.6.16.9 but I can upgrade to the latest stable release if you want. I think I didn't read closely enough before. The newer driver in 2.6.16 has a e100_shutdown function that should get called if the system is shutting down. (via the generic .shutdown handler) and apparently is not called during your shutdown process? This function works as far as we know. The newer driver also *disables* wake on lan when the interface is "UP" to prevent another bug where continuous assertion of PME causes system slowdowns. This in turn explains why your "workaround" to unload the driver works, because WoL is re-enabled in e100_close. In general, the shutdown scripts for "other" distros do things like ifconfig down the interface as part of a runlevel change to runlevel 6 (shutdown), AFAIK I actually think that the driver is working correctly and that your shutdown scripts may be at fault? I see. Well, my shutdown script (pretty much the original (old) Slackware script) doesn't explicitly bring down interfaces. This used to work, and while I realise that "used to work" doesn't mean that it is actually guaranteed to remain working, it would have been nice if this change in behaviour had been documented. That said, while I recognise that it is good practice for whatever handles initialisation to handle unitialisation (userland scripts bringing interfaces up and down in this case), something could be said for having a driver on termination leave a device in the state it reports it was in in the first place. That is, if the driver delays enabling WOL, it would make sense if it would perform this action before the driver doesn't have the chance to do so anymore (like it does on module unload). even if it doesn't call close, the shutdown handler being called should actually take care of it for you. I'm wondering if your system doesn't have some piece that would broadcast the internal system message (netlink?) that would prevent the shutdown handler from ever being called by drivers/base/power/shutdown.c CC'ing GregKH to see if he knows why the shutdown handler for a device might *not* be called when shutting down. It seems the Cc of Greg didn't work, so let's add it now. Greg, please read comment #10. Sorry, I don't know why that callback is not getting called, it should be. suggest this bug be closed because I believe this to be a behavior of (maybe slackware) the OS scripts not giving the driver any warning that the system is shutting down. Without the ->shutdown handler being called no driver should work correctly to enable wake on lan. I believe e100 is working correctly in this case. Why is it up to the shutdown scripts to warn the driver that the system is going down? How would it give this warning anyhow? Are you talking about bringing the interface down before calling shutdown? It seems unlikely that that's got anything to do with shutdown being called or not, as shutdown isn't a network interface specific function. Ok, I just added some printk()s, and it appears that e100_shutdown() is being called after all; it just isn't helping. okay, can you add something to your shutdown script that will show the output of lspci -vvv -d :1229 after your scripts would (cause a) call to e100_shutdown. Also, please take a look if you can at the lights on the back of the adapter and let us know if they flash when you send the magic packet? What we may want to do to help debug this is a quick driver change to do the "shutdown" style tasks at module unload so we still have a system that is up with an adapter that is in D3. can you paste your .config pls? What the script does to cause the call to e100_shutdown() is merely execute the 'shutdown' binary. The lights do indeed flicker when the packet is sent. Created attachment 8671 [details]
Current .config
Added the .config file.
Something I should mention for completeness sake: Some hardware (main board,
CPU, VGA, sound card) has been replaced since I originally created this report.
It's now a Pentium III on a mainboard with an Intel 815EP chipset. This has not
affected the problem I reported here.
Some clarification: The .config file will show CONFIG_ACPI, but as I said in the original report, ACPI is disabled (by "acpi=off" as boot parameter). It was necessary with my old hardware. It probably isn't anymore now, but I'll leave it alone for the moment. fixes were made into the e100/WoL driver and sent to 2.6.18. Please try 2.6.18rc5 or up. Still does not boot after shutting down while running 2.6.18-rc7. More patches are on the way - I'll submit those this week to jgarzik/netdev-2.6. Sorry for the delay. The final fix made it into linus' tree after 2.6.18 was released. It should be available in 2.6.19rc4 (possibly rc3 or rc2) and up. Look for commit ID 975b366af66280ed5b852a1a0446586ce71e306e. Please verify it works and let us know. It still does not work. Running 2.6.19-rc6 now. Serge, Can you confirm the problem still exists with latest kernels? Thanks. Still the same with 2.6.23. *** Bug 9336 has been marked as a duplicate of this bug. *** Wake On LAN was working here on my Intel 10/100 PCI NIC during 2.26 release. Then, as 2.27 rolled-out here, WOL totally stopped. I performed a diff of the 2.26 and 2.27 e100.c code and submit the following diff demostrating the differences. I'll probably spend a little more time figuring out which of the three lines broke WOL here (since I've got it nailed down to this). (No changes noticed in /proc/acpi/wakeup or ethtool eth0 either.) Created attachment 18407 [details]
Diff between 2.26 & 2.27 e100.c
Diff between 2.26 & 2.27 e100.c
One of these three lines probably broke WOL for the e100 driver. Basically, when wakonlan sends the magic packet remotely. The lights blink on the server NIC, but doesn't wake up.
Sorry. Incorrectly stated version numbers above & for attachment. 2.26 == 2.6.26 2.27 == 2.6.27 Just built using the 2.6.26 old e100.c file, but required the additional pointer @ line 1790 to avoid a pointer error: + if (pci_dma_mapping_error(nic->pdev, rx->dma_addr)) WOL still doesn't work. Either the problem lies here or it's deeper. Should I start a new e100 WOL bug for this to avoid hijacking this one? Issues still present in 2.6.28 as of July 16, 2009. Any update on this one, it hasn't seen a lot of activity in a while. Closing as obsolete, if this is wrong feel free to re-open |