Bug 33872
Summary: | e1000e runtime suspend breaks shutdown - ThinkPad T510, X60, X61s, X201 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Diego Viola (diego.viola) |
Component: | Power-Off | Assignee: | Len Brown (lenb) |
Status: | CLOSED DUPLICATE | ||
Severity: | normal | CC: | acpi-bugzilla, alexander.h.duyck, anezch, ant_978red, bruce.w.allan, bwat47, carolyn.wyborny, chandramouli.narayanan, christian.jann, dan.j.williams, diego.viola, emil.s.tantilov, eric.dumazet, feng.tang, fengguang.wu, fenghua.yu, ismail, jardiamj, jbarnes, jbrandeb, jeffrey.t.kirsher, kernel, kianseong, lenb, mario, mein_mail1, ming.m.lin, mrlhwliberty, oleg.smirnov, rosenbavm, rui.zhang, shaohua.li, smconvey, suresh.b.siddha, thomas.jarosch, tomas.winkler, tushar.n.dave, vldmr, wey-yi.w.guy, yinghai, younes.m, youquan.song |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 2.6.38-ARCH | Subsystem: | |
Regression: | No | Bisected commit-id: | |
Attachments: |
Shutdown crash
Shutdown Crash 2 BIOS IRQ Config cpuinfo dmesg interrupts uname lspci BIOS Overview BIOS Display Settings archlinux rc.conf laptop-mode.conf shutdown_initcall_debug.jpg modprobe -l (list of modules in the system) dmesg interrupts lspci failed shutdown picture success shutdown picture ThinkPad T510 shutdown with maxcpus=1 |
Description
Diego Viola
2011-04-23 14:04:14 UTC
Created attachment 55132 [details]
Shutdown crash
Created attachment 55142 [details]
Shutdown Crash 2
Created attachment 55152 [details]
BIOS IRQ Config
Created attachment 55162 [details]
cpuinfo
Created attachment 55172 [details]
dmesg
Created attachment 55182 [details]
interrupts
Created attachment 55192 [details]
uname
Created attachment 55202 [details]
lspci
Created attachment 55212 [details]
BIOS Overview
Created attachment 55222 [details]
BIOS Display Settings
Created attachment 55232 [details]
archlinux rc.conf
Created attachment 55242 [details]
laptop-mode.conf
Please see the 'Shutdown Crash 2' image, this is where the system crashes on shutdown and the Disabling IRQ #19 messages appear. [diego@myhost ~]$ pacman -Q kernel26 kernel26 2.6.38.3-1 [diego@myhost ~]$ I also have to say that I tried changing the PCI IRQ's from 11 to "Auto" but that made no difference, the kernel still crashes on shutdown. I can reproduce this all the time if I have laptop-mode enabled on my init scripts, if I take that off the system will shutdown just fine. but I think it's an issue with the kernel, probably with CPU frequency scaling since I was able to reproduce the problem without laptop-mode at all, by simply lowering the CPU frequencies. but if I have laptop-mode it always happens, so I just disabled that service now. Any ideas about this issue and when it will be fixed please? This seems to be an issue with 2.6.39 still, using vanilla kernel from kernel.org. As reported here: https://bbs.archlinux.org/viewtopic.php?pid=939232#p939232 does this problem go away if you boot with "intel_idle.max_cstate=0" to disable the intel_idle driver? Any change if you exclude the "ips" driver from the kernel? (In reply to comment #19) > does this problem go away if you boot with "intel_idle.max_cstate=0" > to disable the intel_idle driver? > > Any change if you exclude the "ips" driver from the kernel? I tried this in Archlinux with kernel 2.6.38.6-2: A. with intel_idle.max_cstate: * boot with intel_idle.max_cstate=0 in /etc/sysctl.conf * enable laptop-mode * pm-suspend * resume * poweroff * --------- hang B. with intel_idle.max_cstate and ips blacklisted: * boot with added 'blacklist ips' in /etc/modprobe.d/modprobe.conf and intel_idle.max_cstate=0 in /etc/sysctl.conf * enable laptop-mode * pm-suspend * resume * poweroff * --------- hang My hardware is Asus A42JC: Intel Core i3 M370 2.40 GHz 2 GB DDR3 RAM NVidia GeForce 310M with Optimus Also I noticed that my laptop is warmer about 5 deg Celcius than if I run Windows 7. My machine also hangs with "Disabling IRQ #19" when booting the kernel with: intel_idle.max_cstate=0 I tried excluding the ips/intel_ips driver from the kernel as well but I get the same issue "Disabling IRQ #19". >intel_idle.max_cstate=0 causes hang
so this issue exists in both intel_idle and acpi_idle, can you try intel_idle.max_cstate=1 or 2, 3?
also please add kernel parameter 'initcall_debug' but not 'intel_idle.max_cstate=x', do 'echo 8 > /proc/sys/kernel/printk' before shutdown, try to reproduce the issue and attach picture here.
I tried booting the kernel with intel_idle.max_cstate=0, 1, 2, 3, and my system hanged up every time on shutdown with "Disabling IRQ #19". Please see the attached image, the image attached is when I shutted down with 'initcall_debug' and 'echo 8 > /proc/sys/kernel/printk'. The image file name is 'shutdown_initcall_debug.jpg'. Please let me know if you need anything else. i should say try intel_idle.max_cstate=0 or intel_idle.max_cstate=1, or intel_idle.max_cstate=2 ... didn't see the picture, forgot attaching it? Created attachment 60532 [details]
shutdown_initcall_debug.jpg
Attached it. In reply to comment #25. I tried with: intel_idle.max_cstate=0 intel_idle.max_cstate=1 intel_idle.max_cstate=2 intel_idle.max_cstate=3 But it didn't made any difference. The kernel/system hanged up every time during shutdown with "Disabling IRQ #19". how about kernel parameter 'pci=nomsi'? Kernel still hangs with 'pci=nomsi' with: "Disabling IRQ #17" "Disablign IRQ #19" trying disable as more dirvers as possible (maybe just leave harddisk and keyboard) and see if you can still reproduce the issue. Should I recompile with the latest kernel from kernel.org for that or blacklisting modules will do? maybe blacklisting some modules is enough if all drivers are module, like usb, iwlagn, hda_intel, ips, firewire_ohci, sdhci. can you please tell me which modules I should disable from this list, this is my /proc/modules: aesni_intel 45657 1 - Live 0xffffffffa0539000 cryptd 7661 1 aesni_intel, Live 0xffffffffa0531000 aes_x86_64 7436 1 aesni_intel, Live 0xffffffffa052a000 aes_generic 26066 2 aesni_intel,aes_x86_64, Live 0xffffffffa051d000 ipv6 277189 14 - Live 0xffffffffa04c5000 uvcvideo 60799 0 - Live 0xffffffffa04af000 btusb 11185 0 - Live 0xffffffffa01d1000 videodev 65175 1 uvcvideo, Live 0xffffffffa048c000 snd_hda_codec_hdmi 22378 1 - Live 0xffffffffa049e000 bluetooth 55409 1 btusb, Live 0xffffffffa02d3000 snd_hda_codec_conexant 41714 1 - Live 0xffffffffa0277000 v4l2_compat_ioctl32 6716 1 videodev, Live 0xffffffffa0136000 joydev 9799 0 - Live 0xffffffffa0168000 snd_hda_intel 21738 2 - Live 0xffffffffa0152000 arc4 1402 2 - Live 0xffffffffa0106000 ecb 2033 2 - Live 0xffffffffa0087000 iwlagn 385983 0 - Live 0xffffffffa040d000 snd_hda_codec 73739 3 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel, Live 0xffffffffa03f8000 i915 629309 7 - Live 0xffffffffa0348000 snd_hwdep 6134 1 snd_hda_codec, Live 0xffffffffa00a2000 snd_pcm 71032 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec, Live 0xffffffffa0263000 drm_kms_helper 26624 1 i915, Live 0xffffffffa01b4000 sdhci_pci 8202 0 - Live 0xffffffffa011d000 thinkpad_acpi 59799 0 - Live 0xffffffffa01de000 iwlcore 103302 1 iwlagn, Live 0xffffffffa0223000 sdhci 17061 1 sdhci_pci, Live 0xffffffffa0341000 ehci_hcd 39209 0 - Live 0xffffffffa0329000 snd_timer 18992 1 snd_pcm, Live 0xffffffffa010e000 drm 173492 3 i915,drm_kms_helper, Live 0xffffffffa0286000 mac80211 202190 2 iwlagn,iwlcore, Live 0xffffffffa02e3000 mmc_core 63886 1 sdhci, Live 0xffffffffa0317000 snd 55132 12 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,thinkpad_acpi,snd_timer, Live 0xffffffffa02c3000 usbcore 134859 4 uvcvideo,btusb,ehci_hcd, Live 0xffffffffa0200000 soundcore 5986 1 snd, Live 0xffffffffa007f000 cfg80211 141484 3 iwlagn,iwlcore,mac80211, Live 0xffffffffa023e000 tpm_tis 7961 0 - Live 0xffffffffa0141000 sg 24917 0 - Live 0xffffffffa01f7000 processor 23608 0 - Live 0xffffffffa01c9000 tpm 11237 1 tpm_tis, Live 0xffffffffa0109000 nvram 5669 1 thinkpad_acpi, Live 0xffffffffa00a9000 tpm_bios 4953 1 tpm, Live 0xffffffffa002e000 firewire_ohci 28073 0 - Live 0xffffffffa01d5000 i2c_algo_bit 5063 1 i915, Live 0xffffffffa0121000 psmouse 52944 0 - Live 0xffffffffa01a5000 firewire_core 47790 1 firewire_ohci, Live 0xffffffffa0128000 intel_agp 10480 1 i915, Live 0xffffffffa0118000 intel_gtt 13943 3 i915,intel_agp, Live 0xffffffffa0100000 intel_ips 10885 0 - Live 0xffffffffa009d000 i2c_i801 7987 0 - Live 0xffffffffa0073000 serio_raw 4222 0 - Live 0xffffffffa002a000 snd_page_alloc 7017 2 snd_hda_intel,snd_pcm, Live 0xffffffffa01c5000 e1000e 137919 0 - Live 0xffffffffa017c000 crc_itu_t 1321 1 firewire_core, Live 0xffffffffa0076000 evdev 9178 9 - Live 0xffffffffa01a0000 pcspkr 1843 0 - Live 0xffffffffa01bf000 iTCO_wdt 11053 0 - Live 0xffffffffa016d000 video 10996 1 i915, Live 0xffffffffa0172000 i2c_core 18740 6 videodev,i915,drm_kms_helper,drm,i2c_algo_bit,i2c_i801, Live 0xffffffffa0161000 rfkill 14810 3 bluetooth,thinkpad_acpi,cfg80211, Live 0xffffffffa014c000 iTCO_vendor_support 1857 1 iTCO_wdt, Live 0xffffffffa015b000 thermal 7631 0 - Live 0xffffffffa0144000 button 4794 1 i915, Live 0xffffffffa0124000 battery 10545 0 - Live 0xffffffffa013c000 ac 3193 0 - Live 0xffffffffa0139000 wmi 8083 0 - Live 0xffffffffa0114000 ext4 333040 2 - Live 0xffffffffa00ac000 mbcache 5649 1 ext4, Live 0xffffffffa00a5000 jbd2 69632 1 ext4, Live 0xffffffffa008a000 crc16 1321 1 ext4, Live 0xffffffffa0084000 sr_mod 14247 0 - Live 0xffffffffa0067000 cdrom 35657 1 sr_mod, Live 0xffffffffa005c000 sd_mod 25668 4 - Live 0xffffffffa0021000 ahci 20441 3 - Live 0xffffffffa0078000 libahci 17966 1 ahci, Live 0xffffffffa006c000 libata 167694 2 ahci,libahci, Live 0xffffffffa0031000 scsi_mod 123250 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffa0000000 please tell me which modules I should keep from that list, so I could disable all the other modules. disabling iwlagn sdhci e1000e intel_ips firewire_ohci ehci_hcd sdhci_pci snd_hda_intel, there might be dependence though. Created attachment 60552 [details]
modprobe -l (list of modules in the system)
I tried shutting down the machine like 5 times without the modules mentioned and the machine shutted down successfully every time. Does this means that there is a bug in one of those modules? Here some interesting info, I tried disabling modules in pairs. Here is what I found: MODULES DISABLED: iwlagn + intel_ips = hanged 4 times during shutdown. sdhci + intel_ips = hanged 2 times during shutdown. e1000e + intel_ips = shutdown fine 2 times. firewire_ohci + intel_ips = hanged 2 times during shutdown. ehci_hcd + intel_ips = hanged 2 times during shutdown. sdhci_pci + intel_ips = hanged 2 times during shutdown -- shutdown fine 1 time. snd_hda_intel + intel_ips = hanged 2 times during shutdown. I have a Thinkpad T410 with a Core i7 620M processor using the Intel IGP. I run Arch Linux with a custom kernel (which is essentially the pf kernel compiled with a minimal amount of modules for my system). My system also invariably hangs during reboot/halting with the message "Disabling IRQ #19" under the following procedure: Power on, Suspend (using uswsusp's s2ram), Resume, Halt (shutdown -h now) If I do not suspend/resume, I do not get the error. I haven't done a significant amount of testing, but based on Diego Viola's findings I tried following the exact same process as above while removing only the e1000e module before halting (I did not remove intel_ips). My laptop successfully halted 4/4 times. so this could be problem of the drivers. please isolate the issue to a specific driver. Diego, please check if blacklist e1000e resolved your issue. OK so I ran a new test and tried blacklisting the modules one by one (~32 shutdowns in total.) Here is the results: intel_ips (blacklisted) shutdown #1: halted fine. shutdown #2: hangs. shutdown #3: hangs. shutdown #4: hangs. iwlagn (blacklisted) shutdown #1: hangs. shutdown #2: hangs. suhtdown #3: hangs. shutdown #4: hangs. sdhci (blacklisted) shutdown #1: halted fine. shutdown #2: hangs. shutdown #3: hangs. shutdown #4: halted fine. e1000e (blacklisted) shutdown #1: halted fine. shutdown #2: halted fine. shutdown #3: halted fine. shutdown #4: halted fine. firewire_ohci (blacklisted) shutdown #1: hangs. shutdown #2: halted fine. shutdown #3: halted fine. shutdown #4: halted fine. ehci_hcd (blacklisted) shutdown #1: hangs. shutdown #2: hangs. shutdown #3: hangs. shutdown #4: hangs. sdhci_pci (blacklisted) shutdown #1: halted fine. shutdown #2: hangs. shutdown #3: hangs. shutdown #4: hangs. snd_hda_intel (blacklisted) shutdown #1: hangs. shutdown #2: hangs. shutdown #3: hangs. shutdown #4: halted fine. so yeah, when I blacklisted the e1000e module, I couldn't reproduce the hang issue, but still, I wouldn't call that a fix. I just can't reproduce the problem when I blacklist this module. Also, please note that I have laptop-mode-tools running on my system with the default settings. I test this with laptop-mode-tools enabled because for some reason having it enabled exposes/triggers this problem more than having it disabled. But this problem is also present when disabling laptop-mode-tools. From my experience, laptop-mode-tools just triggers this problem more. sure, this is not a fix, but just isolated the problem to be e1000e problem. Need help from e1000e guys here. I'm running Arch Linux with kernel 2.6.39 on a Thinkpad X61s. My system also hangs on shutdown, but without the "Disabling IRQ" messages. Unloading the e1000e module before shutdown also solves the problem, but I figured out, that a simple "ifconfig eth0 down" does the trick, too. @Shaohua: correct, I'm glad we are making progress! Thanks for your help with this. OK so more good news and results! :D I've shutdown my laptop 50 times (yes 50 times for real!) with e1000e blacklisted, and the machine halted perfectly the 50 times I did shutdown the machine. So I think we found the real culprit here... e1000e. Thanks everyone for your help. I can still reproduce the issue with "ifconfig eth0 down". However, removing the e1000e module from the kernel makes the problem go away. Removing the e1000e module is irrelevant for my case, since my laptop uses JMicron JMC250 ethernet controller instead of intel. So there's must be a general issue in the e1000e module that exists in other modules too. Maybe this is a problem with the ACPI subsystem itself and not with the drivers? Since the shutdown issue happens very randomly. Won't a shutdown trace help with this? What an annoying problem. Jeez. After some experiment, I found a workaround for my hardware. Unloading the ehci_hcd module before shutdown ensures the system to shutdown properly. (In reply to comment #53) > After some experiment, I found a workaround for my hardware. Unloading the > ehci_hcd module before shutdown ensures the system to shutdown properly. so we are still far away to root cause the problem. can you post your lspci, dmesg and interrupt data? can you do the same test mentioned in Comment #23? Created attachment 60602 [details]
dmesg
dmesg
Created attachment 60612 [details]
interrupts
Created attachment 60622 [details]
lspci
Created attachment 60632 [details]
failed shutdown picture
I agree with Shaohua on Comment #54. blacklisting 'e1000e' seems to be just a temporal workaround and it seems like it's not the cause based on the experience of other people. So please ignore my comments above of me getting excited for the workaround thing. I hope we'll find the root of the cause and fix it. Thanks everyone for your help. Let me know if I could provide with anything else for helping with this. Created attachment 60642 [details]
success shutdown picture
Notice the difference from the failed shutdown picture, this time there are 2 messages from ehci_hcd.
I must use my Ethernet device so I removed this driver from the blacklist, but my system would still invariably hang from time to time. Any Intel engineers that could help us solve this problem for once and all please? this driver = e1000e module. I think I have the same problem but on an AMD desktop system, I have to rmmod cx8800 cx88_dvb cx24116 cx22702 cx8802 cx88_vp3054_i2c cx88_alsa cx88xx to shutdown. https://bugzilla.redhat.com/show_bug.cgi?id=630420 When this issue will be fixed please? It's annoying as hell. Sorry for my Comment #64. I deeply regret. Sorry. I Will refrain from doing such thing again. I'm running arch linux 64 bit, with 2.6.39 kernel. My laptop is an Asus U52F (core i5 460m, ironlake graphics, 4gb ram,) On my machine adding this command to my rc.local.shutdown script fixed this: "rmmod ehci_hcd" (In reply to comment #60) > Created an attachment (id=60642) [details] > success shutdown picture > > Notice the difference from the failed shutdown picture, this time there are 2 > messages from ehci_hcd. Looks your problem is different as Diego's. In Diego's case, the shutdown hangs when doing device shutdown. while in your case, the shutdown hangs when calling into ACPI shutdown (device shutdown is finished actually) For my case using Arch on X61 blacklisting e1000e (which was not used anyway) worked. (In reply to comment #67) > (In reply to comment #60) > > Created an attachment (id=60642) [details] [details] > > success shutdown picture > > > > Notice the difference from the failed shutdown picture, this time there are > 2 > > messages from ehci_hcd. > Looks your problem is different as Diego's. > > In Diego's case, the shutdown hangs when doing device shutdown. > while in your case, the shutdown hangs when calling into ACPI shutdown > (device > shutdown is finished actually) So my problem is related to the ACPI subsystem, thanks for pointing it out and sorry if my posts above are irrelevant to this bug report. (In reply to comment #69) > (In reply to comment #67) > > (In reply to comment #60) > > > Created an attachment (id=60642) [details] [details] [details] > > > success shutdown picture > > > > > > Notice the difference from the failed shutdown picture, this time there > are 2 > > > messages from ehci_hcd. > > Looks your problem is different as Diego's. > > > > In Diego's case, the shutdown hangs when doing device shutdown. > > while in your case, the shutdown hangs when calling into ACPI shutdown > (device > > shutdown is finished actually) > > So my problem is related to the ACPI subsystem, thanks for pointing it out > and > sorry if my posts above are irrelevant to this bug report. I don't think it's irrelevant. It would be nice to get all those problems fixed. Regardless if it's the same bug or not. An update to my comment at #53 about rmmod-ing ehci_hcd before shutdown: If I put it at /etc/rc.local.shutdown (shutdown script), the issue is still produced but in less frequently. It was working for me for 3 days (about 6 poweroffs) and suddenly the issue is occured again. Now I always manually rmmod ehci_hcd before issuing poweroff to give some interval time between rmmod and acpi shutdown. So far I have 4 clean poweroffs. This issue will likely to be produced if I turn on the laptop with battery power and laptop mode tools enabled, and then I plug it to AC power adapter then poweroff. I couldn't get my machine to shutdown yet by having e1000e on blacklist, I have shutdown a lot on the past 5 days and it hasn't hanged yet since I removed e1000e. Sorry, I wanted to say that I couldn't get my machine to hang yet by having e1000e on blacklist, it's halting perfectly now since I have removed e1000e. my machine has been halting perfectly for the past 5 days now that I removed/blacklisted e1000e module. Please note that when I have the 'e1000e' module loaded, the system would hang/freeze on shutdown and also reboot. I am having the same? shutdown problem with my EEEPC R051PEM, Atom N550, 2GB RAM: blacklist rt2800pci blacklist rt2800lib blacklist rt2x00usb blacklist rt2x00pci blacklist rt2x00lib has solved my shutdown problem. WLAN still works using rt2860sta. Non of the other solutions available in the net (blacklisting e1000e, adding rmmod snd_intel_hda to shutdown script, ehci_hcd is not in use) has solved the problem. Hope that helps to solve the problem, if any further information is required, please ask with instructions how to obtain them, I just a user. lsb_release -a LSB Version: core-2.0-ia32:core-2.0-noarch:core-3.0-ia32:core-3.0-noarch:core-3.1-ia32:core-3.1-noarch:core-3.2-ia32:core-3.2-noarch:core-4.0-ia32:core-4.0-noarch Distributor ID: Ubuntu Description: Ubuntu 10.10 Release: 10.10 Codename: maverick uname -a Linux eeepc01 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 19:00:26 UTC 2011 i686 GNU/Linu lspci 00:00.0 Host bridge: Intel Corporation N10 Family DMI Bridge (rev 02) 00:02.0 VGA compatible controller: Intel Corporation N10 Family Integrated Graphics Controller (rev 02) 00:02.1 Display controller: Intel Corporation N10 Family Integrated Graphics Controller (rev 02) 00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 02) 00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 02) 00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 02) 00:1c.3 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 4 (rev 02) 00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 02) 00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 02) 00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2) 00:1f.0 ISA bridge: Intel Corporation NM10 Family LPC Controller (rev 02) 00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 02) 01:00.0 Ethernet controller: Atheros Communications AR8132 Fast Ethernet (rev c0) 02:00.0 Network controller: RaLink RT3090 Wireless 802.11n 1T/1R PCIe I also have the same problem with a new Dell Optiplex 990 workstation, device has a UEFI Bios so that might be a problem. I played around with some printk statements and found out, that it hangs in the napi_disable() function defined in include/linux/netdevice.h, when called from e1000e_down() in drivers/net/e1000e/netdev.c. I hope this is helpful. (In reply to comment #77) > I also have the same problem with a new Dell Optiplex 990 workstation, device > has a UEFI Bios so that might be a problem. Hallo We are having a lot of Dell computers running Fedora 13 with kernel 2.6.33.3-85.fc13.x86_64 smp. Without having analysed the behaviour in detail I think that I have observed the following behaviour (the computers are for simulations and should run without shutdown): Optiplex GX620 hang on shutdown, Optiplex 745, 755, 760, 780 and PowerEdge T710 shutdown without problems hope that helps Hey; (In reply to comment #79) > (In reply to comment #77) > We are having a lot of Dell computers running Fedora 13 with kernel > 2.6.33.3-85.fc13.x86_64 smp. Without having analysed the behaviour in detail > I > think that I have observed the following behaviour (the computers are for > simulations and should run without shutdown): > Optiplex GX620 hang on shutdown, > Optiplex 745, 755, 760, 780 and PowerEdge T710 shutdown without problems > > hope that helps AFAIK OptiPlex 990 is the first version to have SandyBridge, I am not sure about UEFI. If you guys need any details let me know. I tried contacting the e1000-devel mailing list here: https://lists.sourceforge.net/lists/listinfo/e1000-devel And Thomas Jarosch replied with the following comment: http://sourceforge.net/mailarchive/message.php?msg_id=27666243 He suggests that we compile the kernel with "lockup detection" and we provide a trace. He also states that "this problem mustn't be related to e1000e at all, maybe unloading the module beforehand just makes the problem less likely to occur." I have the problem reported here, on my Lenovo X60 running Arch Linux. `rmmod e1000e` before shutdown works for me. It's the only workaround I know of. I can also verify here that running kernel 2.6.38 on mageia on an identical laptop is fine. verified 2.6.39.2 on Arch linux I do confirm that rmmod e1000e fixes the shutdown problem. I'm using HP 8510w with Arch Linux 2.6.39 kernel. re: comment #78 > it hangs in the napi_disable() function > defined in include/linux/netdevice.h, > when called from e1000e_down() in drivers/net/e1000e/netdev.c. /** * napi_disable - prevent NAPI from scheduling * @n: napi context * * Stop NAPI from being scheduled on this context. * Waits till any outstanding processing completes. */ static inline void napi_disable(struct napi_struct *n) { set_bit(NAPI_STATE_DISABLE, &n->state); while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) msleep(1); clear_bit(NAPI_STATE_DISABLE, &n->state); } I don't think lockup detection will flag this while loop. void e1000e_down(struct e1000_adapter *adapter) ... /* disable transmits in the hardware */ tctl = er32(TCTL); tctl &= ~E1000_TCTL_EN; ew32(TCTL, tctl); /* flush both disables and wait for them to finish */ e1e_flush(); usleep_range(10000, 20000); napi_disable(&adapter->napi); e1000_irq_disable(adapter); del_timer_sync(&adapter->watchdog_timer); del_timer_sync(&adapter->phy_info_timer); BTW. I always suspect magic numbers like usleep_range above. What if you increase them both, say by a factor of 4? Also, e1000e_down is called when unloading the driver before shutdown and that works -- so what is special about when it is called in the shutdown path that makes it fail? Re: "Disabling IRQ #19" message On Diego's box, ips and EHCI are on IRQ 19. This message indicates that they shutdown properly. The real problem, of course, is e1000e that follows, and it is not on IRQ 19, but is on IRQ 20/MSI 41: [ 6.189628] e1000e: Intel(R) PRO/1000 Network Driver - 1.2.20-k2 [ 6.189633] e1000e: Copyright(c) 1999 - 2011 Intel Corporation. [ 6.189687] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 [ 6.189706] e1000e 0000:00:19.0: setting latency timer to 64 [ 6.190059] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X FWIW I can at least reboot with reboot=pci kernel parameter, didn't try shutdown though. This is with the 3.0 kernel. Ismail, Please open a new bug report, unless your machine behaves exactly like the original reporter (Diego). Diego, I'd be interested to know if booting with "maxcpus=1" makes any difference. Created attachment 67282 [details]
ThinkPad T510 shutdown with maxcpus=1
ThinkPad T510 shutdown with maxcpus=1
Len, I tried booting with "maxcpus=1" but I get the same problem, the system hangs on shutdown still, unless doing a "rmmod e1000e" before shutdown. See the last image attached. I have the same problem with my Toshiba Tecra M3 Laptop when I close the laptop lid. Same here on a Thinkpad T60, just add modprobe -r e1000e to /etc/rc.local.shutdown on 2.6.39-ARCH otherwise it freezes/hangs on shutdown/reboot I think I reproduced this on my X60, after much fussing with arch's peculiarities. I believe the issue is because the device is in D3 due to laptop-mode-tools and no ethernet link. kernel 3.0.0 (arch) I believe the issue (at least tied to e1000e) is strongly related to the runtime power management in the kernel. I also have a T60 with similar chipset that we can try to repro on, but I think the driver wake code for making sure D3->D0 device power management transition is made correctly before shutdown is where I'm going to investigate if I have time. I do not have access to a T510 but don't believe we need one. Jesse, Thanks for your reply. Just providing the steps to reproduce the problem: 1- install Arch Linux 64-bit and $(pacman -Syu laptop-mode-tools). 2- enable "laptop-mode" in /etc/rc.conf (DAEMONS array). 3- make sure "e1000e" kernel module is loaded. 4- reboot then shutdown. At this time the system should crash. I have a T510 so let me know if you can provide us with a patch so we can try to reproduce the issue, verify that the problem is fixed and report back. Thank you. I have ThinkPad X220i and ArchLinux x86_64. I don't use laptop-mode-tools. $ lspci | grep Ethernet 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) When I enable runtime power management for ethernet card echo auto > /sys/devices/pci0000:00/0000:00:19.0/power/control notebook hanged during reboot or shutdown. Adding "rmmod e1000e" to rc.local.shutdown solved this problem for me. Has there been any progress with this? I'm glad bugzilla is back. I haven't had this issue in a while, not sure which kernel update ended up fixing it but I'm currently on 3.1.x and it shuts down fine on a fresh install of arch without having to use any workarounds. Hi, Diego, can you please verify if the problem still exists in the latest upstream kernel? I still get the same problem with kernel 3.2.1 (Arch Linux x86_64). I get the issue when laptop-mode-tools (laptop-mode) is active, and when the module "e1000e" is loaded. If I blacklist e1000e and disable laptop-mode all is fine. This is the output I get when the system hangs at shutdown (withe e1000e/laptop-mode is loaded): http://dl.dropbox.com/u/6005119/kernel/P1140657.JPG s/withe/when/ Apparently I have another issue now also, with those "udev timeout: killing" lines. It seems to be related to TPM. http://dl.dropbox.com/u/6005119/kernel/P1140654.JPG https://bbs.archlinux.org/viewtopic.php?id=130228 Any ideas? I think I have the same problem that Diego describes in this bug. I'm using Debian Wheezy, kernel 3.1.0 on a Thinkpad T420. Unloading the module echci_hcd fixes the problem, it's been a couple of weeks since I'm using that fix and my laptop shuts down OK. I hope someone can get to the root of this problem. Cheers! lautriv from ##linux (freenode) provided some insight about this issue. here's the log: 10:55 < lautriv> diegoviola, that issue relates to WakeOnLan you can't remove the IRQ without breaking wakeup stuff. 10:57 < lautriv> diegoviola, i did just read the headlines from you link and remembered that case on my diskless boxes. 11:01 < lautriv> diegoviola, that is because the module claims resources. if i remember right, that did not happen if built-in but that way was unable to use WoL 11:03 < lautriv> diegoviola, you may try to recompile with e1000 built-in or simply tell your shutdown to call a script ( K99removenic or something else) 11:07 < lautriv> diegoviola, it's bad behaviour but the module does it right. still holding resorces while in use. can anyone please verify if this is a duplicate of bug #36132? (In reply to comment #104) > can anyone please verify if this is a duplicate of bug #36132? I can confirm that it seems to be the same issue. Okay. As this problem is caused by the e1000e runtime suspend, let's move discussion to bug #36132. *** This bug has been marked as a duplicate of bug 36132 *** It appears this issue was fixed in 3.3.1 (or possible the first release of 3.3), I've tested 3.3.2 and it's fine too. When I say that it's fine, I mean that I can't reproduce the hang anymore. Shutdown/reboot works as expected now. Please give it a try and share your results in Bug 36132. Thanks. s/possible/possibly/ |