Bug 33872 - e1000e runtime suspend breaks shutdown - ThinkPad T510, X60, X61s, X201
Summary: e1000e runtime suspend breaks shutdown - ThinkPad T510, X60, X61s, X201
Status: CLOSED DUPLICATE of bug 36132
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Off (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-23 14:04 UTC by Diego Viola
Modified: 2012-06-05 03:07 UTC (History)
42 users (show)

See Also:
Kernel Version: 2.6.38-ARCH
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Shutdown crash (593.09 KB, image/jpeg)
2011-04-23 14:07 UTC, Diego Viola
Details
Shutdown Crash 2 (539.84 KB, image/jpeg)
2011-04-23 14:17 UTC, Diego Viola
Details
BIOS IRQ Config (861.77 KB, image/jpeg)
2011-04-23 14:18 UTC, Diego Viola
Details
cpuinfo (3.29 KB, text/plain)
2011-04-23 14:24 UTC, Diego Viola
Details
dmesg (61.35 KB, text/plain)
2011-04-23 14:25 UTC, Diego Viola
Details
interrupts (1.97 KB, text/plain)
2011-04-23 14:26 UTC, Diego Viola
Details
uname (141 bytes, text/plain)
2011-04-23 14:43 UTC, Diego Viola
Details
lspci (2.44 KB, text/plain)
2011-04-23 14:53 UTC, Diego Viola
Details
BIOS Overview (954.22 KB, image/jpeg)
2011-04-23 15:03 UTC, Diego Viola
Details
BIOS Display Settings (845.46 KB, image/jpeg)
2011-04-23 15:04 UTC, Diego Viola
Details
archlinux rc.conf (3.36 KB, text/plain)
2011-04-23 15:37 UTC, Diego Viola
Details
laptop-mode.conf (11.14 KB, text/plain)
2011-04-23 15:38 UTC, Diego Viola
Details
shutdown_initcall_debug.jpg (908.08 KB, image/jpeg)
2011-06-02 06:20 UTC, Diego Viola
Details
modprobe -l (list of modules in the system) (110.52 KB, text/plain)
2011-06-02 08:56 UTC, Diego Viola
Details
dmesg (78.44 KB, text/plain)
2011-06-03 08:52 UTC, Agustianes Suwardi
Details
interrupts (2.31 KB, text/plain)
2011-06-03 08:56 UTC, Agustianes Suwardi
Details
lspci (2.43 KB, text/plain)
2011-06-03 08:57 UTC, Agustianes Suwardi
Details
failed shutdown picture (488.08 KB, image/jpeg)
2011-06-03 09:02 UTC, Agustianes Suwardi
Details
success shutdown picture (471.02 KB, image/jpeg)
2011-06-03 09:08 UTC, Agustianes Suwardi
Details
ThinkPad T510 shutdown with maxcpus=1 (974.45 KB, image/jpeg)
2011-07-31 18:03 UTC, Diego Viola
Details

Description Diego Viola 2011-04-23 14:04:14 UTC
My System:

* ThinkPad T510 (model 4313CTO)
* Arch Linux x86-64
* Linux 2.6.38-ARCH
* Intel Core i7-620M Processor (2.66GHz, 4MB L3, 1066MHz FSB)
* 4 GB PC3-10600 DDR3 SDRAM 1333MHz SODIMM Memory (1 DIMM)

Steps to Reproduce:

* Install Arch Linux x86-64
* Install laptop-mode-tools and enable it on DAEMONS under /etc/rc.conf
* reboot and then shutdown.

Actual Results:

Whenever I shutdown the machine with laptop-mode enabled on rc.conf, the system would lock/freeze with "Disabling IRQ #19" and similar messages. I thought this was a problem with my hardware until I discovered that other people are having this problem on the Arch Linux forums. See here: https://bbs.archlinux.org/viewtopic.php?id=113985

People on the Arch Linux forums have notified about this problem to the laptop-mode-tools maintainer but the maintainer in question claimed that this is a kernel bug. See here: http://mailman.samwel.tk/pipermail/laptop-mode/2011-April/000433.html

I believe this is a kernel bug, since I was able to reproduce the issue without laptop-mode-tools at all, simply by changing the CPU governor to on-demand followed by shutdown. After I did that, I reproduced the same effect.

I have tried to upgrade the BIOS on my ThinkPad from version 1.41 to 1.43 with no success. See logs and pictures about this issue on the attachment of this report.

I have laptop-mode-tools enabled with the default modules, no additional configuration has been made. Additionally, I also noticed that the discussion here looks related to this problem: https://lkml.org/lkml/2011/4/4/121

Expected Results:

System shutdowns fine.
Comment 1 Diego Viola 2011-04-23 14:07:58 UTC
Created attachment 55132 [details]
Shutdown crash
Comment 2 Diego Viola 2011-04-23 14:17:00 UTC
Created attachment 55142 [details]
Shutdown Crash 2
Comment 3 Diego Viola 2011-04-23 14:18:05 UTC
Created attachment 55152 [details]
BIOS IRQ Config
Comment 4 Diego Viola 2011-04-23 14:24:56 UTC
Created attachment 55162 [details]
cpuinfo
Comment 5 Diego Viola 2011-04-23 14:25:34 UTC
Created attachment 55172 [details]
dmesg
Comment 6 Diego Viola 2011-04-23 14:26:45 UTC
Created attachment 55182 [details]
interrupts
Comment 7 Diego Viola 2011-04-23 14:43:24 UTC
Created attachment 55192 [details]
uname
Comment 8 Diego Viola 2011-04-23 14:53:59 UTC
Created attachment 55202 [details]
lspci
Comment 9 Diego Viola 2011-04-23 15:03:06 UTC
Created attachment 55212 [details]
BIOS Overview
Comment 10 Diego Viola 2011-04-23 15:04:04 UTC
Created attachment 55222 [details]
BIOS Display Settings
Comment 11 Diego Viola 2011-04-23 15:37:56 UTC
Created attachment 55232 [details]
archlinux rc.conf
Comment 12 Diego Viola 2011-04-23 15:38:24 UTC
Created attachment 55242 [details]
laptop-mode.conf
Comment 13 Diego Viola 2011-04-23 15:45:44 UTC
Please see the 'Shutdown Crash 2' image, this is where the system crashes on shutdown and the Disabling IRQ #19 messages appear.
Comment 14 Diego Viola 2011-04-23 15:51:54 UTC
[diego@myhost ~]$ pacman -Q kernel26
kernel26 2.6.38.3-1
[diego@myhost ~]$
Comment 15 Diego Viola 2011-04-23 15:57:04 UTC
I also have to say that I tried changing the PCI IRQ's from 11 to "Auto" but that made no difference, the kernel still crashes on shutdown.
Comment 16 Diego Viola 2011-04-23 18:19:17 UTC
I can reproduce this all the time if I have laptop-mode enabled on my init scripts, if I take that off the system will shutdown just fine.

but I think it's an issue with the kernel, probably with CPU frequency scaling since I was able to reproduce the problem without laptop-mode at all, by simply lowering the CPU frequencies.

but if I have laptop-mode it always happens, so I just disabled that service now.
Comment 17 Diego Viola 2011-05-04 21:56:16 UTC
Any ideas about this issue and when it will be fixed please?
Comment 18 Diego Viola 2011-05-26 16:34:19 UTC
This seems to be an issue with 2.6.39 still, using vanilla kernel from kernel.org. As reported here: https://bbs.archlinux.org/viewtopic.php?pid=939232#p939232
Comment 19 Len Brown 2011-06-01 04:36:09 UTC
does this problem go away if you boot with "intel_idle.max_cstate=0"
to disable the intel_idle driver?

Any change if you exclude the "ips" driver from the kernel?
Comment 20 Agustianes Suwardi 2011-06-01 05:29:55 UTC
(In reply to comment #19)
> does this problem go away if you boot with "intel_idle.max_cstate=0"
> to disable the intel_idle driver?
> 
> Any change if you exclude the "ips" driver from the kernel?

I tried this in Archlinux with kernel 2.6.38.6-2:

A. with intel_idle.max_cstate:
   * boot with intel_idle.max_cstate=0 in /etc/sysctl.conf
   * enable laptop-mode
   * pm-suspend
   * resume
   * poweroff
   * --------- hang

B. with intel_idle.max_cstate and ips blacklisted:
   * boot with added 'blacklist ips' in /etc/modprobe.d/modprobe.conf
     and intel_idle.max_cstate=0 in /etc/sysctl.conf
   * enable laptop-mode
   * pm-suspend
   * resume
   * poweroff
   * --------- hang

My hardware is Asus A42JC:
Intel Core i3 M370 2.40 GHz
2 GB DDR3 RAM
NVidia GeForce 310M with Optimus

Also I noticed that my laptop is warmer about 5 deg Celcius than if I run Windows 7.
Comment 21 Diego Viola 2011-06-01 06:09:14 UTC
My machine also hangs with "Disabling IRQ #19" when booting the kernel with: intel_idle.max_cstate=0
Comment 22 Diego Viola 2011-06-01 07:27:50 UTC
I tried excluding the ips/intel_ips driver from the kernel as well but I get the same issue "Disabling IRQ #19".
Comment 23 Shaohua 2011-06-02 01:08:02 UTC
>intel_idle.max_cstate=0 causes hang
so this issue exists in both intel_idle and acpi_idle, can you try intel_idle.max_cstate=1 or 2, 3?

also please add kernel parameter 'initcall_debug' but not 'intel_idle.max_cstate=x', do 'echo 8 > /proc/sys/kernel/printk' before shutdown, try to reproduce the issue and attach picture here.
Comment 24 Diego Viola 2011-06-02 06:16:24 UTC
I tried booting the kernel with intel_idle.max_cstate=0, 1, 2, 3, and my system hanged up every time on shutdown with "Disabling IRQ #19".

Please see the attached image, the image attached is when I shutted down with 'initcall_debug' and 'echo 8 > /proc/sys/kernel/printk'.

The image file name is 'shutdown_initcall_debug.jpg'.

Please let me know if you need anything else.
Comment 25 Shaohua 2011-06-02 06:19:20 UTC
i should say try intel_idle.max_cstate=0 or intel_idle.max_cstate=1, or intel_idle.max_cstate=2 ...

didn't see the picture, forgot attaching it?
Comment 26 Diego Viola 2011-06-02 06:20:47 UTC
Created attachment 60532 [details]
shutdown_initcall_debug.jpg
Comment 27 Diego Viola 2011-06-02 06:21:12 UTC
Attached it.
Comment 28 Diego Viola 2011-06-02 06:23:48 UTC
In reply to comment #25.

I tried with:

intel_idle.max_cstate=0
intel_idle.max_cstate=1
intel_idle.max_cstate=2
intel_idle.max_cstate=3

But it didn't made any difference. The kernel/system hanged up every time during shutdown with "Disabling IRQ #19".
Comment 29 Shaohua 2011-06-02 06:54:36 UTC
how about kernel parameter 'pci=nomsi'?
Comment 30 Diego Viola 2011-06-02 07:33:23 UTC
Kernel still hangs with 'pci=nomsi' with:

"Disabling IRQ #17"
"Disablign IRQ #19"
Comment 31 Shaohua 2011-06-02 08:12:24 UTC
trying disable as more dirvers as possible (maybe just leave harddisk and keyboard) and see if you can still reproduce the issue.
Comment 32 Diego Viola 2011-06-02 08:18:08 UTC
Should I recompile with the latest kernel from kernel.org for that or blacklisting modules will do?
Comment 33 Shaohua 2011-06-02 08:21:44 UTC
maybe blacklisting some modules is enough if all drivers are module, like usb, iwlagn, hda_intel, ips,  firewire_ohci, sdhci.
Comment 34 Diego Viola 2011-06-02 08:48:40 UTC
can you please tell me which modules I should disable from this list, this is my /proc/modules:

aesni_intel 45657 1 - Live 0xffffffffa0539000
cryptd 7661 1 aesni_intel, Live 0xffffffffa0531000
aes_x86_64 7436 1 aesni_intel, Live 0xffffffffa052a000
aes_generic 26066 2 aesni_intel,aes_x86_64, Live 0xffffffffa051d000
ipv6 277189 14 - Live 0xffffffffa04c5000
uvcvideo 60799 0 - Live 0xffffffffa04af000
btusb 11185 0 - Live 0xffffffffa01d1000
videodev 65175 1 uvcvideo, Live 0xffffffffa048c000
snd_hda_codec_hdmi 22378 1 - Live 0xffffffffa049e000
bluetooth 55409 1 btusb, Live 0xffffffffa02d3000
snd_hda_codec_conexant 41714 1 - Live 0xffffffffa0277000
v4l2_compat_ioctl32 6716 1 videodev, Live 0xffffffffa0136000
joydev 9799 0 - Live 0xffffffffa0168000
snd_hda_intel 21738 2 - Live 0xffffffffa0152000
arc4 1402 2 - Live 0xffffffffa0106000
ecb 2033 2 - Live 0xffffffffa0087000
iwlagn 385983 0 - Live 0xffffffffa040d000
snd_hda_codec 73739 3 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel, Live 0xffffffffa03f8000
i915 629309 7 - Live 0xffffffffa0348000
snd_hwdep 6134 1 snd_hda_codec, Live 0xffffffffa00a2000
snd_pcm 71032 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec, Live 0xffffffffa0263000
drm_kms_helper 26624 1 i915, Live 0xffffffffa01b4000
sdhci_pci 8202 0 - Live 0xffffffffa011d000
thinkpad_acpi 59799 0 - Live 0xffffffffa01de000
iwlcore 103302 1 iwlagn, Live 0xffffffffa0223000
sdhci 17061 1 sdhci_pci, Live 0xffffffffa0341000
ehci_hcd 39209 0 - Live 0xffffffffa0329000
snd_timer 18992 1 snd_pcm, Live 0xffffffffa010e000
drm 173492 3 i915,drm_kms_helper, Live 0xffffffffa0286000
mac80211 202190 2 iwlagn,iwlcore, Live 0xffffffffa02e3000
mmc_core 63886 1 sdhci, Live 0xffffffffa0317000
snd 55132 12 snd_hda_codec_hdmi,snd_hda_codec_conexant,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,thinkpad_acpi,snd_timer, Live 0xffffffffa02c3000
usbcore 134859 4 uvcvideo,btusb,ehci_hcd, Live 0xffffffffa0200000
soundcore 5986 1 snd, Live 0xffffffffa007f000
cfg80211 141484 3 iwlagn,iwlcore,mac80211, Live 0xffffffffa023e000
tpm_tis 7961 0 - Live 0xffffffffa0141000
sg 24917 0 - Live 0xffffffffa01f7000
processor 23608 0 - Live 0xffffffffa01c9000
tpm 11237 1 tpm_tis, Live 0xffffffffa0109000
nvram 5669 1 thinkpad_acpi, Live 0xffffffffa00a9000
tpm_bios 4953 1 tpm, Live 0xffffffffa002e000
firewire_ohci 28073 0 - Live 0xffffffffa01d5000
i2c_algo_bit 5063 1 i915, Live 0xffffffffa0121000
psmouse 52944 0 - Live 0xffffffffa01a5000
firewire_core 47790 1 firewire_ohci, Live 0xffffffffa0128000
intel_agp 10480 1 i915, Live 0xffffffffa0118000
intel_gtt 13943 3 i915,intel_agp, Live 0xffffffffa0100000
intel_ips 10885 0 - Live 0xffffffffa009d000
i2c_i801 7987 0 - Live 0xffffffffa0073000
serio_raw 4222 0 - Live 0xffffffffa002a000
snd_page_alloc 7017 2 snd_hda_intel,snd_pcm, Live 0xffffffffa01c5000
e1000e 137919 0 - Live 0xffffffffa017c000
crc_itu_t 1321 1 firewire_core, Live 0xffffffffa0076000
evdev 9178 9 - Live 0xffffffffa01a0000
pcspkr 1843 0 - Live 0xffffffffa01bf000
iTCO_wdt 11053 0 - Live 0xffffffffa016d000
video 10996 1 i915, Live 0xffffffffa0172000
i2c_core 18740 6 videodev,i915,drm_kms_helper,drm,i2c_algo_bit,i2c_i801, Live 0xffffffffa0161000
rfkill 14810 3 bluetooth,thinkpad_acpi,cfg80211, Live 0xffffffffa014c000
iTCO_vendor_support 1857 1 iTCO_wdt, Live 0xffffffffa015b000
thermal 7631 0 - Live 0xffffffffa0144000
button 4794 1 i915, Live 0xffffffffa0124000
battery 10545 0 - Live 0xffffffffa013c000
ac 3193 0 - Live 0xffffffffa0139000
wmi 8083 0 - Live 0xffffffffa0114000
ext4 333040 2 - Live 0xffffffffa00ac000
mbcache 5649 1 ext4, Live 0xffffffffa00a5000
jbd2 69632 1 ext4, Live 0xffffffffa008a000
crc16 1321 1 ext4, Live 0xffffffffa0084000
sr_mod 14247 0 - Live 0xffffffffa0067000
cdrom 35657 1 sr_mod, Live 0xffffffffa005c000
sd_mod 25668 4 - Live 0xffffffffa0021000
ahci 20441 3 - Live 0xffffffffa0078000
libahci 17966 1 ahci, Live 0xffffffffa006c000
libata 167694 2 ahci,libahci, Live 0xffffffffa0031000
scsi_mod 123250 4 sg,sr_mod,sd_mod,libata, Live 0xffffffffa0000000
Comment 35 Diego Viola 2011-06-02 08:49:17 UTC
please tell me which modules I should keep from that list, so I could disable all the other modules.
Comment 36 Shaohua 2011-06-02 08:55:26 UTC
disabling iwlagn sdhci e1000e intel_ips firewire_ohci ehci_hcd sdhci_pci snd_hda_intel, there might be dependence though.
Comment 37 Diego Viola 2011-06-02 08:56:33 UTC
Created attachment 60552 [details]
modprobe -l (list of modules in the system)
Comment 38 Diego Viola 2011-06-02 09:25:34 UTC
I tried shutting down the machine like 5 times without the modules mentioned and the machine shutted down successfully every time.
Comment 39 Diego Viola 2011-06-02 09:29:45 UTC
Does this means that there is a bug in one of those modules?
Comment 40 Diego Viola 2011-06-02 10:42:44 UTC
Here some interesting info, I tried disabling modules in pairs. Here is what I found:

MODULES DISABLED:

iwlagn + intel_ips = hanged 4 times during shutdown.
sdhci + intel_ips = hanged 2 times during shutdown.
e1000e + intel_ips = shutdown fine 2 times.
firewire_ohci + intel_ips = hanged 2 times during shutdown.
ehci_hcd + intel_ips = hanged 2 times during shutdown.
sdhci_pci + intel_ips = hanged 2 times during shutdown -- shutdown fine 1 time.
snd_hda_intel + intel_ips = hanged 2 times during shutdown.
Comment 41 Ryan Rosenbaum 2011-06-02 20:44:18 UTC
I have a Thinkpad T410 with a Core i7 620M processor using the Intel IGP. I run Arch Linux with a custom kernel (which is essentially the pf kernel compiled with a minimal amount of modules for my system).

My system also invariably hangs during reboot/halting with the message "Disabling IRQ #19" under the following procedure:

Power on, Suspend (using uswsusp's s2ram), Resume, Halt (shutdown -h now)

If I do not suspend/resume, I do not get the error.

I haven't done a significant amount of testing, but based on Diego Viola's findings I tried following the exact same process as above while removing only the e1000e module before halting (I did not remove intel_ips).  My laptop successfully halted 4/4 times.
Comment 42 Shaohua 2011-06-03 00:28:06 UTC
so this could be problem of the drivers. please isolate the issue to a specific driver.
Diego, please check if blacklist e1000e resolved your issue.
Comment 43 Diego Viola 2011-06-03 01:53:58 UTC
OK so I ran a new test and tried blacklisting the modules one by one (~32 shutdowns in total.)

Here is the results:

intel_ips (blacklisted)

shutdown #1: halted fine.
shutdown #2: hangs.
shutdown #3: hangs.
shutdown #4: hangs.

iwlagn (blacklisted)

shutdown #1: hangs.
shutdown #2: hangs.
suhtdown #3: hangs.
shutdown #4: hangs.

sdhci (blacklisted)

shutdown #1: halted fine.
shutdown #2: hangs.
shutdown #3: hangs.
shutdown #4: halted fine.

e1000e (blacklisted)

shutdown #1: halted fine.
shutdown #2: halted fine.
shutdown #3: halted fine.
shutdown #4: halted fine.

firewire_ohci (blacklisted)

shutdown #1: hangs.
shutdown #2: halted fine.
shutdown #3: halted fine.
shutdown #4: halted fine.

ehci_hcd (blacklisted)

shutdown #1: hangs.
shutdown #2: hangs.
shutdown #3: hangs.
shutdown #4: hangs.

sdhci_pci (blacklisted)

shutdown #1: halted fine.
shutdown #2: hangs.
shutdown #3: hangs.
shutdown #4: hangs.

snd_hda_intel (blacklisted)

shutdown #1: hangs.
shutdown #2: hangs.
shutdown #3: hangs.
shutdown #4: halted fine.
Comment 44 Diego Viola 2011-06-03 01:58:33 UTC
so yeah, when I blacklisted the e1000e module, I couldn't reproduce the hang issue, but still, I wouldn't call that a fix. I just can't reproduce the problem when I blacklist this module.
Comment 45 Diego Viola 2011-06-03 02:02:54 UTC
Also, please note that I have laptop-mode-tools running on my system with the default settings. I test this with laptop-mode-tools enabled because for some reason having it enabled exposes/triggers this problem more than having it disabled.

But this problem is also present when disabling laptop-mode-tools. From my experience, laptop-mode-tools just triggers this problem more.
Comment 46 Shaohua 2011-06-03 02:04:56 UTC
sure, this is not a fix, but just isolated the problem to be e1000e problem. Need help from e1000e guys here.
Comment 47 Mario Prausa 2011-06-03 02:21:06 UTC
I'm running Arch Linux with kernel 2.6.39 on a Thinkpad X61s. My system also hangs on shutdown, but without the "Disabling IRQ" messages.

Unloading the e1000e module before shutdown also solves the problem, but I figured out, that a simple "ifconfig eth0 down" does the trick, too.
Comment 48 Diego Viola 2011-06-03 03:14:43 UTC
@Shaohua: correct, I'm glad we are making progress!

Thanks for your help with this.
Comment 49 Diego Viola 2011-06-03 04:01:09 UTC
OK so more good news and results! :D

I've shutdown my laptop 50 times (yes 50 times for real!) with e1000e blacklisted, and the machine halted perfectly the 50 times I did shutdown the machine.

So I think we found the real culprit here... e1000e.

Thanks everyone for your help.
Comment 50 Diego Viola 2011-06-03 04:06:07 UTC
I can still reproduce the issue with "ifconfig eth0 down".

However, removing the e1000e module from the kernel makes the problem go away.
Comment 51 Agustianes Suwardi 2011-06-03 06:10:03 UTC
Removing the e1000e module is irrelevant for my case, since my laptop uses JMicron JMC250 ethernet controller instead of intel. So there's must be a general issue in the e1000e module that exists in other modules too.
Comment 52 Diego Viola 2011-06-03 06:21:48 UTC
Maybe this is a problem with the ACPI subsystem itself and not with the drivers? Since the shutdown issue happens very randomly.

Won't a shutdown trace help with this?

What an annoying problem. Jeez.
Comment 53 Agustianes Suwardi 2011-06-03 06:54:09 UTC
After some experiment, I found a workaround for my hardware. Unloading the ehci_hcd module before shutdown ensures the system to shutdown properly.
Comment 54 Shaohua 2011-06-03 08:05:03 UTC
(In reply to comment #53)
> After some experiment, I found a workaround for my hardware. Unloading the
> ehci_hcd module before shutdown ensures the system to shutdown properly.
so we are still far away to root cause the problem.

can you post your lspci, dmesg and interrupt data?

can you do the same test mentioned in Comment #23?
Comment 55 Agustianes Suwardi 2011-06-03 08:52:55 UTC
Created attachment 60602 [details]
dmesg

dmesg
Comment 56 Agustianes Suwardi 2011-06-03 08:56:13 UTC
Created attachment 60612 [details]
interrupts
Comment 57 Agustianes Suwardi 2011-06-03 08:57:54 UTC
Created attachment 60622 [details]
lspci
Comment 58 Agustianes Suwardi 2011-06-03 09:02:54 UTC
Created attachment 60632 [details]
failed shutdown picture
Comment 59 Diego Viola 2011-06-03 09:07:06 UTC
I agree with Shaohua on Comment #54.

blacklisting 'e1000e' seems to be just a temporal workaround and it seems like it's not the cause based on the experience of other people. So please ignore my comments above of me getting excited for the workaround thing.

I hope we'll find the root of the cause and fix it.

Thanks everyone for your help.

Let me know if I could provide with anything else for helping with this.
Comment 60 Agustianes Suwardi 2011-06-03 09:08:14 UTC
Created attachment 60642 [details]
success shutdown picture

Notice the difference from the failed shutdown picture, this time there are 2 messages from ehci_hcd.
Comment 61 Diego Viola 2011-06-04 20:28:34 UTC
I must use my Ethernet device so I removed this driver from the blacklist, but my system would still invariably hang from time to time.

Any Intel engineers that could help us solve this problem for once and all please?
Comment 62 Diego Viola 2011-06-04 20:29:12 UTC
this driver = e1000e module.
Comment 63 Christian Jann 2011-06-05 12:51:15 UTC
I think I have the same problem but on an AMD desktop system, I have to 
rmmod cx8800 cx88_dvb cx24116 cx22702 cx8802 cx88_vp3054_i2c cx88_alsa cx88xx
to shutdown.
https://bugzilla.redhat.com/show_bug.cgi?id=630420
Comment 64 Diego Viola 2011-06-06 06:19:23 UTC
When this issue will be fixed please? It's annoying as hell.
Comment 65 Diego Viola 2011-06-06 20:14:26 UTC
Sorry for my Comment #64. I deeply regret. Sorry. I Will refrain from doing such thing again.
Comment 66 Brandon Watkins 2011-06-06 20:59:16 UTC
I'm running arch linux 64 bit, with 2.6.39 kernel. My laptop is an Asus U52F (core i5 460m, ironlake graphics, 4gb ram,)

On my machine adding this command to my rc.local.shutdown script fixed this:
"rmmod ehci_hcd"
Comment 67 Shaohua 2011-06-07 01:57:34 UTC
(In reply to comment #60)
> Created an attachment (id=60642) [details]
> success shutdown picture
> 
> Notice the difference from the failed shutdown picture, this time there are 2
> messages from ehci_hcd.
Looks your problem is different as Diego's.

In Diego's case, the shutdown hangs when doing device shutdown.
while in your case, the shutdown hangs when calling into ACPI shutdown (device shutdown is finished actually)
Comment 68 Low Kian Seong 2011-06-07 03:48:42 UTC
For my case using Arch on X61 blacklisting e1000e (which was not used anyway) worked.
Comment 69 Agustianes Suwardi 2011-06-07 05:17:13 UTC
(In reply to comment #67)
> (In reply to comment #60)
> > Created an attachment (id=60642) [details] [details]
> > success shutdown picture
> > 
> > Notice the difference from the failed shutdown picture, this time there are
> 2
> > messages from ehci_hcd.
> Looks your problem is different as Diego's.
> 
> In Diego's case, the shutdown hangs when doing device shutdown.
> while in your case, the shutdown hangs when calling into ACPI shutdown
> (device
> shutdown is finished actually)

So my problem is related to the ACPI subsystem, thanks for pointing it out and sorry if my posts above are irrelevant to this bug report.
Comment 70 Diego Viola 2011-06-07 05:49:56 UTC
(In reply to comment #69)
> (In reply to comment #67)
> > (In reply to comment #60)
> > > Created an attachment (id=60642) [details] [details] [details]
> > > success shutdown picture
> > > 
> > > Notice the difference from the failed shutdown picture, this time there
> are 2
> > > messages from ehci_hcd.
> > Looks your problem is different as Diego's.
> > 
> > In Diego's case, the shutdown hangs when doing device shutdown.
> > while in your case, the shutdown hangs when calling into ACPI shutdown
> (device
> > shutdown is finished actually)
> 
> So my problem is related to the ACPI subsystem, thanks for pointing it out
> and
> sorry if my posts above are irrelevant to this bug report.

I don't think it's irrelevant. It would be nice to get all those problems fixed. Regardless if it's the same bug or not.
Comment 71 Agustianes Suwardi 2011-06-08 00:23:16 UTC
An update to my comment at #53 about rmmod-ing ehci_hcd before shutdown: If I put it at /etc/rc.local.shutdown (shutdown script), the issue is still produced but in less frequently. It was working for me for 3 days (about 6 poweroffs) and suddenly the issue is occured again.

Now I always manually rmmod ehci_hcd before issuing poweroff to give some interval time between rmmod and acpi shutdown. So far I have 4 clean poweroffs.

This issue will likely to be produced if I turn on the laptop with battery power and laptop mode tools enabled, and then I plug it to AC power adapter then poweroff.
Comment 72 Diego Viola 2011-06-08 04:28:05 UTC
I couldn't get my machine to shutdown yet by having e1000e on blacklist, I have shutdown a lot on the past 5 days and it hasn't hanged yet since I removed e1000e.
Comment 73 Diego Viola 2011-06-08 04:30:00 UTC
Sorry, I wanted to say that I couldn't get my machine to hang yet by having e1000e on blacklist, it's halting perfectly now since I have removed e1000e.
Comment 74 Diego Viola 2011-06-08 05:07:34 UTC
my machine has been halting perfectly for the past 5 days now that I removed/blacklisted e1000e module.
Comment 75 Diego Viola 2011-06-08 07:28:53 UTC
Please note that when I have the 'e1000e' module loaded, the system would hang/freeze on shutdown and also reboot.
Comment 76 ich007 2011-06-10 19:33:51 UTC
I am having the same? shutdown problem  with my EEEPC R051PEM, Atom N550, 2GB RAM:
blacklist rt2800pci
blacklist rt2800lib
blacklist rt2x00usb
blacklist rt2x00pci
blacklist rt2x00lib
has solved my shutdown problem.
WLAN still works using rt2860sta. 
Non of the other solutions available in the net (blacklisting e1000e, adding rmmod snd_intel_hda to shutdown script, ehci_hcd is not in use) has solved the problem.

Hope that helps to solve the problem, if any further information is required, please ask with instructions how to obtain them, I just a user.

lsb_release -a
LSB Version:    core-2.0-ia32:core-2.0-noarch:core-3.0-ia32:core-3.0-noarch:core-3.1-ia32:core-3.1-noarch:core-3.2-ia32:core-3.2-noarch:core-4.0-ia32:core-4.0-noarch
Distributor ID:    Ubuntu
Description:    Ubuntu 10.10
Release:    10.10
Codename:    maverick
uname -a
Linux eeepc01 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 19:00:26 UTC 2011 i686 GNU/Linu
lspci
00:00.0 Host bridge: Intel Corporation N10 Family DMI Bridge (rev 02)
00:02.0 VGA compatible controller: Intel Corporation N10 Family Integrated Graphics Controller (rev 02)
00:02.1 Display controller: Intel Corporation N10 Family Integrated Graphics Controller (rev 02)
00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 2 (rev 02)
00:1c.3 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation NM10 Family LPC Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 02)
01:00.0 Ethernet controller: Atheros Communications AR8132 Fast Ethernet (rev c0)
02:00.0 Network controller: RaLink RT3090 Wireless 802.11n 1T/1R PCIe
Comment 77 Ismail Donmez 2011-06-15 13:11:30 UTC
I also have the same problem with a new Dell Optiplex 990 workstation, device has a UEFI Bios so that might be a problem.
Comment 78 Mario Prausa 2011-06-16 10:32:54 UTC
I played around with some printk statements and found out, that it hangs in the napi_disable() function defined in include/linux/netdevice.h, when called from e1000e_down() in drivers/net/e1000e/netdev.c.

I hope this is helpful.
Comment 79 ich007 2011-06-16 12:38:49 UTC
(In reply to comment #77)
> I also have the same problem with a new Dell Optiplex 990 workstation, device
> has a UEFI Bios so that might be a problem.
Hallo
We are having a lot of Dell computers running Fedora 13 with kernel 2.6.33.3-85.fc13.x86_64 smp. Without having analysed the behaviour in detail I think that I have observed the following behaviour (the computers are for simulations and should run without shutdown): 
Optiplex GX620 hang on shutdown,
Optiplex 745, 755, 760, 780 and PowerEdge T710 shutdown without problems

hope that helps
Comment 80 Ismail Donmez 2011-06-16 12:43:46 UTC
Hey;

(In reply to comment #79)
> (In reply to comment #77)
> We are having a lot of Dell computers running Fedora 13 with kernel
> 2.6.33.3-85.fc13.x86_64 smp. Without having analysed the behaviour in detail
> I
> think that I have observed the following behaviour (the computers are for
> simulations and should run without shutdown): 
> Optiplex GX620 hang on shutdown,
> Optiplex 745, 755, 760, 780 and PowerEdge T710 shutdown without problems
> 
> hope that helps

AFAIK OptiPlex 990 is the first version to have SandyBridge, I am not sure about UEFI. If you guys need any details let me know.
Comment 81 Diego Viola 2011-06-18 05:58:06 UTC
I tried contacting the e1000-devel mailing list here: https://lists.sourceforge.net/lists/listinfo/e1000-devel

And Thomas Jarosch replied with the following comment:

http://sourceforge.net/mailarchive/message.php?msg_id=27666243

He suggests that we compile the kernel with "lockup detection" and we provide a trace. He also states that "this problem mustn't be related to e1000e at all, maybe unloading the module beforehand just makes the problem less likely to occur."
Comment 82 Dave Cohen 2011-06-28 17:49:20 UTC
I have the problem reported here, on my Lenovo X60 running Arch Linux.

`rmmod e1000e` before shutdown works for me.  It's the only workaround I know of.
Comment 83 Low Kian Seong 2011-06-29 02:29:30 UTC
I can also verify here that running kernel 2.6.38 on mageia on an identical laptop is fine.
Comment 84 Liberty 2011-07-09 16:37:57 UTC
verified 2.6.39.2 on Arch linux
Comment 85 Oleg Smirnov 2011-07-31 14:51:38 UTC
I do confirm that rmmod e1000e fixes the shutdown problem. I'm using HP 8510w with Arch Linux 2.6.39 kernel.
Comment 86 Len Brown 2011-07-31 16:45:32 UTC
re: comment #78

> it hangs in the napi_disable() function
> defined in include/linux/netdevice.h,
> when called from e1000e_down() in drivers/net/e1000e/netdev.c.

/**
 *      napi_disable - prevent NAPI from scheduling
 *      @n: napi context
 *
 * Stop NAPI from being scheduled on this context.
 * Waits till any outstanding processing completes.
 */
static inline void napi_disable(struct napi_struct *n)
{
        set_bit(NAPI_STATE_DISABLE, &n->state);
        while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
                msleep(1);
        clear_bit(NAPI_STATE_DISABLE, &n->state);
}

I don't think lockup detection will flag this while loop.

void e1000e_down(struct e1000_adapter *adapter)
...
        /* disable transmits in the hardware */
        tctl = er32(TCTL);
        tctl &= ~E1000_TCTL_EN;
        ew32(TCTL, tctl);
        /* flush both disables and wait for them to finish */
        e1e_flush();
        usleep_range(10000, 20000);

        napi_disable(&adapter->napi);
        e1000_irq_disable(adapter);

        del_timer_sync(&adapter->watchdog_timer);
        del_timer_sync(&adapter->phy_info_timer);


BTW. I always suspect magic numbers like usleep_range above.
What if you increase them both, say by a factor of 4?

Also, e1000e_down is called when unloading the driver before
shutdown and that works -- so what is special about when
it is called in the shutdown path that makes it fail?

Re: "Disabling IRQ #19" message
On Diego's box, ips and EHCI are on IRQ 19.
This message indicates that they shutdown properly.
The real problem, of course, is e1000e that follows,
and it is not on IRQ 19, but is on IRQ 20/MSI 41:

[    6.189628] e1000e: Intel(R) PRO/1000 Network Driver - 1.2.20-k2
[    6.189633] e1000e: Copyright(c) 1999 - 2011 Intel Corporation.
[    6.189687] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[    6.189706] e1000e 0000:00:19.0: setting latency timer to 64
[    6.190059] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X
Comment 87 Ismail Donmez 2011-07-31 16:50:24 UTC
FWIW I can at least reboot with reboot=pci kernel parameter, didn't try shutdown though. This is with the 3.0 kernel.
Comment 88 Len Brown 2011-07-31 16:58:30 UTC
Ismail,
Please open a new bug report, unless your machine behaves
exactly like the original reporter (Diego).

Diego,
I'd be interested to know if booting with "maxcpus=1"
makes any difference.
Comment 89 Diego Viola 2011-07-31 18:03:42 UTC
Created attachment 67282 [details]
ThinkPad T510 shutdown with maxcpus=1

ThinkPad T510 shutdown with maxcpus=1
Comment 90 Diego Viola 2011-07-31 18:04:48 UTC
Len,

I tried booting with "maxcpus=1" but I get the same problem, the system hangs on shutdown still, unless doing a "rmmod e1000e" before shutdown.

See the last image attached.
Comment 91 Michael 2011-08-02 02:56:55 UTC
I have the same problem with my Toshiba Tecra M3 Laptop when I close the laptop lid.
Comment 92 ant_978red 2011-08-07 01:15:12 UTC
Same here on a Thinkpad T60, just add modprobe -r e1000e to /etc/rc.local.shutdown on 2.6.39-ARCH otherwise it freezes/hangs on shutdown/reboot
Comment 93 Jesse Brandeburg 2011-08-11 23:29:15 UTC
I think I reproduced this on my X60, after much fussing with arch's peculiarities.

I believe the issue is because the device is in D3 due to laptop-mode-tools and no ethernet link.

kernel 3.0.0 (arch)

I believe the issue (at least tied to e1000e) is strongly related to the runtime power management in the kernel.

I also have a T60 with similar chipset that we can try to repro on, but I think the driver wake code for making sure D3->D0 device power management transition is made correctly before shutdown is where I'm going to investigate if I have time.

I do not have access to a T510 but don't believe we need one.
Comment 94 Diego Viola 2011-08-13 21:42:48 UTC
Jesse,

Thanks for your reply. Just providing the steps to reproduce the problem:

1- install Arch Linux 64-bit and $(pacman -Syu laptop-mode-tools).
2- enable "laptop-mode" in /etc/rc.conf (DAEMONS array).
3- make sure "e1000e" kernel module is loaded.
4- reboot then shutdown.

At this time the system should crash. I have a T510 so let me know if you can provide us with a patch so we can try to reproduce the issue, verify that the problem is fixed and report back. Thank you.
Comment 95 Vladimir 2011-09-02 17:49:21 UTC
I have ThinkPad X220i and ArchLinux x86_64. I don't use laptop-mode-tools.

$ lspci | grep Ethernet
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)

When I enable runtime power management for ethernet card
echo auto > /sys/devices/pci0000:00/0000:00:19.0/power/control
notebook hanged during reboot or shutdown.

Adding "rmmod e1000e" to rc.local.shutdown solved this problem for me.
Comment 96 Diego Viola 2012-01-17 10:28:56 UTC
Has there been any progress with this?

I'm glad bugzilla is back.
Comment 97 Brandon Watkins 2012-01-17 16:17:16 UTC
I haven't had this issue in a while, not sure which kernel update ended up fixing it but I'm currently on 3.1.x and it shuts down fine on a fresh install of arch without having to use any workarounds.
Comment 98 Zhang Rui 2012-01-18 01:33:44 UTC
Hi, Diego,

can you please verify if the problem still exists in the latest upstream kernel?
Comment 99 Diego Viola 2012-01-19 07:14:02 UTC
I still get the same problem with kernel 3.2.1 (Arch Linux x86_64).

I get the issue when laptop-mode-tools (laptop-mode) is active, and when the module "e1000e" is loaded.

If I blacklist e1000e and disable laptop-mode all is fine.

This is the output I get when the system hangs at shutdown (withe e1000e/laptop-mode is loaded):

http://dl.dropbox.com/u/6005119/kernel/P1140657.JPG
Comment 100 Diego Viola 2012-01-19 07:15:58 UTC
s/withe/when/
Comment 101 Diego Viola 2012-01-19 11:58:42 UTC
Apparently I have another issue now also, with those "udev timeout: killing" lines. It seems to be related to TPM.

http://dl.dropbox.com/u/6005119/kernel/P1140654.JPG

https://bbs.archlinux.org/viewtopic.php?id=130228

Any ideas?
Comment 102 Jardiamj 2012-01-19 22:03:39 UTC
I think I have the same problem that Diego describes in this bug. I'm using Debian Wheezy, kernel 3.1.0 on a Thinkpad T420.
Unloading the module echci_hcd fixes the problem, it's been a couple of weeks since I'm using that fix and my laptop shuts down OK.
I hope someone can get to the root of this problem.

Cheers!
Comment 103 Diego Viola 2012-01-23 15:27:48 UTC
lautriv from ##linux (freenode) provided some insight about this issue.

here's the log:

10:55 < lautriv> diegoviola, that issue relates to WakeOnLan you can't remove the IRQ without breaking wakeup stuff.
10:57 < lautriv> diegoviola, i did just read the headlines from you link and remembered that case on my diskless boxes.
11:01 < lautriv> diegoviola, that is because the module claims resources. if i remember right, that did not happen if built-in but that way was unable to use WoL
11:03 < lautriv> diegoviola, you may try to recompile with e1000 built-in or simply tell your shutdown to call a script ( K99removenic or something else)
11:07 < lautriv> diegoviola, it's bad behaviour but the module does it right. still holding resorces while in use.
Comment 104 Zhang Rui 2012-02-02 06:18:15 UTC
can anyone please verify if this is a duplicate of bug #36132?
Comment 105 Diego Viola 2012-02-02 07:43:37 UTC
(In reply to comment #104)
> can anyone please verify if this is a duplicate of bug #36132?

I can confirm that it seems to be the same issue.
Comment 106 Zhang Rui 2012-02-03 01:29:40 UTC
Okay.
As this problem is caused by the e1000e runtime suspend, let's move discussion to bug #36132.

*** This bug has been marked as a duplicate of bug 36132 ***
Comment 107 Diego Viola 2012-04-18 07:04:36 UTC
It appears this issue was fixed in 3.3.1 (or possible the first release of 3.3), I've tested 3.3.2 and it's fine too.

When I say that it's fine, I mean that I can't reproduce the hang anymore. Shutdown/reboot works as expected now.

Please give it a try and share your results in Bug 36132.

Thanks.
Comment 108 Diego Viola 2012-04-18 07:21:50 UTC
s/possible/possibly/

Note You need to log in before you can comment on or make changes to this bug.