Bug 14667
Description
Federico Chiacchiaretta
2009-11-22 20:42:13 UTC
please attach the dmesg output after resume. please attach the acpidump output of this laptop. Created attachment 23918 [details]
acpidump
Created attachment 23919 [details]
dmesg before suspending
Created attachment 23920 [details]
dmesg after suspend and resume
What was the last working kernel? Kernel 2.6.31 (default in kubuntu) works perfectly, then I tried 2.6.32-rc7 (with zen patchset) and 2.6.32-rc8 (vanilla compiled for ubuntu) and they're both non working. Further info: 1 - Since fglrx module still does not support kernel 2.6.32 I managed to build it with this fix from debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=554401 2 - When attaching dmesg-after-suspending I saw that during suspending procedure fglrx prints out an error; I tried to blacklist it and rebooted using radeon instead, but when resuming screen doesn't turn on anymore, it keeps black like it's powered off, so I can't see if fglrx is actually the problem. I have the same problem. I am using Gentoo with radeon X driver and I am not using fglrx. It works with 2.6.31 but does not work with 2.6.32-rc7 and 8. Kernel 2.6.32-rc2 is the first one that the problem appears. so you mean 2.6.32-rc1 is working well? could you please run git-bisect to find out which commit introduces this bug? I tested with 2.6.32-rc1 (but it seems that it was tagged 2.6.32-rc2). It seems that all 2.6.32 have the same problem. I will try to find which commit from 2.6.32-rc1 (compared to 2.6.31). The result of using git bisect from 2.6.31 and 2.6.32-rc1 is: commit 6a63b06f3c494cc87eade97f081300bda60acec7 Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Fri Aug 28 23:29:44 2009 +0400 ACPI: EC: use BURST mode only for MSI notebooks Signed-off-by: Alexey Starikovskiy <astarikovskiy@suse.de> Signed-off-by: Len Brown <len.brown@intel.com> alexey, can you look at this issue please? I confirmed that removing this commit, suspend works again for my Dell laptop (Studio XPS 16). please attach the dmidecode output of your laptop Created attachment 24087 [details]
Dmidecode from a Dell Studio XPS 16 laptop
*** Bug 14735 has been marked as a duplicate of this bug. *** Could you please uncomment "#define DEBUG" in drivers/acpi/ec.c and attach dmesg here? Created attachment 24293 [details]
dmesg with ec.c (with DEBUG)
Created attachment 24294 [details]
dmesg with ec.c (with DEBUG) after resume (Suspend)
Created attachment 24295 [details]
dmesg with modified ec.c (with DEBUG)
Created attachment 24296 [details]
dmesg with modified ec.c (with DEBUG) after resume (Suspend)
I had the very issue updating from 2.6.31 to 2.6.32-rc1. Commenting out those two if lines in ec.c fixed it on my Dell Studio 1537. If you guys need another pair of debug log dumps like Alexandre's, just ping. Created attachment 24645 [details] dmesg output with #define DEBUG uncommented for Clevo M720R (In reply to comment #17) > Could you please uncomment "#define DEBUG" in drivers/acpi/ec.c and attach > dmesg here? I'll add one here as well, since I'm reporter of bug 14735 (which is a duplicate of this one). Apologies for the length. See bug 14735 for the notebook's acpidump. I've got a Lenovo U330 here. After upgrading from kernel 2.6.31 to kernel 2.6.32 (and even 2.6.33-rc5) both my sleep and power-off buttons stopped working. :( Unfortunately I don't have the time to recompile a kernel with the ifs commented out in ec.c, so I must assume this is a similar bug. I'll attach a dmidecode for the laptop below. Created attachment 24765 [details]
dmidecode of Lenovo Ideapad U330
What makes you think this issue is related to the $subject bug? (In reply to comment #26) > What makes you think this issue is related to the $subject bug? I have really similar symptoms to the ones specified in bug 14735, which was closed as a duplicate of this bug. Thus I reported it here, because I didn't want to open a new bug. How can I verify it without the need to rebuild the kernel? I guess the only real test would be to revert commit 6a63b06f3c494cc87eade97f081300bda60acec7 and see if the problem goes away, but this cannot be done without rebuilding the kernel. First-Bad-Commit : 6a63b06f3c494cc87eade97f081300bda60acec7 (In reply to comment #0) > Hi > I'm experiencing this problem on my dell laptop: on first boot temperatures > are > correctly detected, fan starts correctly when temperature goes up. If I > suspend > (to ram) the laptop and then resume, I get zero values by all three thermal > zones (0 °C), and fan never starts, I have to reboot to fix this. > I'm using kernel 2.6.32-rc8 from > http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.32-rc8/ , which AFAIK is > kernel vanilla compiled for ubuntu. > I also tried to build kernel with zen patchset (version 2.6.32-rc7) and the > issue is present as well. > > Best Regards > Federico Chiacchiaretta This bug is still present in the trunk (5/02/2010) on a dell xps 1640 It is dangerous because the laptop overheat (the fan never starts after the resume) Can anyone able to reproduce this problem retest with the patch from http://bugzilla.kernel.org/attachment.cgi?id=24904 applied, please? The patch applied fine with 2.6.32 after changing acpi_ec_query_unlocked to acpi_ec_query in it. Sadly, though, it does not solve the problem for me on a Dell Studio 1537. The fan stays off after resuming although it was running at top speed prior to suspending due to a stress test. Maybe it needs a kernel >.32 to work? Commenting out those two if lines as suggested before, however, does fix it for me. OK, lets try one more thing. Created attachment 24928 [details]
Execute _WAK after enabling interrupts during resume
Please retest with this patch applied and report back.
Sorry to say so, but having your patches from comments #31 and #34 in either by themselves or both together does not fix it on .32. I can't work with the very latest git tree for now as it's got some firmware issue with my Radeon R600 graphics chip, but that's another issue. However, I came up with a rather hacky "solution", if you will: Change ec_dmi_table[] in ec.c to check for the BIOS vendor "Dell Inc." instead of "Micro-Star" and drop checking for the chassis vendor and there it goes, the fan starts again after resuming from S3. So maybe adding another butchy hardware vendor to the list isn't the best solution for now, but it "works for me"... If the patches above don't help, I think for now the only thing we can do is to use the DMI information for identifying systems that need the burst mode. Well then, will you need another set of dmidecode or acidump to distinguish the systems that need burst mode? For now it seems like it's limited to Dell laptops that aren't checked for yet. I'm still not sure about the Lenovo box reported in comment #25. Michał, have you tried to revert commit 6a63b06f3c494cc87eade97f081300bda60acec7 and see if the problem goes away? I would also like to report that this exact behavior happens on my Toshiba A505. I will try the patches tonight hopefully and post my findings. The patches in comments #31 and #34 do not work for me. I also tried the hack in comment #35 and used "INSYDE" instead of "Dell, Inc" as the vendor and that did not work for me either. A couple of side notes: In addition to the fan, I also lose events for the laptop lid and my touchpad enable/disable button when returning from S3. Oddly enough, if I add 'acpi_osi="Linux"' to my boot line, it will return from S3 ONCE correctly...the fan fires up as normal and all the bells and whistles come back to life. But the fan is dead after resuming from S3 a second time or any time after that. I'll attach my dmidecode. Created attachment 24935 [details]
Toshiba Satellite A505 dmidecode output
Toshiba Satellite A505 dmidecode output
Does reverting commit 6a63b06f3c494cc87eade97f081300bda60acec7 help? (In reply to comment #37) > Well then, will you need another set of dmidecode or acidump to distinguish > the > systems that need burst mode? For now it seems like it's limited to Dell > laptops that aren't checked for yet. In bug 14735 there is a Clevo box affectes, so evidently the problem is not limited to MSI and Dell really. However, I'd like to understand why the burst mode actually helps. shinydoofy@gmail.com, can you please attach dmesg log generated after suspend/resume with the patch from comment #31, please? @Rafael: Unfortunately, reverting the commit 6a63b06f3c494cc87eade97f081300bda60acec7 does not help. I even went back to the 2.6.30 kernel and the problem still exists for me. I can attach dmesg debug stuff if needed. This means the problem is different in your case. Please file a separate bug report and add my address to its CC list. Thanks! I have the same problem starting, apparently, from any 2.6.32 versions on gentoo (vanilla, gentoo, tuxonice). I've tried the unmodified kernel version 2.6.32.8. I'm currently using version 2.6.31 that works correctly with suspend and resume, but I need to patch it as described here: http://en.gentoo-wiki.com/wiki/Dell_Studio_1555#ACPI Sorry I'm no kernel developer but if you need something just ask. Regards. Created attachment 25105 [details]
dmesg for suspending on a Dell Studio 1537
Sorry it took so long, but here goes...
Thanks! Unfortunately it doesn't give us any new information. I still need comment out the if()s to enable EC burst mode on my Clevo notebook in 2.6.33, is there anything I can test to help? James, Could you please uncomment "#define DEBUG" in drivers/acpi/ec.c, kernel timestamps, and attach dmesg here (or kernel log, if dmesg appears too short)? Created attachment 25294 [details]
dmesg dell 1555 with debug ec.c and timestamps
Hi, I'm attaching dmesg with debug for ec.c as requested. This one is for dell studio 1555.
Federico, debug output looks quite sane... EC seems to be working... Could you please show the output of "grep . /proc/acpi/battery/*/*'? Created attachment 25297 [details]
output of "grep . /proc/acpi/battery/*/*"
Here it is, output of "grep . /proc/acpi/battery/*/*".
Output from battery looks fine as well... Could you please show the output of "grep . /proc/acpi/thermal_zone/*/*' too? Created attachment 25298 [details]
output of "grep . /proc/acpi/thermal_zone/*/*"
Of course, here it is.
Temperatures are not zero. What is the bug again? Created attachment 25299 [details]
output of "grep . /proc/acpi/thermal_zone/*/*"
Ops, I'm sorry, I picked up the wrong file. Here the correct one, after suspend and resume.
Created attachment 25301 [details] dmesg from 2.6.33 with DEBUG enabled in ec.c (Clevo M720R) (In reply to comment #51) > James, > Could you please uncomment "#define DEBUG" in drivers/acpi/ec.c, kernel > timestamps, and attach dmesg here (or kernel log, if dmesg appears too > short)? Done. [Just as a reminder: mine's the Clevo notebook, of bug 14735. It's trouble is no sleep/lid events unless ec.c is tweaked to enable burst mode. I also see other "anomalies", e.g. broken AC status after suspend/resume, once saw cpufreq stuck in lowest state, screwed up interrupts that caused choppy audio.] Could you please check if last patch in bug #14733 changes anything? (In reply to comment #60) > Could you please check if last patch in bug #14733 changes anything? No, it didn't (applied to clean 2.6.33). Additional observation: something seems to persist across reboots. If I run a working kernel, and then reboot into either an unmodified one or one with the last patch in bug #14733, it works OK, but if I boot it from cold (including having removed the battery), it doesn't work. James, Could you please enable "#define DEBUG" in drivers/acpi/ec.c and timestamps in printk and produce kernel logs for broken and working cases? (In reply to comment #62) > James, > Could you please enable "#define DEBUG" in drivers/acpi/ec.c and timestamps > in > printk and produce kernel logs for broken and working cases? I'm now building a "working" kernel, and will attach output shortly. Attachment 25301 [details] above (comment #59) is a dmesg taken from a broken case without the bug 14733 patch, however there's so much stuff that the bit at the beginning is chopped off --- is that what you need? I'll see if I can get an earlier copy... Created attachment 25359 [details]
Full dmesg from unmodified 2.6.33 with DEBUG and timestamps
Created attachment 25360 [details]
Full dmesg from working 2.6.33 ("if (EC_FLAGS_MSI)" commented out) with DEBUG and timestamps
OK, requested logs attached (one enlarged message buffer later ;) ) --- hope these are what you need. Machine was powered down and battery temporarily disconnected between each test. I am seeing the same problem on a Dell Studio 1555 on debian unstable with 2.6.33-0.slh.9-sidux-amd64 I will try the proposed fix and report back. I think #15425 and #12231 could be possible dupes. I can confirm that reversing the commit mentioned in #11 allows me to suspend and resume without the laptop overheating. *** Bug 15425 has been marked as a duplicate of this bug. *** Hi! I was looking around and found the other bug report. I have a Dell Studio 1555 running 2.6.34-rc1 (in Arch Linux) I experience the same issue. Do you need my acpidump, logs or anything? If there is something to test, please let my computer be your bunny! Created attachment 25485 [details]
Allow multibyte access to EC
please check if this patch makes any difference
It works perfectly here with kernel 2.6.33 + archlinux patchset + allow_multibyte_access_to_EC.patch on a dell studio 1555. Thanks very much Alexey! It does indeed make a difference, the fan's working again after suspending to RAM. Thanks a bunch! (In reply to comment #71) > Created an attachment (id=25485) [details] > Allow multibyte access to EC Clever. :-) Hopefully it won't break things for anyone. That's a yes from me, too. Lid and sleep button now seem to work OK on the Clevo M720R with the multibyte access patch. I confirm that the patch fixes this problem. The fan is on and temperature is under control after several suspend and resume cycles. Thank you very much :) I tried patching yesterday's git tree and it works perfectly in my Studio 1555. Is this going to hit stable tree soon? Hmm... started to see a few ACPI: EC: GPE storm detected, transactions will use polling mode messages since applying the multibyte patch, didn't see these before with just the naive burst-mode-enable tweak. Could you catch a debug dmesg from around such a point? patch in comment #71 is in the acpi test tree, queued for 2.6.34 same patch will be needed for 2.6.32.stable and 2.6.33.stable patch in commment #71 dropped, pending refresh Created attachment 25517 [details]
Allow multibyte access to EC, take #2
allow more than 64 bit access (could be up to 32 bytes or 256 bits).
patch in comment #82 added to acpi test tree, pending test results I confirm that the latest patch (comment #82) solves the suspend/resume problem with my laptop, studio XPS 16. It also seems to solve a lot of other problems that apparently were related to ACPI, like brightness (backlight would turn red, blue, yellow) when powered by battery and brightness was changed and loss of touchpad (serio1 of 8042 was missing the input functions). Thanks for the patch It seems this patch doesn't apply to latest stable 2.6.33.1 , do I have to apply some other patch from the acpi test tree ? It certainly is possible to rebase the patch on top of 2.6.33.1, but it would be very helpful to us if you tested it on top of 2.6.34-rc1 too. Created attachment 25531 [details]
Reviewed patch
This is a reviewed patch with Bob and Lin suggestions.
That doesn't build on top of current -git. ACPI_ROUND_UP is the problem. Created attachment 25538 [details]
Reviewed patch, fixed
sorry for the typo...
I am trying to compile 2.6.34-rc1 but it is failing, in the meantime I noticed that the eject key does not work anymore and when I press it dmesg says : "dell-wmi: Unknown key 0 pressed" This might be unrelated, I'll try an older unpatched kernel and see if it is fixed. I've tried the eject key and noticed the same issue that Islam. Using 2.6.34-rc1 with the first patch. This most likely is a separate issue, but it would be helpful to verify that. If it turns out to be unrelated, please file a separate bug report for it. Finally got it to compile. Now running 2.6.34-rc1 with the latest patch in #89 and after several suspend and resume cycles the fan is working and temperature is under control. I tried several older kernels and in none of them the eject key works, so it is unrelated. I'll open a separate bug. patch in acpi tree is now refreshed with patch in comment #89 Would it be possible to rebase the patch on top of 2.6.33.x ? I'd like to file a bug on Archlinux bugtracker for inclusion in next 2.6.33.x release to arch repos. Thank you. I've tried the patch at comment 89 on my dell studio 1555 and it doesn't work. I still get 0° thermal zones when resuming with version 2.6.34-rc3. I'm using the gentoo's vanilla-sources package. Since I'm going to do the same with packages directly from kernel.org (just to be sure), which version is this patch supposed to be applied to, exactly? Created attachment 26030 [details] bugzilla-14667-15749.patch The patch in comment #89 shipped in Linux-2.6.34-rc4. However, it caused the regression described bug 15749. So if you are running 2.6.34-rc4, you need the patch in bug 15749. If you are running something older, you need the patch here plus that one, which I've combined into a single patch here. this bug report is closed. bug 15749 will be closed when the 2nd patch goes upstream. Found this bug still appears on my Dell Studio 1555, when there's no battery inserted while suspending/resuming. Temeratures are all 0°C and the fan is not working. But everything is fine again, when I insert the battery again - after a few seconds the temperatures and the speed are working again. I'm running the latest Debian-unstable Kernel 2.6.38 - also happened on 2.6.37, found others reporting the same issue on Ubuntu: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575030/comments/12 (In reply to comment #98) > Found this bug still appears on my Dell Studio 1555, when there's no battery > inserted while suspending/resuming. > Temeratures are all 0°C and the fan is not working. > But everything is fine again, when I insert the battery again - after a few > seconds the temperatures and the speed are working again. > > I'm running the latest Debian-unstable Kernel 2.6.38 - also happened on > 2.6.37, > found others reporting the same issue on Ubuntu: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575030/comments/12 I can confirm this with basically any kernel from 2.6.32 to 2.6.37, except maybe 2.6.33 (I've never deployed that one). I tried this on gentoo with many flavors (vanilla, gentoo, tuxonice) and even some ubuntu. I was waaay too busy to report. Besides, not so many people are worried about keeping batteries inside a notebook all the time, even if that shortens battery life considerably. It's weird because suspend-to-disk has been OK since 2.6.34, as far as I remember. It's "just" suspend-to-ram... Guys, could you not hijack existing bugs for reporting new issues? The "temperature readings are never correct after resume" problem is kind of different from the "temperature readings are not correct after resume if battery is not present while suspending/resuming" one. Please create a new entry for the new issue. |