Bug 9163
Description
Ciprian Dorin Craciun
2007-10-14 07:16:21 UTC
Created attachment 13151 [details]
lspci output
Created attachment 13152 [details]
dmesg output for the suspend/resume operation
I just tried another suspend/resume and this time the suspend took about half a minute... Also my built-in keyboard and touch pad are dead. It needs a reboot. Ciprian. CCed Tejun and Alan, perhaps they can help. Ciprian, can you also attach your .config, please? Created attachment 13154 [details]
config file for v2.6.23.1
Thanks. Please also attach your working 2.6.22.9 .config, if possible. Please also try to boot with libata.noacpi=0 and see if that helps. Created attachment 13156 [details]
config file for v2.6.22.9
I have tried the libata.noacpi=0 but it didn't help. I also have found something else... When resuming the following happens: 1) I press the power button or open the laptop's lid. 2) The processor's fan starts to spin. 3) Nothing else happens until... (I have waited for almost 2 minutes...) 4) I press any key on the keyboard. And after this, 5) The HDD led starts to blink, 6) After a few seconds it comes out of suspend. It seems thus that it blocks until I press a key -- which I think happened every time but I haven't payed attention. And every time the time difference is between those two lines. Also it happened that this morning -- but only once -- the resume worked just fine. (Or maybe I have pressed a key just after opening the lid.) Ciprian. Well, let's add some more people to the CC list. So, Ciprian, if you hit key early during resume, resume is fast? You see, here it's strange. The laptop starts spinning the processor fan, and sometimes the led from the wireless card blinks. But there is no hard disk activity until I press the any key. And after this it resumes. You can see from the timings: [ 0.275398] PCI: Setting latency timer of device 0000:00:1e.2 to 64 [ 18.437412] PM: Writing back config space on device 0000:00:1f.2 at offset 1 (was 2b00005, writing 2b80005) Here I think that only after 18 seconds I pressed a key. This also happens sometimes at suspend. For example the sleep-led (there is a special sleep led on my laptop) starts blinking when I initialize the suspend procedure and after a few seconds it stops blinking and the laptop suspends -- there is no fan spinning. But sometimes the fan still works and the led blinks until I press any key. Ciprian. P.S.: I am conscious that a laptop could also resume after a key press, and that on some laptops it is not sufficient to open the lid. But mine works just fine -- without any key pressing, just opening the lid -- in v2.6.22.9 -- and any kernel starting with v2.6.17 (or 18?) Ciprian, can you try the current mainline kernel with CONFIG_NO_HZ and CONFIG_HIGH_RES_TIMERS unset, please? Unfortunately I do not own the IBM laptop anymore, but on the new laptop -- a Benq S31V -- with kernel 2.6.23.8 and almost the same configuration (including CONFIG_HO_HZ and CONFIG_HIGH_RES_TIMERS set) the problem doesn't exist anymore. If you want me to carry some tests on this Laptop I will gladly do so. Ciprian. Well, if the problem is not visible on your new laptop, no further testing is needed, IMO. I'm closing this bug as unreproducible. Please reopen if necessary, thanks. I have a similar laptop (t43p) with similar problems; it only resumes correctly if I boot with highres=off,nohz=off. Problems started with 2.6.21, which produced hangs wiht nohz. I didn't really investigate. From 2.6.22 the resume problems started occurring, including on 2.6.24-rc4. After suspending two or three times (I did not notice specific differences between s2disk and s2ram) resume starts taking ages. The process can only be speeded up by pressing some buttons. By the way I had problems with hdaps too (see bug 8757) which I 'solved' by not loading hdaps anymore. I'd be happy to test, so I'll reopen the bug and reboot tonight (2.6.24-rc5) to see what happens. One note, I'm quite unfamiliar with testing stuff like this, so please tell me what info you need. I did not replicate all tests you asked from Ciprian. I'll add config and lspci output anyway. I'm on Slackware12, with kernel 2.6.23.9. Created attachment 14010 [details]
lspci for t43p
Created attachment 14011 [details]
My kernel config - t43p
Sorry, I'm no good with bugzilla and it shows, I'm unable to reopen this bug... How should I continue? Rebooted with 2.6.24-rc5, nohz, highres. I could s2ram without problem for 10 times. I then suspended2disk once, also without problem. After that however, resuming after s2ram was very slow. I did not let it finish but hit a few buttons, without noticeable effect. I then hit fn-f7 a few times, and that did the trick. I rebooted and s2disk-ed a few times. First time no problem, second time I let it resume and it took a couple of minutes, first time I pressed a few any, non-specific, buttons and resume proceeded as normal. I have to mention that my config also contained the hdaps module compiled into the kernel (I mean statically and not as a module). And I also confirm that I have experienced the same behavior as PJBrs encountered... So I would propose to transfer the ownership (the reporter) of the bug to PJBrs -- if such a thing is possible -- as he has the hardware and the same problem, and thus he could help. Ciprian. I'm not sure if we can change the reporter, but that shouldn't matter (if you don't mind receiving emails from the bugzilla, that is). I'm currently on 2.6.24-rc5, and booted with nohz=off, highres=off, and I've had no problems so far (booted about three days ago, s2disk and s2ram several times. As I said, I did have troubles when leaving out nohz=off, highres=off. I'm attaching three episodes of s2disk. The first is ok, the second is not ok but I let resume complete uninterruptedly, the third as not ok and interrupted with keypresses to speed things up. Created attachment 14090 [details]
First time s2disk - no errors (2.6.24-rc5; NO_HZ=y,HIGH_RES_TIMERS=y)
Created attachment 14091 [details]
Second time s2disk - with error, completed uninterruptedly
Created attachment 14092 [details]
Third time s2disk - with error, interrupted with keypresses to speed things up
For good measure I also include dmesg with s2disk and then s2ram without problems, after three days of uptime on kernel 2.6.24-rc5, booted with nohz=off highres=off. I haven't kept count of how many times I suspended before recording this dmesg output. One last thing, either NO_HZ=y or HIGH_RES_TIMERS=y is enough to produce the delays at resume. Both need to be switched off at boot to be able to suspend and resume normally. Please let me know if there's anything more I can test and do to help. Created attachment 14102 [details]
First s2disk, then s2ram; 2.6.24-rc5; nohz=off highres=off; No errors
I just tried 2.6.24-rc6. All is still fine with nohz=off highres=off on the kernel commandline, but without this the problems still remain. I did notice ata1.01: _GTF evaluation failed (AE 0x1001) and ata2.01: _GTF evaluation failed (AE 0x1001) with dmesg, so I'm attaching a dmesg record of s2disk with errors (NO_HZ=y, HIGH_RES_TIMERS=y). Created attachment 14246 [details]
s2disk with errors (2.6.24-rc6; NO_HZ=y,HIGH_RES_TIMERS=y)
Can you please try "hpet=disable" on the kernel command line (with nohz & highres active) ? I tried with hpet=disabled (2.6.24-rc6), and I saw no big differences, save one; This time, the wait happened at suspend instead of resume. Everything went normal until the "s2disk: snapshotting system..." message appeared. At that point I had the slow-blinking (slow as in takes one minute for one blink) cursor, and I pressed some keys to speed things up. This made the next message appear (something like "s2disk: snapshot ready..."), and suspend proceeded as normal. I haven't rebooted (yet) to test whether this was a one-time behaviour, or that booting with hpet=disabled makes the wait occur on suspend instead of resume. I also did s2ram a few times, which had problems as well, but on resume. I did not notice anything that I hadn't noticed without hpet=disabled before. By now, however, I am sure that the wait on resume-from-ram can only be shortened by pressing Fn-{some special thinkpad function key}-combo's, not the "any"-key. Furthermore, in some cases (but this is difficult to reproduce), s2ram only occurs after one more keypress, irrespective which key. In sum: hpet=disabled did not cause major differences, except the wait occurring at suspend instead of resume with s2disk. By the way, I also tried with "hpet=disabled nohz=off", and then everything is fine (couple of s2disk, s2ram, no problems). I'll try 2.6.24-rc7 next. > In sum: hpet=disabled did not cause major differences, except the wait
> occurring at suspend instead of resume with s2disk. By the way, I also tried
> with "hpet=disabled nohz=off", and then everything is fine (couple of s2disk,
> s2ram, no problems).
You said earlier that you have to add "nohz=off" _and_ "highres=off"
to make the problem go away.
With "hpet=disable" you only need "nohz=off". Or did you disable
highres in the kernel config ?
Thanks,
tglx
$ cat /proc/cmdline root=/dev/sda1 ro vga=791 printk.time=Y hpet=disabled nohz=off $ zgrep -e NO_HZ -e HIGH_RES -e HPET /proc/config.gz CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_HPET is not set Apparently I can substitute "highres=off" with "hpet=disabled". > Apparently I can substitute "highres=off" with "hpet=disabled".
Can you please provide the output of
# cat /proc/timer_list
with "hpet=disable" on the command line ?
Thanks,
tglx
Created attachment 14358 [details]
Contents of /proc/timer_list; 2.6.24-rc6, "hpet=disabled" on cmdline
Hmm, I notice that the correct syntax is "hpet=disable", not "hpet=disabled" (with an extra "d" after disable). I assume, however, that they make no difference...? > Hmm, I notice that the correct syntax is "hpet=disable", not "hpet=disabled"
> (with an extra "d" after disable). I assume, however, that they make no
> difference...?
No difference, the match is on "disable".
Can you please add "nolapic_timer" to the kernel command line instead
of "hpet=disable(d)" ?
Thanks,
tglx
No problems with "nolapic_timer"! I tested by doing s2disk, s2ram, s2ram, s2ram, s2disk, s2disk, s2disk; no problems so far. Need any further info? Can you please provide full boot logs with and without "nolapic_timer" on the kernel command line? Please add "apic=verbose" in both cases. Thanks, tglx > Can you please provide full boot logs with and without "nolapic_timer" on the
> kernel command line? Please add "apic=verbose" in both cases.
I try to collect the info we have so far into one place. Please
correct me if I confused something.
1.) If you only do s2ram, then the problem never happens
2.) It's never the first s2disk cycle, but any consequent s2disk or s2ram cycle
3.) Problem goes away with one of the following command line option combos:
A) "nohz=off highres=off"
B) "hpet=disable nohz=off"
C) "nolapic_timer"
Thanks,
tglx
Created attachment 14360 [details]
Bootlog with nolapic_timer on the cmdline; 2.6.24-rc6
Created attachment 14361 [details]
Bootlog without nolapic_timer on the cmdline; 2.6.24-rc6
> 1.) If you only do s2ram, then the problem never happens Yes. I haven't tried this with all different options, but I have noticed anything different either. > 2.) It's never the first s2disk cycle, but any consequent s2disk or s2ram > cycle Sometimes but rarely it does happen the first time, I didn't notice this at first. I'm not sure whether it actually worked twice in a row but I don't think so. I don't know whether it matters if it is a reboot or a normal boot either. > 3.) Problem goes away with one of the following command line option combos: > A) "nohz=off highres=off" > B) "hpet=disable nohz=off" > C) "nolapic_timer" This is entirely correct. On Tue, 8 Jan 2008, bugme-daemon@bugzilla.kernel.org wrote: > > 3.) Problem goes away with one of the following command line option combos: > > A) "nohz=off highres=off" > > B) "hpet=disable nohz=off" > > C) "nolapic_timer" > > This is entirely correct. And in all cases you can continue by pressing keys - if I understand correctly it needs to be some special function key, right ? After that the kernel works fine (no strange behaviour, slowness, ...) ? Thanks, tglx (In reply to comment #46) > And in all cases you can continue by pressing keys - if I understand > correctly it needs to be some special function key, right ? No, only in the case of s2ram it needs to be a special key (Fn+F7, or Fn+F3, or Fn+{WhatHaveYou}; a combo of your average pair of special laptop keys). The wait with s2disk can be shortened by pressing a couple of 'any' keys. > After that the kernel works fine (no strange behaviour, slowness, ...) ? One thing I have sometimes noticed (also occurred with nolapic_timer, and some other combinations of kernel-parameters I don't quite remember) is that the special thinkpad fancontrol script seems to lose control over the fan, but restarting it makes it work again. But that's probably too particular to be of any use. I admit that the symptoms are quite varied, and I don't always remember all situations and symptoms. But I try to be as precise as possible :-) (In reply to comment #47) > (In reply to comment #46) > > After that the kernel works fine (no strange behaviour, slowness, ...) ? I meant to say - in general, no, but I don't tax this box very much, so I might have missed it. > One thing I have sometimes noticed (also occurred with nolapic_timer, and > some > other combinations of kernel-parameters I don't quite remember) is that the > special thinkpad fancontrol script seems to lose control over the fan, but > restarting it makes it work again. But that's probably too particular to be > of > any use. To clarify, this script comes from thinkwiki.org, it's not officially delivered by lenovo with the laptop. > > > After that the kernel works fine (no strange behaviour, slowness, ...) ? > > I meant to say - in general, no, but I don't tax this box very much, so I > might > have missed it. At least it is nothing obvious. > > One thing I have sometimes noticed (also occurred with nolapic_timer, and > some > > other combinations of kernel-parameters I don't quite remember) is that the > > special thinkpad fancontrol script seems to lose control over the fan, but > > restarting it makes it work again. But that's probably too particular to be > of > > any use. > > To clarify, this script comes from thinkwiki.org, it's not officially > delivered > by lenovo with the laptop. Yeah, that's a separate issue probably, but it might be worthwile to disable the thinkpad stuff for a test. Thanks, tglx I did some further testing. First, I switched to 2.6.24-rc7, and I loaded thinkpad_acpi without the fan_control option and without the fancontrol script. Some results: I tested with X loaded but no users logged in, and I could s2disk twice without problem. I then did s2ram -> problem. I then wanted to know whether the problem was with X, so I rebooted to runlevel 3 (multiuser, network, no X), and could s2disk without problem three times, but the fourth time had the wait. Apparently, whether the problem occurs depends on the number of programs running, amount of memory used, or something like that. From all tests until now I'm only entirely confident that: The problem goes away with one of the following command line option combos: A) "nohz=off highres=off" B) "hpet=disable nohz=off" C) "nolapic_timer" With regard to which key to press; I think, but I'm not sure, that I once needed to press some combo of Fn+{$SpecialThinkpadKey} on s2disk, whereas before the rule seemed to be that: 1) the wait on s2disk could be shortened by pressing 'any' keys 2) the wait on s2ram could be shortened by pressing Fn+{$SpecialThinkpadKey} It is still the rule, but I can't rule out the possibility that it has exceptions. Another test. I was able to reproduce the problem at the first time s2disk by loading 'many' documents (21 pdfs, 3 docs in open office, 1 doc in ms-office, crossover, kontact konqueror and then some). I don't think system load was very high, but memory load was about 1 GB (this machine has 1GB of RAM and 1,5 GB of swap). Last test for today; you wrote:
> 1.) If you only do s2ram, then the problem never happens
This still holds. Opened 61 pdfs and compiled a programme while doing s2ram two times - no problem. A total of about 10 times s2ram in a row, still no problem.
In sum, to get all info in one place :-)
1.) If you only do s2ram, then the problem never happens
2.) Apparently, the lower the memory use, the higher the chances of s2disk succeeding normally
3.) Problem goes away with one of the following command line option combos:
A) "nohz=off highres=off"
B) "hpet=disable nohz=off"
C) "nolapic_timer"
4.) After the problems have started, normal operation still continues without obvious performance problems.
5.) I'm _almost_ sure that:
a) the wait on s2disk can be shortened by pressing 'any' keys (3 to 4 times)
b) the wait on s2ram can be shortened by pressing Fn+{$SpecialThinkpadKey} only, also several times (I did not count).
Switched to 2.6.24-rc8, I only tested s2disk twice. Unfortunately, I still had trouble, which occurred when resuming for the second time. I didn't test in detail, I'll wait for the next release to retry. Now on 2.6.24, still no change. Is there anything else I might do to test? Or should I wait for 2.6.25-rc1 and see what that brings? Please apply patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.24/patches/ on top of 2.6.24 and see if that changes anything. Hi Rafael, I tried it, but had troubles after second time s2disk. Any hopes for trying the -rt patch-series? And could you suggest any debugging options that might yield useful information? By the way, I'm currently only testing whether I can reproduce the problem, I'm not testing all details as I reported in comment #52. Anyway, thanks for the suggestion :-) Please keep alerting me when you need further testing. In the configuration as per Comment #55, please do: # echo core > /sys/power/pm_test # echo mem > /sys/power/state (this will test the suspend/resume sequence with busy-waiting for 5 sec. instead of suspending) # echo mem > /sys/power/state and see if you have problems after that. No problems after that. I had to recompile with CONFIG_PM_DEBUG=y CONFIG_PM_VERBOSE=y and now have huge logs. Do you need any dmesg output? Hm. CONFIG_PM_VERBOSE shouldn't be necessary. I'm starting to think that your box's BIOS wants us to save/restore the ACPI NVS area during hibernation, which we're not doing at the moment. This functionality is going to be added at some point, though. (In reply to comment #59) > Hm. CONFIG_PM_VERBOSE shouldn't be necessary. In fact it wasn't, but I thought I'd be better off safe than sorry :-) I also have _GTF evaluation failed, using T42. Feb 3 19:43:41 alon1 PCI: Setting latency timer of device 0000:00:1f.1 to 64 Feb 3 19:43:41 alon1 ata1.01: _GTF evaluation failed (AE 0x1001) Feb 3 19:43:41 alon1 PM: Writing back config space on device 0000:00:1f.5 at offset 1 (was 2900007, writing 2900003) Feb 3 19:43:41 alon1 ACPI: PCI Interrupt 0000:00:1f.5[B] -> Link [LNKB] -> GSI 5 (level, low) -> IRQ 5 Feb 3 19:43:41 alon1 PCI: Setting latency timer of device 0000:00:1f.5 to 64 Feb 3 19:43:41 alon1 ata2.01: _GTF evaluation failed (AE 0x1001) Feb 3 19:43:41 alon1 ata1.00: ACPI cmd ef/02:00:00:00:00:a0 succeeded Feb 3 19:43:41 alon1 ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out Feb 3 19:43:41 alon1 ata1.00: ACPI cmd ef/03:45:00:00:00:a0 filtered out Feb 3 19:43:41 alon1 ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 filtered out Tested 2.6.25-rc3 today. It appeared harder to reproduce the problem, I did s2disk, s2disk, s2disk, and then s2ram. Problems only occurred on resume from that s2ram. Did not test further, but I did notice the following in dmesg: [ 945.521152] ACPI: EC: missing OBF confirmation, don't expect it any longer. [ 945.605157] ACPI: EC: missing write data confirmation, don't expect it any lo nger. ... [ 956.809345] ACPI: EC: acpi_ec_wait timeout, status = 40, expect_event = 2 [ 956.809349] ACPI: EC: input buffer is not empty, aborting transaction Incidentally, this was also the first time that I had the delays on resume. I'm attaching full logs, for good measure :-) Created attachment 14983 [details]
2.6.25-rc3 dmesg with sequence of s2disk s2disk s2ram s2disk
And another test, 2.6.25-rc8; still the same as -rc3. As with -rc3, at the first time resume is problematic, I find the "ACPI: EC: missing OBF confirmation, don't expect it any longer. ACPI: EC: missing write data confirmation, don't expect it any longer." messages in my logs. Oh well, still hoping it helps :-) One more thing; previously (2.6.24 and earlier) I could reproduce the problem with doing s2disk twice. After I noted that the problem seemed related to memory use, I could reproduce it by doing s2disk the first time, after having loaded a lot of stuff. With 2.6.25-rc*, I wasn't able to do that. I issued s2disk four times in a row and couldn't reproduce the problem. One time I suspended to disk with only 600MB swap left, and s2disk needed ~450MB for its suspend image, still no problems. It seems that now I can only reproduce it by first issuing s2disk (for a couple of times?), then issuing s2ram, and the problem is then reproduced upon resume (from s2_ram_). Created attachment 15581 [details]
test patch
There is a test patch, which might appear in next kernel release. It would be great if you give it a try and check if it helps or not.
Hi Alexey, the patch seems quite promising! I did a string of about 8 s2disk and s2ram, with lots of open pdf's, and could not reproduce the problem! It was a bit disappointing that after about the eighth resume I got a complete system hang, but it only occurred after "[drm] Loading R300 Microcode", so I'm pretty sure that's xorg-related :-) I cannot give you dmesg-output right now, because of that hang, but I did a "grep EC /var/log/messages" , and I found the following. First, without your patch: [ 0.180890] ACPI: EC: EC description table is found, configuring boot EC [ 0.195535] ACPI: EC: non-query interrupt received, switching to interrupt mode [ 0.221563] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 [ 0.221576] ACPI: EC: driver started in poll mode [ 0.236115] ACPI: EC: non-query interrupt received, switching to interrupt mode [ 4.591992] thinkpad_acpi: ThinkPad BIOS 1YET65WW (1.29 ), EC 1YHT29WW-1.06 ... and this is happens upon resume: [ 1532.037513] ACPI: EC: missing OBF confirmation, don't expect it any longer. [ 1532.053513] ACPI: EC: missing write data confirmation, don't expect it any longer. This is with your patch: [ 0.164259] ACPI: EC: EC description table is found, configuring boot EC [ 0.170870] ACPI: EC: non-query interrupt received, switching to interrupt mode [ 0.192164] ACPI: EC: GPE = 0x1c, I/O: command/status = 0x66, data = 0x62 [ 0.192170] ACPI: EC: driver started in poll mode [ 0.198634] ACPI: EC: non-query interrupt received, switching to interrupt mode [ 4.143958] thinkpad_acpi: ThinkPad BIOS 1YET65WW (1.29 ), EC 1YHT29WW-1.06 And each following line occurs upon successful resume, irrespective of whether it was from ram or disk: [15122.865041] ACPI: EC: non-query interrupt received, switching to interrupt mode [15217.449190] ACPI: EC: non-query interrupt received, switching to interrupt mode [15434.263007] ACPI: EC: non-query interrupt received, switching to interrupt mode [16918.370743] ACPI: EC: non-query interrupt received, switching to interrupt mode [16994.838165] ACPI: EC: non-query interrupt received, switching to interrupt mode [17130.493470] ACPI: EC: non-query interrupt received, switching to interrupt mode [17220.770876] ACPI: EC: non-query interrupt received, switching to interrupt mode [17432.865081] ACPI: EC: non-query interrupt received, switching to interrupt mode I think its pretty safe to assume that this patch fixes the problem. Thanks a lot!! Let me know if there's any more testing you'd like me to do, or information you'd like to have. Hi, I was wondering what the current status of this patch is. Just for your interest, I'm happy to say that I haven't had resume problems since applying it to 2.6.25-rc8! So will it be included in a final 2.6.25? Or do you need any further testing to be done? No, it is considered too big for 2.6.25-rc8... I hope it manages into 2.6.26-rc1. Hi! Unfortunately, after several days without problems, I finally hit a resume problem again, with 2.6.25-rc8, and your patch applied. It is more or less, but not exactly the same problem. This time, I resumed from ram and hit the non-blinking cursor. I started hitting buttons, and then special thinkpad Fn+F? combo's, but this time without any effect (except the screen getting brighter / darker, in the case of fn+pgup/pgdn ;-) ). I did wake it in the end, by hitting the power button several times, after which it woke up and then immediately shut down. I found the following in /var/log/syslog: [132249.337459] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b0=1" [132249.337464] ACPI: EC: read timeout, command = 130 [132252.775383] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" [132252.775387] ACPI: EC: write_cmd timeout, command = 128 [132252.775394] thinkpad_acpi: fan watchdog: error 5 while enabling fan, will try again later... [132255.592950] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" [132255.592954] ACPI: EC: write_cmd timeout, command = 128 [132258.141983] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" [132258.141987] ACPI: EC: finish-write timeout, command = 131 [132258.141995] ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl] [20070126] [132258.142009] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LID_._LID] (Node f780fb70), AE_TIME [132260.100499] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" [132260.100503] ACPI: EC: write_cmd timeout, command = 128 [132260.100510] thinkpad_acpi: fan watchdog: error 5 while enabling fan, will try again later... Apparently, this is much harder to hit. I never had this situation, and the patch corrected all prior "species" of this bug. Let me know if you need more info or testing :-) (but only after the weekend, am gone for a few days). P.S., the shutdown was seemingly due only to the power-button having been pressed, and shutdown proceeded as usual.Apr 16 22:02:50 slacktop kernel: [132249.337459] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b0=1" Apr 16 22:02:50 slacktop kernel: [132249.337464] ACPI: EC: read timeout, command = 130 Apr 16 22:02:50 slacktop kernel: [132252.775383] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" Apr 16 22:02:50 slacktop kernel: [132252.775387] ACPI: EC: write_cmd timeout, command = 128 Apr 16 22:02:50 slacktop kernel: [132252.775394] thinkpad_acpi: fan watchdog: error 5 while enabling fan, will try again later... Apr 16 22:02:50 slacktop kernel: [132255.592950] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" Apr 16 22:02:50 slacktop kernel: [132255.592954] ACPI: EC: write_cmd timeout, command = 128 Apr 16 22:02:50 slacktop kernel: [132258.141983] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" Apr 16 22:02:50 slacktop kernel: [132258.141987] ACPI: EC: finish-write timeout, command = 131 Apr 16 22:02:50 slacktop kernel: [132258.141995] ACPI Exception (evregion-0420): AE_TIME, Returned by Handler for [EmbeddedControl] [20070126] Apr 16 22:02:50 slacktop kernel: [132258.142009] ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.LID_._LID] (Node f780fb70), AE_TIME Apr 16 22:02:50 slacktop kernel: [132260.100499] ACPI: EC: acpi_ec_wait timeout, status = 0x29, event = "b1=0" Apr 16 22:02:50 slacktop kernel: [132260.100503] ACPI: EC: write_cmd timeout, command = 128 Apr 16 22:02:50 slacktop kernel: [132260.100510] thinkpad_acpi: fan watchdog: error 5 while enabling fan, will try again later... Managed to capture full dmesg output with resume problem on 2.6.25 with your patch applied. This time, I did not try to speed up resume by repeatedly pressing the power button. It took a few minutes to resume, after which everything worked as it should. I'll attach the dmesg output. Next thing I'll try will probably be 2.6.26-rc2 or rc3. Reproducing the problem takes a long time, uptime was 5 days (including s2disk / s2ram several times) before I hit the problem again. By the way, should I keep using this bug report to report on this resume problem, or open a new one? The point is that all the symptoms remain the same, only the type of button pushed to speed things up has changed. Created attachment 16097 [details]
dmesg output - 2.6.25 with patch
While trying 2.6.26-rc? I noticed that I still had one not-so-handy command in rc.local, i.e.: echo enable,0xffff > /proc/acpi/ibm/hotkey I removed this and rebooted 2.6.25 with patch, and have not had any problems since, uptime says 14 days up, with multiple s2ram and/or s2disk each day. So I think the bug should be marked as verified. Apologies for the mix-up, I should have tested without any tampering with thinkpad-acpi module options :-) commit 223883b7aafa02410ed2e571d6032c876d0b23b8 Author: Alexey Starikovskiy <astarikovskiy@suse.de> Date: Fri Mar 21 17:07:21 2008 +0300 ACPI: EC: Switch off GPE mode during suspend/resume shipped in 2.6.26-rc1 closed |