This is really odd. 1)COLD boot (into K 2.6.17.1). Log in at VT 1. 2)cat /proc/acpi/battery/BAT0/state 3)Now, you can't write to any of /proc/acpi/, and /dev/nvram no longer reacts to the buttons being pressed. A cold reboot is required to recover. BUT 1)COLD boot (into K 2.6.17.1) 2)Press Fn-F12...which triggers acpid...which causes a write to /proc/acpi/ibm/light 3)cat /proc/acpi/battery/BAT0/state 4)All is still fine. You must press the physical button; merely writing to the /proc/acpi/ibm/light is insufficient. It seems that *any* software read from BAT0/state will mess up acpi/nvram unless at least one hardware write to acpi (press a button) has previously occurred! The machine is a thinkpad X20, with kernel 2.6.17.1 In the "broken" state, writing to /proc/acpi/ibm/foo will just time out, with no effect. Furthermore, `cat /dev/nvram` no longer changes in response to the volume buttons etc being pressed. We're doing something horrid to the BIOS, I think. Please let me know if there is anything else I can help with. I'm still a kernel newbie :-)
Also tested with: - kernel 2.6.16.20 (self-compiled, from kernel.org) - which fails in the same way - kernel 2.6.12.22 (Mandriva distro) - which works OK. On the thinkpad mailing list, at least one other person (owning a T22) has a similar issue. However, it does not occur on an A22p.
Please try the latest kernel - 2.6.18-rc2 is available now.
Boot the kernel with ec_intr=0 and check the problem, the problem disappears at least on T22.
ec_intr=1 doesn't help at all - sorry. I'll compile 2.6.18 when it's released - I'm afraid I'm somewhat short of time right now. (unless there's anywhere I can download a pre-compiled kernel just for testing)
Why ec_intr=1? Is it typo? ec_intr is released in 2.6.17 too, default value is 1, we need 0!
Yes, that was a typo! I meant ec_intr=0. However, it makes no difference. One further piece of information: the testcase I initially gave is for booting up into runlevel 3. However, the results are different for runlevel 1. Test a) (as before) Default runlevel is 3. Cold boot. cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #fails Test b) (as before) Default runlevel is 3. Cold boot. Press Fn-F12 cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #works Test c) Boot with Linux single cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #works Test d) Boot with Linux single cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #works echo off > /proc/acpi/ibm/light #works init 3 cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #fails! Test e) Boot with Linux single cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #works echo off > /proc/acpi/ibm/light #works init 3 Press Fn-F12 cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #works [The thinklight and the volume buttons seem to both fail at the same time and the same way. It's easier to test with the light though.] Test f) chkconfig --del acpi; chkconfig --del acpid Boot into runlevel 3 cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #fails! [Note that the acpi modules are already modprobed in rc.sysinit] Test g) chkconfig --del cpufreq (acpi,acpid still disabled) Boot into runlevel 3 cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #WORKS!! Test h) cpufreq disabled, acpi,acpid enabled. runlevel 3. cat /proc/acpi/battery/BAT0/state #works echo on > /proc/acpi/ibm/light #works Thus, we have the following failure path. i) Start the cpufreq service. ii) Don't press a button (fn-f12) iii)Read from the battery.
Now, we've found the bug! Trying to get a better handle on it: Starting cpufreq (the system service) is the cause of the problems. cpufreq loads the following kernel modules: cpufreq_ondemand cpufreq_conservative cpufreq_powersave speedstep_smi speedstep_lib freq_table So, this diagnostic may help. Cold boot, runlevel 3, acpi,acpid on, cpufreq not started cd /proc/acpi cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works modprobe freq_table cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works modprobe speedstep_lib cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works modprobe speedstep_smi cat battery/BAT0/state #works echo on > ibm/light #FAILS !!! ---- Cold boot, runlevel 3, acpi,acpid on, cpufreq not started cd /proc/acpi cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works modprobe cpufreq_ondemand cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works modprobe cpufreq_conservative cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works modprobe cpufreq_powersave cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works CONCLUSION: speedstep_smi is to blame. Further tests: Runlevel 3, modprobe everything except speedstep_smi modprobe speedstep_smi Press Fn-F12 cat battery/BAT0/state #works echo on > ibm/light #works echo off > ibm/light #works rmmod speedstep_smi modprobe speedstep_smi [DON'T Press Fn-F12] cat battery/BAT0/state #works echo on > ibm/light #FAILS! rmmod speedstep_smi does NOT allow recovery except with cold-boot. So, it's the speedstep_smi module which is to blame, and this only manifests itself in the (common) case where the battery is read before a button is pressed. I think that's as far as I can get... Richard
> Yes, that was a typo! I meant ec_intr=0. However, it makes no difference. Are you sure that you boot the kernel with ec_intr=0? Please repeat it (booting and the sequence of commands which result failure) and attach the result and whole
Re #8, By "FAILS", I mean that nothing happens. eg echo on >/proc/acpi/ibm/light just doesn't turn the light on. The command takes slightly longer than usual to complete (about 2 sec, rather than nearly instant), then the prompt returns with no error message. [I didn't check whether there was an exit code]. I'm not sure that the T22 had the same problem - although I think it is the same symptom. On an A22p, I don't experience this bug, even with speedstep_smi Yes I'm sure it was booted with ec_intr=0. I'm repeating the experiment now. 1)Linux2.6.17.1 single 2)cat /proc/cmdline BOOT_IMAGE=Linux2.6.17.1 root=301 inotify splash=verbose panic=60 ec_intr=0 single 3)init 3 #Services: acpi,acpid on, cpufreq not started cd /proc/acpi cat battery/BAT0/state #works echo on > ibm/light #works. Exit code 0 echo off > ibm/light #works. Exit code 0 modprobe speedstep-smi cat battery/BAT0/state #works. Exit code 0 echo on > ibm/light #FAILS. Light doesn't come on. Command takes #2 secs to return. Exit code 0. dmesg > dmesg.txt lsmod > lsmod.txt lspci -vv > lspcivv.txt cat /var/log/kernel/errors > vlke.txt Will attach these in a moment. I hope that's what you need. Any other things I can help with.
Created attachment 8674 [details] Dmesg and a few other logfiles. (tarball) See comment #9
Thanks for the info. I see that T22 has another problem, so I cannot reproduce your situation. Please perform your experiment without battery files reading; i.e. rmmod battery modprobe speedstep-smi echo on > ibm/light ... Also, please perform your experiment without battery files reading, but with reading some others /proc/acpi files (thermal files for example).
Also, please attach the dmesg in both cases - with and without failure as separate files.
OK - here are a few more tests. Always, ec_intr=0, and cold reboots. Experiment A) Linux2.6.17.1 single init 3 rmmod battery cd /proc/acpi modprobe speedstep-smi echo on > ibm/light #works dmesg > dmesgA.txt Experiment B) Linux2.6.17.1 single init 3 rmmod battery cd /proc/acpi modprobe speedstep-smi cat ibm/thermal #works echo on > ibm/light #fails dmesg > dmesgB.txt Experiment C) Linux2.6.17.1 single init 3 rmmod battery cd /proc/acpi #modprobe speedstep-smi #NOTE: I did not load speedstep-smi. cat ibm/thermal #works echo on > ibm/light #works dmesg > dmesgC.txt Experiment D) Linux2.6.17.1 single init 3 rmmod battery cd /proc/acpi modprobe speedstep-smi echo on > ibm/light #works echo off > ibm/light #works cat ibm/thermal #works echo on > ibm/light #fails dmesg > dmesgD.txt Experiment E) Linux2.6.17.1 single init 3 rmmod battery cd /proc/acpi modprobe speedstep-smi Press Fn-F12. This triggers acpid, which responds to /etc/acpi/events/fn-f12 by running /etc/acpi/actions/fn-f12.sh, which flashes the thinklight 3 times. #works cat ibm/thermal #works echo on > ibm/light #works dmesg > dmesgE.txt Thus, it really does appear that speedstep-smi MUST be followed by a physical button press before anything in /proc/acpi is read. Otherwise, any subsequent writes to /proc/acpi/ will fail. Attachment to follow (is there any way to add an attachment to a comment?).
Created attachment 8683 [details] dmesgs (x5) See comment 13
The latest experiment shows that it is not "power battery" bug. Bug is assigned to component "Other".
is it possible to create a failure without the ibm_acpi module loaded?
Please reopen this bug if: - it is still present with kernel 2.6.22 and - you can provide the requested information.