Bug 6847 - speedstep_smi vs. ibm_acpi
Summary: speedstep_smi vs. ibm_acpi
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: Drivers
Classification: Unclassified
Component: Platform (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: acpi_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-07-16 20:54 UTC by Richard Neill
Modified: 2010-10-08 18:17 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.17.1 (mainline)
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
Dmesg and a few other logfiles. (tarball) (11.50 KB, application/octet-stream)
2006-08-02 11:32 UTC, Richard Neill
Details
dmesgs (x5) (5.86 KB, application/octet-stream)
2006-08-02 15:24 UTC, Richard Neill
Details

Description Richard Neill 2006-07-16 20:54:58 UTC
This is really odd.

1)COLD boot (into K 2.6.17.1). Log in at VT 1.
2)cat /proc/acpi/battery/BAT0/state
3)Now, you can't write to any of /proc/acpi/, and /dev/nvram no longer reacts to
the buttons being pressed. A cold reboot is required to recover.

BUT

1)COLD boot (into K 2.6.17.1)
2)Press Fn-F12...which triggers acpid...which causes a write to /proc/acpi/ibm/light
3)cat /proc/acpi/battery/BAT0/state
4)All is still fine.

You must press the physical button; merely writing to the /proc/acpi/ibm/light
is insufficient.  It seems that *any* software read from BAT0/state will mess up 
acpi/nvram unless at least one hardware write to acpi (press a button) has
previously occurred!

The machine is a thinkpad X20, with kernel 2.6.17.1

In the "broken" state, writing to /proc/acpi/ibm/foo  will just time out, with
no effect. Furthermore, `cat /dev/nvram` no longer changes in response to the
volume buttons etc being pressed.

We're doing something horrid to the BIOS, I think.

Please let me know if there is anything else I can help with. I'm still a kernel
newbie :-)
Comment 1 Richard Neill 2006-07-17 19:09:06 UTC
Also tested with:
  - kernel 2.6.16.20 (self-compiled, from kernel.org) - which fails in the same way
  - kernel 2.6.12.22 (Mandriva distro) - which works OK.


On the thinkpad mailing list, at least one other person (owning a T22) has a
similar issue. However, it does not occur on an A22p.
Comment 2 Vladimir Lebedev 2006-07-21 02:25:44 UTC
Please try the latest kernel - 2.6.18-rc2 is available now.
Comment 3 Vladimir Lebedev 2006-07-28 09:02:01 UTC
Boot the kernel with ec_intr=0 and check the problem, the problem disappears 
at least on T22.

Comment 4 Richard Neill 2006-07-28 18:35:11 UTC
ec_intr=1 doesn't help at all - sorry.

I'll compile 2.6.18 when it's released - I'm afraid I'm somewhat short of time
right now. (unless there's anywhere I can download a pre-compiled kernel just
for testing)
Comment 5 Vladimir Lebedev 2006-07-29 04:56:18 UTC
Why ec_intr=1? Is it typo?
ec_intr is released in 2.6.17 too, default value is 1, we need 0!
Comment 6 Richard Neill 2006-07-29 12:41:23 UTC
Yes, that was a typo! I meant ec_intr=0. However, it makes no difference.

One further piece of information: the testcase I initially gave is for booting
up into runlevel 3. However, the results are different for runlevel 1.

Test a) (as before)  
  Default runlevel is 3. 
  Cold boot. 
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #fails
 
Test b) (as before)
  Default runlevel is 3. 
  Cold boot.
  Press Fn-F12 
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #works

Test c)
  Boot with Linux single
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #works

Test d)
  Boot with Linux single
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #works
  echo off > /proc/acpi/ibm/light       #works
  init 3
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #fails!

Test e)
  Boot with Linux single
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #works  
  echo off > /proc/acpi/ibm/light       #works
  init 3
  Press Fn-F12 
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #works


[The thinklight and the volume buttons seem to both fail at the same time and
the same way. It's easier to test with the light though.]

Test f)
  chkconfig --del acpi; chkconfig --del acpid
  Boot into runlevel 3
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #fails!
[Note that the acpi modules are already modprobed in rc.sysinit]

Test g)
  chkconfig --del cpufreq (acpi,acpid still disabled)
  Boot into runlevel 3
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #WORKS!!


Test h)
  cpufreq disabled, acpi,acpid enabled. 
  runlevel 3.
  cat /proc/acpi/battery/BAT0/state     #works
  echo on > /proc/acpi/ibm/light        #works


Thus, we have the following failure path.

   i) Start the cpufreq service.
   ii) Don't press a button (fn-f12)
   iii)Read from the battery.
Comment 7 Richard Neill 2006-07-29 13:01:19 UTC
Now, we've found the bug!  Trying to get a better handle on it:
Starting cpufreq (the system service) is the cause of the problems.

cpufreq loads the following kernel modules:
   cpufreq_ondemand 
   cpufreq_conservative
   cpufreq_powersave
   speedstep_smi
   speedstep_lib
   freq_table


So, this diagnostic may help.

 Cold boot, runlevel 3, acpi,acpid on, cpufreq not started
 cd /proc/acpi
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works

 modprobe freq_table
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works

 modprobe speedstep_lib
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works

 modprobe speedstep_smi
 cat battery/BAT0/state    #works
 echo on > ibm/light       #FAILS !!!

----
 Cold boot, runlevel 3, acpi,acpid on, cpufreq not started
 cd /proc/acpi
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works

 modprobe cpufreq_ondemand 
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works

 modprobe cpufreq_conservative
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works

 modprobe cpufreq_powersave
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works
 echo off > ibm/light      #works


CONCLUSION: speedstep_smi is to blame.

Further tests:

  Runlevel 3, modprobe everything except speedstep_smi

  modprobe speedstep_smi
  Press Fn-F12
  cat battery/BAT0/state    #works
  echo on > ibm/light       #works
  echo off > ibm/light      #works

  rmmod speedstep_smi
  modprobe speedstep_smi
  [DON'T Press Fn-F12]
  cat battery/BAT0/state    #works
  echo on > ibm/light       #FAILS!
  
  rmmod speedstep_smi does NOT allow recovery except with cold-boot.


So, it's the speedstep_smi module which is to blame, and this only manifests
itself in the (common) case where the battery is read before a button is pressed. 

I think that's as far as I can get...

Richard


Comment 8 Vladimir Lebedev 2006-07-30 13:10:39 UTC
> Yes, that was a typo! I meant ec_intr=0. However, it makes no difference.
Are you sure that you boot the kernel with ec_intr=0?
Please repeat it (booting and the sequence of commands which result failure) 
and attach the result and whole 
Comment 9 Richard Neill 2006-08-02 11:30:32 UTC
Re #8, 
By "FAILS", I mean that nothing happens. eg
   echo on >/proc/acpi/ibm/light
just doesn't turn the light on. The command takes slightly longer than usual to
complete (about 2 sec, rather than nearly instant), then the prompt returns with
no error message. [I didn't check whether there was an exit code].

I'm not sure that the T22 had the same problem - although I think it is the same
symptom. On an A22p, I don't experience this bug, even with speedstep_smi

Yes I'm sure it was booted with ec_intr=0. I'm repeating the experiment now.

1)Linux2.6.17.1 single

2)cat /proc/cmdline 
   BOOT_IMAGE=Linux2.6.17.1 root=301 inotify splash=verbose panic=60 ec_intr=0
single

3)init 3
 #Services: acpi,acpid on, cpufreq not started

 cd /proc/acpi
 cat battery/BAT0/state    #works
 echo on > ibm/light       #works. Exit code 0
 echo off > ibm/light      #works. Exit code 0

 modprobe speedstep-smi
 cat battery/BAT0/state    #works. Exit code 0
 echo on > ibm/light       #FAILS. Light doesn't come on. Command takes
                           #2 secs to return. Exit code 0.

 dmesg > dmesg.txt
 lsmod > lsmod.txt
 lspci -vv > lspcivv.txt
 cat /var/log/kernel/errors > vlke.txt

Will attach these in a moment.

I hope that's what you need. Any other things I can help with.
Comment 10 Richard Neill 2006-08-02 11:32:46 UTC
Created attachment 8674 [details]
Dmesg and a few other logfiles. (tarball)

See comment #9
Comment 11 Vladimir Lebedev 2006-08-02 13:48:36 UTC
Thanks for the info.

I see that T22 has another problem, so I cannot reproduce your situation.

Please perform your experiment without battery files reading; i.e.
 rmmod battery 
 modprobe speedstep-smi
 echo on > ibm/light
...

Also, please perform your experiment without battery files reading, but with 
reading some others /proc/acpi files (thermal files for example).
Comment 12 Vladimir Lebedev 2006-08-02 13:53:59 UTC
Also, please attach the dmesg in both cases - with and without failure as 
separate files.
Comment 13 Richard Neill 2006-08-02 15:22:33 UTC
OK - here are a few more tests. Always, ec_intr=0, and cold reboots.


Experiment A)

Linux2.6.17.1 single
init 3
rmmod battery
cd /proc/acpi
modprobe speedstep-smi
echo on > ibm/light     #works
dmesg > dmesgA.txt


Experiment B)

Linux2.6.17.1 single
init 3
rmmod battery
cd /proc/acpi
modprobe speedstep-smi
cat ibm/thermal         #works
echo on > ibm/light     #fails
dmesg > dmesgB.txt


Experiment C)

Linux2.6.17.1 single
init 3
rmmod battery
cd /proc/acpi
#modprobe speedstep-smi #NOTE: I did not load speedstep-smi.
cat ibm/thermal         #works
echo on > ibm/light     #works
dmesg > dmesgC.txt


Experiment D)

Linux2.6.17.1 single
init 3
rmmod battery
cd /proc/acpi
modprobe speedstep-smi 
echo on > ibm/light     #works
echo off > ibm/light    #works
cat ibm/thermal         #works
echo on > ibm/light     #fails
dmesg > dmesgD.txt


Experiment E)

Linux2.6.17.1 single
init 3
rmmod battery
cd /proc/acpi
modprobe speedstep-smi 
Press Fn-F12. 
   This triggers acpid, which responds to /etc/acpi/events/fn-f12
   by running /etc/acpi/actions/fn-f12.sh, which flashes 
   the thinklight 3 times.   #works
cat ibm/thermal         #works
echo on > ibm/light     #works
dmesg > dmesgE.txt

Thus, it really does appear that speedstep-smi MUST be followed by a physical
button press before anything in /proc/acpi is read. Otherwise, any subsequent
writes to /proc/acpi/ will fail.


Attachment to follow (is there any way to add an attachment to a comment?).
Comment 14 Richard Neill 2006-08-02 15:24:23 UTC
Created attachment 8683 [details]
dmesgs (x5)

See comment 13
Comment 15 Vladimir Lebedev 2006-08-03 06:20:07 UTC
The latest experiment shows that it is not "power battery" bug.
Bug is assigned to component "Other".
Comment 16 Len Brown 2006-08-09 23:59:35 UTC
is it possible to create a failure without the ibm_acpi module loaded?
Comment 17 Adrian Bunk 2007-10-06 12:47:12 UTC
Please reopen this bug if:
- it is still present with kernel 2.6.22 and
- you can provide the requested information.

Note You need to log in before you can comment on or make changes to this bug.