Bug 3191 - SMBus unhiding in drivers/pci/quirks.c cause ACPI to use wrong IO ports and fails (Title changed)
Summary: SMBus unhiding in drivers/pci/quirks.c cause ACPI to use wrong IO ports and f...
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Config-Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Shaohua
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-11 12:46 UTC by Eric Valette
Modified: 2004-10-21 17:28 UTC (History)
5 users (show)

See Also:
Kernel Version: 2.6.8-rc4-mm1 2.6.8.1-mm1 2.6.8.1-mm4
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
acpidump output bziped (10.64 KB, application/octet-stream)
2004-08-12 02:23 UTC, Eric Valette
Details
2.6.8-rc3-mm1 dmesg (15.14 KB, text/plain)
2004-08-12 02:25 UTC, Eric Valette
Details
lscpi -vv for ASUS L3800C (7.94 KB, text/plain)
2004-08-13 03:28 UTC, Eric Valette
Details
Fix the temperature problem for L3C on mm tree (440 bytes, patch)
2004-08-18 04:05 UTC, Eric Valette
Details | Diff
Ioport configuration with ACPI=OFF boot parameter and ISA and I2C_I801 not configured (969 bytes, text/plain)
2004-08-19 08:57 UTC, Eric Valette
Details
workaround patch (939 bytes, patch)
2004-08-19 19:20 UTC, Shaohua
Details | Diff
io ports with ACPI_OFF_SMB_ON_I2C_I801_ON_ISA_ON (994 bytes, text/plain)
2004-08-20 02:02 UTC, Eric Valette
Details
ioports with ACPI_ON_SMB_ON_I2C_I801_ON_ISA_ON (1.21 KB, text/plain)
2004-08-20 02:04 UTC, Eric Valette
Details
/proc/interrupts on 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration (538 bytes, text/plain)
2004-08-20 02:58 UTC, Eric Valette
Details
/proc/ioports on 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration (1.21 KB, text/plain)
2004-08-20 03:00 UTC, Eric Valette
Details
dmesg for 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration (15.13 KB, text/plain)
2004-08-20 03:02 UTC, Eric Valette
Details
HP compaq nc8000 also broken (2.44 KB, text/plain)
2004-08-20 12:34 UTC, Eric Valette
Details
Manual (setpci) L3C SMBus enabling + SMBus initial IO port base value (2.79 KB, text/plain)
2004-08-20 13:05 UTC, Eric Valette
Details
debug patch (1.20 KB, patch)
2004-08-22 22:34 UTC, Shaohua
Details | Diff
dmesg with trace for PCI iospace allocation (15.13 KB, text/plain)
2004-08-23 01:11 UTC, Eric Valette
Details
ioports with ISA and BIOS PNP (1.04 KB, text/plain)
2004-08-23 02:30 UTC, Eric Valette
Details
proposed patch (1.70 KB, patch)
2004-08-23 03:02 UTC, Shaohua
Details | Diff
Correct fix for ASUS L3C (1.76 KB, patch)
2004-08-23 04:07 UTC, Eric Valette
Details | Diff
Ioports with fixed patch (1.26 KB, text/plain)
2004-08-23 04:12 UTC, Eric Valette
Details
lspci output (1.23 KB, text/plain)
2004-08-23 04:14 UTC, Eric Valette
Details
Patche updated to fix HP CompaQ nc8000 (1.87 KB, patch)
2004-08-23 05:37 UTC, Eric Valette
Details | Diff
Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000 (2.69 KB, patch)
2004-08-23 07:51 UTC, Eric Valette
Details | Diff
Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000 (3.77 KB, patch)
2004-08-24 02:42 UTC, Eric Valette
Details | Diff
requested dmesg, after S3 and swsusp (13.99 KB, text/plain)
2004-10-19 14:24 UTC, Karol Kozimor
Details

Description Eric Valette 2004-08-11 12:46:09 UTC
Distribution: Debian sid
Hardware Environment: Asus L3800C
Software Environment: gcc 3.4 compiler
Problem Description:

On the same machine, the dmesg output for 2.6.8-rc3-mm1 says :

ACPI : Thermal Zone [THRM] (53 C) 

which is correct while 2.6.8-rc4-mm1 prints

ACPI : Thermal Zone [THRM] (-123 C)

Which is obviously incorrect. Furthermore, this makes the fan make so much noise
that if I finished booting, I would not stand it for long.





Steps to reproduce:
Comment 1 Wang, Zhenyu Z 2004-08-11 20:00:28 UTC
Pls, reverse two patches below and try.

http://linux-acpi.bkbits.net:8080/linux-acpi-test-2.6.8/cset@1.1731.1.10
http://linux-acpi.bkbits.net:8080/linux-acpi-test-2.6.8/cset@1.1731.1.9

But, I can't see these patches will cause trouble..

-zhen
Comment 2 Eric Valette 2004-08-12 00:57:26 UTC
Problem is that those patches are already in 2.6.8-rc3-mm1 as shown below :

patch -p1 --dry-run -i ../patchThermal1.patch 
patching file drivers/acpi/thermal.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
5 out of 5 hunks ignored -- saving rejects to file drivers/acpi/thermal.c.rej
valette@tri-yann:/usr/src/linux$ patch -p1 --dry-run -i ../patchThermal2.patch 
patching file drivers/acpi/thermal.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
3 out of 3 hunks ignored -- saving rejects to file drivers/acpi/thermal.c.rej
valette@tri-yann:/usr/src/linux$ uname -a
Linux tri-yann 2.6.8-rc3-mm1 #22 Sun Aug 8 23:59:53 CEST 2004 i686 GNU/Linux
valette@tri-yann:/usr/src/linux$ 
Comment 3 Wang, Zhenyu Z 2004-08-12 01:10:14 UTC
Yeah, these patches are also in 2.6.8-rc4-mm1, right?
So try to reverse them by "patch -R -p1 -i xxx.patch", then recompile, reboot
and see if the result changes.

-zhen
Comment 4 Eric Valette 2004-08-12 01:21:43 UTC
As the two patches are in both kernels how do you think they could be the cause
of the problem because the hardware is the same and problems appears only on
2.6.8-rc4-mm1?

Each time I boot 2.6.8-rc4-mm1, I trash my filesystems so I would prefer to
avoid booting for nothing... A check for reasonnable temperature range would
avoid trashing the system...
Comment 5 Wang, Zhenyu Z 2004-08-12 02:03:16 UTC
Fine, pls attach your acpidmp output and dmesg after boot.
Comment 6 Wang, Zhenyu Z 2004-08-12 02:07:57 UTC
Does 2.6.8-rc4 has this strange issue?
Comment 7 Eric Valette 2004-08-12 02:23:23 UTC
Created attachment 3490 [details]
acpidump output bziped
Comment 8 Eric Valette 2004-08-12 02:25:18 UTC
Created attachment 3491 [details]
2.6.8-rc3-mm1 dmesg
Comment 9 Eric Valette 2004-08-12 02:28:17 UTC
As explained I cannot simply boot 2.6.8-rc4-mm1. It freeze before I have a
change to store my dmesg somewhere (even with init=/bin/sh). I get the prompt
but cannot type anything...

I'm currently compilling rc4 and will try to only apply the acpi related patches
Comment 10 Eric Valette 2004-08-12 02:35:17 UTC
plain rc4 works. Will aplly manually all but only acpi patches contained in
2.6.8-rc4-mm1 that is :

bk-acpi.patch
remove-unconditional-pci-acpi-irq-routing.patch

Note that I already reverted the last one without success on original
2.6.8-rc4-mm1...
Comment 11 Wang, Zhenyu Z 2004-08-12 02:58:34 UTC
This is Asus P4_L3CS. Other person on acpi-devel suffer this too.

What's the output of "cat /proc/acpi/thermal_zone/THRM/*"?
Comment 12 Eric Valette 2004-08-12 03:09:03 UTC
On 2.6.8-rc3-mm1

cat /proc/acpi/thermal_zone/THRM/*
cooling mode:   active
<polling disabled>
state:                   active[0]
temperature:             49 C
critical (S5):           95 C
passive:                 90 C: tc1=1 tc2=1 tsp=100 devices=0xeffe4ba8 
active[0]:               40 C: devices=0xeffd9c28

Next boot I will give the values for 2.6.8-rc4-mm1
Comment 13 Eric Valette 2004-08-12 03:17:48 UTC
values for 2.6.8-rc4 + bk-acpi.patch +
remove-unconditional-pci-acpi-irq-routing.patch

cat /proc/acpi/thermal_zone/THRM/*
cooling mode:   active
<polling disabled>
state:                   active[0]
temperature:             51 C
critical (S5):           95 C
passive:                 90 C: tc1=1 tc2=1 tsp=100 devices=0xeff2e7a8 
active[0]:               40 C: devices=0xeff3a828 

So where are we going now? Latest ACPI code by itslef does not seeme to be the
only culprit. So how can the value be corrupted? Could safety test be added to
the code so that negative and > 100
Comment 14 Wang, Zhenyu Z 2004-08-12 20:10:33 UTC
Good effort on test!
How about the thermal zone message when boot up?
Is that still wrong with 2.6.8-rc4+bk-acpi.patch? 
I am just wondering why rc3-mm1 not encounter this...
Comment 15 Eric Valette 2004-08-13 01:00:48 UTC
Boot message thermal zone is correct on 2.6.8-rc4 + bk-acpi.patch. So my
question is, weher do you get the value from and how can t be corrupted...

Comment 16 Wang, Zhenyu Z 2004-08-13 01:37:01 UTC
Temperature is read by evaluate _TMP method inside ThermalZone.
For your dsdt, it runs:
   Store(\_SB.PCI0.PX40.RTMP(), Local0)
   Return(\_TZ_.KELV(Local0)) //seems it uses celsius..then convert
In RTMP(), it calculates the average temperature value after 3 probes.
Could you attach your "lspci -vv" output? I want to know what is on PCI0.PX40
Maybe you can revert bk-pci.patch from rc4-mm1 and try. It has some quirk on 
Asus L3C.

-zhen
Comment 17 Eric Valette 2004-08-13 03:28:13 UTC
Created attachment 3498 [details]
lscpi -vv for ASUS L3800C
Comment 18 Eric Valette 2004-08-13 03:32:08 UTC
Well, I'm curious because 2.6.8-rc3-mm1 and 2.6.8-rc4-mm1 are so close from the
bk-acpi bitkeeper tree... At least it seems now sure that the bk-acpi.patch by
itself is not the _single_ root cause of the problem as applied alone it works.
Maybe other PCI, ACPI, IRQ tweaks are also problemetic...

--eric
Comment 19 Eric Valette 2004-08-17 03:14:36 UTC
I tested this morning 2.6.8.1-mm1 with the same result regarding the temperature
but slightly different on global useability : the system is very slow but alive.
I guess I'm now in the case Karol Kosimor reported on LKML.

NB : 2.6.8.1 + aci-20040715-2.6.8.diff is OK.
Comment 20 Wang, Zhenyu Z 2004-08-17 20:32:53 UTC
Do you encounter temperature issue if apply 2.6.8.1+ acpi-patch + bk-pci.patch?
as my question in comment 16
Comment 21 Eric Valette 2004-08-18 00:13:40 UTC
Zen>Do you encounter temperature issue if apply 2.6.8.1+ acpi-patch +
Zen> bk-pci.patch? as my question in comment 16

I guess you mean 2.6.8.1 - bk-pci.patch + acpi-patch right?
Comment 22 Wang, Zhenyu Z 2004-08-18 00:47:19 UTC
bk-acpi.patch in 2.6.8.1-mm1 is just a little build fix.
My aim is asking for your help to try 2.6.8.1 + bk-acpi.patch + bk-pci.patch. 
bk-acpi.patch & bk-pci.patch are all at 
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-mm1/broken-out/
I've seen some quirks of Asus L3C in mm's bk-pci.patch, I wanna see 
if that quirk breaks ACPI. 

thanks,
-zhen
Comment 23 Eric Valette 2004-08-18 01:49:33 UTC
Except it does not work as the acpi-bk.patch depends on linus.patch that
contains the new acpi code. So I will try with linus.patch + bk-acpi.patch +
pci-bk.patch. OK?
Comment 24 Eric Valette 2004-08-18 02:38:28 UTC
OK zhen your suspicion was right its the bk-pci.patch contained in 2.6.8.1 that
breaks the thermal support on my ASUS L3C (and makes it tottaly unresponsive).
I've seen that part of the ASUS L3C SMBus fixup modification where done by Karol
Kosimor that own a L3C himslef so maybe he broke it or, more likely, it is
completely unrelated as he has the hardware to test its modifications.

I will try to back out the L3C related SMBus fixup modification just to be sure...
Comment 25 Karol Kozimor 2004-08-18 02:51:14 UTC
Both 2.6.7 and 2.6.8-rc3-mm1 worked fine with my fixup, the code it touches is 
actually quite straightforward (it simply enables the SMBus bridge that ASUS 
otherwise  hides). 
Comment 26 Eric Valette 2004-08-18 03:06:08 UTC
Unfortunately, it also is exactly what breakes 2.6.8.1-mm1 as far as temperature
is related.

NB : on another ASUS MB (and old A7V but that is still my desktop because of
powerfull SCSI controllers) the SMB bus IO space was also reserved exclusively
reserved wronly by ACPI and thus I2C code trying to make the SMB bus IO space
was falling...  See <http://bugme.osdl.org/show_bug.cgi?id=3049>

Just removing the line for L3C in drivers/pci/quirks.c fixes the problem. Try it
yourself.
Comment 27 Karol Kozimor 2004-08-18 03:47:35 UTC
So that's the quirk + the rest of bk-pci that breaks it (resource management 
code?). I'll try to post /proc/ioports and iomem when I get to the laptop. 
Comment 28 Eric Valette 2004-08-18 04:05:55 UTC
Created attachment 3518 [details]
Fix the temperature problem for L3C on mm tree

The following patche make the whole 2.6.8.1.mm1 functional again.
Comment 29 Karol Kozimor 2004-08-18 04:19:27 UTC
Yup, that's a workaround. The root cause lies somewhere in the resource 
management, and that's why I need the ioports and iomem output (since my 
kernel at least does boot). 
Comment 30 Eric Valette 2004-08-18 04:22:06 UTC
Karol, what is your patch supposed to achieve? make i2c-i810 work I guess. But
something on drivers/pci/quirks.c when the variable is set breakes badly the
thermal.c code that in turn make the whole laPtop unresponsive... So I think the
most urgent is to back out the patche until the other part is fixed : either the
function to unhide the bus is badly broken or the way thermal.c assumes it can
read the THMR value is wrong. You are both in relation now and I THINK I'VE MADE
MY PART OF DEBUGGING.

Thanks for your work and help anyway,

Do you want the value of iomem and ioports? With or without the kludge? both?
Comment 31 Shaohua 2004-08-19 06:56:51 UTC
Eric, please attach your /proc/ioports with acpi disabled and SMbus enabled. I 
want to check it. Thanks, David.
Comment 32 Karol Kozimor 2004-08-19 06:59:50 UTC
From http://hell.org.pl/~sziwan/asus/l3c/quirk.report-acpioff: 
 
0000-001f : dma1 
0020-0021 : pic1 
0040-005f : timer 
0060-006f : keyboard 
0070-0077 : rtc 
0080-008f : dma page reg 
00a0-00a1 : pic2 
00c0-00df : dma2 
00f0-00ff : fpu 
0100-010f : pcmcia_socket0 
  0100-0107 : serial 
  0108-010f : serial 
0170-0177 : ide1 
01f0-01f7 : ide0 
0290-0297 : pnp 00:11 
02f8-02ff : serial 
0376-0376 : ide1 
03c0-03df : vga+ 
03f0-03f1 : pnp 00:11 
03f6-03f6 : ide0 
03f8-03ff : serial 
0cf8-0cff : PCI conf1 
4000-40ff : PCI CardBus #03 
4400-44ff : PCI CardBus #03 
4800-48ff : PCI CardBus #07 
4c00-4cff : PCI CardBus #07 
8400-840f : 0000:00:1f.1 
  8400-8407 : ide0 
  8408-840f : ide1 
a800-a8ff : 0000:02:05.0 
  a800-a8ff : 8139too 
b400-b41f : 0000:00:1d.1 
  b400-b41f : uhci_hcd 
b800-b81f : 0000:00:1d.0 
  b800-b81f : uhci_hcd 
d000-dfff : PCI Bus #01 
  d800-d8ff : 0000:01:00.0 
    d800-d8ff : radeonfb 
e000-e0ff : 0000:00:1f.5 
  e000-e0ff : Intel 82801CA-ICH3 
e100-e13f : 0000:00:1f.5 
  e100-e13f : Intel 82801CA-ICH3 
e200-e2ff : 0000:00:1f.6 
e300-e37f : 0000:00:1f.6 
e400-e47f : pnp 00:11 
  e400-e47f : 0000:00:1f.0 
e800-e81f : 0000:00:1f.3 
ec00-ec3f : pnp 00:11 
  ec00-ec3f : 0000:00:1f.0 
Comment 33 Eric Valette 2004-08-19 07:37:07 UTC
Karol, while any ioport for possible config is usefull, yours differs from mine
because :
     1) I do not have PNP enabled. And PNP also do things on the SMBus as i is
behing the ISA bridge...
     2) While you want to enable SMBus you do not have the SMBus driver (i2c-i801)

Problem is that I scratched my working tree to try 2.6.8.1-mm2 so be patient...
Just for the __fun__ this kernel hangs misdedecting the ttyS1...

Bad luck theses days. Hopefully due to unemployment I have plenty of time...
Comment 34 Eric Valette 2004-08-19 08:23:38 UTC
New problems : without ISA Pnp but i2c-i801 and IS support, the kernel does not
boot probably because the irq is not correctly configurated... Will try to
rebuid my original .config that booted with acpi=off

Comment 35 Karol Kozimor 2004-08-19 08:26:53 UTC
Hmm, as for PNP, I get the point. However, bear in mind that ACPI breaks 
before i2c-i801 gets any chance to load. 
Comment 36 Eric Valette 2004-08-19 08:54:03 UTC
Well not sure : I managed to boot once and see the ioports allocated by i2c-i801
unfortunately, laptop became unmanageable before I got a change to save ioports
configuration. Ioport zone was wrong of course.
Comment 37 Eric Valette 2004-08-19 08:57:08 UTC
Created attachment 3531 [details]
Ioport configuration with ACPI=OFF boot parameter and ISA and I2C_I801 not configured

You can see there that the SMB bus is indeed visible and owns e800-e81f (taht
is exaclty the same region as ACPI...
Comment 38 Shaohua 2004-08-19 19:20:49 UTC
Created attachment 3534 [details]
workaround patch

Eric & Karol,
I don't think it's a resource conflict. Please note ACPI motherboard driver
just reserve the io ports. I guess the bug appears even you remove
motherboard.c (comment motherboard.o in acpi/Makefile).
I'm not familar with SMBus, but I guess just enabling SMBus in LPC is not
sufficient. Maybe BIOS doesn't initialize SMBus and if OS enabled it, OS is
responsible for initializing it. Please try the workaround. And please try if
removing motherboard.c helps. Sorry for letting you try much. 
Thanks, David.
Comment 39 Eric Valette 2004-08-20 00:24:59 UTC
I do not think it is an IO conflict error either. I do think it is an side
effect of making the device visible on the PCI bus and that latter the
pci_enable_device code potentially remaps the IO ports, irq when detecting a new
device. My current analysis is that allthough not visible, the SMBus is indeed
used on L3C for the APCI hardware monitoring, and that unhidding it cause the
reinitialization of some of its firmware configured PCI registers values by
default PCI ressource management code including possibly the IO port range and
possibly the IRQ. But as the default values are hardcoded in DTST, the ACPI code
fails...

Concerning the additionnal initialization code, of course I will try it but I
would like to recall that manually enabling the device (using setpci) and
bypassing the PCI initialization code for the device do not breaks the machine.
I will try to do it and manually load the i2c-i801 drivers as a modules to see
what happens... If it still works, it means that PCI ressource management cause
the problem.

More later...
Comment 40 Eric Valette 2004-08-20 02:01:22 UTC
OK I managed to boot correctly this time with I2C-I801 ON, all possible sensors,
ISA ON but not ISA PNP _AND_ ACPI ON of OFF.

Here is the diff concerning the IOPORTS :

>diff ioports_ACPI_ON_SMB_ON_I2C_I801_ON_ISA_ON
ioports_ACPI_OFF_SMB_ON_I2C_I801_ON_ISA_ON
19,20d18
< 1000-101f : 0000:00:1f.3
<   1000-1007 : i801-smbus
44,51c42,43
<   e400-e47f : motherboard
<     e400-e403 : PM1a_EVT_BLK
<     e404-e405 : PM1a_CNT_BLK
<     e408-e40b : PM_TMR
<     e410-e415 : ACPI CPU throttle
<     e428-e42b : GPE0_BLK
<     e42c-e42f : GPE1_BLK
< e800-e81f : motherboard
---
> e800-e81f : 0000:00:1f.3
>   e800-e807 : i801-smbus
53d44
<   ec00-ec3f : motherboard

So it is cristall clear that the SMBus IO zone is relocated by the PCI code
compared to the NON ACPI mode. I2C-i801 driver is operationnal thus we can
expect the chipset to be properly configurated this time allthough I have found
no supported sensors on it. Karol, what were you looking at behind this bus?

The full ioports will be attached.
Comment 41 Eric Valette 2004-08-20 02:02:47 UTC
Created attachment 3535 [details]
io ports with ACPI_OFF_SMB_ON_I2C_I801_ON_ISA_ON
Comment 42 Eric Valette 2004-08-20 02:04:03 UTC
Created attachment 3536 [details]
ioports with ACPI_ON_SMB_ON_I2C_I801_ON_ISA_ON
Comment 43 Shaohua 2004-08-20 02:11:01 UTC
Eric, do you reconfigure SMBus's IO BAR? After boot, can your thermal zone 
work?
Comment 44 Eric Valette 2004-08-20 02:56:01 UTC
OK I tried your patch on a working 2.6.8-mm2 (removing the L3C trick). I
reapplied the L3C trick (back to original 2.6.8-mm2) and then your patch. It
still fails to get the correct temperature as you will see attached files to come.

This time, I have enough. I propose to simply remove the patch as :
      1) It obviouly breaks the L3C code,
      2) Karol did not provide any hint on its possible usage on this machine,
      3) I have conigured all possible sensors for the SMBus but failed to
detect any so what is the final use of SMBus on this particular machine?

Note that the problem is more general as L5C are also broken trying to unhide
the SMBus <http://bugme.osdl.org/show_bug.cgi?id=3233>. So unhidding the SMBus
on ASUS machines is probably a bad idea as they provide description for it in
the DTST and due to PCI ressource management change and ACPI enhancement seems
to break.

I still do not know why PCI reconfigures the Chipset IO base and would be glad
to understand it without readding all PCI code...
Comment 45 Eric Valette 2004-08-20 02:58:23 UTC
Created attachment 3537 [details]
/proc/interrupts on 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration
Comment 46 Eric Valette 2004-08-20 03:00:13 UTC
Created attachment 3538 [details]
/proc/ioports on 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration
Comment 47 Eric Valette 2004-08-20 03:02:14 UTC
Created attachment 3539 [details]
dmesg for 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration
Comment 48 Karol Kozimor 2004-08-20 06:05:45 UTC
Hmm, that leaves me puzzled. I don't really know much about sensors, I just 
thought that if the patch was useful enough for M2400N, it might be for L3C as 
well. Anyway, it would still be good to know if M2N is also broken. 
Comment 49 Eric Valette 2004-08-20 06:23:38 UTC
Well, as I see it, SMBus is mainly used for accessing sensors chips. It works
well on my ASUS A7V once you figure out what sensors you have on the bus (never
given in the docs). I have no clue of the type of hardware used to monitor the
temperature and its interface on the L3C.

Here is the kind of output you get 

as99127f-i2c-2-2d
Adapter: SMBus Via Pro adapter at e800
VCore:     +1.81 V  (min =  +1.66 V, max =  +1.82 V)
+3.3V:     +3.52 V  (min =  +3.20 V, max =  +3.54 V)
+5V:       +5.05 V  (min =  +4.73 V, max =  +5.24 V)
+12V:     +12.34 V  (min = +10.82 V, max = +13.19 V)
-12V:     -12.33 V  (min = -13.22 V, max = -10.74 V)
-5V:       -5.15 V  (min =  -5.25 V, max =  -4.74 V)
fan2:     6887 RPM  (min = 2836 RPM, div = 2)                     (beep)
M/B Temp:    +49
Comment 50 Eric Valette 2004-08-20 12:34:10 UTC
Created attachment 3540 [details]
HP compaq nc8000 also broken
Comment 51 Eric Valette 2004-08-20 12:49:15 UTC
Changed title as it is misleading
Comment 52 Eric Valette 2004-08-20 13:05:08 UTC
Created attachment 3541 [details]
Manual (setpci) L3C SMBus enabling + SMBus initial IO port base value

I thinks this clearly shows that SMBus IO port space is not allocated where the
firmware (and ACPI DTST code)  expects it.
Comment 53 Shaohua 2004-08-22 22:34:35 UTC
Created attachment 3546 [details]
debug patch

From my understanding, the problem is BIOS allocate resources for SMBus, it's
from e800-e81f, but BIOS disables access to SMBus devices, so SMBus's IO base
isn't initialized. ACPI Thermal zone will access e800-e81f, and it will die if
the address changed, since it uses hardcoded address. If SMBus is enabled,
since its IOBAR isn't initialized, PCI core will try to allocate a new address
but PCI core doesn't know the pre-defined address. The new address will be
different. In this stage, thermal zone will use wrong base address and failed.
Does this make sense to you, Eric? Could you apply the debug patch to catch
some info and so we can confirm it. If the assumption is right, there is no
method to fix the problem but to hide the SMBus, since ACPI and PCI don't know
each other.
Thanks,
Shaohua
Comment 54 Eric Valette 2004-08-23 00:46:30 UTC
1) I think our analysis of the problem are now _about_ the same. And contrarily
to what you said in previous comments, there is indeed an IO region overlapping
conflict, but the real bug is providing access to SMBus via ACPI through fixed
regions and not true the SBMbus IOBAR (grrr ASUS). I checked with XP SP2 and
indeed on windows also there is no SMBus...
2) I would be surprised that the IOBAR is not initialized as we get the expected
value when ACPI=off (see previous ioport dump attachement) and SMB is working
fine. Could this be just possible luck?
3) Can we find a way to avoid reallocation of the IO space via supplemental
quirks.c tricks? NB : this is only valuable if unhidding the SMBus gives us more
functionnality but the sensor located on the SMBus is detected but not yet
managed with current stock kernel (I loaded one by one each chips module). The
lm90 modules detects the chips but says it is not managed (Vendor AXIM but
chip_id not managed). I will try to add additionnals sensors chips present in
lmsensors-2.8.7,
4) The SMBus is only one part of the ioports reserved by the function 3 device.
What are other possible functionnality provided by this chip that could be
usefull? (I guess some of you have access to relevant docs :-))

Trace to come.

And anyway thanks a lot for your time spend on this problem. It was not wasted
as we found other broken laptops (ASus but also others)...

Comment 55 Eric Valette 2004-08-23 01:11:50 UTC
Created attachment 3547 [details]
dmesg with trace for PCI iospace allocation

As expected the IOBAR is correctly set (which I was also certain as the correct
IO range is allocated when ACPI=off). And indeed it gets relocated by PCI code
breaking ACPI...
Comment 56 Eric Valette 2004-08-23 01:17:48 UTC
Here are the _extra_ printed information (full dmesg allready attached)

SMBus IO base: e800
...
PCI: Cannot allocate resource region 4 of device 0000:00:1f.3
Re-assign resources for 0000:00:1f.3, 1000 - 101f

So indeed it sound like an IO zone conflict allthough ACPI did clear the
IORESSOURCE_BUSY flags...
 
Comment 57 Shaohua 2004-08-23 01:30:58 UTC
Ok, thanks. Looks like ACPI motherboard.c reserved IO ports too earily, so it 
can't be reserved by PCI devices any more and PCI core allocats new IO ports 
for it. I will try to fix it. Thanks, David.
Comment 58 Eric Valette 2004-08-23 01:49:05 UTC
I guess the change that makes IO APIC/power management timer work on my laptop
(great to know motherboard hardware design is not broken) has unexpected side
effects then...

I will test any patch you propose (provided it has not chance to destroy my
personnal laptop...)

-- eric
Comment 59 Eric Valette 2004-08-23 02:30:06 UTC
Created attachment 3548 [details]
ioports with ISA and BIOS PNP

As requested by email
Comment 60 Shaohua 2004-08-23 03:02:22 UTC
Created attachment 3549 [details]
proposed patch

Well, the BIOS doesn't report the IO region in PNP, but it does in ACPI :(. The
only workaround I can think of is to add another quirk - pre-reserve the IOBAR
for SMBus. Eric, could you please test it? I didn't test the patch, so maybe
you need change it a little if possible. Many thanks, Shaohua.
Comment 61 Eric Valette 2004-08-23 04:07:05 UTC
Created attachment 3550 [details]
Correct fix for ASUS L3C

I _corrected_ the patches for L3C and 2.6.8.1-mm4
       1) The proposed patches does not apply (a lot of change occured in
bk-pci.patch),
       2) The device on which it was applied is wrong for this machine,
       3) The size computation was wrong (1 byte IO missing),

Anyway the idea WAS GOOD.
Comment 62 Eric Valette 2004-08-23 04:12:50 UTC
Created attachment 3551 [details]
Ioports with fixed patch
Comment 63 Eric Valette 2004-08-23 04:14:40 UTC
Created attachment 3552 [details]
lspci output

The SMBus is there... Just need to find the correct sensor code now :-)
Comment 64 Shaohua 2004-08-23 05:03:57 UTC
Great, Eric. I use base kernel and ICH4 PCI id, so failed. Maybe we should 
also fix HP compaq nc8000 case.
Comment 65 Eric Valette 2004-08-23 05:15:29 UTC
For fixing bug, I always prefer using -mm tree as it is closer to various bk
tree. I would maintain something, my views would probably be different.

So remaining :
        1) Please mark your patch as incorrect (at least the IO zone size
computation is wrong),
        2) We indeed should fix L5C and nc8000,

Problem is that I do not know what SMBus they contain. Will ask...
Comment 66 Shaohua 2004-08-23 05:22:35 UTC
Comment on attachment 3549 [details]
proposed patch

The patch isn't correct, please refer to Eric's.
Comment 67 Eric Valette 2004-08-23 05:37:45 UTC
Created attachment 3553 [details]
Patche updated to fix HP CompaQ nc8000

on HP compaq nc8000 we have PCI_DEVICE_ID_INTEL_82801DB_3,
so here is an updated patche.
Comment 68 Eric Valette 2004-08-23 07:51:09 UTC
Created attachment 3554 [details]
Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000

Checked that the patch works with L5C and nc8000 owners.
Comment 69 Shaohua 2004-08-23 18:26:10 UTC
Eric, please note the number (0x20, 4) in 'asus_smbus_resources' is ICH 
specific. Does this apply for sis96x? I suppose they are different devices, so 
the IOBAR index is different. Thanks, Shaohua.
Comment 70 Eric Valette 2004-08-24 00:39:04 UTC
I should not watch the olympic games while coding :-( Of course this is plain
wrong for SiS SMBus. The base address is 0x4 in PCI config space and I do not
fully understand the ressource parameter index used in pci_claim_resource. Help
appreciated.

I think we should end up by coding a generic function with thoses two parameter
as arguments and make device specific function that calls the generic one with
the two parameter set.

Will code that. What do you need to give me the correct resource index? 
Comment 71 Shaohua 2004-08-24 00:54:26 UTC
I think so, should provide a generic routine. Actually only one parameter is 
needed: addr = PCI_BASE_ADDRESS_0 + (index << 2); The index parameter presents 
which bar is the resource (6 BARs for PCI device). Please go ahead and clean 
it up. I haven't the material about SiS SMBus, but lspci -vv should help you 
find it out(it will display something like "Region 4: I/O ports at .. 
[size=..]").
Thanks,
Shaohua.
Comment 72 Eric Valette 2004-08-24 02:42:57 UTC
Created attachment 3556 [details]
Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000

Hope this one is the final version. NB : by luck the patch was indeed working
foR the LC5 with SiS96x SMBus bridge because the IO BAR index is the same as on
the i810 (I asked an L5C owner to test it before). Anyway this version is much
more generic and open the door for yet unfixed or to come SMBus unhiding...
Comment 73 Alex Williamson 2004-08-24 15:50:04 UTC
RE: "Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000"

   It's better that what's in the kernel now, but this is still wrong.  I hit
this problem on my nc6000.  ACPI is well within it's right to claim this
resource, it's not a "buggy BIOS" issue.  Until we unhide this PCI function, the
SMBus is completely hidden.  ACPI has reported that the range is in use via the
_CRS method on the motherboard node.  At this point, ACPI firmware has exclusive
access to the device, which is required for the Operation Region is uses to
access it.

   By unhiding this device, and effectively stealing the resource, we're
exporting the device to any kernel level driver that wants it.  IMHO, the kernel
has no right to take this device from ACPI and hand it off to a sensor driver. 
How would we deal with both AML and sensor driver poking ths SMBus controller at
the same time?  This is a firmware owned device, sorry sensors.  I see two options:

1. Only unhide the device on systems where it's known not to be used or those
without ACPI enabled.
2. Look for resource conflicts w/ ACPI an re-hide the device if ACPI claims
ownership.
Comment 74 Eric Valette 2004-08-25 00:48:08 UTC
I still think this is a bug BIOS bug because of the way the instructions to
access the sensor via a hardcoded address is given in the DTST. If access was
coded indirect via offset in the IO base register we whould be able to reloccate
the IO region without problems.

Regarding the question of sharing a device by two different kernel entities,
this one is more serious. The question is the benefit if some more information
can be provided to userland. See <http://fobie.net/nc8000/#i2c> to see what I mean.

Regarding Linus patch versus mine, Linus' one may have more side effect but also
more benefit if we find this type of conflict for other things than the SMBus.

I'm just curious greg this not mention this patch because he was in copy of the
whole discussion.

-- eric
Comment 75 Alex Williamson 2004-08-25 08:49:05 UTC
Eric, the SMBus device is hidden for a reason.  ACPI needs exclusive access to
it.  Once it's hidden, the OS shouldn't be able to move it, so firmware doesn't
need to dynamically determine where it lives.  Now you've exposed it and moved
it, and all the assumptions firmware was able to make are broken.  IHMO, hiding
the SMBus controller to solve the exclusive access problem is a reasonable
solution.  Exposing it so we can dink with the controller, potentially getting
it very confused and breaking ACPI thermal managment is NOT a reasonable solution.

I really think the default should be to not expose hidden SMBus controllers when
ACPI is present.  The little bit of extra info the sensors are able to get by
poking this interface is not worth the potential thermal problems that could
result.  This is dangerous.
Comment 76 Eric Valette 2004-08-25 09:13:38 UTC
Well, first I did _not_ unhide the bus. Someone else did. I just wanted to have
my laptop functionnal again and, as writen in a previous mail, anyway, I have
not found the correct sensor driver for this laptop so I could'nt care less
about making the SMBus visible.

BUT I _do_ care if my laptop becomes unusable (kacpid taking 100% of the cpu)
which is still the case with 2.6.8.1-mm4 and 2.6.9-rc1. I will manually apply
Linus fix to see if it really solves the problem when the SMBus drivers tries to
get ownership of the IO region...

Comment 77 Alex Williamson 2004-08-25 09:30:31 UTC
Eric,

   Sorry, for the impication.  I should have more thoroughly read the url in
your update.  Please let me know if you still have kacpid issues with Linus'
fix.  My nc6000 is behaving nicely now, I'd expect the nc8000 to as well (as
long as the sensor modules aren't loaded).  I don't work in the laptop group,
but I'd be happy to work with you if you're still seeing problems.  Thanks,  Alex
Comment 78 Karol Kozimor 2004-08-25 12:07:37 UTC
Linus' patch also fixes the problem:
e400-e47f : 0000:00:1f.0
  e400-e47f : motherboard
    e400-e403 : PM1a_EVT_BLK
    e404-e405 : PM1a_CNT_BLK
    e408-e40b : PM_TMR
    e410-e415 : ACPI CPU throttle
    e428-e42b : GPE0_BLK
    e42c-e42f : GPE1_BLK
e800-e81f : motherboard
  e800-e81f : 0000:00:1f.3
Comment 79 Shaohua 2004-10-11 19:51:35 UTC
A final fix for such kind of problems has been merged in 2.6.9-rc4. Close it. 
Comment 80 Karol Kozimor 2004-10-18 14:18:34 UTC
Ouch. I've finally verified this bug has not been squashed completely. Steps 
to reproduce with 2.6.9-rc4 vanilla: 
1. Do a full S3 suspend / resume cycle. 
2. Do a full swsusp cycle (both platform and shutdown trigger the bug). 
 
After the machine resumes and a thermal event is triggered, the aforementioned 
mutex loop starts again. 
 
Comment 81 Shaohua 2004-10-19 01:46:34 UTC
Karol, did the ioports have conflict now? can you clarify current problem? Is 
the thermal problem or the SMbus problem? If it's the SMBus problem I guess 
the driver must set the config register to reenable the SMBus.
Comment 82 Karol Kozimor 2004-10-19 02:16:59 UTC
Well, it's not that easy as /proc/ioports doesn't show any change and I'm not 
using any sensor drivers at that point.  
 
Basically, it seems to me that the smbus' IO BAR is somehow reprogrammed on 
resume; why it happens only after S3 followed by S4 is beyond me. Anyway, 
since the OperationRegion SMB0 is hardcoded in the DSDT, when the BAR is 
reprogrammed any subsequent _L00 GPE makes kacpid spin at WTSB() (or at least 
it would seem so from my limited understanding). 
Comment 83 Shaohua 2004-10-19 02:44:38 UTC
As far as I can tell, the enable/disable bit of SMBus in ICH4 is in LPC bridge 
0xF2. Current PCI code will not save/restore 0xF2. This causes the SMBus will 
be hided after S3 by BIOS. Possibly it's the reason. Could you send me the 
dmesg after s4?
Comment 84 Karol Kozimor 2004-10-19 14:24:08 UTC
Created attachment 3858 [details]
requested dmesg, after S3 and swsusp

Log attached. Additionally, some funnies in the PCI config space of the LPC
chip:
[after fresh boot]
00:1f.0 ISA bridge: Intel Corp. 82801CAM ISA Bridge (LPC) (rev 02)
	Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr-
Stepping- SERR+ FastB2B-
	Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
	Latency: 0
00: 86 80 8c 24 0f 01 80 02 02 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 01 e4 00 00 10 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 01 ec 00 00 10 00 00 00
60: 05 0b 0b 0b d0 00 00 00 80 80 80 80 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: ff 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: aa 03 00 00 00 00 00 00 0d 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 81 06 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 04 20 00 00 02 0f 00 00 04 00 00 00 00 00 00 00
e0: 10 00 00 80 00 00 0f 1c 33 22 00 00 00 00 67 45
f0: 0f 00 01 84 00 00 00 00 47 0f 0f 00 00 00 80 00
[after S3; the temperature still reads, kacpid doesn't spin]
[...]
f0: 0f 00 09 84 00 00 00 00 47 0f 0f 00 00 00 80 00
[after subsequent swsusp; kacpid spinning, temperature at -129
Comment 85 Shaohua 2004-10-19 17:44:17 UTC
Thanks, Karol. The lspci and dmesg confirm my assumption. After S3, the SMbus 
is disabled by BIOS. After S4, the SMBus can be enabled, since S4 will re-
invoke pci_fixup.

A workaround is save/restore 1f.0's 0xf2 config register when sleep/wakeup. A 
final solution is Linux provides Bridge (P2P/LPC ...) Driver.

Gerg, what's the plan to provide Linux PCI bridge driver?

Could you please open a new track for the new issue? it's completly different 
from original problem.
Comment 86 Greg Kroah-Hartman 2004-10-20 16:57:56 UTC
No one has sent me a pci bus driver :)
Comment 87 Karol Kozimor 2004-10-21 07:10:59 UTC
I filed a tracker at #3609, please close this one. 

Note You need to log in before you can comment on or make changes to this bug.