Distribution: Debian sid Hardware Environment: Asus L3800C Software Environment: gcc 3.4 compiler Problem Description: On the same machine, the dmesg output for 2.6.8-rc3-mm1 says : ACPI : Thermal Zone [THRM] (53 C) which is correct while 2.6.8-rc4-mm1 prints ACPI : Thermal Zone [THRM] (-123 C) Which is obviously incorrect. Furthermore, this makes the fan make so much noise that if I finished booting, I would not stand it for long. Steps to reproduce:
Pls, reverse two patches below and try. http://linux-acpi.bkbits.net:8080/linux-acpi-test-2.6.8/cset@1.1731.1.10 http://linux-acpi.bkbits.net:8080/linux-acpi-test-2.6.8/cset@1.1731.1.9 But, I can't see these patches will cause trouble.. -zhen
Problem is that those patches are already in 2.6.8-rc3-mm1 as shown below : patch -p1 --dry-run -i ../patchThermal1.patch patching file drivers/acpi/thermal.c Reversed (or previously applied) patch detected! Assume -R? [n] Apply anyway? [n] Skipping patch. 5 out of 5 hunks ignored -- saving rejects to file drivers/acpi/thermal.c.rej valette@tri-yann:/usr/src/linux$ patch -p1 --dry-run -i ../patchThermal2.patch patching file drivers/acpi/thermal.c Reversed (or previously applied) patch detected! Assume -R? [n] Apply anyway? [n] Skipping patch. 3 out of 3 hunks ignored -- saving rejects to file drivers/acpi/thermal.c.rej valette@tri-yann:/usr/src/linux$ uname -a Linux tri-yann 2.6.8-rc3-mm1 #22 Sun Aug 8 23:59:53 CEST 2004 i686 GNU/Linux valette@tri-yann:/usr/src/linux$
Yeah, these patches are also in 2.6.8-rc4-mm1, right? So try to reverse them by "patch -R -p1 -i xxx.patch", then recompile, reboot and see if the result changes. -zhen
As the two patches are in both kernels how do you think they could be the cause of the problem because the hardware is the same and problems appears only on 2.6.8-rc4-mm1? Each time I boot 2.6.8-rc4-mm1, I trash my filesystems so I would prefer to avoid booting for nothing... A check for reasonnable temperature range would avoid trashing the system...
Fine, pls attach your acpidmp output and dmesg after boot.
Does 2.6.8-rc4 has this strange issue?
Created attachment 3490 [details] acpidump output bziped
Created attachment 3491 [details] 2.6.8-rc3-mm1 dmesg
As explained I cannot simply boot 2.6.8-rc4-mm1. It freeze before I have a change to store my dmesg somewhere (even with init=/bin/sh). I get the prompt but cannot type anything... I'm currently compilling rc4 and will try to only apply the acpi related patches
plain rc4 works. Will aplly manually all but only acpi patches contained in 2.6.8-rc4-mm1 that is : bk-acpi.patch remove-unconditional-pci-acpi-irq-routing.patch Note that I already reverted the last one without success on original 2.6.8-rc4-mm1...
This is Asus P4_L3CS. Other person on acpi-devel suffer this too. What's the output of "cat /proc/acpi/thermal_zone/THRM/*"?
On 2.6.8-rc3-mm1 cat /proc/acpi/thermal_zone/THRM/* cooling mode: active <polling disabled> state: active[0] temperature: 49 C critical (S5): 95 C passive: 90 C: tc1=1 tc2=1 tsp=100 devices=0xeffe4ba8 active[0]: 40 C: devices=0xeffd9c28 Next boot I will give the values for 2.6.8-rc4-mm1
values for 2.6.8-rc4 + bk-acpi.patch + remove-unconditional-pci-acpi-irq-routing.patch cat /proc/acpi/thermal_zone/THRM/* cooling mode: active <polling disabled> state: active[0] temperature: 51 C critical (S5): 95 C passive: 90 C: tc1=1 tc2=1 tsp=100 devices=0xeff2e7a8 active[0]: 40 C: devices=0xeff3a828 So where are we going now? Latest ACPI code by itslef does not seeme to be the only culprit. So how can the value be corrupted? Could safety test be added to the code so that negative and > 100
Good effort on test! How about the thermal zone message when boot up? Is that still wrong with 2.6.8-rc4+bk-acpi.patch? I am just wondering why rc3-mm1 not encounter this...
Boot message thermal zone is correct on 2.6.8-rc4 + bk-acpi.patch. So my question is, weher do you get the value from and how can t be corrupted...
Temperature is read by evaluate _TMP method inside ThermalZone. For your dsdt, it runs: Store(\_SB.PCI0.PX40.RTMP(), Local0) Return(\_TZ_.KELV(Local0)) //seems it uses celsius..then convert In RTMP(), it calculates the average temperature value after 3 probes. Could you attach your "lspci -vv" output? I want to know what is on PCI0.PX40 Maybe you can revert bk-pci.patch from rc4-mm1 and try. It has some quirk on Asus L3C. -zhen
Created attachment 3498 [details] lscpi -vv for ASUS L3800C
Well, I'm curious because 2.6.8-rc3-mm1 and 2.6.8-rc4-mm1 are so close from the bk-acpi bitkeeper tree... At least it seems now sure that the bk-acpi.patch by itself is not the _single_ root cause of the problem as applied alone it works. Maybe other PCI, ACPI, IRQ tweaks are also problemetic... --eric
I tested this morning 2.6.8.1-mm1 with the same result regarding the temperature but slightly different on global useability : the system is very slow but alive. I guess I'm now in the case Karol Kosimor reported on LKML. NB : 2.6.8.1 + aci-20040715-2.6.8.diff is OK.
Do you encounter temperature issue if apply 2.6.8.1+ acpi-patch + bk-pci.patch? as my question in comment 16
Zen>Do you encounter temperature issue if apply 2.6.8.1+ acpi-patch + Zen> bk-pci.patch? as my question in comment 16 I guess you mean 2.6.8.1 - bk-pci.patch + acpi-patch right?
bk-acpi.patch in 2.6.8.1-mm1 is just a little build fix. My aim is asking for your help to try 2.6.8.1 + bk-acpi.patch + bk-pci.patch. bk-acpi.patch & bk-pci.patch are all at http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-mm1/broken-out/ I've seen some quirks of Asus L3C in mm's bk-pci.patch, I wanna see if that quirk breaks ACPI. thanks, -zhen
Except it does not work as the acpi-bk.patch depends on linus.patch that contains the new acpi code. So I will try with linus.patch + bk-acpi.patch + pci-bk.patch. OK?
OK zhen your suspicion was right its the bk-pci.patch contained in 2.6.8.1 that breaks the thermal support on my ASUS L3C (and makes it tottaly unresponsive). I've seen that part of the ASUS L3C SMBus fixup modification where done by Karol Kosimor that own a L3C himslef so maybe he broke it or, more likely, it is completely unrelated as he has the hardware to test its modifications. I will try to back out the L3C related SMBus fixup modification just to be sure...
Both 2.6.7 and 2.6.8-rc3-mm1 worked fine with my fixup, the code it touches is actually quite straightforward (it simply enables the SMBus bridge that ASUS otherwise hides).
Unfortunately, it also is exactly what breakes 2.6.8.1-mm1 as far as temperature is related. NB : on another ASUS MB (and old A7V but that is still my desktop because of powerfull SCSI controllers) the SMB bus IO space was also reserved exclusively reserved wronly by ACPI and thus I2C code trying to make the SMB bus IO space was falling... See <http://bugme.osdl.org/show_bug.cgi?id=3049> Just removing the line for L3C in drivers/pci/quirks.c fixes the problem. Try it yourself.
So that's the quirk + the rest of bk-pci that breaks it (resource management code?). I'll try to post /proc/ioports and iomem when I get to the laptop.
Created attachment 3518 [details] Fix the temperature problem for L3C on mm tree The following patche make the whole 2.6.8.1.mm1 functional again.
Yup, that's a workaround. The root cause lies somewhere in the resource management, and that's why I need the ioports and iomem output (since my kernel at least does boot).
Karol, what is your patch supposed to achieve? make i2c-i810 work I guess. But something on drivers/pci/quirks.c when the variable is set breakes badly the thermal.c code that in turn make the whole laPtop unresponsive... So I think the most urgent is to back out the patche until the other part is fixed : either the function to unhide the bus is badly broken or the way thermal.c assumes it can read the THMR value is wrong. You are both in relation now and I THINK I'VE MADE MY PART OF DEBUGGING. Thanks for your work and help anyway, Do you want the value of iomem and ioports? With or without the kludge? both?
Eric, please attach your /proc/ioports with acpi disabled and SMbus enabled. I want to check it. Thanks, David.
From http://hell.org.pl/~sziwan/asus/l3c/quirk.report-acpioff: 0000-001f : dma1 0020-0021 : pic1 0040-005f : timer 0060-006f : keyboard 0070-0077 : rtc 0080-008f : dma page reg 00a0-00a1 : pic2 00c0-00df : dma2 00f0-00ff : fpu 0100-010f : pcmcia_socket0 0100-0107 : serial 0108-010f : serial 0170-0177 : ide1 01f0-01f7 : ide0 0290-0297 : pnp 00:11 02f8-02ff : serial 0376-0376 : ide1 03c0-03df : vga+ 03f0-03f1 : pnp 00:11 03f6-03f6 : ide0 03f8-03ff : serial 0cf8-0cff : PCI conf1 4000-40ff : PCI CardBus #03 4400-44ff : PCI CardBus #03 4800-48ff : PCI CardBus #07 4c00-4cff : PCI CardBus #07 8400-840f : 0000:00:1f.1 8400-8407 : ide0 8408-840f : ide1 a800-a8ff : 0000:02:05.0 a800-a8ff : 8139too b400-b41f : 0000:00:1d.1 b400-b41f : uhci_hcd b800-b81f : 0000:00:1d.0 b800-b81f : uhci_hcd d000-dfff : PCI Bus #01 d800-d8ff : 0000:01:00.0 d800-d8ff : radeonfb e000-e0ff : 0000:00:1f.5 e000-e0ff : Intel 82801CA-ICH3 e100-e13f : 0000:00:1f.5 e100-e13f : Intel 82801CA-ICH3 e200-e2ff : 0000:00:1f.6 e300-e37f : 0000:00:1f.6 e400-e47f : pnp 00:11 e400-e47f : 0000:00:1f.0 e800-e81f : 0000:00:1f.3 ec00-ec3f : pnp 00:11 ec00-ec3f : 0000:00:1f.0
Karol, while any ioport for possible config is usefull, yours differs from mine because : 1) I do not have PNP enabled. And PNP also do things on the SMBus as i is behing the ISA bridge... 2) While you want to enable SMBus you do not have the SMBus driver (i2c-i801) Problem is that I scratched my working tree to try 2.6.8.1-mm2 so be patient... Just for the __fun__ this kernel hangs misdedecting the ttyS1... Bad luck theses days. Hopefully due to unemployment I have plenty of time...
New problems : without ISA Pnp but i2c-i801 and IS support, the kernel does not boot probably because the irq is not correctly configurated... Will try to rebuid my original .config that booted with acpi=off
Hmm, as for PNP, I get the point. However, bear in mind that ACPI breaks before i2c-i801 gets any chance to load.
Well not sure : I managed to boot once and see the ioports allocated by i2c-i801 unfortunately, laptop became unmanageable before I got a change to save ioports configuration. Ioport zone was wrong of course.
Created attachment 3531 [details] Ioport configuration with ACPI=OFF boot parameter and ISA and I2C_I801 not configured You can see there that the SMB bus is indeed visible and owns e800-e81f (taht is exaclty the same region as ACPI...
Created attachment 3534 [details] workaround patch Eric & Karol, I don't think it's a resource conflict. Please note ACPI motherboard driver just reserve the io ports. I guess the bug appears even you remove motherboard.c (comment motherboard.o in acpi/Makefile). I'm not familar with SMBus, but I guess just enabling SMBus in LPC is not sufficient. Maybe BIOS doesn't initialize SMBus and if OS enabled it, OS is responsible for initializing it. Please try the workaround. And please try if removing motherboard.c helps. Sorry for letting you try much. Thanks, David.
I do not think it is an IO conflict error either. I do think it is an side effect of making the device visible on the PCI bus and that latter the pci_enable_device code potentially remaps the IO ports, irq when detecting a new device. My current analysis is that allthough not visible, the SMBus is indeed used on L3C for the APCI hardware monitoring, and that unhidding it cause the reinitialization of some of its firmware configured PCI registers values by default PCI ressource management code including possibly the IO port range and possibly the IRQ. But as the default values are hardcoded in DTST, the ACPI code fails... Concerning the additionnal initialization code, of course I will try it but I would like to recall that manually enabling the device (using setpci) and bypassing the PCI initialization code for the device do not breaks the machine. I will try to do it and manually load the i2c-i801 drivers as a modules to see what happens... If it still works, it means that PCI ressource management cause the problem. More later...
OK I managed to boot correctly this time with I2C-I801 ON, all possible sensors, ISA ON but not ISA PNP _AND_ ACPI ON of OFF. Here is the diff concerning the IOPORTS : >diff ioports_ACPI_ON_SMB_ON_I2C_I801_ON_ISA_ON ioports_ACPI_OFF_SMB_ON_I2C_I801_ON_ISA_ON 19,20d18 < 1000-101f : 0000:00:1f.3 < 1000-1007 : i801-smbus 44,51c42,43 < e400-e47f : motherboard < e400-e403 : PM1a_EVT_BLK < e404-e405 : PM1a_CNT_BLK < e408-e40b : PM_TMR < e410-e415 : ACPI CPU throttle < e428-e42b : GPE0_BLK < e42c-e42f : GPE1_BLK < e800-e81f : motherboard --- > e800-e81f : 0000:00:1f.3 > e800-e807 : i801-smbus 53d44 < ec00-ec3f : motherboard So it is cristall clear that the SMBus IO zone is relocated by the PCI code compared to the NON ACPI mode. I2C-i801 driver is operationnal thus we can expect the chipset to be properly configurated this time allthough I have found no supported sensors on it. Karol, what were you looking at behind this bus? The full ioports will be attached.
Created attachment 3535 [details] io ports with ACPI_OFF_SMB_ON_I2C_I801_ON_ISA_ON
Created attachment 3536 [details] ioports with ACPI_ON_SMB_ON_I2C_I801_ON_ISA_ON
Eric, do you reconfigure SMBus's IO BAR? After boot, can your thermal zone work?
OK I tried your patch on a working 2.6.8-mm2 (removing the L3C trick). I reapplied the L3C trick (back to original 2.6.8-mm2) and then your patch. It still fails to get the correct temperature as you will see attached files to come. This time, I have enough. I propose to simply remove the patch as : 1) It obviouly breaks the L3C code, 2) Karol did not provide any hint on its possible usage on this machine, 3) I have conigured all possible sensors for the SMBus but failed to detect any so what is the final use of SMBus on this particular machine? Note that the problem is more general as L5C are also broken trying to unhide the SMBus <http://bugme.osdl.org/show_bug.cgi?id=3233>. So unhidding the SMBus on ASUS machines is probably a bad idea as they provide description for it in the DTST and due to PCI ressource management change and ACPI enhancement seems to break. I still do not know why PCI reconfigures the Chipset IO base and would be glad to understand it without readding all PCI code...
Created attachment 3537 [details] /proc/interrupts on 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration
Created attachment 3538 [details] /proc/ioports on 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration
Created attachment 3539 [details] dmesg for 2.6.8.1-mm2 with the L3C SMBus enabling and IOBAR and interrupt reconfiguration
Hmm, that leaves me puzzled. I don't really know much about sensors, I just thought that if the patch was useful enough for M2400N, it might be for L3C as well. Anyway, it would still be good to know if M2N is also broken.
Well, as I see it, SMBus is mainly used for accessing sensors chips. It works well on my ASUS A7V once you figure out what sensors you have on the bus (never given in the docs). I have no clue of the type of hardware used to monitor the temperature and its interface on the L3C. Here is the kind of output you get as99127f-i2c-2-2d Adapter: SMBus Via Pro adapter at e800 VCore: +1.81 V (min = +1.66 V, max = +1.82 V) +3.3V: +3.52 V (min = +3.20 V, max = +3.54 V) +5V: +5.05 V (min = +4.73 V, max = +5.24 V) +12V: +12.34 V (min = +10.82 V, max = +13.19 V) -12V: -12.33 V (min = -13.22 V, max = -10.74 V) -5V: -5.15 V (min = -5.25 V, max = -4.74 V) fan2: 6887 RPM (min = 2836 RPM, div = 2) (beep) M/B Temp: +49
Created attachment 3540 [details] HP compaq nc8000 also broken
Changed title as it is misleading
Created attachment 3541 [details] Manual (setpci) L3C SMBus enabling + SMBus initial IO port base value I thinks this clearly shows that SMBus IO port space is not allocated where the firmware (and ACPI DTST code) expects it.
Created attachment 3546 [details] debug patch From my understanding, the problem is BIOS allocate resources for SMBus, it's from e800-e81f, but BIOS disables access to SMBus devices, so SMBus's IO base isn't initialized. ACPI Thermal zone will access e800-e81f, and it will die if the address changed, since it uses hardcoded address. If SMBus is enabled, since its IOBAR isn't initialized, PCI core will try to allocate a new address but PCI core doesn't know the pre-defined address. The new address will be different. In this stage, thermal zone will use wrong base address and failed. Does this make sense to you, Eric? Could you apply the debug patch to catch some info and so we can confirm it. If the assumption is right, there is no method to fix the problem but to hide the SMBus, since ACPI and PCI don't know each other. Thanks, Shaohua
1) I think our analysis of the problem are now _about_ the same. And contrarily to what you said in previous comments, there is indeed an IO region overlapping conflict, but the real bug is providing access to SMBus via ACPI through fixed regions and not true the SBMbus IOBAR (grrr ASUS). I checked with XP SP2 and indeed on windows also there is no SMBus... 2) I would be surprised that the IOBAR is not initialized as we get the expected value when ACPI=off (see previous ioport dump attachement) and SMB is working fine. Could this be just possible luck? 3) Can we find a way to avoid reallocation of the IO space via supplemental quirks.c tricks? NB : this is only valuable if unhidding the SMBus gives us more functionnality but the sensor located on the SMBus is detected but not yet managed with current stock kernel (I loaded one by one each chips module). The lm90 modules detects the chips but says it is not managed (Vendor AXIM but chip_id not managed). I will try to add additionnals sensors chips present in lmsensors-2.8.7, 4) The SMBus is only one part of the ioports reserved by the function 3 device. What are other possible functionnality provided by this chip that could be usefull? (I guess some of you have access to relevant docs :-)) Trace to come. And anyway thanks a lot for your time spend on this problem. It was not wasted as we found other broken laptops (ASus but also others)...
Created attachment 3547 [details] dmesg with trace for PCI iospace allocation As expected the IOBAR is correctly set (which I was also certain as the correct IO range is allocated when ACPI=off). And indeed it gets relocated by PCI code breaking ACPI...
Here are the _extra_ printed information (full dmesg allready attached) SMBus IO base: e800 ... PCI: Cannot allocate resource region 4 of device 0000:00:1f.3 Re-assign resources for 0000:00:1f.3, 1000 - 101f So indeed it sound like an IO zone conflict allthough ACPI did clear the IORESSOURCE_BUSY flags...
Ok, thanks. Looks like ACPI motherboard.c reserved IO ports too earily, so it can't be reserved by PCI devices any more and PCI core allocats new IO ports for it. I will try to fix it. Thanks, David.
I guess the change that makes IO APIC/power management timer work on my laptop (great to know motherboard hardware design is not broken) has unexpected side effects then... I will test any patch you propose (provided it has not chance to destroy my personnal laptop...) -- eric
Created attachment 3548 [details] ioports with ISA and BIOS PNP As requested by email
Created attachment 3549 [details] proposed patch Well, the BIOS doesn't report the IO region in PNP, but it does in ACPI :(. The only workaround I can think of is to add another quirk - pre-reserve the IOBAR for SMBus. Eric, could you please test it? I didn't test the patch, so maybe you need change it a little if possible. Many thanks, Shaohua.
Created attachment 3550 [details] Correct fix for ASUS L3C I _corrected_ the patches for L3C and 2.6.8.1-mm4 1) The proposed patches does not apply (a lot of change occured in bk-pci.patch), 2) The device on which it was applied is wrong for this machine, 3) The size computation was wrong (1 byte IO missing), Anyway the idea WAS GOOD.
Created attachment 3551 [details] Ioports with fixed patch
Created attachment 3552 [details] lspci output The SMBus is there... Just need to find the correct sensor code now :-)
Great, Eric. I use base kernel and ICH4 PCI id, so failed. Maybe we should also fix HP compaq nc8000 case.
For fixing bug, I always prefer using -mm tree as it is closer to various bk tree. I would maintain something, my views would probably be different. So remaining : 1) Please mark your patch as incorrect (at least the IO zone size computation is wrong), 2) We indeed should fix L5C and nc8000, Problem is that I do not know what SMBus they contain. Will ask...
Comment on attachment 3549 [details] proposed patch The patch isn't correct, please refer to Eric's.
Created attachment 3553 [details] Patche updated to fix HP CompaQ nc8000 on HP compaq nc8000 we have PCI_DEVICE_ID_INTEL_82801DB_3, so here is an updated patche.
Created attachment 3554 [details] Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000 Checked that the patch works with L5C and nc8000 owners.
Eric, please note the number (0x20, 4) in 'asus_smbus_resources' is ICH specific. Does this apply for sis96x? I suppose they are different devices, so the IOBAR index is different. Thanks, Shaohua.
I should not watch the olympic games while coding :-( Of course this is plain wrong for SiS SMBus. The base address is 0x4 in PCI config space and I do not fully understand the ressource parameter index used in pci_claim_resource. Help appreciated. I think we should end up by coding a generic function with thoses two parameter as arguments and make device specific function that calls the generic one with the two parameter set. Will code that. What do you need to give me the correct resource index?
I think so, should provide a generic routine. Actually only one parameter is needed: addr = PCI_BASE_ADDRESS_0 + (index << 2); The index parameter presents which bar is the resource (6 BARs for PCI device). Please go ahead and clean it up. I haven't the material about SiS SMBus, but lspci -vv should help you find it out(it will display something like "Region 4: I/O ports at .. [size=..]"). Thanks, Shaohua.
Created attachment 3556 [details] Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000 Hope this one is the final version. NB : by luck the patch was indeed working foR the LC5 with SiS96x SMBus bridge because the IO BAR index is the same as on the i810 (I asked an L5C owner to test it before). Anyway this version is much more generic and open the door for yet unfixed or to come SMBus unhiding...
RE: "Patch to FIX SMBus unhiding for ASUS L3C, L5C, HP CompaQ nc8000" It's better that what's in the kernel now, but this is still wrong. I hit this problem on my nc6000. ACPI is well within it's right to claim this resource, it's not a "buggy BIOS" issue. Until we unhide this PCI function, the SMBus is completely hidden. ACPI has reported that the range is in use via the _CRS method on the motherboard node. At this point, ACPI firmware has exclusive access to the device, which is required for the Operation Region is uses to access it. By unhiding this device, and effectively stealing the resource, we're exporting the device to any kernel level driver that wants it. IMHO, the kernel has no right to take this device from ACPI and hand it off to a sensor driver. How would we deal with both AML and sensor driver poking ths SMBus controller at the same time? This is a firmware owned device, sorry sensors. I see two options: 1. Only unhide the device on systems where it's known not to be used or those without ACPI enabled. 2. Look for resource conflicts w/ ACPI an re-hide the device if ACPI claims ownership.
I still think this is a bug BIOS bug because of the way the instructions to access the sensor via a hardcoded address is given in the DTST. If access was coded indirect via offset in the IO base register we whould be able to reloccate the IO region without problems. Regarding the question of sharing a device by two different kernel entities, this one is more serious. The question is the benefit if some more information can be provided to userland. See <http://fobie.net/nc8000/#i2c> to see what I mean. Regarding Linus patch versus mine, Linus' one may have more side effect but also more benefit if we find this type of conflict for other things than the SMBus. I'm just curious greg this not mention this patch because he was in copy of the whole discussion. -- eric
Eric, the SMBus device is hidden for a reason. ACPI needs exclusive access to it. Once it's hidden, the OS shouldn't be able to move it, so firmware doesn't need to dynamically determine where it lives. Now you've exposed it and moved it, and all the assumptions firmware was able to make are broken. IHMO, hiding the SMBus controller to solve the exclusive access problem is a reasonable solution. Exposing it so we can dink with the controller, potentially getting it very confused and breaking ACPI thermal managment is NOT a reasonable solution. I really think the default should be to not expose hidden SMBus controllers when ACPI is present. The little bit of extra info the sensors are able to get by poking this interface is not worth the potential thermal problems that could result. This is dangerous.
Well, first I did _not_ unhide the bus. Someone else did. I just wanted to have my laptop functionnal again and, as writen in a previous mail, anyway, I have not found the correct sensor driver for this laptop so I could'nt care less about making the SMBus visible. BUT I _do_ care if my laptop becomes unusable (kacpid taking 100% of the cpu) which is still the case with 2.6.8.1-mm4 and 2.6.9-rc1. I will manually apply Linus fix to see if it really solves the problem when the SMBus drivers tries to get ownership of the IO region...
Eric, Sorry, for the impication. I should have more thoroughly read the url in your update. Please let me know if you still have kacpid issues with Linus' fix. My nc6000 is behaving nicely now, I'd expect the nc8000 to as well (as long as the sensor modules aren't loaded). I don't work in the laptop group, but I'd be happy to work with you if you're still seeing problems. Thanks, Alex
Linus' patch also fixes the problem: e400-e47f : 0000:00:1f.0 e400-e47f : motherboard e400-e403 : PM1a_EVT_BLK e404-e405 : PM1a_CNT_BLK e408-e40b : PM_TMR e410-e415 : ACPI CPU throttle e428-e42b : GPE0_BLK e42c-e42f : GPE1_BLK e800-e81f : motherboard e800-e81f : 0000:00:1f.3
A final fix for such kind of problems has been merged in 2.6.9-rc4. Close it.
Ouch. I've finally verified this bug has not been squashed completely. Steps to reproduce with 2.6.9-rc4 vanilla: 1. Do a full S3 suspend / resume cycle. 2. Do a full swsusp cycle (both platform and shutdown trigger the bug). After the machine resumes and a thermal event is triggered, the aforementioned mutex loop starts again.
Karol, did the ioports have conflict now? can you clarify current problem? Is the thermal problem or the SMbus problem? If it's the SMBus problem I guess the driver must set the config register to reenable the SMBus.
Well, it's not that easy as /proc/ioports doesn't show any change and I'm not using any sensor drivers at that point. Basically, it seems to me that the smbus' IO BAR is somehow reprogrammed on resume; why it happens only after S3 followed by S4 is beyond me. Anyway, since the OperationRegion SMB0 is hardcoded in the DSDT, when the BAR is reprogrammed any subsequent _L00 GPE makes kacpid spin at WTSB() (or at least it would seem so from my limited understanding).
As far as I can tell, the enable/disable bit of SMBus in ICH4 is in LPC bridge 0xF2. Current PCI code will not save/restore 0xF2. This causes the SMBus will be hided after S3 by BIOS. Possibly it's the reason. Could you send me the dmesg after s4?
Created attachment 3858 [details] requested dmesg, after S3 and swsusp Log attached. Additionally, some funnies in the PCI config space of the LPC chip: [after fresh boot] 00:1f.0 ISA bridge: Intel Corp. 82801CAM ISA Bridge (LPC) (rev 02) Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 0 00: 86 80 8c 24 0f 01 80 02 02 00 01 06 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40: 01 e4 00 00 10 00 00 00 00 00 00 00 00 00 00 00 50: 00 00 00 00 00 00 00 00 01 ec 00 00 10 00 00 00 60: 05 0b 0b 0b d0 00 00 00 80 80 80 80 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: ff 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: aa 03 00 00 00 00 00 00 0d 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 81 06 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 04 20 00 00 02 0f 00 00 04 00 00 00 00 00 00 00 e0: 10 00 00 80 00 00 0f 1c 33 22 00 00 00 00 67 45 f0: 0f 00 01 84 00 00 00 00 47 0f 0f 00 00 00 80 00 [after S3; the temperature still reads, kacpid doesn't spin] [...] f0: 0f 00 09 84 00 00 00 00 47 0f 0f 00 00 00 80 00 [after subsequent swsusp; kacpid spinning, temperature at -129
Thanks, Karol. The lspci and dmesg confirm my assumption. After S3, the SMbus is disabled by BIOS. After S4, the SMBus can be enabled, since S4 will re- invoke pci_fixup. A workaround is save/restore 1f.0's 0xf2 config register when sleep/wakeup. A final solution is Linux provides Bridge (P2P/LPC ...) Driver. Gerg, what's the plan to provide Linux PCI bridge driver? Could you please open a new track for the new issue? it's completly different from original problem.
No one has sent me a pci bus driver :)
I filed a tracker at #3609, please close this one.