Hi probably it's a BIOS problem of my laptop and not a kernel bug, I've googled a lot without finding anything similar. Anyway, having identified the patch which introduces the problem, I'm posting here to make this info public. Most recent kernel where this bug did not occur: 2.6.15 Distribution: Gentoo Hardware Environment: Toshiba Tecra M2 laptop # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 13 model name : Intel(R) Pentium(R) M processor 2.00GHz stepping : 6 cpu MHz : 600.000 cache size : 2048 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss tm pbe up est tm2 bogomips : 1198.08 # lspci 00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 21) 00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 21) 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03) 00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 03) 00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 03) 00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 03) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 83) 00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 03) 00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 03) 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03) 00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 03) 00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 03) 01:00.0 VGA compatible controller: nVidia Corporation NV34M [GeForce FX Go5200 32M/64M] (rev a1) 02:05.0 Network controller: Intel Corporation PRO/Wireless 2200BG Network Connection (rev 05) 02:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:09.0 Ethernet controller: Intel Corporation 82540EP Gigabit Ethernet Controller (LOM) (rev 03) 02:0b.0 CardBus bridge: Toshiba America Info Systems ToPIC100 PCI to Cardbus Bridge with ZV Support (rev 32) 02:0b.1 CardBus bridge: Toshiba America Info Systems ToPIC100 PCI to Cardbus Bridge with ZV Support (rev 32) 02:0d.0 System peripheral: Toshiba America Info Systems SD TypA Controller (rev 03) # sensors adm1032-i2c-0-4c Adapter: SMBus I801 adapter at d880 Software Environment: lm_sensors-2.10.0 Problem Description: Unable to resume after power down: pressing the power button will turn the power led on but the laptop doesn't start. I have to keep the power button pressed for 5 seconds to power off and then press it once again to restart. When rebooting the power goes off after showing a blinking cursor for some second, and before the GRUB screen is showed. Same procedure is required to restart. When suspending to RAM I get a "failure to resume" message from BIOS. Same procedure is required to restart. Steps to reproduce: load the lm90 kernel module in 2.6.16 or 2.6.17-rc1 and halt or reboot. Using git-bisect (great feature) I've been able to identify the offending patch: [PATCH] i2c: i2c-i801 explicitly enables/disables PEC This patch tweaks i2c-i801.c so that the driver always sets the SMBAUXCTL register (which enables/disables PEC) explicitly before each transaction. Signed-off-by: Mark M. Hoffman <mhoffman at lightlink.com> Signed-off-by: Jean Delvare <khali at linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh at suse.de> --- commit 2e3e13f8e9d9b2111404cdccaa4e1b988b70acce tree de95ee215c2189cbfb98829e32e7fb117c94a160 parent 46f25dffbaba48c571d75f5f574f31978287b8d2 author Mark M. Hoffman <mhoffman at lightlink.com> Sun, 06 Nov 2005 23:04:51 +0100 committer Greg Kroah-Hartman <gregkh at suse.de> Thu, 05 Jan 2006 22:16:20 -0800 drivers/i2c/busses/i2c-i801.c | 6 +----- 1 files changed, 1 insertions(+), 5 deletions(-) diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index ac3eafa..1c752dd 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -468,8 +468,7 @@ static s32 i801_access(struct i2c_adapte return -1; } - if (hwpec) - outb_p(1, SMBAUXCTL); /* enable hardware PEC */ + outb_p(hwpec, SMBAUXCTL); /* enable/disable hardware PEC */ if(block) ret = i801_block_transaction(data, read_write, size, hwpec); @@ -478,9 +477,6 @@ static s32 i801_access(struct i2c_adapte ret = i801_transaction(); } - if (hwpec) - outb_p(0, SMBAUXCTL); /* disable hardware PEC */ - if(block) return ret; if(ret) If I restore these two lines if (hwpec) outb_p(0, SMBAUXCTL); /* disable hardware PEC */ the problem disappears. If I disable HW PEC via sysfs after loading the sensors modules and I run the sensors program at least once reboot is ok too. Let me know if I've to post more info. Thanks in advance.
Thanks for the very complete bug report and the detailed analysis. So it looks like your system won't resume properly if the i801 PEC bit was set when suspend happened. Please undo your own change and try the following patch instead. I don't use suspend myself, and my ADM1032 chip is not on my i801 adapter anyway, so I can't test it. I don't know much about suspend and resume also so this is a first try, which may or may not work. Please let me know.
Created attachment 7882 [details] Fix i2c-i801 resume when PEC is enabled
Created attachment 7883 [details] Fix i2c-i801 suspend and shutdown when PEC is enabled Thank you very much for the ready answer. Your patch works correctly for the suspend/resume case, and I've elaborated on it to cover shutdown and reboot. I'm not an hacker, so I hope I've used the correct hooks in pci_driver struct: I've added similar code in the shutdown and in the remove functions to cover both the built-in and the module configurations. Tested on 2.6.17-rc1, it works both in kernel and as a module. Thank you again for your help.
Daniele, maybe you can still report the problem to Toshiba? They really should fix their BIOS code.
Created attachment 7896 [details] Clear PEC bit after every transaction After some discussion, this more simple fix seems to be prefered. It should work just as fine, and is also more robust with regards to unclean reboots. Please test.
Hi Jean that's exactly the first patch I tried, and it worked. I'll test it again and I'll let you know. I'm also trying to find how to report the problem to Toshiba, but there's not an obvious way to do it. Thanks again Daniele
I confirm that tha last patch works, tested on 2.6.17-rc1
I've tested this patch and it solves the identical problem I had on my Toshiba Satellite A40. But more importantly, it also solves the problem I've been having since 2.6.16 that my fan no longer started automatically when the processor heats up, allowing it to overheat dangerously. See http://bugzilla.kernel.org/show_bug.cgi?id=6315. This patch magically restores fan/temperature control to what I was used to with 2.6.15. Please push this patch through for both 2.6.17 and 2.6.16.
Frans, does the Tecra M2 have automatic fan speed regulation, as Frans described for the Satellite A40?
Hi Jean I suppose you're asking to me. Yes, also Tecra M2 has auto fan control. In fact I've noticed some strange fan behaviour, but in my case the fan didn't turn off at low temps, so I didn't worry too much. It also seems that this doesn't happen anymore after patching.
Oops, yes that was a question for you, Daniele. So this explains why the SMBus was hidden on both laptops. Toshiba seem to know their business. I think we will have to drop both quirks and leave the SMBus hidden again on these laptops. I'm sorry about that, but the current situation is unsafe, and we don't want users to burn their hardware.
Jean, the users too don't want to fry their hardware -). I'm sad because the quirk for Tecra M2 was my first and only patch to linux kernel, but I realize it probably should be dropped until a safe solution is found. I've searched a lot without finding any documentation, do you think that you, as a kernel developer, may ask some clarification to Toshiba? Cheers Daniele
No, I have no technical contact at Toshiba. I'll not remove the quirks right away. Let's take some time to discuss the alternatives and try a few hacks first. Maybe the ACPI folks with have an idea.
As was discussed in bug #6315, it is now clear that on these Toshiba laptops (Satellite A40 and Tecra M2) the thermal management is done by SMM code, which can access the SMBus at any time. Thus the only safe option is to remove the quirk that was unhiding the SMBus. I am sorry about that, I understand that you liked that quirk and I agree that the risk that something actually goes wrong with it is thin, but still this is a risk I am not willing to take.