Subject : 2.6.24-rc4 hwmon it87 probe fails Submitter : Mike Houston <mikeserv@bmts.com> References : http://lkml.org/lkml/2007/12/4/466
Patch is available: http://lkml.org/lkml/2007/12/9/186
I've shown Jean Delvare another system where it87 now fails and the proposed patch does not work.
At Jean's request I'll add the info about my system there (has been failing both in recent Fedora kernels as in -mm kernels)
Created attachment 14062 [details] GA-K8N-Ultra-9 lspci
Created attachment 14063 [details] GA-K8N-Ultra-9 system logs
Created attachment 14064 [details] GA-K8N-Ultra-9 ioports
According to khali > (13:42:32) khali: nim-nim: the faulty patch was backported to 2.6.23.10 :( So the regression is propagating
Gigabyte M56S-S3 w/ F3 BIOS also has the issue with 2.6.23.11. Undoing the changes from http://kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fincr%2Fpatch-2.6.23.9-10.bz2;z=34 works around the issue.
Created attachment 14132 [details] Gigabyte GA-965G-DS3 (F9) DSDT table as per request On Wednesday December 19 2007 07:45:14 pm Carlos Corbacho wrote: > On Thursday 20 December 2007 00:20:21 Bjorn Helgaas wrote: > > I suspect the manufacturers would say "Oh, the sensors? The BIOS > > isn't broken, you're just supposed to use WMI or some (undocumented) > > ACPI device to get at those." > > It's quite possible - can we have DSDTs for the boards in question so we > can quickly check if this is a possibility? (Basically, to see if they have > PNP0C14 devices - if they don't, then I'm afraid it's nothing to do with > WMI). > > -Carlos DSDT for my GA-965G-DS3 which is affected by this issue.
There is no ACPI-WMI mapper device (PNP0C14) on this board, so WMI is, at least in this case, not the solution to this bug.
Created attachment 14136 [details] GA-K8N-Ultra-9 DSDT Final F8 BIOS, F9 never went out of beta and is broken in many ways
Please attach the output of "dmidecode" for the affected motherboards.
Created attachment 14146 [details] GA-K8N-Ultra-9 dmidecode
Created attachment 14147 [details] dmidecode for Gigabyte M56S-S3 board /w F3 BIOS
Created attachment 14267 [details] it87 request only environment controller ports Here's the patch I propose to fix this problem. We haven't quite got a consensus that this is the right approach, but Mike did verify that it works for him: http://lkml.org/lkml/2007/12/21/197.
(In reply to comment #15) > Created an attachment (id=14267) [details] > it87 request only environment controller ports > > Here's the patch I propose to fix this problem. This patch works on my system
Works for 2.6.23.11?
(In reply to comment #16) > (In reply to comment #15) > > Created an attachment (id=14267) [details] [details] > > it87 request only environment controller ports > > > > Here's the patch I propose to fix this problem. > > This patch works on my system My system being 2.6.24-rc5-mm1 + GA-K8N-Ultra-9
For 2.6.23, you might as well upgrade to 2.6.23.12, where the "faulty" patch was reverted. (The patch itself is correct, it just happens to reveal a problem that was hidden so far.)
Hi Jean, Bjorn, In 2.6.24-rc7-git3 I expect that the it87 driver on these boards is still not loading. For Shaohua's workaround from comment #1 isn't applied, nor is Bjorn's workaround from comment #17 -- while the initial PNP patch that sparked this is still present (as it must be). (Indeed, one could argue that the PNP fix should again be pushed into 2.6.23.stable, since it fixes far scarier potential system failures than the it87 driver failing to load. But with side effects such as this, any such change perhaps may exceed .stable's risk tolerance...) I've read through the thread and I concur with the things that Bjorn wrote about how ACPI works. It is likely that the BIOS is reserving this device for its own use, and unlikely that the BIOS will ever declare a device for the purpose of the OS to bind a native driver to this hardware. While it is never a good bet to assume a BIOS writer is doing something correctly, one could argue that by not loading it87 on these boards, we are obeying what the BIOS writer asked us to do. Do you think that this problem is widespread and will effect virtually all boards where it87 used to work, or do you think it is a small subset? Does anybody know if a native driver for it87 exists on Windows and if it loads on the systems at hand?
Interesting idea but somewhat unacceptable. When running Linux I'd like to be ale to use all hardware on my board. I contacted gigabyte over this matter via http://ggts.gigabyte.com.tw/. I got a beta bios F4a for my board. Will test with that version.
In all BIOS affected, the I/O port range(s) declared for the IT87xxF device are plain wrong, and that's why the it87 driver fails loading. It doesn't have anything to do with the BIOS reserving the device for it's own use. It is very frequent that motherboard BIOS declare an I/O area for the hardware monitoring chip, and in my experience it doesn't correlate with the BIOS making use of the device in question. There are many boards out there (probably the majority) where the I/O area of the hardware monitoring chip is declared in the BIOS but the BIOS doesn't make any use of it (outside of the BIOS setup screen, that is.) I don't expect the problem at hand to be widespread. It only affects motherboards where the BIOS declares a wrong I/O area for the IT87xxF chip. As far as I know, only 3 models are affected, all by Gigabyte: K8N Ultra-9, 965G-DS3 and M56S-S3. I have one board using the it87 driver here, incidentally a Gigabyte one as well, and it works just fine. The IT87xxF chips are very popular and we would probably know by know if (many) additional models were affected. I fail to see how Windows monitoring applications would matter here. I have no idea how I/O resources are managed on Windows, if they are at all.
(In reply to comment #15) > Created an attachment (id=14267) [details] > it87 request only environment controller ports I don't like this patch much. While it is fine to only request_region() the ports the driver really uses (i.e. 0x295-0x296), the platform device resource is supposed to match the ports that the chip actually decode, to let the user know that they should not attempt to use these ports. Despite what is written on the various IT87xxF datasheets, these chips do decode the full 0x290-0x297 I/O range (for older ones) or at least 0x294-0x297 (for recent incarnations), and not just 0x295-0x296. Your proposed patch will make it look like the surrounding ports are free, while they are not. I don't think this is right, but OTOH I have to admit that it is unlikely that users will attempt to make use of the I/O ports in question, so in practice the badness should be limited. At least it works around the problem at hand, and while not the way it should have been, it has the merit of being relatively simple and not too intrusive. In my opinion the best fix would be quirks that fix or discard the broken I/O port range declarations made by the BIOS of the affected motherboards. However I don't have the time (nor the knowledge) to do this myself, so if nobody is going to do it, I guess that we have to take your patch for the time being. But it should be updated to clearly document that the driver now declares less ports that the chip actually decodes.
> There are many boards out there (probably the majority) > where the I/O area of the hardware monitoring chip is > declared in the BIOS but the BIOS doesn't make any use of it > (outside of the BIOS setup screen, that is.) Yes, I'd believe that. This is why I asked if a native hw monitoring driver/application works on these boards in Windows. Per the comments on the list, Windows is seeing the reservations the same way Linux is. (Though who knows if those reservations are honored by a platform specific driver -- probably not since a platform driver knows all) > I don't expect the problem at hand to be widespread. good, then maybe a DMI based BIOS workaround is viable. However, I agree with Bjorn that "pnpacpi=off" would be hitting this problem with too big a stick -- even if limited to a finite list of boards. If Bjorn's driver patch even w/o DMI doesn't break anything, maybe that is the most pragmatic way to go?
The Gigabyte F4a beta bios for M56S-S3 board does not fix the issue for a 2.6.23.11 kernel so I had to revert the patch I mentioned above again. I again referenced Gigabyte support to this discussion. Please do the same if you see this problem in your setup.
(In reply to comment #24) > good, then maybe a DMI based BIOS workaround is viable. > However, I agree with Bjorn that "pnpacpi=off" would > be hitting this problem with too big a stick -- even > if limited to a finite list of boards. If Bjorn's driver > patch even w/o DMI doesn't break anything, maybe that > is the most pragmatic way to go? I agree that it is less correct than a DMI-based BIOS workaround but also more pragmatic, so let's just do that for now. The DMI-based BIOS workaround can always be implemented later if someone finds the time for that or if additional issues require it.
(In reply to comment #25) > The Gigabyte F4a beta bios for M56S-S3 board does not fix the issue for a > 2.6.23.11 kernel so I had to revert the patch I mentioned above again. > I again referenced Gigabyte support to this discussion. > Please do the same if you see this problem in your setup. I probably won't bother, when I contacted them ~ 6 months ago to fix the CK804 HPET declaration, they started by denying HPET existed, before admitting the board was not recent enough for them to expend any more work on it. So now that's hpet=force on the boot line for me
(In reply to comment #22) > I fail to see how Windows monitoring applications would matter here. I have > no > idea how I/O resources are managed on Windows, if they are at all. Most motherboards manufacturers have their own overclocking/temp monitoring GUI applet. Since it's vendor-specific I suppose it can embark its own mobo/resource table, and ignore whatever the BIOS says. Having the BIOS reserve resources would have the bonus side-effect of preventing any third-party generic tool from running, and force overclockers to use the vendor-approved tool.
Where are we with this problem? (http://bugzilla.kernel.org/show_bug.cgi?id=9514) I think (correct me if I'm misremembering), we started reserving more motherboard resources, and then we started seeing conflicts between some of those resources and something it87 needs. We can't fix this by reserving fewer motherboard resources. We really want to reserve *all* the motherboard resources to prevent conflicts. We could fiddle with the PNP system driver to make it ignore resources that overlap other resources (http://lkml.org/lkml/2007/12/9/186). The system device has 0x290-0x29f and 0x290-0x294 ranges, and this patch ignores the second. I'm reluctant to do this because it just seems like a hack in the system driver. Also, Mike Houston found that Windows lists both, and I think we should do the same (http://lkml.org/lkml/2007/12/9/170). The second option I see is to use my patch (http://bugzilla.kernel.org/attachment.cgi?id=14267&action=view) to make it87 request only the ports it uses. Jean rightly believes a platform device should reflect all the ports a chip decodes, and my patch goes the other direction. But in an ACPI system, the BIOS has the responsibility of listing all the address space that is in use, so I don't think we really should *have* platform devices unless they come from ACPI. Since we don't know how to get the it87 functionality the "correct" way (i.e., via some ACPI device), we have to kludge things a bit, and I think a reasonable start is to rely on ACPI to tell us what address space is in use and change the it87 driver to request only the ports it uses. Whatever we do, it'd be nice to have a fix in 2.6.24, and I think my patch is the least evil for now. Possibly the situation could actually be improved slightly by removing the platform device stuff from it87 altogether, at least for ACPI systems, though I think this is a post-2.6.24 question. Jean, you own the it87 driver, so do you want to chime in? Any other possibilities for a 2.6.24 fix?
Reply-To: mikeserv@bmts.com On Thu, 17 Jan 2008 13:38:57 -0700 Bjorn Helgaas <bjorn.helgaas@hp.com> wrote: > Where are we with this problem? > (http://bugzilla.kernel.org/show_bug.cgi?id=9514) > The second option I see is to use my patch > (http://bugzilla.kernel.org/attachment.cgi?id=14267&action=view) > to make it87 request only the ports it uses. > Whatever we do, it'd be nice to have a fix in 2.6.24, and I think my > patch is the least evil for now. From my perspective it sure seems to be the least evil, because it only touches what needs to be touched for my situation. I of course can't say how it would affect others. I'm still using your patch (on 2.6.24-rc8 now) but I'll test whatever other solutions you folks may come up with. Mike Houston
it would be great if the resource manager can automatically merge the reserved resources when we do it. But I agree we can use Bjorn's patch for now.
Bjorn, I agree with your analysis in comment #29. Except that I do not own the it87 driver. I am not maintaining this driver specifically and I am no longer the hwmon subsystem maintainer either. It's really up to Mark M. Hoffman to decide whether he takes your patch or not. Me, all I can say is that I am OK if he does.
I've forwarded Bjorn's patch to Linus - "least evil" is just about right. Consumer-grade mainboards have crappy BIOS, news at 11. There is no standard support for hardware monitoring built into Windows; thus the BIOS writers have no incentive to implement that correctly. So Bjorn: I have to disagree with you about one thing - the BIOS is emphatically *not* to be trusted here (again, at least with consumer-grade hardware). Nicolas' guess in comment #28 is spot on: I've stepped through some of those vendor-supplied apps w/ a debugger and that's exactly how they work.
Appeared in the mainline as commit 87b4b6634ac112ddfe7b92aae50eb4bf7b128d1a
Gigabyte replied: Our bios engineer has open the resource for 0X290-0x29F reserved for I/O. (and attached a test BIOS) Would this be enough?
With the new testbios and 2.6.23.11 I get: (...) 0170-0177 : 0000:00:09.0 01f0-01f7 : 0000:00:09.0 01f0-01f7 : ide0 0290-0297 : it87 0290-0294 : pnp 00:02 0295-0296 : pnp 00:02 0376-0376 : 0000:00:09.0 0378-037a : parport0 (...) Is this better? On 2.6.24 I get: 0170-0177 : 0000:00:09.0 01f0-01f7 : 0000:00:09.0 01f0-01f7 : ide0 0290-0294 : pnp 00:02 0295-0296 : it87 0295-0296 : pnp 00:02 0295-0296 : it87 0376-0376 : 0000:00:09.0 0378-037a : parport0 2.6.24 works.
Udo, I'm not sure what exactly Gigabyte changed (I don't know what the PNP resources were like with the previous BIOS) but it still doesn't look correct to me: they are declaring a resource at 0290-0294 while it doesn't make sense for the IT87xxF. That being said, these declarations would at least make it possible to declare 0290-0297 as the IT87xxF device resource and only requests ports 0295-0296 when the driver attaches. That would be a mix between the 2.6.23 and 2.6.24 variants of the it87 driver, that makes IMHO more sense than what we did in 2.6.24. But OTOH there are at least 2 other boards out there that don't have a newer BIOS available. Until Gigabyte release a public BIOS update for each affected board, we have to leave the it87 driver as it is now.
Gigabyte wrote me: our bios open only the resource for 0295 to 0296 not 0X290-0x29F. They cannot open reserve 0X290-0X29, this will cause the system to run unstable. According to bios engineer, the best thing you can do is to open the resource for IT87 is 295 to 0296. Whereas according to your information the problem can be solve. On 2.6.24 I get: 0170-0177 : 0000:00:09.0 01f0-01f7 : 0000:00:09.0 01f0-01f7 : ide0 0290-0294 : pnp 00:02 0295-0296 : it87 0295-0296 : pnp 00:02 0295-0296 : it87 0376-0376 : 0000:00:09.0 0378-037a : parport0 Any comments? http://ggts.gigabyte.com for your direct communication about this issue.
The BIOS indeed doesn't reserve 0x290-0x29f, but it reserves 0x290-0x294, with no good reason as far as I can see. "This will cause the system to run unstable" is another way to say that they have no idea what they are talking about and did not understand your request at all. At this point it is probably better to just give up on them and take good note that Gigabyte support is responsive but clueless.