Bug 9514 - 2.6.24-rc4 hwmon it87 probe fails
Summary: 2.6.24-rc4 hwmon it87 probe fails
Status: CLOSED CODE_FIX
Alias: None
Product: Drivers
Classification: Unclassified
Component: PNP (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Mark M. Hoffman
URL:
Keywords:
Depends on:
Blocks: 9243
  Show dependency tree
 
Reported: 2007-12-06 17:10 UTC by Rafael J. Wysocki
Modified: 2008-02-25 00:17 UTC (History)
13 users (show)

See Also:
Kernel Version: 2.6.24-rc4
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
GA-K8N-Ultra-9 lspci (26.95 KB, text/plain)
2007-12-16 05:18 UTC, Nicolas Mailhot
Details
GA-K8N-Ultra-9 system logs (35.26 KB, text/plain)
2007-12-16 05:19 UTC, Nicolas Mailhot
Details
GA-K8N-Ultra-9 ioports (1.81 KB, text/plain)
2007-12-16 05:20 UTC, Nicolas Mailhot
Details
Gigabyte GA-965G-DS3 (F9) DSDT table as per request (20.53 KB, application/octet-stream)
2007-12-19 18:09 UTC, Elvis Pranskevichus
Details
GA-K8N-Ultra-9 DSDT (19.06 KB, application/octet-stream)
2007-12-20 14:30 UTC, Nicolas Mailhot
Details
GA-K8N-Ultra-9 dmidecode (13.59 KB, text/plain)
2007-12-22 03:50 UTC, Nicolas Mailhot
Details
dmidecode for Gigabyte M56S-S3 board /w F3 BIOS (15.44 KB, text/plain)
2007-12-22 04:06 UTC, udo
Details
it87 request only environment controller ports (3.89 KB, patch)
2008-01-02 12:30 UTC, Bjorn Helgaas
Details | Diff

Description Rafael J. Wysocki 2007-12-06 17:10:51 UTC
Subject         : 2.6.24-rc4 hwmon it87 probe fails
Submitter       : Mike Houston <mikeserv@bmts.com>
References      : http://lkml.org/lkml/2007/12/4/466
Comment 1 Rafael J. Wysocki 2007-12-11 16:25:48 UTC
Patch is available: http://lkml.org/lkml/2007/12/9/186
Comment 2 Nicolas Mailhot 2007-12-13 07:23:21 UTC
I've shown Jean Delvare another system where it87 now fails and the proposed patch does not work.
Comment 3 Nicolas Mailhot 2007-12-16 05:10:09 UTC
At Jean's request I'll add the info about my system there (has been failing both in recent Fedora kernels as in -mm kernels)
Comment 4 Nicolas Mailhot 2007-12-16 05:18:44 UTC
Created attachment 14062 [details]
GA-K8N-Ultra-9 lspci
Comment 5 Nicolas Mailhot 2007-12-16 05:19:17 UTC
Created attachment 14063 [details]
GA-K8N-Ultra-9 system logs
Comment 6 Nicolas Mailhot 2007-12-16 05:20:03 UTC
Created attachment 14064 [details]
GA-K8N-Ultra-9 ioports
Comment 7 Nicolas Mailhot 2007-12-16 05:26:46 UTC
According to khali

> (13:42:32) khali: nim-nim: the faulty patch was backported to 2.6.23.10 :(

So the regression is propagating
Comment 8 udo 2007-12-16 05:38:31 UTC
Gigabyte M56S-S3 w/ F3 BIOS also has the issue with 2.6.23.11.
Undoing the changes from http://kernel.org/diff/diffview.cgi?file=%2Fpub%2Flinux%2Fkernel%2Fv2.6%2Fincr%2Fpatch-2.6.23.9-10.bz2;z=34 works around the issue.
Comment 9 Elvis Pranskevichus 2007-12-19 18:09:11 UTC
Created attachment 14132 [details]
Gigabyte GA-965G-DS3 (F9) DSDT table as per request

On Wednesday December 19 2007 07:45:14 pm Carlos Corbacho wrote:
> On Thursday 20 December 2007 00:20:21 Bjorn Helgaas wrote:
> > I suspect the manufacturers would say "Oh, the sensors?  The BIOS
> > isn't broken, you're just supposed to use WMI or some (undocumented)
> > ACPI device to get at those."
>
> It's quite possible - can we have DSDTs for the boards in question so we
> can quickly check if this is a possibility? (Basically, to see if they have
> PNP0C14 devices - if they don't, then I'm afraid it's nothing to do with
> WMI).
>
> -Carlos

DSDT for my GA-965G-DS3 which is affected by this issue.
Comment 10 Carlos Corbacho 2007-12-19 18:17:43 UTC
There is no ACPI-WMI mapper device (PNP0C14) on this board, so WMI is, at least in this case, not the solution to this bug.
Comment 11 Nicolas Mailhot 2007-12-20 14:30:47 UTC
Created attachment 14136 [details]
GA-K8N-Ultra-9 DSDT

Final F8 BIOS, F9 never went out of beta and is broken in many ways
Comment 12 Jean Delvare 2007-12-22 03:08:41 UTC
Please attach the output of "dmidecode" for the affected motherboards.
Comment 13 Nicolas Mailhot 2007-12-22 03:50:51 UTC
Created attachment 14146 [details]
GA-K8N-Ultra-9 dmidecode
Comment 14 udo 2007-12-22 04:06:45 UTC
Created attachment 14147 [details]
dmidecode for Gigabyte M56S-S3 board /w F3 BIOS
Comment 15 Bjorn Helgaas 2008-01-02 12:30:08 UTC
Created attachment 14267 [details]
it87 request only environment controller ports

Here's the patch I propose to fix this problem.  We haven't quite got a consensus that this is the right approach, but Mike did verify that it works for him: http://lkml.org/lkml/2007/12/21/197.
Comment 16 Nicolas Mailhot 2008-01-02 14:53:09 UTC
(In reply to comment #15)
> Created an attachment (id=14267) [details]
> it87 request only environment controller ports
> 
> Here's the patch I propose to fix this problem.

This patch works on my system
Comment 17 udo 2008-01-03 08:45:43 UTC
Works for 2.6.23.11?
Comment 18 Nicolas Mailhot 2008-01-03 08:48:35 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > Created an attachment (id=14267) [details] [details]
> > it87 request only environment controller ports
> > 
> > Here's the patch I propose to fix this problem.
> 
> This patch works on my system

My system being 2.6.24-rc5-mm1 + GA-K8N-Ultra-9
Comment 19 Jean Delvare 2008-01-03 08:51:49 UTC
For 2.6.23, you might as well upgrade to 2.6.23.12, where the "faulty" patch was reverted.

(The patch itself is correct, it just happens to reveal a problem that was hidden so far.)
Comment 20 Len Brown 2008-01-11 20:09:01 UTC
Hi Jean, Bjorn,

In 2.6.24-rc7-git3 I expect that the it87 driver on these boards
is still not loading.  For Shaohua's workaround from comment #1 isn't
applied, nor is Bjorn's workaround from comment #17 -- while
the initial PNP patch that sparked this is still present (as it must be).
(Indeed, one could argue that the PNP fix should again be pushed
 into 2.6.23.stable, since it fixes far scarier potential system failures
 than the it87 driver failing to load.  But with side effects such
 as this, any such change perhaps may exceed .stable's risk tolerance...)

I've read through the thread and I concur with the things
that Bjorn wrote about how ACPI works.  It is likely that the
BIOS is reserving this device for its own use, and unlikely
that the BIOS will ever declare a device for the purpose of the OS
to bind a native driver to this hardware.  While it is never a good bet
to assume a BIOS writer is doing something correctly, one could
argue that by not loading it87 on these boards, we are obeying
what the BIOS writer asked us to do.

Do you think that this problem is widespread and will effect
virtually all boards where it87 used to work, or do you think
it is a small subset?  Does anybody know if a native driver for it87
exists on Windows and if it loads on the systems at hand?
Comment 21 udo 2008-01-12 00:02:11 UTC
Interesting idea but somewhat unacceptable.
When running Linux I'd like to be ale to use all hardware on my board.

I contacted gigabyte over this matter via http://ggts.gigabyte.com.tw/.
I got a beta bios F4a for my board.
Will test with that version.
Comment 22 Jean Delvare 2008-01-12 00:14:26 UTC
In all BIOS affected, the I/O port range(s) declared for the IT87xxF device are plain wrong, and that's why the it87 driver fails loading. It doesn't have anything to do with the BIOS reserving the device for it's own use.

It is very frequent that motherboard BIOS declare an I/O area for the hardware monitoring chip, and in my experience it doesn't correlate with the BIOS making use of the device in question. There are many boards out there (probably the majority) where the I/O area of the hardware monitoring chip is declared in the BIOS but the BIOS doesn't make any use of it (outside of the BIOS setup screen, that is.)

I don't expect the problem at hand to be widespread. It only affects motherboards where the BIOS declares a wrong I/O area for the IT87xxF chip. As far as I know, only 3 models are affected, all by Gigabyte: K8N Ultra-9, 965G-DS3 and M56S-S3. I have one board using the it87 driver here, incidentally a Gigabyte one as well, and it works just fine. The IT87xxF chips are very popular and we would probably know by know if (many) additional models were affected.

I fail to see how Windows monitoring applications would matter here. I have no idea how I/O resources are managed on Windows, if they are at all.
Comment 23 Jean Delvare 2008-01-12 00:38:54 UTC
(In reply to comment #15)
> Created an attachment (id=14267) [details]
> it87 request only environment controller ports

I don't like this patch much. While it is fine to only request_region() the ports the driver really uses (i.e. 0x295-0x296), the platform device resource is supposed to match the ports that the chip actually decode, to let the user know that they should not attempt to use these ports. Despite what is written on the various IT87xxF datasheets, these chips do decode the full 0x290-0x297 I/O range (for older ones) or at least 0x294-0x297 (for recent incarnations), and not just 0x295-0x296.

Your proposed patch will make it look like the surrounding ports are free, while they are not. I don't think this is right, but OTOH I have to admit that it is unlikely that users will attempt to make use of the I/O ports in question, so in practice the badness should be limited. At least it works around the problem at hand, and while not the way it should have been, it has the merit of being relatively simple and not too intrusive.

In my opinion the best fix would be quirks that fix or discard the broken I/O port range declarations made by the BIOS of the affected motherboards. However I don't have the time (nor the knowledge) to do this myself, so if nobody is going to do it, I guess that we have to take your patch for the time being. But it should be updated to clearly document that the driver now declares less ports that the chip actually decodes.
Comment 24 Len Brown 2008-01-12 00:58:10 UTC
> There are many boards out there (probably the majority)
> where the I/O area of the hardware monitoring chip is
> declared in the BIOS but the BIOS doesn't make any use of it
> (outside of the BIOS setup screen, that is.)

Yes, I'd believe that.
This is why I asked if a native hw monitoring driver/application
works on these boards in Windows.  Per the comments on the
list, Windows is seeing the reservations the same way Linux is.
(Though who knows if those reservations are honored by a platform
 specific driver -- probably not since a platform driver knows all)

> I don't expect the problem at hand to be widespread.

good, then maybe a DMI based BIOS workaround is viable.
However, I agree with Bjorn that "pnpacpi=off" would
be hitting this problem with too big a stick -- even
if limited to a finite list of boards.  If Bjorn's driver
patch even w/o DMI doesn't break anything, maybe that
is the most pragmatic way to go?
Comment 25 udo 2008-01-12 01:21:27 UTC
The Gigabyte F4a beta bios for M56S-S3 board does not fix the issue for a 2.6.23.11 kernel so I had to revert the patch I mentioned above again.
I again referenced Gigabyte support to this discussion.
Please do the same if you see this problem in your setup.
Comment 26 Jean Delvare 2008-01-12 02:00:27 UTC
(In reply to comment #24)
> good, then maybe a DMI based BIOS workaround is viable.
> However, I agree with Bjorn that "pnpacpi=off" would
> be hitting this problem with too big a stick -- even
> if limited to a finite list of boards.  If Bjorn's driver
> patch even w/o DMI doesn't break anything, maybe that
> is the most pragmatic way to go?

I agree that it is less correct than a DMI-based BIOS workaround but also more pragmatic, so let's just do that for now. The DMI-based BIOS workaround can always be implemented later if someone finds the time for that or if additional issues require it.
Comment 27 Nicolas Mailhot 2008-01-12 02:25:54 UTC
(In reply to comment #25)
> The Gigabyte F4a beta bios for M56S-S3 board does not fix the issue for a
> 2.6.23.11 kernel so I had to revert the patch I mentioned above again.
> I again referenced Gigabyte support to this discussion.
> Please do the same if you see this problem in your setup.

I probably won't bother, when I contacted them ~ 6 months ago to fix the CK804 HPET declaration, they started by denying HPET existed, before admitting the board was not recent enough for them to expend any more work on it. So now that's hpet=force on the boot line for me
Comment 28 Nicolas Mailhot 2008-01-12 02:33:07 UTC
(In reply to comment #22)

> I fail to see how Windows monitoring applications would matter here. I have
> no
> idea how I/O resources are managed on Windows, if they are at all.

Most motherboards manufacturers have their own overclocking/temp monitoring GUI applet. Since it's vendor-specific I suppose it can embark its own mobo/resource table, and ignore whatever the BIOS says. Having the BIOS reserve resources would have the bonus side-effect of preventing any third-party generic tool from running, and force overclockers to use the vendor-approved tool.
Comment 29 Bjorn Helgaas 2008-01-17 12:38:41 UTC
Where are we with this problem?
(http://bugzilla.kernel.org/show_bug.cgi?id=9514)

I think (correct me if I'm misremembering), we started reserving more
motherboard resources, and then we started seeing conflicts between
some of those resources and something it87 needs.

We can't fix this by reserving fewer motherboard resources.  We
really want to reserve *all* the motherboard resources to prevent
conflicts.

We could fiddle with the PNP system driver to make it ignore
resources that overlap other resources (http://lkml.org/lkml/2007/12/9/186).
The system device has 0x290-0x29f and 0x290-0x294 ranges, and this
patch ignores the second.  I'm reluctant to do this because it just
seems like a hack in the system driver.  Also, Mike Houston found
that Windows lists both, and I think we should do the same
(http://lkml.org/lkml/2007/12/9/170).

The second option I see is to use my patch
(http://bugzilla.kernel.org/attachment.cgi?id=14267&action=view)
to make it87 request only the ports it uses.

Jean rightly believes a platform device should reflect all the ports
a chip decodes, and my patch goes the other direction.  But in an
ACPI system, the BIOS has the responsibility of listing all the address
space that is in use, so I don't think we really should *have* platform
devices unless they come from ACPI.  Since we don't know how to get the
it87 functionality the "correct" way (i.e., via some ACPI device),
we have to kludge things a bit, and I think a reasonable start is to
rely on ACPI to tell us what address space is in use and change the
it87 driver to request only the ports it uses.

Whatever we do, it'd be nice to have a fix in 2.6.24, and I think my
patch is the least evil for now.

Possibly the situation could actually be improved slightly by removing
the platform device stuff from it87 altogether, at least for ACPI
systems, though I think this is a post-2.6.24 question.

Jean, you own the it87 driver, so do you want to chime in?  Any other
possibilities for a 2.6.24 fix?
Comment 30 Anonymous Emailer 2008-01-17 14:00:15 UTC
Reply-To: mikeserv@bmts.com

On Thu, 17 Jan 2008 13:38:57 -0700
Bjorn Helgaas <bjorn.helgaas@hp.com> wrote:

> Where are we with this problem?
> (http://bugzilla.kernel.org/show_bug.cgi?id=9514)

> The second option I see is to use my patch
> (http://bugzilla.kernel.org/attachment.cgi?id=14267&action=view)
> to make it87 request only the ports it uses.

> Whatever we do, it'd be nice to have a fix in 2.6.24, and I think my
> patch is the least evil for now.

From my perspective it sure seems to be the least evil, because it
only touches what needs to be touched for my situation. I of course
can't say how it would affect others.

I'm still using your patch (on 2.6.24-rc8 now) but I'll test whatever
other solutions you folks may come up with.

Mike Houston
Comment 31 Shaohua 2008-01-17 17:19:20 UTC
it would be great if the resource manager can automatically merge the reserved resources when we do it. But I agree we can use Bjorn's patch for now.
Comment 32 Jean Delvare 2008-01-18 13:31:14 UTC
Bjorn, I agree with your analysis in comment #29. Except that I do not own the it87 driver. I am not maintaining this driver specifically and I am no longer the hwmon subsystem maintainer either. It's really up to Mark M. Hoffman to decide whether he takes your patch or not. Me, all I can say is that I am OK if he does.
Comment 33 Mark M. Hoffman 2008-01-22 05:07:39 UTC
I've forwarded Bjorn's patch to Linus - "least evil" is just about right.

Consumer-grade mainboards have crappy BIOS, news at 11.  There is no standard support for hardware monitoring built into Windows; thus the BIOS writers have no incentive to implement that correctly.  So Bjorn: I have to disagree with you about one thing - the BIOS is emphatically *not* to be trusted here (again, at least with consumer-grade hardware).  Nicolas' guess in comment #28 is spot on: I've stepped through some of those vendor-supplied apps w/ a debugger and that's exactly how they work.
Comment 34 Rafael J. Wysocki 2008-01-23 15:01:28 UTC
Appeared in the mainline as commit 87b4b6634ac112ddfe7b92aae50eb4bf7b128d1a
Comment 35 udo 2008-01-27 03:18:33 UTC
Gigabyte replied:

Our bios engineer has open the resource for 0X290-0x29F reserved for I/O.
(and attached a test BIOS)

Would this be enough?
Comment 36 udo 2008-01-27 03:52:12 UTC
With the new testbios and 2.6.23.11 I get:

(...)
0170-0177 : 0000:00:09.0
01f0-01f7 : 0000:00:09.0
  01f0-01f7 : ide0
0290-0297 : it87
  0290-0294 : pnp 00:02
  0295-0296 : pnp 00:02
0376-0376 : 0000:00:09.0
0378-037a : parport0
(...)

Is this better?

On 2.6.24 I get:

0170-0177 : 0000:00:09.0
01f0-01f7 : 0000:00:09.0
  01f0-01f7 : ide0
0290-0294 : pnp 00:02
0295-0296 : it87
  0295-0296 : pnp 00:02
    0295-0296 : it87
0376-0376 : 0000:00:09.0
0378-037a : parport0

2.6.24 works.
Comment 37 Jean Delvare 2008-01-28 01:40:44 UTC
Udo, I'm not sure what exactly Gigabyte changed (I don't know what the PNP resources were like with the previous BIOS) but it still doesn't look correct to me: they are declaring a resource at 0290-0294 while it doesn't make sense for the IT87xxF. That being said, these declarations would at least make it possible to declare 0290-0297 as the IT87xxF device resource and only requests ports 0295-0296 when the driver attaches. That would be a mix between the 2.6.23 and 2.6.24 variants of the it87 driver, that makes IMHO more sense than what we did in 2.6.24.

But OTOH there are at least 2 other boards out there that don't have a newer BIOS available. Until Gigabyte release a public BIOS update for each affected board, we have to leave the it87 driver as it is now.
Comment 38 udo 2008-02-23 10:36:34 UTC
Gigabyte wrote me:

our bios open only the resource for 0295 to 0296 not 0X290-0x29F.
They cannot open reserve 0X290-0X29, this will cause the system to run unstable.
According to bios engineer, the best thing you can do is to open the resource for IT87 is
295 to 0296.
Whereas according to your information the problem can be solve.

On 2.6.24 I get:
0170-0177 : 0000:00:09.0
01f0-01f7 : 0000:00:09.0
01f0-01f7 : ide0
0290-0294 : pnp 00:02
0295-0296 : it87
0295-0296 : pnp 00:02
0295-0296 : it87
0376-0376 : 0000:00:09.0
0378-037a : parport0

Any comments?
http://ggts.gigabyte.com for your direct communication about this issue.
Comment 39 Jean Delvare 2008-02-25 00:17:44 UTC
The BIOS indeed doesn't reserve 0x290-0x29f, but it reserves 0x290-0x294, with no good reason as far as I can see.

"This will cause the system to run unstable" is another way to say that they have no idea what they are talking about and did not understand your request at all. At this point it is probably better to just give up on them and take good note that Gigabyte support is responsive but clueless.

Note You need to log in before you can comment on or make changes to this bug.