Bug 11715

Summary: Blacklist HP 6715s - buggy BIOS
Product: ACPI Reporter: Nicolas Tritz (tritz.nicolas)
Component: Power-ThermalAssignee: herrmann.der.user
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla, herrmann.der.user, mingo, stefan.friesel, trenn
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 2.6.27_rc9 Subsystem:
Regression: Yes Bisected commit-id:
Attachments: Diff to blacklist HP 6715s
PCI based quirk vs 2.6.27-rc9
dmesg after deprioritize DMI quirks patch
lspci
dmidecode
lspci -nnvvxxxx
lspci -nnvvxxxx (with 2.6.26.6)
Patch is against v2.6.27-3977-gc166ab7 (Linus git as of today).
dmesg-2.6.27-git4 with patch
_real_ dmesg-2.6.27-git4 with patch

Description Nicolas Tritz 2008-10-07 13:11:51 UTC
Latest working kernel version: 2.6.26*
Earliest failing kernel version: 2.6.27*
Distribution: Gentoo
Hardware Environment: HP Compaq 6715s with AMD Turion 64 x2

Problem Description:

Same as bug 11516, fans are always on and cpu is always at 800 MHz. The first workaround was to put pci=noacpi on the kernel cmdline.

When I saw the patch "x86 ACPI: Blacklist two HP machines with buggy BIOSes" (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e84956f92a846246b09b34f2a728329c386d250f), I've just added my 6715s to arch/x86/kernel/acpi/boot.c in the same way as 6715b and it works.

So please, can you put this laptop as well ? I don't like the pci=noacpi stuff because I don't know very well what it does on my computer.

Steps to reproduce: Grab an HP 6715s and try the 2.6.27 (rc9 or not) on it.

(Please be gentle, it's my first bug. :))
Comment 1 Nicolas Tritz 2008-10-07 14:17:02 UTC
Created attachment 18196 [details]
Diff to blacklist HP 6715s

Here you can see the little diff I have done to add my computer to the blacklist, if it wasn't clear earlier.
Comment 2 Len Brown 2008-10-07 14:56:53 UTC
Created attachment 18199 [details]
PCI based quirk vs 2.6.27-rc9

rather than expand the DMI,
please try this patch from  Andreas Herrmann <andreas.herrmann3@amd.com>
that is checked into ingo's tree.
Comment 3 Nicolas Tritz 2008-10-07 15:21:51 UTC
Created attachment 18200 [details]
dmesg after deprioritize DMI quirks patch

Sorry, it doesn't work, but it seems I don't have a SB450 so maybe the patch can't work on my system.

I'll post lspci and dmidecode, please tell me if you want other outputs or tests.
Comment 4 Nicolas Tritz 2008-10-07 15:23:35 UTC
Created attachment 18201 [details]
lspci
Comment 5 Nicolas Tritz 2008-10-07 15:24:41 UTC
Created attachment 18202 [details]
dmidecode
Comment 6 Nicolas Tritz 2008-10-07 22:39:14 UTC
I've just tried the "x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC" patch (http://marc.info/?l=linux-kernel&m=122341129802026&w=2) without and with your previous patch by Ingo but same results : fans on and cpu "low".

(a grep on /var/log/messages doesn't show the KERN_INFO on this patch nor the KERN_ERR on Ingo's patch)
Comment 7 herrmann.der.user 2008-10-10 03:31:55 UTC
WRT comment #3 and #4

You have an SB600. That means that the patch for SB450 won't work.

I'll come up with a patch to address this old issue (on new hardware).

BTW, I've tried to trigger our chipset folks to tell HP to correct
there broken BIOSes.
But I am not sure how successful this will be.
Comment 8 Nicolas Tritz 2008-10-12 11:04:27 UTC
Hi,

Thanks for your help.

I've found a similar problem in bug 11516 (comments 38 and 43) so I'll wait for .27-rc* before new tests.
Comment 9 herrmann.der.user 2008-10-14 07:03:01 UTC
Nicolas, can you please attach output of "lspci -nnvvxxxx" to this bugzilla.
Thanks, Andreas
Comment 10 Nicolas Tritz 2008-10-14 09:04:03 UTC
Created attachment 18303 [details]
lspci -nnvvxxxx

lspci -nnvvxxxx with kernel 2.6.27 and pci=noacpi.

(I don't know if this last option can change the output, but I can retry with a previous kernel if you want.)
Comment 11 herrmann.der.user 2008-10-14 10:57:49 UTC
So here we go.
(With some explanation for those who are interested.)

Chipset is configured such that timer interrupt goes to IOAPIC INT0.
FYI, this is bit 14 at offset 0x64 in the PCI config space of device 14.0
(SMBus device). The register reference for SB600 can be found at
http://www.coreboot.org/AMD_Public_Documents

That means that ACPI INTR_SRC_OVR for IRQ0 (see your dmesg)

 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
 ...
 ACPI: IRQ0 used by override.
 ACPI: IRQ2 used by override.

is bogus.

Furthermore INT2 of IOAPIC is connected to output of PIC.
And Linux has to explicitly unmask IRQ0 in PIC to get the timer
interrupt at all:

 ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
 ..MP-BIOS bug: 8254 timer not connected to IO-APIC
 ...trying to set up timer (IRQ0) through the 8259A ...
 ..... (found apic 0 pin 2) ...
 ....... works.

The best solution here is to ignore the bogus INTR_SRC_OVR for IRQ0
if the "swap bit" is not set.
(Setting the "swap bit" is no option as it causes the change of
thermal trip point on those HP Laptops...)

Patch for testing follows.
Comment 12 Nicolas Tritz 2008-10-14 12:06:49 UTC
Created attachment 18304 [details]
lspci -nnvvxxxx (with 2.6.26.6)

Ok, thanks for the information, I'm sure somebody will understand you... :)

I have done the lspci thing on a .26.6 too and there are some differences, like pins routed to differents IRQ or in the hex-dumps. I put it here too because it's my lastest working kernel (if you forget NO_HZ not working with it).
Comment 13 herrmann.der.user 2008-10-14 12:09:38 UTC
Created attachment 18306 [details]
Patch is against v2.6.27-3977-gc166ab7 (Linus git as of today).

Please give it a try! Thanks.
Comment 14 Nicolas Tritz 2008-10-14 13:11:45 UTC
Created attachment 18312 [details]
dmesg-2.6.27-git4 with patch

I have tried it against 2.6.27-git4 (I don't know how to retrieve a specific git version...) : it works !

No more loud fans, desktop as reactive as before, NO_HZ seems to work (no more 500 or 1000+ wakeups-from-idle per second in powertop)...

I have the "hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000" flood in dmesg but it's another bug I think.

Many thanks for your help, I can test it against the lastest git version if you give me a link explaining how to do it.
Comment 15 Nicolas Tritz 2008-10-14 13:13:42 UTC
Created attachment 18313 [details]
_real_ dmesg-2.6.27-git4 with patch

oops...
Comment 16 Shaohua 2008-10-14 20:31:10 UTC
please push the patch to base.
Comment 17 Len Brown 2008-10-16 12:48:42 UTC
patch in comment #13 applied to acpi-test
Comment 18 Thomas Renninger 2008-10-19 15:43:56 UTC
This patch looks save enough for .27 stable kernel?
Andreas, do you mind sending this one to stable@kernel.org and ask for inclusion.
Like that I need not to backport/add this for SLE11 and other distris would also benefit from that.

I wonder how this BIOS bug can be shown as such by marking a message with the new FW_BUG string. The problem is that at early SB600 detection the MADT info including the wrong source override is not available yet. An additional variable could be introduced sb600_detected and marked as __initdata.
Hmm, I wanted to provide an example patch, but I don't find the patch in the acpi-test tree?:

(afer a successful git pull):
git checkout --track -b test origin/test

cat .git/FETCH_HEAD
...
branch 'test' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6

git branch
  release
* test

It's not there? Do I miss something?
Comment 19 herrmann.der.user 2008-10-22 02:39:54 UTC
Thomas,
I didn't find the patch in acpi/test either.

I've send it to Ingo and expect that he will apply it asap.
I'll ping him again to do so. Then there is no need to bring this
upstream via linux-acpi.
Furthermore I've already sent the patch to stable, but I think,
it won't be applied as long as it is not in Linus' git.

To sum it up.
I'd like to see following patches upstream and in 2.6.27-stable:

(1) x86: SB450: skip IRQ0 override if it is not routed to INT2 of IOAPIC
   (already in Linus' git and in stable tree.)
(2) x86: SB600: skip IRQ0 override if it is not routed to INT2 of IOAPIC
   (not yet in any git-tree but sent to Ingo)
(3) Remove dmi-quirks for HP Laptops that advertise bogus override for IRQ0
   (not yet in any git-tree but sent to Ingo, see attachement #18338 in
    bug #11516)

Thus (2) and (3) are still not upstream. Both should go
into 2.6.27-stable, too.
Comment 20 Len Brown 2008-10-24 22:50:13 UTC
shipped in linux-2.6.28-rc1
closed

commit 26adcfbf00e0726b4469070aa2f530dcf963f484
Author: Andreas Herrmann <andreas.herrmann3@amd.com>
Date:   Tue Oct 14 21:01:15 2008 +0200

    x86: SB600: skip ACPI IRQ0 override if it is not routed to INT2 of IOAPIC