Bug 1435 - ACPI trouble on IBM blade servers, SvrWrks CSB6
Summary: ACPI trouble on IBM blade servers, SvrWrks CSB6
Status: CLOSED WILL_NOT_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: BIOS (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Len Brown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-10-27 06:23 UTC by Olaf Hering
Modified: 2008-07-22 00:16 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.4.23-pre8
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
BladeCenter_HS20_8832-31Z.tar.gz new blade server data (52.59 KB, application/x-gunzip)
2003-10-27 06:24 UTC, Olaf Hering
Details
debug patch to hard-code PCI link on IRQ14 to edge/high (784 bytes, patch)
2003-11-03 14:27 UTC, Len Brown
Details | Diff

Description Olaf Hering 2003-10-27 06:23:20 UTC
Distribution: SuSE Linux Enterprise Server 8
Hardware Environment:
IBM Blade Server.
Software Environment:
Problem Description:

Newer blades (or newer BIOS revisions) require acpi=off to boot.
That happens with the SLES8 kernel, and also with plain 2.4.23pre, and 2.6.0-testN

Older blades (or older BIOS revisions) do not require any acpi boot options.
Maybe these older systems are blacklisted somewhere.

I will attach various outputs from a new blade server, this one
does not boot without acpi=off. IDE doesnt work, acpi=on leads to dma timeouts.
I was able to use nfsroot and gather the data.

2.4.23pre8 does not use the second cpu, this worked ok with earlier 23pre kernels.
Comment 1 Olaf Hering 2003-10-27 06:24:16 UTC
Created attachment 1215 [details]
BladeCenter_HS20_8832-31Z.tar.gz new blade server data

note the *acpi_off* and *acpi_on* files.
Comment 2 Len Brown 2003-10-30 13:31:26 UTC
There are 2 issues here. 
 
#1 why doesn't ACPI mode work on this system 
#2 should this system be blacklisted and does the list need updating. 
 
#1: ACPI mode 
It appears that in ACPI/APIC mode, the IDE interrupt is set to IO-APIC-level, 
when in non ACPI mode it is set to IO-APIC-edge. 
 
Please boot the system with ACPI enabled and the non-optimal "noapic" 
on the cmdline as an experiment to see if ACPI actually works and it is 
the setting of the interrupt to IDE which is causing the problems. 
(please attach dmesg & /proc/interrupts) 
 
the dmesg for the failure shows that we set IRQ14 to level/low (mode:1/active:1) 
like we're hard-coded to do for all PCI link device interrupts: 
 
ACPI: PCI Interrupt Link [LPID] (IRQs *14) 
ACPI: PCI Interrupt Link [LPID] enabled at IRQ 14 
IOAPIC[0]: Set PCI routing entry (14-14 -> 0x99 -> IRQ 14 Mode:1 Active:1) 
00:00:0f[B] -> 14-14 -> IRQ 14 
 
Hmmm, I've seen PCI Interrupt Link devices used for PIRQ routers in 
PIC mode, but I've never seen them used in APIC mode. 
Surely we have this case hard-coded to to level/low... 
 
The AML for this machine is quite curious. 
_PRT in PIC mode returns {} -- nothing. 
_PRT in APIC mode returns entries with PCI link devices: 
 
The code for LPID is quite unusual. 
            Method (_PRS, 0, NotSerialized) 
            { 
                Store (0x81, IOPT) 
                Return (_CRS ()) 
            } 
 
Curious to have a programmable device who's only possible setting is the current one... 
 
#2: blacklist 
please attach the dmidecode and the dmesg from the (old) working system/BIOS. 
 
Please see if the (new) failing system/bios boots with "acpi=ht" 
 
Yes, there is a dmi_scan.c blacklist entry that probably matches the old system: 
 
        { force_acpi_ht, "IBM Bladecenter", { 
                        MATCH(DMI_BOARD_VENDOR, "IBM"), 
                        MATCH(DMI_BOARD_NAME, "IBM eServer BladeCenter HS20"), 
                        NO_MATCH, NO_MATCH }}, 
 
If it matches, you'll see in the dmesg: 
IBM Bladecenter detected: force use of acpi=ht 
 
The dmi info you attached suggests that IBM may have re-named some fields: 
 
        BIOS Information 
                Vendor: IBM 
                Version: -[BSE105DUS-1.01]- 
                Release Date: 07/30/2003 
 
        System Information 
                Manufacturer: IBM 
                Product Name: IBM eServer BladeCenter HS20 -[883231Z]- 
 
        Base Board Information 
                Manufacturer: IBM 
                Product Name: Server Blade 
 
#3: other comments;-) 
lots of usb messages in the dmesg -- i guess you've got some usb debugging 
enabled?  is there an actual problem with usb, or just IDE? 
 
Comment 3 Len Brown 2003-10-30 22:36:59 UTC
Looking closer at the AML for link device LPID... 
_PRS returns _CRS -- current IRQ is the only one possible. 
_SRS is a no-op, trying to set this IRQ to anything does nothing. 
_CRS does actually ask and IO port (PIDP) what the IRQ# is: 
 
           Method (_CRS, 0, NotSerialized) 
            { 
                Store (0x83, IOPT)	/* just a debug line: writes 0x83 to port 80 */ 
                Name (RRET, ResourceTemplate () 
                { 
                    IRQ (Level, ActiveLow, Shared) {0} 
                }) 
                CreateWordField (RRET, 0x01, RINT) 
                Store (PIDP, Local0) 
                If (LEqual (Local0, 0x00)) 
                { 
                    Store (0x00, RINT) 
                } 
                Else 
                { 
                    ShiftLeft (One, Local0, RINT) 
                } 
 
                Return (RRET) 
            } 
 
Bug _CRS is hard-coded to return (Level, ActiveLow, Shared) -- which 
is exactly how Linux set up the interrupt.  ie. The AML is telling us to 
program the IOAPIC for IDE as level triggered, not edge triggered. 
 
Indeed, I don't know why this link device exists, except for the purpose of 
specifying level/low. 
--- 
Re: the 2nd cpu. 
This is a dual-Xeon HT-capable system, yes? 
if booting with acpi=ht does not bring up all 4 logical processors, then try increasing 
CONFIG_NR_CPUS=8 from 4. 
 
Comment 4 Len Brown 2003-11-03 14:27:19 UTC
Created attachment 1335 [details]
debug patch to hard-code PCI link on IRQ14 to edge/high

This debug patch should ignore the level/low specification for a link device on
IRQ14
and hard-code it to edge/high.	Please try it out on the bladesever in IOAPIC
mode.
Please attach resulting dmesg and /proc/interrupts.

If this patch works as intended and the system functions properly, then this
confirms that the issue is IRQ14 setting resulting from this BIOS link device.
Comment 5 Len Brown 2003-11-06 13:24:47 UTC
Leah tested the patch and confirmed that if Linux ignores the BIOS then IDE works. 
http://bugzilla.suse.de/show_bug.cgi?id=32567 
 
Closing as will-not-fix -- since Linux is correctly doing exactly what the BIOS asks -- 
setting IRQ14 to level/low. 
 
The distros can add this system to their version of the "acpi=ht" blacklist 
until the BIOS is fixed. 
 

Note You need to log in before you can comment on or make changes to this bug.