Bug 10883 - Kernel panic (BUG at drivers/acpi/osl.c:460) linux-2.6.25.3
Summary: Kernel panic (BUG at drivers/acpi/osl.c:460) linux-2.6.25.3
Status: REJECTED INSUFFICIENT_DATA
Alias: None
Product: ACPI
Classification: Unclassified
Component: ACPICA-Core (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: Venkatesh Pallipadi
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-07 11:24 UTC by Michal Zimen
Modified: 2008-09-29 08:00 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.25.3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
AML version of acpidump (6.89 KB, application/octet-stream)
2008-06-23 12:48 UTC, Michal Zimen
Details
RAW version of acpidump (19.50 KB, text/plain)
2008-06-23 12:49 UTC, Michal Zimen
Details
debug patch (974 bytes, patch)
2008-08-15 23:12 UTC, Andi Kleen
Details | Diff
Make acpi_os_{read/write}_port() return error rather than panic if BIOS reports invalid port width (2.03 KB, patch)
2008-09-29 07:56 UTC, Alex Zeffertt
Details | Diff

Description Michal Zimen 2008-06-07 11:24:49 UTC
Latest working kernel version: 2.6.24.x
Earliest failing kernel version: 
Distribution: Ubuntu, but own kernel
Hardware Environment: x86, Ali ide chipset
Software Environment: Ubuntu 8.04
Problem Description:

My linux works only with "acpi=off" parameter in kernel cmdline.
With this bug I encounter since 2.6.25-rcX, but didn't have enough time
to check it.
Below is brief dump from console:
kernel BUG at drivers/acpi/osl.c:460!
invalid opcode: 0000 [#1] DEBUG_PAGEALLOC
Modules linked in: af_packet arc4 ecb crypto_blkcipher cryptomgr crypt_algapi rt2500pci rt2x00pci.......
Pid: 0, comm: swapper Not tainted (2.6.35.3-dbg #4)
EIP: ..
EIP is at acpi_os_reat_port+0x40/0x4b
ESI:
 DS:
Process swapper
Call Trace:
   acpi_hw_low_level_read
   acpi_hw_register_read
   acpi_hw_register_write
   acpi_set_register
   acpi_idle_enter_simple
   acpi_set_register
   acpi_idle_enter_bm
   cpuidle_idle_call
   cpuidle_idle_call
   rest_init
...
EIP: acpi_os_read_port+....
... [end trace ...]

Steps to reproduce: 

 Unfortuntely, it seems this bug occurs randomly, but always locks up. 
(Maybe higher load?)
Comment 1 Zhang Rui 2008-06-09 19:48:51 UTC
Please attach the acpidump output.
Comment 2 Michal Zimen 2008-06-12 00:26:14 UTC
I'm sorry, but I have to state, that my laptop yesterday fall asleep forever. Yet I can't get any acpidump no more.

I would stop(or postpone) this bug issue, due to I can't help myself with dead laptop.
Comment 3 Len Brown 2008-06-12 13:59:24 UTC
acpi_status acpi_os_read_port(acpi_io_address port, u32 * value, u32 width)
{
        u32 dummy;

        if (!value)
                value = &dummy;

        *value = 0;
        if (width <= 8) {
                *(u8 *) value = inb(port);
        } else if (width <= 16) {
                *(u16 *) value = inw(port);
        } else if (width <= 32) {
                *(u32 *) value = inl(port);
        } else {
460:             BUG();
        }

        return AE_OK;
}

Strange, that means we are being called to read an IO port
of width > 32 bits.

acpi_os_read_port() is unchanged since the (working) 2.6.24,
so something else above must have changed.

If the machine comes back to life, in addition to acquiring
the acpidump output, it would be good to try
"processor.max_cstate=2" and if that doesn't work then
boot with "idle=poll" and make sure the system is at least
sane in that basic configuration.
Then try building build with CONFIG_CPU_IDLE=n

closing as unreproducible now, if the machine comes back to life,
please re-open.  note that sometimes removing the AC and the
battery can wake a machine that has permanently fallen asleep.
Comment 4 Michal Zimen 2008-06-23 12:45:01 UTC
However, my laptop suddenly came to life and so I can upload acpidump files.
(the cure: another plugin-plugout the battery from slot)
Comment 5 Michal Zimen 2008-06-23 12:48:39 UTC
Created attachment 16588 [details]
AML version of acpidump
Comment 6 Michal Zimen 2008-06-23 12:49:16 UTC
Created attachment 16589 [details]
RAW version of acpidump
Comment 7 Michal Zimen 2008-06-26 00:03:47 UTC
As you've previously recommended I accomplished testing with these results:
   processor.max_cstate=2  -  after awhile lockup occured
   idle=poll  - without lockups  
   without CONFIG_CPU_IDLE  -  without lockups 
Comment 8 Venkatesh Pallipadi 2008-07-03 11:16:55 UTC
Looks like all callers to acpi_os_read_port() pass on a static 8, 16 or 32 width. And there should not be a call for > 32 in normal execution. Probably we have a corrupted stack or something like that?
Comment 9 Zhang Rui 2008-08-14 01:40:06 UTC
Hi, venki,
what's the status of this bug?
Comment 10 Michal Zimen 2008-08-15 01:10:02 UTC
So, it seems there are certain unknown hardware problems which cause these problems. This issue is supported by fact there is only I who had reported this behaviour.
I suggest we could close this if no one else in some close future wouldn't report similar bug.
Comment 11 Andi Kleen 2008-08-15 23:12:28 UTC
Created attachment 17271 [details]
debug patch

Could you perhaps run with this debug patch and see if you get 
get any 

ACPI: invalid read port width

messages in dmesg. If yes post the full dmesg.

I also removed the BUG() so it won't kill your boot anymore, so you
have to check in dmesg with grep after boot.
Comment 12 Zhang Rui 2008-09-23 20:43:18 UTC
reject this bug as there is no response from the bug reporter.
Michal, please reopen it if you can test the patch in comment #11 and update here.
Comment 13 Alex Zeffertt 2008-09-29 07:56:08 UTC
Created attachment 18103 [details]
Make acpi_os_{read/write}_port() return error rather than panic if BIOS reports invalid port width

Also make drivers/acpi/processor_throttling.c test the return value after calling these routines.  (All the other callers already do this!)
Comment 14 Alex Zeffertt 2008-09-29 08:00:13 UTC
Please could you reopen this bug.  We have seen this cause crashes on some of our machines too.

The patch I have just posted is based on the one in comment #11, but I have made it return an error code rather than just generate a warning, and ensured that all callers check this code.

We have verified that this fixes the bug on our machines.

Note You need to log in before you can comment on or make changes to this bug.