Bug 1164

Summary: Oops at boottime with ACPI enabled - VIA694
Product: ACPI Reporter: Stian Jordet (stian_web)
Component: BIOSAssignee: Len Brown (lenb)
Status: CLOSED CODE_FIX    
Severity: normal CC: acpi-bugzilla
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.0-test4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: The oops
dmidecode output
acpidmp output
dmesg from 2.6.0-test7 with acpi and aml_relaxed
/proc/interrupts from 2.6.0-test7 with acpi and relax_aml
/proc/interrupts from 2.6.0-test7 with acpi, relax_aml and pci=noacpi
fixed DSDT
dmesg from 2.6.0-test8 with pci_irq.c patch and fixed dsdt
patch to get info
dmesg latest bk, acpi_debug and patch to get more info
fixed dsdt
new /proc/interrupts
dmesg
Screenshot of Device Manager in Windows XP
Screenshot from Windows XP showing irq assignments.
dmesg with io-apic patch
patch for the error

Description Stian Jordet 2003-08-28 16:51:48 UTC
Distribution: Debian Woody
Hardware Environment: Rioworks SDVIA Dual P3 motherboard
Software Environment:
Problem Description: Oops at boot-time with ACPI-enabled

Steps to reproduce:
If I boot with ACPI enabled, I get an oops right after "ACPI: Subsystem
20030813". I have written down the oops for hand (only way, have no serial
console and the computer locks up). I will also attach the output from dmidecode
and acpidmp. I don't doubt for a second that my DSDT is all screwed up, but it
shouldn't oops, should it?
Comment 1 Stian Jordet 2003-08-28 16:52:22 UTC
Created attachment 759 [details]
The oops
Comment 2 Stian Jordet 2003-08-28 16:53:07 UTC
Created attachment 760 [details]
dmidecode output
Comment 3 Stian Jordet 2003-08-28 16:53:41 UTC
Created attachment 761 [details]
acpidmp output
Comment 4 Luming Yu 2003-09-03 02:08:22 UTC
From disassemble file of dsdt, I found :
DefinitionBlock ("DSDT.aml", "DSDT", 1, "VIA694", "AWRDACPI", 4096)

Would you please reference bug#10 ? Workaround patch there could solve your problem.

Thanks a lot!
Comment 5 Stian Jordet 2003-09-03 06:34:26 UTC
I will indeed try it, but it have to wait about a week (I won't have access to
that computer again before september 11th or 12th. But thanks a lot for looking
into this :)
Comment 6 Stian Jordet 2003-09-11 17:30:37 UTC
I have now tried, but the patch from bug #10 didn't help me any, sorry. Do you
have any other ideas? Btw. does the DSDT look sane? As I wrote earlier, I guess
my DSDT is totally screwed, but it still shouldn't oops the kernel at boot-time
with no errors. I will have access to this computer untill tuesday (september
16th). After that I don't know when I will get to it again, so if you have any
ideas as to what I should try, please give them to me now :)

Thanks again :)

Stian
Comment 7 Shaohua 2003-09-11 18:58:53 UTC
I doubt if it's an ACPI bug. 
>[<c0106e6c>] cpu_idle+0x30/0x40
>[<c04716aa>] <6>ACPI: Interpreter enabled
                 ^this just is a printk string. Seems that before complete 
executing printk, cpu_idle exec, then oops.
did you open APM?
Comment 8 Stian Jordet 2003-09-11 19:25:16 UTC
Hmm. I don't have APM. And it does disappear when I boot with acpi=off... But
you might be right. *sigh* :(
Comment 9 Stian Jordet 2003-10-17 19:43:17 UTC
Well, with latest bk from 2003-10-17, it still panics at boot. But just for fun
I enabled the "Relax AML"-option, and now it works :-) But weird things are
still happening.

1. The pc shouldn't panic whatever weird programming my DSDT might have.
2. When I enable the acpi debug option, it panics again, even with relax_aml.
3. It doesn't print any message about "relax_aml" being used.

ACPI gets my IRQ-routing all wrong :-( But pci=noacpi makes this computer
working quite good. Attached is dmesg from 2.6.0-test7 with acpi enabled, and
relax_aml enabled. Also /proc/interrupts from the same kernel with and withoug
pci=noacpi.

As I said, this has never worked better, actually it is working quite well right
now, but it would be nice if you could make the irq-routing work and perhaps
don't let it panic when I enable debug... :-)

Thanks!
Comment 10 Stian Jordet 2003-10-17 19:44:46 UTC
Created attachment 1071 [details]
dmesg from 2.6.0-test7 with acpi and aml_relaxed
Comment 11 Stian Jordet 2003-10-17 19:47:18 UTC
Created attachment 1072 [details]
/proc/interrupts from 2.6.0-test7 with acpi and relax_aml
Comment 12 Stian Jordet 2003-10-17 19:48:15 UTC
Created attachment 1073 [details]
/proc/interrupts from 2.6.0-test7 with acpi, relax_aml and pci=noacpi
Comment 13 Shaohua 2003-10-19 19:56:34 UTC
please help gather some info using below code with acpi and relax_aml
:
--- pci_irq.c   2003-10-09 03:24:04.000000000 +0800
+++ pci_irq.c.new       2003-10-20 10:55:44.000000000 +0800
@@ -146,10 +146,10 @@
        else
                entry->link.index = prt->source_index;

-       ACPI_DEBUG_PRINT_RAW((ACPI_DB_INFO,
-               "      %02X:%02X:%02X[%c] -> %s[%d]\n",
+       printk(
+               "      %02X:%02X:%02X[%c] -> %s[%d], counts %d\n",
                entry->id.segment, entry->id.bus, entry->id.device,
-               ('A' + entry->pin), prt->source, entry->link.index));
+               ('A' + entry->pin), prt->source, entry->link.index, 
acpi_prt.count);

        /* TBD: Acquire/release lock */
        list_add_tail(&entry->node, &acpi_prt.entries);
Comment 14 Shaohua 2003-10-19 20:05:15 UTC
Created attachment 1107 [details]
fixed DSDT

and try this DSDT, to avoid 'Store(local0, local0)'
Comment 15 Stian Jordet 2003-10-20 17:50:27 UTC
Hmm. I patched pci_irq.c, but didn't get any printk's in dmesg :( Also used your
fixed dsdt (Thanks), but couldn't see any difference. Sorry. I'll attach dmesg
from kernel with the patch and your dsdt.

Btw. I wrote earlier that it worked fine with pci=noacpi, but it didn't. USB
(uhci) and ACPI (!) didn't get interrupts. If I booted with noapic, both ACPI
and USB (uhci) worked fine. Anyway, test7 (and test8) is a big step forward :)
But I do understand that this motherboard perhaps never will get irq-routing
working with acpi, it would be nice if acpi didn't make the box oops when
enabling debug/disabling relaxed_aml. (Kinda weird that relaxed_aml makes a
difference, when it never prints any warning?)

Anyway, thank you very much for looking into this. I only have access to this
computer about a weekend each month, so I can't test anything the nest three or
four weeks, but if you have any ideas, please tell :)
Comment 16 Stian Jordet 2003-10-20 18:01:58 UTC
Created attachment 1119 [details]
dmesg from 2.6.0-test8 with pci_irq.c patch and fixed dsdt
Comment 17 Shaohua 2003-10-31 01:52:35 UTC
Created attachment 1301 [details]
patch to get info

No, I think your BIOS has no error. maybe it's an ACPI bug. I want to get more
info with above patch. And if possible, please try the latest kernel. Thanks.
Comment 18 Stian Jordet 2003-10-31 04:21:28 UTC
Thank you very much :) I'll try the patch as soon as I get home (probably two
and a half week from now). I just have one question; in my dmesg I have these lines:

ACPI: ACPI tables contain no PCI IRQ routing entries
PCI: Invalid ACPI-PCI IRQ routing table

Is that correct? Or do my dsdt contain IRQ routing entries? Anyway, thank you
very much again. I'll owe you a beer if you ever come to Norway :)
Comment 19 Len Brown 2003-10-31 11:02:01 UTC
Wow, plenty of items to work on for this system;-) 
 
I dont' see how the RELAXED_AML patch would effect this system. 
If ran, you'd see warnings: 
 
                                        ACPI_REPORT_WARNING(( 
                                                "The ACPI AML in your computer contains errors, " 
                                                "please nag the manufacturer to correct it.\n")); 
                                        ACPI_REPORT_WARNING(( 
                                                "Allowing relaxed access to fields; " 
                                                "turn on CONFIG_ACPI_DEBUG for details.\n")); 
 
Can you verify that the oops is gone with the latest bk tree and no 
RELAXED_AML used? 
 
Luming, can you post the text difference for the custom DSDT you attached? 
 
Re: CONFIG_ACPI_DEBUG panic -- this one is fixed in the latest tree. 
 
>ACPI: ACPI tables contain no PCI IRQ routing entries 
>PCI: Invalid ACPI-PCI IRQ routing table 
 
Yes, your original DSDT does contain _PRT entries -- for both PIC and APIC modes. 
 
Re: IRQ routing screwed up. 
working on this, maybe will have some more VIA fixes by the time 
you get back to this system;-) 
 
 
Comment 20 Stian Jordet 2003-10-31 11:31:28 UTC
I'm very aware that the RELAXD_AML patch shouldn't affect this system; as I said
in comment #3:

3. It doesn't print any message about "relax_aml" being used.

But that was the only way getting the oops go away :-) With test8, that was.
I'll try the latest bk-tree next time I'm home (in about two and a half week).

Anway, if Shaohua is right - and my DSDT is correct - there seems to be some
bugs left. Both because of the "ACPI: ACPI tables contain no PCI IRQ routing
entries", which is wrong, and because of the oops :-)

Anyway, I'm very impressed with your work. I had a ACPI problem on another
SMP-board half a year ago, and Andy fixed that in a couple of days. Now I've had
three of you guys looking at this bug, it's really appreciated :-) Beers for
everyone, if you take a holiday in Norway :-)

You'll hear from me in a couple of weeks :-)
Comment 21 Shaohua 2003-11-05 19:40:47 UTC
>CONFIG_ACPI_DEBUG panic -- this one is fixed in the latest tree.
Len, I guess this one is different as what you said. So please attach the oops 
with CONFIG_ACPI_DEBUG and the dmesg when oops.
>can you post the text difference for the custom DSDT you attached? 
My DSDT just avoid 'Store(local0, local0)', in recent kernel, it's not 
necessary.
>ACPI: ACPI tables contain no PCI IRQ routing entries
I guess we can get more info with CONFIG_ACPI_DEBUG.
Comment 22 Stian Jordet 2003-11-19 14:57:36 UTC
Somewhere between oct. 20th and today, this has been fixed. Now the kernel boots
without RELAXED_AML as well :):) And it boots with ACPI_DEBUG.

Just to be sure I haven't been crazy, I tested again with my bk-snapshot from
20031020, and it oopsed again without RELAXED_AML. Anyway, that part solved :)

I have here attached dmesg from a boot with ACPI_DEBUG and with Shaohua's patch
to get more info. Hope this helps. If you have anything you want me to try, I
will be here untill mid-day monday. After that I won't have access to this box
before christmas. Thanks :)
Comment 23 Stian Jordet 2003-11-19 14:59:19 UTC
Created attachment 1478 [details]
dmesg latest bk, acpi_debug and patch to get more info

Btw. should this be a seperate bug, perhaps? Since it no longer oopses? ACPI
just don't understands my DSDT's irq-routing, even though Shaohua says the DSDT
is correct.
Comment 24 Shaohua 2003-11-19 16:23:50 UTC
Created attachment 1482 [details]
fixed dsdt

yep, your DSDT has error indead. Please try this fixed DSDT.
Comment 25 Stian Jordet 2003-11-19 17:09:37 UTC
And now it's working perfect :) Thank you :) (Come get some beer :) If I should
try to bug the motherboard manufacturer to fix the BIOS, what should I tell
them? What did you do with the DSDT? Anyway, thanks a lot :)
Comment 26 Stian Jordet 2003-11-19 17:10:55 UTC
Created attachment 1483 [details]
new /proc/interrupts

Btw. doesn't this look a bit weird? With uhci_hcd having irq 27? I'm just
curious :)
Comment 27 Stian Jordet 2003-11-19 17:12:33 UTC
Created attachment 1484 [details]
dmesg

Here's the dmesg as well, in case you care :) Feel free to close this bug, but
please tell me what to tell Rioworks to do first.
Comment 28 Shaohua 2003-11-19 18:25:24 UTC
>Btw. doesn't this look a bit weird? With uhci_hcd having irq 27?
it's not weird, because you use ioapic
>What did you do with the DSDT?
just like this:
--- dsdt.dsl	2003-11-20 10:20:03.000000000 +0800
+++ dsdt.fix	2003-11-20 10:19:52.000000000 +0800
@@ -1344,33 +1344,33 @@
                 Package (0x04)
                 {
                     0x0007FFFF, 
-                    0x00, 
-                    0x00, 
-                    \_SB.PCI0.LNKA
+                    0x00,  
+                    \_SB.PCI0.LNKA,
+		    0x00
                 }, 
 
                 Package (0x04)
                 {
                     0x0007FFFF, 
-                    0x01, 
-                    0x00, 
-                    \_SB.PCI0.LNKB
+                    0x01,  
+                    \_SB.PCI0.LNKB,
+                    0x00
                 }, 
 
                 Package (0x04)
                 {
                     0x0007FFFF, 
-                    0x02, 
-                    0x00, 
-                    \_SB.PCI0.LNKC
+                    0x02,  
+                    \_SB.PCI0.LNKC,
+                    0x00
                 }, 
 
                 Package (0x04)
                 {
                     0x0007FFFF, 
-                    0x03, 
-                    0x00, 
-                    \_SB.PCI0.LNKD
+                    0x03,  
+                    \_SB.PCI0.LNKD,
+                    0x00
                 }, 
 
                 Package (0x04)
Comment 29 Stian Jordet 2003-11-19 18:32:55 UTC
Ahh, ok. Great. I'll forward it to them. Hope they will release a new bios.
Anyway, I just found a new bug. The power-button doesn't generate any ACPI-event
anymore. It does when I boot with noapic. But everything else works fine with
your DSDT (usb, eth0, etc.). Any hint on that?
Comment 30 Shaohua 2003-11-19 18:54:46 UTC
I think this one can be closed. please open new track, if you have other 
problems.
Comment 31 Stian Jordet 2003-11-19 19:07:27 UTC
Ok. Here's a new one for you all :)

http://bugme.osdl.org/show_bug.cgi?id=1563

Thanks for your great help :)
Comment 32 Len Brown 2003-11-20 14:00:57 UTC
I wonder what Windows did on this box -- couldn't have possibly run in IOAPIC mode 
using ACPI with that broken DSDT.  Must have either given up on IOAPIC mode 
and run in PIC mode, or disabled ACPI and run in legacy mode.  I wonder if 
Microsoft gave the vendor a Windows Logo... 
 
if you get a chance to run windows, it would be interesting to see: 
1. does ACPI run -- eg. does power button sleep system etc? 
2. does winmsd show similar IRQ assignments as we have, or are they all < 16? 
 
I'd be interested to see the dmesg after this additional patch is applied: 
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.6.0-test9/20031107182100-print_IO_APIC.patch 
 
thanks, 
-Len 
 
Comment 33 Stian Jordet 2003-11-20 17:04:27 UTC
Since you guys are so fantastic, I will actually install Windows XP on it
tomorrow, just to test this :) Will also get a dmesg with that patch. You'll
hear from me then :) Thanks again :)
Comment 34 Stian Jordet 2003-11-20 18:16:51 UTC
Created attachment 1500 [details]
Screenshot of Device Manager in Windows XP

I was eager to see how this went, so I installed XP right away. It's in the
middle of the night here, but I hope to get some feedback on from you today :)

This is a screenshot of the Device Manager in Windows XP. It clearly shows that
my system is detected as SMP ACPI, and Fan, Button etc. is detected. When I
press powerbutton, the system goes to sleep. Everything works as expected.

I'll attach a screenshot of winmsd showing interrupts in a moment. (Which I
guess isn't good news for you, since it's obviously using IO-APIC...)
Comment 35 Stian Jordet 2003-11-20 18:17:46 UTC
Created attachment 1501 [details]
Screenshot from Windows XP showing irq assignments.
Comment 36 Stian Jordet 2003-11-20 18:43:13 UTC
Created attachment 1503 [details]
dmesg with io-apic patch

Here it is. This is also with a patch from 

http://bugzilla.kernel.org/show_bug.cgi?id=1563

made by Shaohua Li. My powerbutton still doesn't work without noapic, but my
interrupts look good :) I noticed that I got a * on all LNKA-LNKD now, which I
didn't without his patch. Guess that's a good thing. If just that stupid
powerbutton would work, everything would be perfect for the first time on this
board...
Comment 37 Shaohua 2003-11-20 18:58:31 UTC
>guess isn't good news for you, since it's obviously using IO-APIC...)
Maybe ACPI should be more robust
I will check if ACPI can tolerate this error.
Comment 38 Stian Jordet 2003-11-20 19:11:41 UTC
Well, that's up to you :) It really isn't that important, I can live with
compiling in my own DSDT. But I hate that my powerbutton doesn't work (If you
want something to do, I mean :p )
Comment 39 Shaohua 2003-11-20 22:23:04 UTC
Created attachment 1508 [details]
patch for the error

Please try the patch without fixed DSDT. thanks.
Comment 40 Len Brown 2003-11-21 00:07:48 UTC
The broken links (in apic mode) are these: 
                Package (0x04) { 0x0007FFFF, 0x00, 0x00, \_SB.PCI0.LNKA }, 
                Package (0x04) { 0x0007FFFF, 0x01, 0x00, \_SB.PCI0.LNKB }, 
                Package (0x04) { 0x0007FFFF, 0x02, 0x00, \_SB.PCI0.LNKC }, 
                Package (0x04) { 0x0007FFFF, 0x03, 0x00, \_SB.PCI0.LNKD }, 
lspci shows that they are these devices: 
 
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Mobile South] (rev 23) 
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235  
00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 11) (prog-if 00 [UHCI]) 
00:07.3 Host bridge: VIA Technologies, Inc. VT82C596 Power Management (rev 30) 
 
If we ignore the broken entries, then we'd not be able to detect or set the interrupts for these 
devices.  Life may go on if there are no interrupts for pin A -- the ISA interrupts already have their 
own pins.  pin B is IDE, which is also hard-coded, so you may not notice if that PIRQ were not 
set up.  Pin C is your UHCI USB controllers; and I don't know what pinD is. 
 
So if we didn't patch the DSDT, and we followed the ACPI spec and tossed the bogus _PRT 
entries, then I'd expect UHCI USB to stop working.  Was USB working when you booted XP? 
 
thanks, 
-Len 
 
 
Comment 41 Stian Jordet 2003-11-21 07:13:08 UTC
Shaohua:

With this patch, and the one from the other bug:
http://bugme.osdl.org/attachment.cgi?id=1499&action=view

I don't need the patched DSDT :) Very good work. Will this get into the kernel,
or is it to much of a hack?

Len:
USB worked fine with XP, both on the UHCI- and OHCI/EHCI controller. But with
Shaohua's patches it works fine in Linux as well (He's a clever man, methinks :)
Comment 42 Len Brown 2003-11-30 20:29:42 UTC
The CLOSED state means that correct fix is shipping in the release. 
RESOLVED means there is a patch available, which may or may not be correct. 
 
While the patch in this bug report illustrates and addresses the problem, 
it is unlikely to ever ship in the release.  I believe that the correct fix will be 
to detect the bad DSDT and set what today we call "pci=noacpi" automatically. 
 
So I'm changing this back to RESOLVED until we have the correct fix 
shipping in the release. 
 
thanks, 
-Len 
 
Comment 43 Stian Jordet 2003-12-01 07:37:22 UTC
Uhm, I may understand why you con't include the patch in the release, since my
DSDT is not following the standards. But is it doing any harm? The fix you
proposed; detect the broken dsdt and proceed with pci=noacpi wouldn't work,
since usb does not work without noapic on this box. Acpi works with this patch,
but I understand that it is your decision. Anyway, thanks for all help :)
Comment 44 Shaohua 2003-12-01 18:07:40 UTC
pci=noacpi doesn't means noapic, did your box work with pci=noacpi?
Comment 45 Stian Jordet 2003-12-01 18:12:07 UTC
No, that was what I was trying to say :) Sorry if I expressed myself poorly.
I'll try again:

noapic: everything works fine

acpi (with fixed dsdt or your excellent patch):  everything works fine

without acpi or pci=noacpi: no onboard usb.

That's why I really wanted to get acpi going (which I now have, thanks to you.
Ofcourse I would like to have your patch in the kernel (especially since Windows
obviously understands my DSDT), but I'm quite happy now).

Anyway, pci=noacpi is not an option for me.
Comment 46 Stian Jordet 2003-12-01 20:06:31 UTC
Btw. Len, here http://bugme.osdl.org/show_bug.cgi?id=1563#c15 you say "If
Windows doesn't need a fixed DSDT to run on this box, Linux shouldn't
either...". That's not true anymore? :) (yeah, I'm a pain in the ass, sorry).
Comment 47 Len Brown 2003-12-01 20:42:19 UTC
Stian,  
The original issue -- the oops -- is gone, so I'm reversing myself and re-closing this bug.  
The subsequent bug #1563 has two fixes -- one to fix the IRQ27 issue, 
and another to fix the ACPI interrupt.  Both are specific to ACPI mode. 
 
That leaves the broken _PRT entries in the DSDT... 
 
Yes, I see on a careful re-read that pci=noacpi is not going to work here -- 
it screws up both USB _and_ the ACPI interrupt. 
 
No, I don't advocate people running with patched DSDTs -- I believe that BIOS bugs should be 
fixed by the platform vendor that shipped the BIOS. 
 
Shaohua's _PRT swizzle patch in attachment 1508 [details] is not an easy call. 
If we find lots of systems suffer from this issue we may have no choice 
but to break compatibility with the ACPI spec and be bug compatible with 
what Windows apparently does.  Doing so may cause Linux to diverge 
from Intel's ACPI compliant ACPICA implementation, which would have its 
own issues.  So I'm going to defer that decision until we have more information. 
 
Thanks for all your help! 
-Len 
 
Comment 48 Stian Jordet 2003-12-01 20:54:54 UTC
Ok, I of course respect that decision, and thank you for a long and very good
explanation :) And I'm quite impressed with the progress on these bugs. With
-test8 I just got a kernel panic at boot time, now everything works (or at least
we know why it won't work :) So you have fixed every issue, which I really am
thankful for. Great work! :)

Just one last question, is it ok if I cc you on my mail to Rioworks, begging for
them to release a fixed BIOS, and refering to you?
Comment 49 Stian Jordet 2004-01-27 09:00:44 UTC
Len: Just one thing. I just tried kernel 2.4.25-pre7 just for fun, and that
panics on my box, even with the fixed bios. Kernel 2.6.x is working perfectly
now, but 2.4.25-pre7 is having a similar panic to the one I reported here. I do
not care about this bug, I haven't used 2.4.x since before 2.5.33 on this box,
and I will never use it again, I just wanted to let you know :) Anyway, thanks
for perfectly working acpi in 2.6 :)