Bug 1164
Summary: | Oops at boottime with ACPI enabled - VIA694 | ||
---|---|---|---|
Product: | ACPI | Reporter: | Stian Jordet (stian_web) |
Component: | BIOS | Assignee: | Len Brown (lenb) |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | acpi-bugzilla |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.0-test4 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
The oops
dmidecode output acpidmp output dmesg from 2.6.0-test7 with acpi and aml_relaxed /proc/interrupts from 2.6.0-test7 with acpi and relax_aml /proc/interrupts from 2.6.0-test7 with acpi, relax_aml and pci=noacpi fixed DSDT dmesg from 2.6.0-test8 with pci_irq.c patch and fixed dsdt patch to get info dmesg latest bk, acpi_debug and patch to get more info fixed dsdt new /proc/interrupts dmesg Screenshot of Device Manager in Windows XP Screenshot from Windows XP showing irq assignments. dmesg with io-apic patch patch for the error |
Description
Stian Jordet
2003-08-28 16:51:48 UTC
Created attachment 759 [details]
The oops
Created attachment 760 [details]
dmidecode output
Created attachment 761 [details]
acpidmp output
From disassemble file of dsdt, I found : DefinitionBlock ("DSDT.aml", "DSDT", 1, "VIA694", "AWRDACPI", 4096) Would you please reference bug#10 ? Workaround patch there could solve your problem. Thanks a lot! I will indeed try it, but it have to wait about a week (I won't have access to that computer again before september 11th or 12th. But thanks a lot for looking into this :) I have now tried, but the patch from bug #10 didn't help me any, sorry. Do you have any other ideas? Btw. does the DSDT look sane? As I wrote earlier, I guess my DSDT is totally screwed, but it still shouldn't oops the kernel at boot-time with no errors. I will have access to this computer untill tuesday (september 16th). After that I don't know when I will get to it again, so if you have any ideas as to what I should try, please give them to me now :) Thanks again :) Stian I doubt if it's an ACPI bug.
>[<c0106e6c>] cpu_idle+0x30/0x40
>[<c04716aa>] <6>ACPI: Interpreter enabled
^this just is a printk string. Seems that before complete
executing printk, cpu_idle exec, then oops.
did you open APM?
Hmm. I don't have APM. And it does disappear when I boot with acpi=off... But you might be right. *sigh* :( Well, with latest bk from 2003-10-17, it still panics at boot. But just for fun I enabled the "Relax AML"-option, and now it works :-) But weird things are still happening. 1. The pc shouldn't panic whatever weird programming my DSDT might have. 2. When I enable the acpi debug option, it panics again, even with relax_aml. 3. It doesn't print any message about "relax_aml" being used. ACPI gets my IRQ-routing all wrong :-( But pci=noacpi makes this computer working quite good. Attached is dmesg from 2.6.0-test7 with acpi enabled, and relax_aml enabled. Also /proc/interrupts from the same kernel with and withoug pci=noacpi. As I said, this has never worked better, actually it is working quite well right now, but it would be nice if you could make the irq-routing work and perhaps don't let it panic when I enable debug... :-) Thanks! Created attachment 1071 [details]
dmesg from 2.6.0-test7 with acpi and aml_relaxed
Created attachment 1072 [details]
/proc/interrupts from 2.6.0-test7 with acpi and relax_aml
Created attachment 1073 [details]
/proc/interrupts from 2.6.0-test7 with acpi, relax_aml and pci=noacpi
please help gather some info using below code with acpi and relax_aml : --- pci_irq.c 2003-10-09 03:24:04.000000000 +0800 +++ pci_irq.c.new 2003-10-20 10:55:44.000000000 +0800 @@ -146,10 +146,10 @@ else entry->link.index = prt->source_index; - ACPI_DEBUG_PRINT_RAW((ACPI_DB_INFO, - " %02X:%02X:%02X[%c] -> %s[%d]\n", + printk( + " %02X:%02X:%02X[%c] -> %s[%d], counts %d\n", entry->id.segment, entry->id.bus, entry->id.device, - ('A' + entry->pin), prt->source, entry->link.index)); + ('A' + entry->pin), prt->source, entry->link.index, acpi_prt.count); /* TBD: Acquire/release lock */ list_add_tail(&entry->node, &acpi_prt.entries); Created attachment 1107 [details]
fixed DSDT
and try this DSDT, to avoid 'Store(local0, local0)'
Hmm. I patched pci_irq.c, but didn't get any printk's in dmesg :( Also used your fixed dsdt (Thanks), but couldn't see any difference. Sorry. I'll attach dmesg from kernel with the patch and your dsdt. Btw. I wrote earlier that it worked fine with pci=noacpi, but it didn't. USB (uhci) and ACPI (!) didn't get interrupts. If I booted with noapic, both ACPI and USB (uhci) worked fine. Anyway, test7 (and test8) is a big step forward :) But I do understand that this motherboard perhaps never will get irq-routing working with acpi, it would be nice if acpi didn't make the box oops when enabling debug/disabling relaxed_aml. (Kinda weird that relaxed_aml makes a difference, when it never prints any warning?) Anyway, thank you very much for looking into this. I only have access to this computer about a weekend each month, so I can't test anything the nest three or four weeks, but if you have any ideas, please tell :) Created attachment 1119 [details]
dmesg from 2.6.0-test8 with pci_irq.c patch and fixed dsdt
Created attachment 1301 [details]
patch to get info
No, I think your BIOS has no error. maybe it's an ACPI bug. I want to get more
info with above patch. And if possible, please try the latest kernel. Thanks.
Thank you very much :) I'll try the patch as soon as I get home (probably two and a half week from now). I just have one question; in my dmesg I have these lines: ACPI: ACPI tables contain no PCI IRQ routing entries PCI: Invalid ACPI-PCI IRQ routing table Is that correct? Or do my dsdt contain IRQ routing entries? Anyway, thank you very much again. I'll owe you a beer if you ever come to Norway :) Wow, plenty of items to work on for this system;-)
I dont' see how the RELAXED_AML patch would effect this system.
If ran, you'd see warnings:
ACPI_REPORT_WARNING((
"The ACPI AML in your computer contains errors, "
"please nag the manufacturer to correct it.\n"));
ACPI_REPORT_WARNING((
"Allowing relaxed access to fields; "
"turn on CONFIG_ACPI_DEBUG for details.\n"));
Can you verify that the oops is gone with the latest bk tree and no
RELAXED_AML used?
Luming, can you post the text difference for the custom DSDT you attached?
Re: CONFIG_ACPI_DEBUG panic -- this one is fixed in the latest tree.
>ACPI: ACPI tables contain no PCI IRQ routing entries
>PCI: Invalid ACPI-PCI IRQ routing table
Yes, your original DSDT does contain _PRT entries -- for both PIC and APIC modes.
Re: IRQ routing screwed up.
working on this, maybe will have some more VIA fixes by the time
you get back to this system;-)
I'm very aware that the RELAXD_AML patch shouldn't affect this system; as I said in comment #3: 3. It doesn't print any message about "relax_aml" being used. But that was the only way getting the oops go away :-) With test8, that was. I'll try the latest bk-tree next time I'm home (in about two and a half week). Anway, if Shaohua is right - and my DSDT is correct - there seems to be some bugs left. Both because of the "ACPI: ACPI tables contain no PCI IRQ routing entries", which is wrong, and because of the oops :-) Anyway, I'm very impressed with your work. I had a ACPI problem on another SMP-board half a year ago, and Andy fixed that in a couple of days. Now I've had three of you guys looking at this bug, it's really appreciated :-) Beers for everyone, if you take a holiday in Norway :-) You'll hear from me in a couple of weeks :-) >CONFIG_ACPI_DEBUG panic -- this one is fixed in the latest tree. Len, I guess this one is different as what you said. So please attach the oops with CONFIG_ACPI_DEBUG and the dmesg when oops. >can you post the text difference for the custom DSDT you attached? My DSDT just avoid 'Store(local0, local0)', in recent kernel, it's not necessary. >ACPI: ACPI tables contain no PCI IRQ routing entries I guess we can get more info with CONFIG_ACPI_DEBUG. Somewhere between oct. 20th and today, this has been fixed. Now the kernel boots without RELAXED_AML as well :):) And it boots with ACPI_DEBUG. Just to be sure I haven't been crazy, I tested again with my bk-snapshot from 20031020, and it oopsed again without RELAXED_AML. Anyway, that part solved :) I have here attached dmesg from a boot with ACPI_DEBUG and with Shaohua's patch to get more info. Hope this helps. If you have anything you want me to try, I will be here untill mid-day monday. After that I won't have access to this box before christmas. Thanks :) Created attachment 1478 [details]
dmesg latest bk, acpi_debug and patch to get more info
Btw. should this be a seperate bug, perhaps? Since it no longer oopses? ACPI
just don't understands my DSDT's irq-routing, even though Shaohua says the DSDT
is correct.
Created attachment 1482 [details]
fixed dsdt
yep, your DSDT has error indead. Please try this fixed DSDT.
And now it's working perfect :) Thank you :) (Come get some beer :) If I should try to bug the motherboard manufacturer to fix the BIOS, what should I tell them? What did you do with the DSDT? Anyway, thanks a lot :) Created attachment 1483 [details]
new /proc/interrupts
Btw. doesn't this look a bit weird? With uhci_hcd having irq 27? I'm just
curious :)
Created attachment 1484 [details]
dmesg
Here's the dmesg as well, in case you care :) Feel free to close this bug, but
please tell me what to tell Rioworks to do first.
>Btw. doesn't this look a bit weird? With uhci_hcd having irq 27? it's not weird, because you use ioapic >What did you do with the DSDT? just like this: --- dsdt.dsl 2003-11-20 10:20:03.000000000 +0800 +++ dsdt.fix 2003-11-20 10:19:52.000000000 +0800 @@ -1344,33 +1344,33 @@ Package (0x04) { 0x0007FFFF, - 0x00, - 0x00, - \_SB.PCI0.LNKA + 0x00, + \_SB.PCI0.LNKA, + 0x00 }, Package (0x04) { 0x0007FFFF, - 0x01, - 0x00, - \_SB.PCI0.LNKB + 0x01, + \_SB.PCI0.LNKB, + 0x00 }, Package (0x04) { 0x0007FFFF, - 0x02, - 0x00, - \_SB.PCI0.LNKC + 0x02, + \_SB.PCI0.LNKC, + 0x00 }, Package (0x04) { 0x0007FFFF, - 0x03, - 0x00, - \_SB.PCI0.LNKD + 0x03, + \_SB.PCI0.LNKD, + 0x00 }, Package (0x04) Ahh, ok. Great. I'll forward it to them. Hope they will release a new bios. Anyway, I just found a new bug. The power-button doesn't generate any ACPI-event anymore. It does when I boot with noapic. But everything else works fine with your DSDT (usb, eth0, etc.). Any hint on that? I think this one can be closed. please open new track, if you have other problems. Ok. Here's a new one for you all :) http://bugme.osdl.org/show_bug.cgi?id=1563 Thanks for your great help :) I wonder what Windows did on this box -- couldn't have possibly run in IOAPIC mode using ACPI with that broken DSDT. Must have either given up on IOAPIC mode and run in PIC mode, or disabled ACPI and run in legacy mode. I wonder if Microsoft gave the vendor a Windows Logo... if you get a chance to run windows, it would be interesting to see: 1. does ACPI run -- eg. does power button sleep system etc? 2. does winmsd show similar IRQ assignments as we have, or are they all < 16? I'd be interested to see the dmesg after this additional patch is applied: http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/test/2.6.0-test9/20031107182100-print_IO_APIC.patch thanks, -Len Since you guys are so fantastic, I will actually install Windows XP on it tomorrow, just to test this :) Will also get a dmesg with that patch. You'll hear from me then :) Thanks again :) Created attachment 1500 [details]
Screenshot of Device Manager in Windows XP
I was eager to see how this went, so I installed XP right away. It's in the
middle of the night here, but I hope to get some feedback on from you today :)
This is a screenshot of the Device Manager in Windows XP. It clearly shows that
my system is detected as SMP ACPI, and Fan, Button etc. is detected. When I
press powerbutton, the system goes to sleep. Everything works as expected.
I'll attach a screenshot of winmsd showing interrupts in a moment. (Which I
guess isn't good news for you, since it's obviously using IO-APIC...)
Created attachment 1501 [details]
Screenshot from Windows XP showing irq assignments.
Created attachment 1503 [details] dmesg with io-apic patch Here it is. This is also with a patch from http://bugzilla.kernel.org/show_bug.cgi?id=1563 made by Shaohua Li. My powerbutton still doesn't work without noapic, but my interrupts look good :) I noticed that I got a * on all LNKA-LNKD now, which I didn't without his patch. Guess that's a good thing. If just that stupid powerbutton would work, everything would be perfect for the first time on this board... >guess isn't good news for you, since it's obviously using IO-APIC...)
Maybe ACPI should be more robust
I will check if ACPI can tolerate this error.
Well, that's up to you :) It really isn't that important, I can live with compiling in my own DSDT. But I hate that my powerbutton doesn't work (If you want something to do, I mean :p ) Created attachment 1508 [details]
patch for the error
Please try the patch without fixed DSDT. thanks.
The broken links (in apic mode) are these: Package (0x04) { 0x0007FFFF, 0x00, 0x00, \_SB.PCI0.LNKA }, Package (0x04) { 0x0007FFFF, 0x01, 0x00, \_SB.PCI0.LNKB }, Package (0x04) { 0x0007FFFF, 0x02, 0x00, \_SB.PCI0.LNKC }, Package (0x04) { 0x0007FFFF, 0x03, 0x00, \_SB.PCI0.LNKD }, lspci shows that they are these devices: 00:07.0 ISA bridge: VIA Technologies, Inc. VT82C596 ISA [Mobile South] (rev 23) 00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 11) (prog-if 00 [UHCI]) 00:07.3 Host bridge: VIA Technologies, Inc. VT82C596 Power Management (rev 30) If we ignore the broken entries, then we'd not be able to detect or set the interrupts for these devices. Life may go on if there are no interrupts for pin A -- the ISA interrupts already have their own pins. pin B is IDE, which is also hard-coded, so you may not notice if that PIRQ were not set up. Pin C is your UHCI USB controllers; and I don't know what pinD is. So if we didn't patch the DSDT, and we followed the ACPI spec and tossed the bogus _PRT entries, then I'd expect UHCI USB to stop working. Was USB working when you booted XP? thanks, -Len Shaohua: With this patch, and the one from the other bug: http://bugme.osdl.org/attachment.cgi?id=1499&action=view I don't need the patched DSDT :) Very good work. Will this get into the kernel, or is it to much of a hack? Len: USB worked fine with XP, both on the UHCI- and OHCI/EHCI controller. But with Shaohua's patches it works fine in Linux as well (He's a clever man, methinks :) The CLOSED state means that correct fix is shipping in the release. RESOLVED means there is a patch available, which may or may not be correct. While the patch in this bug report illustrates and addresses the problem, it is unlikely to ever ship in the release. I believe that the correct fix will be to detect the bad DSDT and set what today we call "pci=noacpi" automatically. So I'm changing this back to RESOLVED until we have the correct fix shipping in the release. thanks, -Len Uhm, I may understand why you con't include the patch in the release, since my DSDT is not following the standards. But is it doing any harm? The fix you proposed; detect the broken dsdt and proceed with pci=noacpi wouldn't work, since usb does not work without noapic on this box. Acpi works with this patch, but I understand that it is your decision. Anyway, thanks for all help :) pci=noacpi doesn't means noapic, did your box work with pci=noacpi? No, that was what I was trying to say :) Sorry if I expressed myself poorly. I'll try again: noapic: everything works fine acpi (with fixed dsdt or your excellent patch): everything works fine without acpi or pci=noacpi: no onboard usb. That's why I really wanted to get acpi going (which I now have, thanks to you. Ofcourse I would like to have your patch in the kernel (especially since Windows obviously understands my DSDT), but I'm quite happy now). Anyway, pci=noacpi is not an option for me. Btw. Len, here http://bugme.osdl.org/show_bug.cgi?id=1563#c15 you say "If Windows doesn't need a fixed DSDT to run on this box, Linux shouldn't either...". That's not true anymore? :) (yeah, I'm a pain in the ass, sorry). Stian, The original issue -- the oops -- is gone, so I'm reversing myself and re-closing this bug. The subsequent bug #1563 has two fixes -- one to fix the IRQ27 issue, and another to fix the ACPI interrupt. Both are specific to ACPI mode. That leaves the broken _PRT entries in the DSDT... Yes, I see on a careful re-read that pci=noacpi is not going to work here -- it screws up both USB _and_ the ACPI interrupt. No, I don't advocate people running with patched DSDTs -- I believe that BIOS bugs should be fixed by the platform vendor that shipped the BIOS. Shaohua's _PRT swizzle patch in attachment 1508 [details] is not an easy call. If we find lots of systems suffer from this issue we may have no choice but to break compatibility with the ACPI spec and be bug compatible with what Windows apparently does. Doing so may cause Linux to diverge from Intel's ACPI compliant ACPICA implementation, which would have its own issues. So I'm going to defer that decision until we have more information. Thanks for all your help! -Len Ok, I of course respect that decision, and thank you for a long and very good explanation :) And I'm quite impressed with the progress on these bugs. With -test8 I just got a kernel panic at boot time, now everything works (or at least we know why it won't work :) So you have fixed every issue, which I really am thankful for. Great work! :) Just one last question, is it ok if I cc you on my mail to Rioworks, begging for them to release a fixed BIOS, and refering to you? Len: Just one thing. I just tried kernel 2.4.25-pre7 just for fun, and that panics on my box, even with the fixed bios. Kernel 2.6.x is working perfectly now, but 2.4.25-pre7 is having a similar panic to the one I reported here. I do not care about this bug, I haven't used 2.4.x since before 2.5.33 on this box, and I will never use it again, I just wanted to let you know :) Anyway, thanks for perfectly working acpi in 2.6 :) |