Distribution: Debian Sid Hardware Environment: Dual P3, d6vaa elitegroup motherboard, 256 mb ram, ide discs Problem Description: I have this oops when I enable acpi in 2.4.23-rc3. It started happening at least in 2.4.22. Sorry for not reporting it before, but usually I don't use acpi (my bios seems to be screwed so usually I just turn it off) and when it works alsa stops working (the sound card uses another interrupt and esd doesn't seem to work very well with whatever changes there're), so I sent this hoping it can solve problems to other people. PCI: Using configuration type 1. tbxface-0117[03] acpi_load_tables :ACPI tables sucesfully acquired Parsing all control methods (lots of points) Table[DSDT] (id F004) - 383 objects with 31 devices 83 methods 29 regions ACPI Namespace sucesfully loaded at root c0369cfc IOAPIC[0]: Set PCI routing entry (2-11 -> 0x81 -> IRQ11 Mode: 0 Active: 0) evxfvent-0093 [04] acpi_enable: Transition to ACPI mode sucesfull evgpeblk-0748 [06] ev_create_gpe_back: GPE-00 to 15 [_GPE] 2 regs at 0000000000004020 on int 11 ksymoops 2.4.9 on i686 2.6.0-test9-mm5. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.6.0-test9-mm5/ (default) -m /boot/System.map-2.4.23-rc3 (specified) Error (regular_file): read_ksyms stat /proc/ksyms failed No modules in ksyms, skipping objects No ksyms, skipping lsmod Unable to handle kernel NULL pointer dereference at virtual address 00000018 00000018 *pde = 00000000 EIP: 0010:[<00000018>] Not Tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 eax: 00000000 ebx: c01053d0 ecx: 00000010 edx: dffe8000 esi: dffe8000 edi: dffe8000 ebp: ffffe000 esp: dffe9fb0 Stack: c0105492 00000002 00000000 00000000 00000000 c011bf45 00000c93 00000282 ffffffff 34000000 00000001 00000000 00000001 c035c12a 00000246 00000029 000000ff c011c1bf 00000000 00000400 Call trace: [<c0105492>] [<c011bf45>] [<c011c1bf>] Code: Bad EIP value >>EIP; 00000018 Before first symbol <===== >>ebx; c01053d0 <default_idle+0/50> Trace; c0105492 <cpu_idle+52/70> Trace; c011bf45 <call_console_drivers+65/120> Trace; c011c1bf <printk+14f/180> <0>Kernel Panic: Attempted to kill idle task! 1 error issued. Results may not be reliable.
Created attachment 1545 [details] acpidmp output
Created attachment 1546 [details] dmi decode output
Created attachment 1547 [details] dmesg under 2.4 with acpi disabled (CONFIG_ACPI_BOOT enabled, though)
Created attachment 1548 [details] /proc/interrupt output under 2.4 with acpi disabled
Created attachment 1549 [details] dmesg output under 2.6 with acpi disabled
Created attachment 1550 [details] /proc/interrupts output under 2.6 with acpi disabled
Created attachment 1551 [details] dmesg output under 2.6 with acpi enabled
Created attachment 1552 [details] /proc/interrupts under 2.6 with acpi enabled
I forgot to say that the bug still happens if I disable the acpi debug stuff.
same problem as Bug 1164, please refer to it
> 2.6.0 -- acpi enabled: > rscreate-0346 [10] rs_create_pci_routing_: (PRT[1C].source_index) Need Integer, found Reference > pci_irq-0217 [07] acpi_pci_irq_add_prt : Error evaluating _PRT [AE_BAD_DATA] Yes, this BIOS has malformed _PRT entries: Package (0x04) { 0x0007FFFF, 0x00, 0x00, \_SB.PCI0.LNKA }, Do 2.4.23 and 2.6.0 boot and run correctly if ACPI is booted with pci=noacpi? Current thinking is that enabling that option automatically is the "fix" when we see a BIOS error like this. But that doesn't explain why 2.6 boots and 2.4 takes an oops. The oops in bug #1164 was causd by the DEBUG printk, and is different here. Can you try a clean 2.4 build? cp .config /tmp make distclean clean cp /tmp/.config . make oldconfig make depend make bzImage modules thanks, -Len
(Sorry for the delay, the monitor broke) I built a fresh (distclean, etc) 2.4.23, CONFIG_ACPI_DEBUG=y, pci=noacpi. Same hang ;( Yes, 2.6 with acpi enabled and pci=noacpi boots and works - attaching dmesg and /proc/interrupts. Hell, even esd works well with pci=noacpi :) acpid, however, fails (acpid-1.0.2, debian sid) estel:/home/diego/wget/pmtools-20010730# acpid acpid: can't open /proc/acpi/event: Device or resource busy strace: old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40161000 set_thread_area({entry_number:-1 -> 6, base_addr:0x401612a0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 munmap(0x40018000, 64916) = 0 geteuid32() = 0 open("/proc/acpi/event", O_RDONLY) = -1 EBUSY (Device or resource busy) write(2, "acpid: can\'t open /proc/acpi/eve"..., 60acpid: can't open /proc/acpi/event: Device or resource busy ) = 60 exit_group(1) = ? Don't know if it's related, but it happens in 2.4 too.
Created attachment 1579 [details] dmesg-2.6-acpi enabled pci=noacpi
Created attachment 1580 [details] /proc/interrupts 2.6-acpi enabled pci=noacpi
Let me know if I've got this right... The primary issue at hand is that 2.4.23 takes an oops at boot and hangs. This still occurs with a clean build, with !CONFIG_ACPI_DEBUG, and with pci=noacpi. It does not occur for 2.4.23 with acpi=off, or acpi=ht (just CONFIG_ACPI_BOOT) It does not occur on 2.6.0. Does the oops look the same with the latest 2.4.23 build? Can you capture the failing console with a serial line? One unusual thing in this system: ACPI: INT_SRC_OVR (bus[0] irq[0xb] global_irq[0xb] polarity[0x0] trigger[0x0]) IOAPIC[0]: Set PCI routing entry (2-11 -> 0x81 -> IRQ11 Mode: 0 Active: 0) I'd be surprised if ACPI events (such as power button) register interrupts when you do get the system up on 2.6 in ACPI mode. Please try the patch to bug #1351 and see if ACPI events start coming through.
Created attachment 1586 [details] enable verbose ACPI debugging output Please apply this patch to the failing 2.4.23 kernel, built with CONFIG_ACPI_DEBUG, boot with acpi enabled and report the last thing that comes out before the hang. thanks, -Len
I think at least you should try if fixed DSDT can work in your box. (fix _PRT table).
OK, I tested Len's patch "Created an attachment (id=1586) enable verbose ACPI debugging output" Sorry, I don't have serial console ;( I've caught two different messages (the first one isn't wrong, it seems the oops got mixed with the messages). Surprisingly, after booting several times with your patch (to check how different were the messages), I rebooted once more, and then it _booted_, no hang (I attach dmesg anc /proc/interrupts, of course the printk buffer was too small to store the whole boot).
Created attachment 1596 [details] verbose dmesg output 1
Created attachment 1597 [details] verbose dmesg output 2
Created attachment 1598 [details] verbose dmesg output - NO hang
Created attachment 1599 [details] /proc/interrupts with the "verbose patch" - No hang
Re: the 2.4 oops -- how did you capture this w/o a serial console? evgpeblk-0296 [08] ev_get_gpe_xrupt_block:---- Entry utalloc-0409 [09] utcallocate :---- Entry 00000040 utalloc-0432 [09] utcallocate :---- Exit- dffea220 utalloc-0677 <1> unable to handle kernel NULL pointer dereference ----Entry---Enfeatr20dffea220 CPU:1 EIP:0010[<00000018>] Not tainted EFLAGS:00010246 osl-0930 eax:00000000 ebx:c01053d0 ecx:00000010 edx:dffe4000 osl-0732 [09] os_acquire_lock :esi:dffe4000 edi:dffe4000 ebp:ffffe000 esp:dffe5ff0 [09] os_release_lock: Process swapper (pid 0, stackpage=dffe5000)[09] os_release_lock :00000002 evgpeblk-0349 00000000 [08] os_acquire_lock :00000282 osl-0742 ffffffff----Exit- 34000000 00000001 ----Exit- drivers/acpi/events/evgpeblk.c is the same in 2.4 and 2.6 -- that would be too easy;-) nsaccess-0288 [07] ns_lookup :---- Entry nsaccess-0310 [07] ns_lookup :Null scope prefix using root node (c035983c) nsaccess-0375 [07] ns_lookup :Path absolute from root[c035983c] nsaccess-0486 [07] ns_lookup :Simple pathname (1 segment, Flags=2) nsdump-0081 [07] ns_print_name :[_GPE] The 2nd one simply hanged here? So every time you repeat the boot it stops in a different place? Can you try booting 2.4.23 with "nosmp", and if it works try it a few times just to make sure? I think we could study many issues on this system... Applying the patch from bug #1351 to fix the ACPI interrupt is moot because linux chokes on the _PRT and ignores it -- effectively coming up in pci=noacpi mode. If you're game to trying to get the system all the way up in ACPI mode, you'll need also Shaohua's patch from bug 1164 to hopefully have the kernel not discard the _PRT. thanks, -Len
Yes, the second one halted at that point. I can get different messages, too; but most of them are like the first one (a couple of lines more/less), sometimes without printk, sometimes with it. Sometimes they're different to #1 and #2. Yes, nosmp works! 4/4 times I tested it just booted. I'll try nosmp + Shaoua's patch now.
This patch: http://bugme.osdl.org/attachment.cgi?id=1508&action=view + nosmp boots and works with acpi (dmesg and interrupts attached) but still I can't cat /proc/acpi/event. Should I try http://bugme.osdl.org/attachment.cgi?id=1570&action=view too?
Created attachment 1610 [details] dmesg under 2.4.23 with acpi enabled and Shaohua's patch (http://bugme.osdl.org/attachment.cgi?id=1508)
Created attachment 1611 [details] /proc/interrupts in 2.4.23, acpi enabled and shaohua's patch (http://bugme.osdl.org/attachment.cgi?id=1508)
Created attachment 1614 [details] proposed patch to fix Invalid reference to IRQ 0 can you try the patch, it should fix the below error: IOAPIC[0]: Invalid reference to IRQ 0/n<7>00:00:07[A] -> 2-10 -> IRQ 0
Lets keep this bug focused on the primary issue -- the oops with the 2.4 ACPI+SMP kernel. Turns out that "nosmp" also disables IOAPIC initialization -- something that ACPI didn't expect. (and yet another bug;-) Don't bother with that "IRQ 0" patch, it will be insufficient to properly overcome the lack of IOAPIC initialization. I think it would be a worthwhile exercise to file an additional bug to see if we can get the uni-processor kernel running on this box in ACPI+APIC mode. To do so you'll want to build with CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_SMP=n CONFIG_ACPI=y You'll need (at least;-) the bug 1164 patch for the garbled _PRT entry and you'll need the bug 1351 patch for the ACPI interrupt in particular. That said, now to noodle on why 2 processors crash here when 2.6 works...
Created attachment 1615 [details] patch to enable verbose3 messages for thread tracing Please remove the previous "verbose ACPI debugging output" patch and apply this one. Boot it a few times with the SMP kernel and capture the dmesg -- it may point out a multi-threaded issue. thanks, -Len
Ok For the comment #29: I applied http://bugzilla.kernel.org/attachment.cgi?id=1570 and http://bugzilla.kernel.org/attachment.cgi?id=1508 (i think they're those). It boots, but still I can't cat /proc/acpi/event. dmesg and /proc/interrupts attached, as usual
Created attachment 1656 [details] dmesg in 2.4, acpi enabled, attachements 108 + 1570
Created attachment 1657 [details] /proc/interrupts, in 2.4 with acpi enabled, attachements 1508 + 1570 (previous send needs s/108/1508/
Now for 30: I applied http://bugzilla.kernel.org/attachment.cgi?id=1615 (without any other patches, compiled from 0 as usual). I booted 3 times. In the third attempt it booted (I'll attach the dmesg and /proc/interrupts if needed, but it shouldn't be different than other already posted) I've took a photo of the log. I can understand it pretty; I hope you can see it well too. If not, I'll write down the whole (I don't mind doing it if needed). I've taken 3 photos of each oops anyway. pictures are here (i didn't want to put so much rubbish here) 1 oops: http://www.terra.es/personal/diegocg/oops1-1.jpg http://www.terra.es/personal/diegocg/oops1-2.jpg http://www.terra.es/personal/diegocg/oops1-3.jpg 2 oops: http://www.terra.es/personal/diegocg/oops2-1.jpg http://www.terra.es/personal/diegocg/oops2-2.jpg http://www.terra.es/personal/diegocg/oops2-3.jpg
Well, I just tried 2.4.26-pre1 and it just works beautifully! Whatever you've done, THANKS, I owe you a beer :) (I know it has been a long time since my last report, but 2.6 works so well... ) I attach /proc/interrupts and dmesg. Sound still doesn't works properly (both with alsa and OSS drivers) but I assume that's a different problem. (just for curiosity, stracing esd i get: select(6, NULL, [5], NULL, {0, 100}) = 1 (out [5], left {0, 10000}) write(5, "\0\0\0\0\0\0\304b\0\0Oj\0\0\250\17\0\0\214\246\0\0\16\220"..., 4096) = 4096 select(6, NULL, [5], NULL, {0, 100}) = 1 (out [5], left {0, 10000}) write(5, "\0\0)M\0\0002\337\0\0\207\217\0\0\277\247\0\0{\21\0\0\22"..., 4096) = 4096 select(6, NULL, [5], NULL, {0, 100}) = 0 (Timeout) select(6, NULL, [5], NULL, {0, 100}) = 0 (Timeout) select(6, NULL, [5], NULL, {0, 100}) = 0 (Timeout) select(6, NULL, [5], NULL, {0, 100}) = 0 (Timeout) select(6, NULL, [5], NULL, {0, 100}) = 0 (Timeout) select(6, NULL, [5], NULL, {0, 100}) = 0 (Timeout) (period)
Created attachment 2283 [details] working 2.4.26-pre1 dmesg
Created attachment 2284 [details] /proc/interrupts fomr the 2.4.26-pre1 working kernel
I guess I'll close this one....