If an OOPS occurs in the idle process, the kernel intends to panic -- but even though dmesg says: <0>Kernel panic: Attempted to kill the idle task! In idle task - not syncing the system is still usable. To verify this behaviour, I hardcoded a division by zero to acpi_processor_idle: ACPI: Processor [CPU0] (supports C1 C2 C3, 8 throttling states) divide error: 0000 [#1] PREEMPT CPU: 0 EIP: 0060:[<d05bf2df>] Not tainted EFLAGS: 00010282 (2.6.5-rc2) EIP is at acpi_processor_idle+0x24/0x208 [processor] eax: 0000000a ebx: c0482000 ecx: ffffffff edx: 00000000 esi: 00099100 edi: ced1e0c4 ebp: c0483fd4 esp: c0483fbc ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c0482000 task=c03f9a20) Stack: d05bf2bb 00000000 0000007b c0482000 00099100 c04a7b60 c0483fe0 c01050e4 c0482000 c0483ff8 c04847e6 c03a4b18 c0484540 c04a7b80 00000816 0051d007 c010019f Call Trace: [<d05bf2bb>] acpi_processor_idle+0x0/0x208 [processor] [<c01050e4>] cpu_idle+0x34/0x40 [<c04847e6>] start_kernel+0x146/0x170 [<c0484540>] unknown_bootoption+0x0/0x120 Code: f7 7d ec 50 68 82 21 5c d0 e8 d3 d0 b5 ef fa 6b 47 14 48 5b <0>Kernel panic: Attempted to kill the idle task! In idle task - not syncing ... but I'm still writing this bugreport without having rebooted.
acpi_processor_idle() crassh -- moving but to ACPI sub-system
Len, please re-read the description: I only used the acpi idle handler as an example, and advertedly added a division by zero to _prove_ that there's a different(!) BUG: whenever an OOPS occurs in the idle handler, a panic() should occur and lock the system. The panic() call is done, but the system is usable afterwards. It's a process management bug, and not an ACPI bug.
How is it working now, with recent kernels? Isn't there a flag that tells whether to panic on oops? Thanks.
When using the kernel 2.6.20, this bug is not reproduced. I changed the code of the arch/i386/kernel/process.c file as follows: diff -u arch/i386/kernel/process.c.orig arch/i386/kernel/process.c --- arch/i386/kernel/process.c.orig 2007-10-12 00:22:38.000000000 +0400 +++ arch/i386/kernel/process.c 2007-10-12 00:19:55.000000000 +0400 @@ -94,12 +94,25 @@ EXPORT_SYMBOL(enable_hlt); +int codedot_oops = 0; +EXPORT_SYMBOL(codedot_oops); + /* * We use this if we don't have any better * idle routine.. */ void default_idle(void) { + volatile struct s { + int s; + } *s = NULL; + + if (codedot_oops) { + codedot_oops = 0; + printk(KERN_EMERG "I am going to crash\n"); + ++s->s; + } + if (!hlt_counter && boot_cpu_data.hlt_works_ok) { current_thread_info()->status &= ~TS_POLLING; /* After the kernel has booted, I installed my own module, which switches the codedot_oops flag on. The kernel immediately reported an oops and a panic: debian:~/oops# insmod oops.ko [ 77.012000] Oops module is being installed debian:~/oops# [ 77.054000] I am going to crash [ 77.056000] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 77.056000] printing eip: [ 77.057000] c0101915 [ 77.057000] *pde = 00000000 [ 77.057000] Oops: 0000 [#1] [ 77.057000] PREEMPT [ 77.057000] Modules linked in: oops [ 77.057000] CPU: 0 [ 77.057000] EIP: 0060:[<c0101915>] Not tainted VLI [ 77.057000] EFLAGS: 00000282 (2.6.20 #8) [ 77.057000] EIP is at default_idle+0x35/0x70 [ 77.057000] eax: 00000026 ebx: 00000800 ecx: c02eb420 edx: c02eb420 [ 77.057000] esi: 00099100 edi: c02ff000 ebp: 0036a007 esp: c0305fe0 [ 77.057000] ds: 007b es: 007b ss: 0068 [ 77.057000] Process swapper (pid: 0, ti=c0304000 task=c02e7380 task.ti=c0304000) [ 77.057000] Stack: c02b90d8 c010119a c0306720 c02b6fe6 c0306250 00000000 c032cc40 00000000 [ 77.057000] Call Trace: [ 77.057000] [<c010119a>] cpu_idle+0x3a/0x70 [ 77.057000] [<c0306720>] start_kernel+0x280/0x300 [ 77.057000] [<c0306250>] unknown_bootoption+0x0/0x250 [ 77.057000] ======================= [ 77.057000] Code: 75 16 a1 2c ca 32 c0 85 c0 75 09 80 3d 05 1f 30 c0 00 75 24 f3 90 58 c3 31 c0 a3 24 ca 32 c0 c7 04 24 d8 90 2b c0 e8 eb 37 01 00 <a1> 00 00 00 00 40 a3 00 00 00 00 eb ca 89 e0 25 00 e0 ff ff 83 [ 77.057000] EIP: [<c0101915>] default_idle+0x35/0x70 SS:ESP 0068:c0305fe0 [ 77.057000] <0>Kernel panic - not syncing: Attempted to kill the idle task! [ 77.059000] The system got freezed after that. Can this bug be closed already?
Thanks Anton. This is definitely a proof we can close this bug now.