Bug 2349 - panic caused by idle process is ignored
Summary: panic caused by idle process is ignored
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Process Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: process_other
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-03-22 01:05 UTC by Dominik Brodowski
Modified: 2007-10-11 14:24 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.5-rc2
Subsystem:
Regression: ---
Bisected commit-id:


Attachments

Description Dominik Brodowski 2004-03-22 01:05:52 UTC
If an OOPS occurs in the idle process, the kernel intends to panic -- but even
though dmesg says:

 <0>Kernel panic: Attempted to kill the idle task!
In idle task - not syncing

the system is still usable. To verify this behaviour, I hardcoded a division by
zero to acpi_processor_idle:

ACPI: Processor [CPU0] (supports C1 C2 C3, 8 throttling states)
divide error: 0000 [#1]
PREEMPT
CPU:    0
EIP:    0060:[<d05bf2df>]    Not tainted
EFLAGS: 00010282   (2.6.5-rc2)
EIP is at acpi_processor_idle+0x24/0x208 [processor]
eax: 0000000a   ebx: c0482000   ecx: ffffffff   edx: 00000000
esi: 00099100   edi: ced1e0c4   ebp: c0483fd4   esp: c0483fbc
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0482000 task=c03f9a20)
Stack: d05bf2bb 00000000 0000007b c0482000 00099100 c04a7b60 c0483fe0 c01050e4
       c0482000 c0483ff8 c04847e6 c03a4b18 c0484540 c04a7b80 00000816 0051d007
       c010019f
Call Trace:
 [<d05bf2bb>] acpi_processor_idle+0x0/0x208 [processor]
 [<c01050e4>] cpu_idle+0x34/0x40
 [<c04847e6>] start_kernel+0x146/0x170
 [<c0484540>] unknown_bootoption+0x0/0x120
 
Code: f7 7d ec 50 68 82 21 5c d0 e8 d3 d0 b5 ef fa 6b 47 14 48 5b
 <0>Kernel panic: Attempted to kill the idle task!
In idle task - not syncing

... but I'm still writing this bugreport without having rebooted.
Comment 1 Len Brown 2004-03-22 10:14:21 UTC
acpi_processor_idle() crassh -- moving but to ACPI sub-system
Comment 2 Dominik Brodowski 2004-03-22 10:20:12 UTC
Len, please re-read the description: I only used the acpi idle handler as an
example, and advertedly added a division by zero to _prove_ that there's a
different(!) BUG: whenever an OOPS occurs in the idle handler, a panic() should
occur and lock the system. The panic() call is done, but the system is usable
afterwards. It's a process management bug, and not an ACPI bug.
Comment 3 Natalie Protasevich 2007-09-04 19:03:35 UTC
How is it working now, with recent kernels? Isn't there a flag that tells whether to panic on oops?
Thanks.
Comment 4 Anton Salikhmetov 2007-10-11 13:34:50 UTC
When using the kernel 2.6.20, this bug is not reproduced. I changed the code of the arch/i386/kernel/process.c file as follows: 

diff -u arch/i386/kernel/process.c.orig arch/i386/kernel/process.c
--- arch/i386/kernel/process.c.orig     2007-10-12 00:22:38.000000000 +0400
+++ arch/i386/kernel/process.c  2007-10-12 00:19:55.000000000 +0400
@@ -94,12 +94,25 @@

 EXPORT_SYMBOL(enable_hlt);

+int codedot_oops = 0;
+EXPORT_SYMBOL(codedot_oops);
+
 /*
  * We use this if we don't have any better
  * idle routine..
  */
 void default_idle(void)
 {
+       volatile struct s {
+               int s;
+       } *s = NULL;
+
+       if (codedot_oops) {
+               codedot_oops = 0;
+               printk(KERN_EMERG "I am going to crash\n");
+               ++s->s;
+       }
+
        if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
                current_thread_info()->status &= ~TS_POLLING;
                /*

After the kernel has booted, I installed my own module, which switches the codedot_oops flag on. The kernel immediately reported an oops and a panic:

debian:~/oops# insmod oops.ko
[   77.012000] Oops module is being installed
debian:~/oops# [   77.054000] I am going to crash
[   77.056000] BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
[   77.056000]  printing eip:
[   77.057000] c0101915
[   77.057000] *pde = 00000000
[   77.057000] Oops: 0000 [#1]
[   77.057000] PREEMPT
[   77.057000] Modules linked in: oops
[   77.057000] CPU:    0
[   77.057000] EIP:    0060:[<c0101915>]    Not tainted VLI
[   77.057000] EFLAGS: 00000282   (2.6.20 #8)
[   77.057000] EIP is at default_idle+0x35/0x70
[   77.057000] eax: 00000026   ebx: 00000800   ecx: c02eb420   edx: c02eb420
[   77.057000] esi: 00099100   edi: c02ff000   ebp: 0036a007   esp: c0305fe0
[   77.057000] ds: 007b   es: 007b   ss: 0068
[   77.057000] Process swapper (pid: 0, ti=c0304000 task=c02e7380 task.ti=c0304000)
[   77.057000] Stack: c02b90d8 c010119a c0306720 c02b6fe6 c0306250 00000000 c032cc40 00000000
[   77.057000] Call Trace:
[   77.057000]  [<c010119a>] cpu_idle+0x3a/0x70
[   77.057000]  [<c0306720>] start_kernel+0x280/0x300
[   77.057000]  [<c0306250>] unknown_bootoption+0x0/0x250
[   77.057000]  =======================
[   77.057000] Code: 75 16 a1 2c ca 32 c0 85 c0 75 09 80 3d 05 1f 30 c0 00 75 24 f3 90 58 c3 31 c0 a3 24 ca 32 c0 c7 04 24 d8 90 2b c0 e8 eb 37 01 00 <a1> 00 00 00 00 40 a3 00 00 00 00 eb ca 89 e0 25 00 e0 ff ff 83
[   77.057000] EIP: [<c0101915>] default_idle+0x35/0x70 SS:ESP 0068:c0305fe0
[   77.057000]  <0>Kernel panic - not syncing: Attempted to kill the idle task!
[   77.059000]

The system got freezed after that.

Can this bug be closed already?
Comment 5 Natalie Protasevich 2007-10-11 14:24:28 UTC
Thanks Anton. This is definitely a proof we can close this bug now.

Note You need to log in before you can comment on or make changes to this bug.