Bug 15375 - BUG: scheduling while atomic - acpi_idle_enter_bm - Dell Laptops
Summary: BUG: scheduling while atomic - acpi_idle_enter_bm - Dell Laptops
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Processor (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: ykzhao
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-23 04:31 UTC by Len Brown
Modified: 2010-09-29 02:17 UTC (History)
3 users (show)

See Also:
Kernel Version: 2.6.27-2.6.32
Subsystem:
Regression: No
Bisected commit-id:


Attachments
attach one debug patch that prints the stack backtrace (833 bytes, patch)
2010-02-23 09:29 UTC, ykzhao
Details | Diff

Description Len Brown 2010-02-23 04:31:49 UTC
According to kerneloops.org, the highest frequency ACPI-relaetd
OOPS occurs only on Dell Laptops:

http://www.kerneloops.org/searchweek.php?search=acpi_idle_enter_bm
Comment 1 ykzhao 2010-02-23 09:21:24 UTC
     It seems that this kernel oops issue is not related with the cpu idle driver. Instead it is related with that the schedule function is called explicitly or implicitly when executing hardware interrupt or softirq. (Maybe the schedule function will be called when calling msleep or obtaining the mutex lock).
    In the kernel every process will have its own preempt_count, which indicates whether it is in hardware irq or software irq.
    > BUG: scheduling while atomic: swapper/0/0x00000100 (0x00000100 indicates that this happens when executing the software irq. 
    > BUG: scheduling while atomic: swapper/0/0x10010000 (0x10010000 indicates that this happens when executing the hardware interrupt).
    
     The backtrace reports that it is in idle driver. The following describes how the kerneloops is reported in idle driver. In fact it is related with the calling schedule explicitly/implicitly while executing the hardware/software irq.
     1. Before entering the deep C-state, the local irq will be disabled.
     2. The cpu will be waked up from the C-state after the hardware interrupt is triggered.
      3. The hardware ISR will be serviced after the local irq is re-enabled again(This is re-enabled in the function of acpi_idle_enter_bm/enter_simple). 
     4. If the hardware ISR is too long, maybe it will raise the software IRQ. And after the hardware ISR is finished, it will check whether the softirq is raised and then execute it(This is called in the function of do_softirq).
     5. Maybe the hardware ISR/softirq will try to obtain the mutex lock. If the mutex lock can't be obtained, it will call the function of schedule implicitly. Then the schedule_debug function will be called in the function of schedule to check whether the task schedule happens in hardware ISR/software IRQ. 
     6. If the preempt_count can't meet with the requirement, it will complain the backtrace in www.kerneloops.org. As this happens in interrupt context, the schedule_debug function only prints the backtrace of the stack before the interrupt happens.


I will attach one debug patch to print the stack backtrace that calls the schedule function explicitly/implicitly.

thanks.
Comment 2 ykzhao 2010-02-23 09:29:49 UTC
Created attachment 25171 [details]
attach one debug patch that prints the stack backtrace

Can someone try the debug patch so that it can print the stack backtrace  when the schedule function is called explicitly/implicitly in course of hardware/software irq context?

And after the backtrace is complained again, please attach the output of dmesg.

thanks.
Comment 3 Len Brown 2010-03-30 01:39:16 UTC
Some of this sighting may be fixed by the patch to the dell-laptop driver.
That patch went upstream in 2.6.34-rc1.

https://bugzilla.redhat.com/show_bug.cgi?id=572827

It will also need to be back-ported to distro releases which
shipped the dell-laptop input key code before upstream did.

re-open this bug if we see the oops w/ upstream-based kernel
newer than 2.6.34-rc1

Note You need to log in before you can comment on or make changes to this bug.