Bug 5825 - soft lockup detected on CPU#0
Summary: soft lockup detected on CPU#0
Status: CLOSED CODE_FIX
Alias: None
Product: ACPI
Classification: Unclassified
Component: Power-Sleep-Wake (show other bugs)
Hardware: i386 Linux
: P2 normal
Assignee: Shaohua
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-01-04 09:21 UTC by Oleksij Rempel (fishor)
Modified: 2006-02-07 17:29 UTC (History)
1 user (show)

See Also:
Kernel Version: 2.6.15-mm3 >
Subsystem:
Regression: ---
Bisected commit-id:


Attachments
lspci (9.37 KB, text/plain)
2006-01-04 09:23 UTC, Oleksij Rempel (fishor)
Details
cpuinfo (554 bytes, text/plain)
2006-01-04 09:29 UTC, Oleksij Rempel (fishor)
Details
debug patch (654 bytes, patch)
2006-01-04 17:21 UTC, Shaohua
Details | Diff
dmesg (15.29 KB, text/plain)
2006-01-06 04:09 UTC, Oleksij Rempel (fishor)
Details
dmesg 2.6.15-mm3 (21.43 KB, text/plain)
2006-01-11 14:01 UTC, Oleksij Rempel (fishor)
Details
disassm (5.21 KB, text/plain)
2006-01-17 05:42 UTC, Oleksij Rempel (fishor)
Details
dmesg with debug patch (15.19 KB, text/plain)
2006-01-17 23:26 UTC, Oleksij Rempel (fishor)
Details
dmesg with agp=off (15.18 KB, text/plain)
2006-01-17 23:39 UTC, Oleksij Rempel (fishor)
Details
dmesg without agp and snd (15.07 KB, text/plain)
2006-01-17 23:40 UTC, Oleksij Rempel (fishor)
Details
patch for time resume (2.04 KB, patch)
2006-01-18 18:29 UTC, Shaohua
Details | Diff
dmesg with fix patch (15.30 KB, text/plain)
2006-01-19 00:21 UTC, Oleksij Rempel (fishor)
Details

Description Oleksij Rempel (fishor) 2006-01-04 09:21:49 UTC
Most recent kernel where this bug did not occur: 2.6.15 > this  is not regression.
Distribution:ubuntu breezy 64-bit
Hardware Environment: Board ASUS P5LD2-VM (i945g + Celeron D)



With S1 it's generate falowing output with oops

echo 1 ? > /proc/acpi/sleep 
Stopping tasks: ==========|
[ACPI Debug]  String: [0x04] "SIOS"
ACPI: PCI interrupt for device 0000:04:00.0 disabled
ACPI: PCI interrupt for device 0000:00:1d.7 disabled
acpi_bus-0201 [02] bus_set_power         : Device is not power manageable
ACPI: PCI interrupt for device 0000:00:1d.3 disabled
ACPI: PCI interrupt for device 0000:00:1d.2 disabled
ACPI: PCI interrupt for device 0000:00:1d.1 disabled
ACPI: PCI interrupt for device 0000:00:1d.0 disabled
ACPI: PCI interrupt for device 0000:00:1b.0 disabled
PM: Entering standby sleep
 hwsleep-0284 [03] enter_sleep_state     : Entering sleep state [S1]
Back to C!
PM: Finishing wakeup.
BUG: soft lockup detected on CPU#0!
CPU 0:
Modules linked in: af_packet snd_hda_intel snd_hda_codec snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd snd_page_alloc unix
Pid: 3138, comm: bash Not tainted 2.6.15-rc7-debug #2
RIP: 0010:[<ffffffff8012cf97>] <ffffffff8012cf97>{release_console_sem+296}
RSP: 0018:ffff81003df91cd8  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff803a7140 RCX: 0000000000003ff6
RDX: 00000000ffffff01 RSI: 0000000000003ff6 RDI: 0000000000003ff6
RBP: 000000058025720b R08: 0000000000000005 R09: 00000000ffffffff
R10: ffff81003f5ce520 R11: 0000000000000000 R12: ffffffff80257397
R13: ffff81003df91c48 R14: ffff81003df91d08 R15: 0000003000000010
FS:  00002aaaab0576d0(0000) GS:ffffffff80438800(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000060c000 CR3: 000000003f071000 CR4: 00000000000006e0

Call Trace:<ffffffff8012d246>{vprintk+571}
<ffffffff80201f8b>{acpi_hw_low_level_read+299}
       <ffffffff8012d2f4>{printk+141} <ffffffff8012d2f4>{printk+141}
       <ffffffff80118a9a>{setup_apic_nmi_watchdog+446}
<ffffffff80118b4b>{lapic_nmi_resume+16}
       <ffffffff80259265>{__sysdev_resume+22} <ffffffff80259507>{sysdev_resume+62}
       <ffffffff80143f98>{enter_state+442} <ffffffff80218399>{acpi_suspend+70}
       <ffffffff802184a3>{acpi_system_write_sleep+107}
<ffffffff8016519f>{vfs_write+199}
       <ffffffff80165299>{sys_write+69} <ffffffff8010e57e>{system_call+126}
       
ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
ACPI: PCI Interrupt 0000:00:1b.0[A] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1b.0 to 64
ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> IRQ 169
PCI: Setting latency timer of device 0000:00:1c.0 to 64
ACPI: PCI Interrupt 0000:00:1c.1[B] -> GSI 17 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:1c.1 to 64
ACPI: PCI Interrupt 0000:00:1c.2[C] -> GSI 18 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1c.2 to 64
ACPI: PCI Interrupt 0000:00:1c.3[D] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1c.3 to 64
ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 20 (level, low) -> IRQ 50
PCI: Setting latency timer of device 0000:00:1d.0 to 64
ACPI: PCI Interrupt 0000:00:1d.1[B] -> GSI 17 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:00:1d.1 to 64
ACPI: PCI Interrupt 0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 185
PCI: Setting latency timer of device 0000:00:1d.2 to 64
ACPI: PCI Interrupt 0000:00:1d.3[D] -> GSI 19 (level, low) -> IRQ 193
PCI: Setting latency timer of device 0000:00:1d.3 to 64
acpi_bus-0201 [03] bus_set_power         : Device is not power manageable
PCI: Enabling device 0000:00:1d.7 (0000 -> 0002)
ACPI: PCI Interrupt 0000:00:1d.7[A] -> GSI 20 (level, low) -> IRQ 50
PCI: Setting latency timer of device 0000:00:1d.7 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 18 (level, low) -> IRQ 185
ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 177
PCI: Setting latency timer of device 0000:04:00.0 to 64
e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
[ACPI Debug]  String: [0x04] "SIOW"
Restarting tasks... done
Comment 1 Oleksij Rempel (fishor) 2006-01-04 09:23:35 UTC
Created attachment 6929 [details]
lspci
Comment 2 Oleksij Rempel (fishor) 2006-01-04 09:29:12 UTC
Created attachment 6930 [details]
cpuinfo
Comment 3 Shaohua 2006-01-04 17:21:37 UTC
Created attachment 6934 [details]
debug patch

Untested, but does this one help a little?
Comment 4 Oleksij Rempel (fishor) 2006-01-05 02:00:39 UTC
Not.



[  195.567546] Back to C!
[  195.567912] PM: Finishing wakeup.
[  195.568787] BUG: soft lockup detected on CPU#0!
[  195.568820] CPU 0:
[  195.568873] Modules linked in: binfmt_misc fan container af_packet
snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd
snd_page_alloc unix
[  195.569270] Pid: 4068, comm: bash Not tainted 2.6.15debug-soft-lock #3
[  195.569307] RIP: 0010:[<ffffffff8024b7be>]
<ffffffff8024b7be>{intel_i915_configure+246}
[  195.569377] RSP: 0018:ffff81002a245de8  EFLAGS: 00000286
[  195.569459] RAX: 000000003f18f001 RBX: ffffffff804239d8 RCX:
000000003f18f001[  195.569506] RDX: ffffc20010104684 RSI: 00000000000011a1 RDI:
0000000000000000[  195.569555] RBP: ffff81003f56b070 R08: 0000000000000000 R09:
0000000000000000[  195.569603] R10: ffff810001ff4800 R11: 0000000000000246 R12:
0000000000000000[  195.569652] R13: ffffffff804285e0 R14: ffff81002a245f50 R15:
0000000000000001[  195.569701] FS:  00002aaaab0576d0(0000)
GS:ffffffff804f4800(0000) knlGS:0000000000000000
[  195.569762] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  195.569808] CR2: 00000000006d12a0 CR3: 000000002b04e000 CR4:
00000000000006e0[  195.569855]
[  195.569856] Call Trace:<ffffffff8024b753>{intel_i915_configure+139}
<ffffffff8024cd8d>{agp_intel_resume+90}
[  195.570025]        <ffffffff801e59f7>{pci_device_resume+28}
<ffffffff80265ec2>{resume_device+146}
[  195.570188]        <ffffffff80265f77>{dpm_resume+151}
<ffffffff80265fd4>{device_resume+33}
[  195.570349]        <ffffffff801443ed>{enter_state+447}
<ffffffff8021ba09>{acpi_suspend+70}
[  195.570511]        <ffffffff8021bb13>{acpi_system_write_sleep+107}
<ffffffff8016568e>{vfs_write+209}
[  195.570674]        <ffffffff801657cc>{sys_write+69}
<ffffffff8010e57e>{system_call+126}
[  195.570836]
[  195.588468] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 169
Comment 5 Oleksij Rempel (fishor) 2006-01-05 10:16:09 UTC
With kernel 2.6.15-mm1 it can't go to sleep with message:
[   70.343628] PM: Preparing system for standby sleep
[   70.346118] Stopping tasks: ==========
[   76.343411]  stopping tasks failed (1 tasks remaining)
[   76.343446] Restarting tasks...<6> Strange, kedac not stopped
[   76.343686]  done
Comment 6 Shaohua 2006-01-05 19:28:00 UTC
Can you give me a full dmesg? I'd like to know which timer (TSC/HPET) are you 
using.
Comment 7 Oleksij Rempel (fishor) 2006-01-06 04:09:31 UTC
Created attachment 6945 [details]
dmesg
Comment 8 Oleksij Rempel (fishor) 2006-01-11 14:01:37 UTC
Created attachment 6989 [details]
dmesg 2.6.15-mm3

Sorry last dmesg wosnt full. 
It use TSC
[   28.408641] time.c: Using PIT/TSC based timekeeping.
Comment 9 Shaohua 2006-01-16 20:39:01 UTC
I tried two systems here, never find such issue. Could you please try latest -
git tree?
Comment 10 Oleksij Rempel (fishor) 2006-01-17 05:37:26 UTC
This i get from latest git, it's looks like random error. Because now RIP is  
{intel_i915_configure+246}, before {release_console_sem+296}.  Standbay in 
linux can't disable video adapter (in windows can), may be it's some thing with 
this issue? 
    
   
   
[  279.403368] Back to C!   
[  279.601272] PM: Finishing wakeup.   
[  279.602122] BUG: soft lockup detected on CPU#0!   
[  279.602154] CPU 0:   
[  279.602207] Modules linked in: fan container af_packet piix snd_hda_intel   
snd   
_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd snd_page_alloc   
nls_cp   
437 vfat fat nls_utf8 ntfs unix   
[  279.602727] Pid: 4459, comm: bash Not tainted 2.6.16-rc1-1 #2   
[  279.602762] RIP: 0010:[<ffffffff8023f7a2>]   
<ffffffff8023f7a2>{intel_i915_conf   
igure+246}   
[  279.602842] RSP: 0018:ffff8100278e5de8  EFLAGS: 00000282   
[  279.602921] RAX: 000000003f12f001 RBX: 0000000000000001 RCX:   
000000003f12f001   
[  279.602969] RDX: ffffc200101046c8 RSI: 00000000000011b2 RDI:   
0000000000000000   
[  279.603016] RBP: 0000000000000000 R08: 0000000000000000 R09:   
0000000000000000   
[  279.603063] R10: ffff810001fed6c0 R11: 0000000000000246 R12:   
0000000000000001   
[  279.603111] R13: 0000000000000001 R14: ffffffff801f95ab R15:   
0000000000000000   
[  279.603159] FS:  00002ac1d03786d0(0000) GS:ffffffff8047b000(0000)   
knlGS:00000   
00000000000   
[  279.603219] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b   
[  279.603264] CR2: 00000000006d12a0 CR3: 0000000027b1c000 CR4:   
00000000000006e0   
[  279.603311]   
[  279.603312] Call Trace: <ffffffff8023f737>{intel_i915_configure+139}   
[  279.603437]        <ffffffff80240d71>{agp_intel_resume+90}   
<ffffffff801d9a43>   
{pci_device_resume+28}   
[  279.603600]        <ffffffff8025a782>{resume_device+146}   
<ffffffff8025a837>{d   
pm_resume+151}   
[  279.603765]        <ffffffff8025a894>{device_resume+33}   
<ffffffff8013dce9>{en   
ter_state+447}   
[  279.603927]        <ffffffff8020f9b9>{acpi_suspend+70}   
<ffffffff8020fac3>{acp   
i_system_write_sleep+107}   
[  279.604094]        <ffffffff8015f860>{vfs_write+212}   
<ffffffff8015f99e>{sys_w   
rite+69}   
[  279.604257]        <ffffffff8010a5ee>{system_call+126}   
[  279.621860] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 16 (level, low) ->   
IRQ   
   
   
Comment 11 Oleksij Rempel (fishor) 2006-01-17 05:42:14 UTC
Created attachment 7046 [details]
disassm

If it's help you. Here is disassm from intel_i915_configure
Comment 12 Oleksij Rempel (fishor) 2006-01-17 05:49:44 UTC
I disambled agp, and get RIP {mwait_idle+51} 
 
 
[  176.357951] PCI: Setting latency timer of device 0000:00:1b.0 to 64 
[  176.358749] BUG: soft lockup detected on CPU#0! 
[  176.358781] CPU 0: 
[  176.358845] Modules linked in: fan container af_packet piix snd_hda_intel 
snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd snd_page_alloc 
nls_cp437 vfat fat nls_utf8 ntfs unix 
[  176.359397] Pid: 0, comm: swapper Not tainted 2.6.16-rc1-1 #2 
[  176.359443] RIP: 0010:[<ffffffff80108d0e>] <ffffffff80108d0e>{mwait_idle+51} 
[  176.359510] RSP: 0018:ffffffff80487fb8  EFLAGS: 00000246 
[  176.359590] RAX: 0000000000000000 RBX: ffffffff8030a0b5 RCX: 
0000000000000000 
[  176.359638] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
ffffffff8043d9d8 
[  176.359686] RBP: 000000290fc15f77 R08: ffffffff80486000 R09: 
0000000000000246 
[  176.359734] R10: ffff810001fed6c0 R11: ffffffff802bc8f5 R12: 
ffffffff803b3a40 
[  176.359783] R13: 00000000fffd6376 R14: ffffffff80387080 R15: 
ffff81003e377c00 
[  176.359831] FS:  0000000000000000(0000) GS:ffffffff8047b000(0000) 
knlGS:0000000000000000 
[  176.359891] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b 
[  176.359937] CR2: 00000000006d12a0 CR3: 000000002cf7a000 CR4: 
00000000000006e0 
[  176.359984] 
[  176.359985] Call Trace: <ffffffff80108cb5>{cpu_idle+68} 
<ffffffff80488756>{start_kernel+388} 
[  176.360176]        <ffffffff80488265>{_sinittext+613} 
[  176.379687] ACPI: PCI Interrupt 0000:00:1c.0[A] -> GSI 16 (level, low) -> 
IRQ 169 
[  176.379783] PCI: Setting latency timer of device 0000:00:1c.0 to 64 
 
Comment 13 Shaohua 2006-01-17 22:43:38 UTC
I'd like to know which device costs too much in resume. Can you add one line 
below in resume_device() (drivers/base/power/resume.c)
--- drivers/base/power/resume.c 2006-01-18 14:47:10.000000000 +0800
+++ resume.c    2006-01-18 14:30:24.000000000 +0800
@@ -25,6 +25,7 @@ int resume_device(struct device * dev)
        int error = 0;

        down(&dev->sem);
+dev_printk(KERN_DEBUG, dev, "start %d\n", jiffies);
        if (dev->power.pm_parent
                        && dev->power.pm_parent->power.power_state.event) {
                dev_err(dev, "PM: resume from %d, parent %s still %d\n",
@@ -36,6 +37,7 @@ int resume_device(struct device * dev)
                dev_dbg(dev,"resuming\n");
                error = dev->bus->resume(dev);
        }
+dev_printk(KERN_DEBUG, dev, "end %d\n", jiffies);
        up(&dev->sem);
        return error;
 better with CONFIG_PRINTK_TIME enabled. Thanks!
Comment 14 Oleksij Rempel (fishor) 2006-01-17 23:26:45 UTC
Created attachment 7057 [details]
dmesg with debug patch
Comment 15 Oleksij Rempel (fishor) 2006-01-17 23:39:25 UTC
Created attachment 7058 [details]
dmesg with agp=off
Comment 16 Oleksij Rempel (fishor) 2006-01-17 23:40:40 UTC
Created attachment 7059 [details]
dmesg without agp and snd
Comment 17 Oleksij Rempel (fishor) 2006-01-17 23:41:22 UTC
I will try memtest. 
Comment 18 Shaohua 2006-01-18 18:29:27 UTC
Created attachment 7068 [details]
patch for time resume

I reproduced this issue with notsc. This might fix your problem.
Comment 19 Oleksij Rempel (fishor) 2006-01-18 23:25:57 UTC
I get build problems. 
  LD      .tmp_vmlinux1 
arch/x86_64/kernel/built-in.o: In function `timer_resume': 
time.c:(.text+0x5c75): undefined reference to `get_cycles_sync' 
 
Comment 20 Shaohua 2006-01-18 23:44:48 UTC
It's against latest kernel (15-git11).
Comment 21 Oleksij Rempel (fishor) 2006-01-19 00:21:57 UTC
Created attachment 7069 [details]
dmesg with fix patch

The patch is working. 
Thank you!
Comment 22 Shaohua 2006-02-07 17:29:39 UTC
Shipped in 2.6.16-rc2.

Note You need to log in before you can comment on or make changes to this bug.