Bug 8391

Summary: Soft lockup on CPU0 when resuming from suspension to ram, related to acpi processor module
Product: ACPI Reporter: Giorgio Lando (patroclo7)
Component: Power-ProcessorAssignee: Thomas Gleixner (tglx)
Status: CLOSED CODE_FIX    
Severity: high CC: patroclo7, protasnb, rui.zhang, tglx
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: 2.6.22 Subsystem:
Regression: --- Bisected commit-id:
Attachments: My kernel config
bootlog
collection of pending fixups
patch series of the previous

Description Giorgio Lando 2007-04-28 08:46:31 UTC
Most recent kernel where this bug did *NOT* occur: 2.6.20
Distribution: archlinux (but the problem occurs also with a vanilla kernel)
Hardware Environment: acer 1644WLMi, with an Intel Centrino 2000 MhZ

Problem Description: When I resume the laptop from a suspension to ram, I get a
soft lockup. If the module 'processor' is not loaded, the laptop resume
normally.  On the contrary compiling the processor acpi module into the kernel
does not solve the problem. 
This happens with a vanilla kernel. 2.6.21.1 does not solve it.
This is what I can see in the system log:
Apr 28 17:18:37 clarabella BUG: soft lockup detected on CPU#0!
Apr 28 17:18:37 clarabella [<c01524f8>] softlockup_tick+0xa8/0x110
Apr 28 17:18:37 clarabella [<c0132853>] update_process_times+0x33/0x80
Apr 28 17:18:37 clarabella [<c0144a1b>] tick_sched_timer+0x5b/0xc0
Apr 28 17:18:37 clarabella [<c0140a3d>] hrtimer_interrupt+0x13d/0x1d0
Apr 28 17:18:37 clarabella [<c01189f2>] smp_apic_timer_interrupt+0x52/0x90
Apr 28 17:18:37 clarabella [<c034387f>] preempt_schedule_irq+0x3f/0x60
Apr 28 17:18:37 clarabella [<c0104d30>] apic_timer_interrupt+0x28/0x30
Apr 28 17:18:37 clarabella [<c024f141>] cfb_imageblit+0x521/0x580
Apr 28 17:18:37 clarabella [<c0141f88>] clocksource_get_next+0x38/0x40
Apr 28 17:18:37 clarabella [<c01400b2>] ktime_get_ts+0x22/0x60
Apr 28 17:18:37 clarabella [<c0143432>] clockevents_program_event+0x92/0x110
Apr 28 17:18:37 clarabella [<c024cc06>] bit_putcs+0x576/0x600
Apr 28 17:18:37 clarabella [<c024dca0>] bitfill_aligned+0x0/0x100
Apr 28 17:18:37 clarabella [<c024b72b>] fbcon_switch+0x41b/0x5f0
Apr 28 17:18:37 clarabella [<c0246efd>] fbcon_putcs+0x19d/0x2e0
Apr 28 17:18:37 clarabella [<c024c690>] bit_putcs+0x0/0x600

Steps to reproduce: suspend to ram with the acpi support for the processor in
the kernel or as a loaded module; try to resume.
Comment 1 Giorgio Lando 2007-04-28 08:51:00 UTC
Created attachment 11311 [details]
My kernel config

I attach my kernel config
Comment 2 Thomas Gleixner 2007-04-28 10:09:55 UTC
Can you please try with the module loaded and following addons to the kernel
commandline:

A) highres=off

B) nohz=off

C) highres=off nohz=off

Thanks,

    tglx
Comment 3 Thomas Gleixner 2007-04-28 10:12:55 UTC
Can you please add a boot log (with the module loaded and no further commandline
options) ?

Thanks,

    tglx
Comment 4 Giorgio Lando 2007-05-02 14:12:14 UTC
I am able to resume from suspension to ram in scenario C), while I get the
previous soft lockup in scenarios A) and B).
Comment 5 Giorgio Lando 2007-05-02 14:29:48 UTC
Created attachment 11378 [details]
bootlog

This is the bootlog, with the module loaded and no kernel boot options.
Comment 6 Thomas Gleixner 2007-05-02 14:33:00 UTC
Created attachment 11379 [details]
collection of pending fixups

Can you please apply the attached patch and retest ?
Comment 7 Giorgio Lando 2007-05-02 14:53:48 UTC
The patch seems to solve the issue. With that applied, the same config, the
module loaded and no special kernel boot option, I am able to resume from
suspension to ram. Thanks.
Comment 8 Thomas Gleixner 2007-05-02 15:02:58 UTC
Created attachment 11380 [details]
patch series of the previous

May I ask you a favour?

The attached tarball has the seperate parts of the patch I attached before. The
tarball contains a quilt patch series. If you are not familiar with quilt, then
just apply the patches one after each other according to the order, which can
be found in the file "series". Please recompile and boot after each step and
report which one finally fixes the problem

Thanks

    tglx
Comment 9 Giorgio Lando 2007-05-02 15:39:38 UTC
The problem is solved when I apply the fourth patch, that is
highres-dyntick-avoid-xtime-lock-contention.patch +
clocksource-fix-resume-logic.patch +
acpi-keep-tsc-stable-when-lapic-timer-c2-ok-is-set.patch +
clockevents-fix-resume-logic.patch.
On the contrary, clockevents-fix-oneshot-suspend.patch does not seem to be required.
Comment 10 Thomas Gleixner 2007-05-02 15:51:28 UTC
Giorgio,

thanks a lot. The last patch is required for consitency on different hardware.

I put that bug into PATCH_ALREADY_AVAILABLE status for now. I close it once the
fixes hit mainline and the 2.6.21 stable sries.

Thanks,

     tglx
Comment 11 Giorgio Lando 2007-07-09 01:38:08 UTC
I have these problems again in 2.6.22. Always connected with the processor ACPI driver and solvable with 'nohz=off highres=off'. Does this mean that these patches (or their replacement) have not been included in 2.6.22?
Comment 12 Giorgio Lando 2007-07-09 01:43:22 UTC
I have looked and it seems that their replacement have been included. Thus I think that the bug should be opened again. 
Comment 13 Zhang Rui 2007-08-05 09:42:11 UTC
Hi, Thomas,
I can reproduce the bug and boot with "nohz=off" solves the problem.
Could you give a detailed description of this bug please?

Thanks,
Rui
Comment 14 Natalie Protasevich 2007-10-18 21:47:46 UTC
Any update on this bug please? Can anyone confirm that kernel works now as noted in #10?
Thanks.
Comment 15 Thomas Gleixner 2007-11-13 06:54:29 UTC
Giorgio, Zhang,

is the problem still there with 2.6.23 ?

Thanks,
      tglx
Comment 16 Giorgio Lando 2007-11-13 07:01:21 UTC
No, it is not. 2.6.23 suspends and resumes fine with highres and nohz. I think that the bug can be closed (actually I had forgotten this bug).
Comment 17 Zhang Rui 2007-11-13 17:31:08 UTC
No, everything is working well.
I think it's fixed in 2.6.23-rc3. :)
Comment 18 Thomas Gleixner 2007-11-14 00:01:21 UTC
Giorgio, Zhang,

Thanks for testing!

    tglx