Bug 11347 - last commit to include/generic-asm/rtc.h is making my machine to freeze
Summary: last commit to include/generic-asm/rtc.h is making my machine to freeze
Status: CLOSED PATCH_ALREADY_AVAILABLE
Alias: None
Product: Timers
Classification: Unclassified
Component: Realtime Clock (show other bugs)
Hardware: All Linux
: P1 high
Assignee: Gilles Grandou
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-15 02:24 UTC by Gilles Grandou
Modified: 2012-01-23 21:44 UTC (History)
4 users (show)

See Also:
Kernel Version: 2.6.25, 2.6.26, 2.6.27-rc3
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments

Description Gilles Grandou 2008-08-15 02:24:52 UTC
Latest working kernel version: 2.6.24
Earliest failing kernel version: 2.6.25-rc1
Distribution: Debian/unstable
Hardware Environment: lenovo t60p
Software Environment:
Problem Description:

All Debian kernel starting from version 2.6.25 are making my laptop to freeze a few minutes after the booting.
- the X server is frozen (including mouse)
- keyboard is stuck, capslock does not toggle led
- network is dead

I've tried generic kernel 2.6.25, 2.6.26, up to 2.6.27-rc3 they all suffer from the same issue. it's seems to be even faster to hang on last kernels (>= 2.6.26).

I've bisect'ed the kernel repository up to this commit:

commit 795d45b22c079946332bf3825afefe5a981a97b6
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon Feb 4 16:48:10 2008 +0100

    x86: fix RTC lockdep warning: potential hardirq recursion

    After disabling both CONFIG_DEBUG_LOCKING_API_SELFTESTS and netconsole
    (using current mainline) I get a login prompt, and also...

    [    5.181668] SELinux: policy loaded with handle_unknown=deny
    [    5.183315] type=1403 audit(1202100038.157:3): policy loaded auid=4294967295 ses=4294967295
    [    5.822073] SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
    [    7.819146] ------------[ cut here ]------------
    [    7.819146] WARNING: at kernel/lockdep.c:2033 trace_hardirqs_on+0x9b/0x10d()
    [    7.819146] Modules linked in: generic ext3 jbd ide_disk ide_core
    [    7.819146] Pid: 399, comm: hwclock Not tainted 2.6.24 #4
    [    7.819146]  [<c011d140>] warn_on_slowpath+0x41/0x51
    [    7.819146]  [<c01364a9>] ? lock_release_holdtime+0x50/0x56
    [    7.819146]  [<c013770c>] ? check_usage_forwards+0x19/0x3b
    [    7.819146]  [<c01390c4>] ? __lock_acquire+0xac3/0xb0b
    [    7.819146]  [<c0108c98>] ? native_sched_clock+0x8b/0x9f
    [    7.819146]  [<c01364a9>] ? lock_release_holdtime+0x50/0x56
    [    7.819146]  [<c030ca6c>] ? _spin_unlock_irq+0x22/0x42
    [    7.819146]  [<c013848b>] trace_hardirqs_on+0x9b/0x10d
    [    7.819146]  [<c030ca6c>] _spin_unlock_irq+0x22/0x42
    [    7.819146]  [<c011481e>] hpet_rtc_interrupt+0xdf/0x290
    [    7.819146]  [<c014ea90>] handle_IRQ_event+0x1a/0x46
    [    7.819146]  [<c014f8ea>] handle_edge_irq+0xbe/0xff
    [    7.819146]  [<c0106e08>] do_IRQ+0x6d/0x84
    [    7.819146]  [<c0105596>] common_interrupt+0x2e/0x34
    [    7.819146]  [<c013007b>] ? ktime_get_ts+0x8/0x3f
    [    7.819146]  [<c0139420>] ? lock_release+0x167/0x16f
    [    7.819146]  [<c017974a>] ? core_sys_select+0x2c/0x327
    [    7.819146]  [<c0179792>] core_sys_select+0x74/0x327
    [    7.819146]  [<c0108c98>] ? native_sched_clock+0x8b/0x9f
    [    7.819146]  [<c01364a9>] ? lock_release_holdtime+0x50/0x56
    [    7.819146]  [<c030ca6c>] ? _spin_unlock_irq+0x22/0x42
    [    7.819146]  [<c01384d6>] ? trace_hardirqs_on+0xe6/0x10d
    [    7.819146]  [<c030ca77>] ? _spin_unlock_irq+0x2d/0x42
    [    7.819146]  [<c023b437>] ? rtc_do_ioctl+0x11b/0x677
    [    7.819146]  [<c01c487e>] ? inode_has_perm+0x5e/0x68
    [    7.819146]  [<c01364a9>] ? lock_release_holdtime+0x50/0x56
    [    7.819146]  [<c0108c98>] ? native_sched_clock+0x8b/0x9f
    [    7.819146]  [<c01c490b>] ? file_has_perm+0x83/0x8c
    [    7.819146]  [<c023ba08>] ? rtc_ioctl+0xf/0x11
    [    7.819146]  [<c017898d>] ? do_ioctl+0x55/0x67
    [    7.819146]  [<c0179d15>] sys_select+0x93/0x163
    [    7.819146]  [<c0104b39>] ? sysenter_past_esp+0x9a/0xa5
    [    7.819146]  [<c0104afe>] sysenter_past_esp+0x5f/0xa5
    [    7.819146]  =======================
    [    7.819146] ---[ end trace 96540ca301ffb84c ]---
    [    7.819210] rtc: lost 6 interrupts
    [    7.870668] type=1400 audit(1202128840.794:4): avc:  denied  { audit_write } for  pid=399 comm="hwclock" capability=29 scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:hwclock_t:s0 tclass=capability
    [    9.538866] input: PC Speaker as /class/input/input5

    Because hpet_rtc_interrupt()'s call to get_rtc_time() ends up
    resolving to include/asm-generic/rtc.h's (hilariously inlined)
    get_rtc_time(), which does spin_unlock_irq() from hard IRQ context.

    The obvious patch fixes it.

    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/asm-generic/rtc.h b/include/asm-generic/rtc.h
index d3238f1..dd1bed8 100644
--- a/include/asm-generic/rtc.h
+++ b/include/asm-generic/rtc.h
@@ -35,10 +35,11 @@
 static inline unsigned char rtc_is_updating(void)
 {
        unsigned char uip;
+       unsigned long flags;

-       spin_lock_irq(&rtc_lock);
+       spin_lock_irqsave(&rtc_lock, flags);
        uip = (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP);
-       spin_unlock_irq(&rtc_lock);
+       spin_unlock_irqrestore(&rtc_lock, flags);
        return uip;
 }

@@ -46,6 +47,8 @@ static inline unsigned int get_rtc_time(struct rtc_time *time)
 {
        unsigned long uip_watchdog = jiffies;
        unsigned char ctrl;
+       unsigned long flags;
+
 #ifdef CONFIG_MACH_DECSTATION
        unsigned int real_year;
 #endif
@@ -72,7 +75,7 @@ static inline unsigned int get_rtc_time(struct rtc_time *time)
         * RTC has RTC_DAY_OF_WEEK, we ignore it, as it is only updated
         * by the RTC when initially set to a non-zero value.
         */
-       spin_lock_irq(&rtc_lock);
+       spin_lock_irqsave(&rtc_lock, flags);
        time->tm_sec = CMOS_READ(RTC_SECONDS);
        time->tm_min = CMOS_READ(RTC_MINUTES);
        time->tm_hour = CMOS_READ(RTC_HOURS);
@@ -83,7 +86,7 @@ static inline unsigned int get_rtc_time(struct rtc_time *time)
        real_year = CMOS_READ(RTC_DEC_YEAR);
 #endif
        ctrl = CMOS_READ(RTC_CONTROL);
-       spin_unlock_irq(&rtc_lock);
+       spin_unlock_irqrestore(&rtc_lock, flags);

        if (!(ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
        {

I've reverted this commit and tried to run the 2.6.27-rc3 kernel. It no longer freezes and I've now more than 30 hours uptime!

If it can helps here is an extract of my .config:
$ egrep -i '(timer|rtc)' .config
CONFIG_TIMERFD=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_X86_PM_TIMER=y
CONFIG_HANGCHECK_TIMER=m
CONFIG_SND_TIMER=m
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m
# RTC interfaces
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set
# I2C RTC drivers
CONFIG_RTC_DRV_DS1307=m
CONFIG_RTC_DRV_DS1374=m
CONFIG_RTC_DRV_DS1672=m
CONFIG_RTC_DRV_MAX6900=m
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
CONFIG_RTC_DRV_M41T80=m
# CONFIG_RTC_DRV_M41T80_WDT is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# SPI RTC drivers
# CONFIG_RTC_DRV_M41T94 is not set
# CONFIG_RTC_DRV_DS1305 is not set
CONFIG_RTC_DRV_MAX6902=m
# CONFIG_RTC_DRV_R9701 is not set
CONFIG_RTC_DRV_RS5C348=m
# Platform RTC drivers
CONFIG_RTC_DRV_CMOS=m
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=m
CONFIG_RTC_DRV_DS1742=m
CONFIG_RTC_DRV_STK17TA8=m
CONFIG_RTC_DRV_M48T86=m
CONFIG_RTC_DRV_M48T59=m
CONFIG_RTC_DRV_V3020=m
# on-CPU RTC drivers
CONFIG_TIMER_STATS=y
Comment 1 Helle 2008-09-05 01:17:56 UTC
There are many unspecific (kernel and debian) bug reports, which I think have the same reason. Something related to BIOS/RTC/HPET ?
Because there is nothing in the logs in time of freeze it is hard to give more information. After I use the Ingo Molnar's netconsole I got on a other computer the log message "rtc: lost 787  interrupts" before freezing.
The workaround "hpet=disable" works mostly.

From the ChangeLog-2.6.27-rc5:
>> commit 38c052f8cff1bd323ccfa968136a9556652ee420
>> Author: Ingo Molnar <mingo@elte.hu>
>> Date:   Sat Aug 23 17:59:07 2008 +0200
>>  rtc: fix deadlock
>>  if get_rtc_time() is _ever_ called with IRQs off, we deadlock badly
>>  in it, waiting for jiffies to increment.
>>
>>  So make the code more robust by doing an explicit mdelay(20).
>>  This solves a very hard to reproduce/debug hard lockup reported
>>  by Mikael Pettersson.

Kernel 2.6.27-rc5  solved the freeze problem on my Thinkpad T60!
If others can confirm this solution, maybe the developers can backport this fix to 2.6.26.

Thanks a lot especially to Ingo Molnar
Helle
Comment 2 David Brownell 2008-09-05 01:32:38 UTC
Note also the patch in bug #11153 comment #18 ... and that whole bug; the whole "HPET emulation" scheme for RTC interrupts is problematic.  Of course, expecting ACPI interrupts to work is turning up problems too ... we might need to just hang the RTC irq handler directly off SCI (which is where the HPET "legacy replacement mode" re-routes it) rather than using ACPI mechanisms. 
Comment 3 Gilles Grandou 2008-09-06 05:40:33 UTC
(In reply to comment #1)

> From the ChangeLog-2.6.27-rc5:
> >> commit 38c052f8cff1bd323ccfa968136a9556652ee420
> >> Author: Ingo Molnar <mingo@elte.hu>
> >> Date:   Sat Aug 23 17:59:07 2008 +0200
> 
> Kernel 2.6.27-rc5  solved the freeze problem on my Thinkpad T60!
> If others can confirm this solution, maybe the developers can backport this
> fix to 2.6.26.

I've applied the above patch over a stock 2.6.26, I have now +15h uptime without encountering any freeze on my T60p.

Thanks for the hint!
Comment 4 john stultz 2012-01-23 19:28:34 UTC
So it looks like this issue was resolved with 2.6.27, and this bug can be closed.

Note You need to log in before you can comment on or make changes to this bug.