Bug 10220
Summary: | Swsusp restores sometimes wrong system time | ||
---|---|---|---|
Product: | Power Management | Reporter: | Martin Koegler (mkoegler) |
Component: | Hibernation/Suspend | Assignee: | Rafael J. Wysocki (rjw) |
Status: | REJECTED INSUFFICIENT_DATA | ||
Severity: | normal | CC: | bunk, john.stultz, tglx |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.24.2 (and earlier version too) | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 7216 | ||
Attachments: |
kernel config
lspci |
Description
Martin Koegler
2008-03-10 23:52:40 UTC
Can you please create an _attachment_ with the .config and _another_ one with the output of lspci? Created attachment 15216 [details]
kernel config
Created attachment 15217 [details]
lspci
Thanks. Thomas, can you please point me to the code responsible for the clock restoration during a resume? It looks like we have a problem with the return to zero of jiffies or something similar. A application which aborts on this this error is bind: Feb 5 18:19:35 pc named[2629]: timer.c:695: fatal error: Feb 5 18:19:35 pc named[2629]: RUNTIME_CHECK(isc_time_now((&now)) == 0) failed Feb 5 18:19:35 pc named[2629]: exiting (due to fatal error in library) The (preprocessed) sources of the failing code are: static inline void fix_tv_usec(struct timeval *tv) { isc_boolean_t fixed = isc_boolean_false; if (tv->tv_usec < 0) { fixed = isc_boolean_true; do { tv->tv_sec -= 1; tv->tv_usec += 1000000; } while (tv->tv_usec < 0); } else if (tv->tv_usec >= 1000000) { fixed = isc_boolean_true; do { tv->tv_sec += 1; tv->tv_usec -= 1000000; } while (tv->tv_usec >=1000000); } if (fixed) (void)syslog(3, "gettimeofday returned bad tv_usec: corrected"); } isc_result_t isc_time_now(isc_time_t *t) { struct timeval tv; char strbuf[128]; ((void) ((t != ((void *)0)) || ((isc_assertion_failed)("time.c", 148, isc_assertiontype_require, "t != ((void *)0)"), 0))); if (gettimeofday(&tv, ((void *)0)) == -1) { isc__strerror((*__errno_location ()), strbuf, sizeof(strbuf)); isc_error_unexpected("time.c", 152, "%s", strbuf); return (34); } # 164 "time.c" fix_tv_usec(&tv); if (tv.tv_sec < 0) return (34); # 175 "time.c" if (sizeof(tv.tv_sec) > sizeof(t->seconds) && ((tv.tv_sec | (unsigned int)-1) ^ (unsigned int)-1) != 0U) return (41); t->seconds = tv.tv_sec; t->nanoseconds = tv.tv_usec * 1000; return (0); } I guess, that tv.tv_sec < 0 check fails. The year is not recorded in the syslog, but the in the coredumps of bind: $ ls -ld /var/cache/bind drwxrwxr-x 2 root bind 336 1943-02-05 18:19 /var/cache/bind $ ls -l /var/cache/bind/core.* -rw------- 1 root root 26759168 2007-07-19 09:23 /var/cache/bind/core.12355 -rw------- 1 root root 26320896 2007-05-07 07:22 /var/cache/bind/core.21582 -rw------- 1 root root 26476544 2007-05-07 07:22 /var/cache/bind/core.21716 -rw------- 1 root root 26750976 2007-06-04 16:16 /var/cache/bind/core.22726 -rw------- 1 root root 26529792 2007-05-07 07:22 /var/cache/bind/core.2518 -rw------- 1 root root 26877952 1936-08-23 23:20 /var/cache/bind/core.2593 -rw------- 1 root root 27152384 2008-03-08 07:47 /var/cache/bind/core.2629 -rw------- 1 root root 26734592 1935-12-03 03:12 /var/cache/bind/core.2677 -rw------- 1 root root 27013120 2007-12-26 11:48 /var/cache/bind/core.27886 > ------- Comment #4 from rjw@sisk.pl 2008-03-11 14:52 -------
> Thomas, can you please point me to the code responsible for the clock
> restoration during a resume? It looks like we have a problem with the return
> to zero of jiffies or something similar.
kernel/time/timekeeping.c:timekeeping_resume()
Thanks,
tglx
The system is using tsc as time source: # cat /sys/devices/system/clocksource/clocksource0/available_clocksource tsc acpi_pm pit jiffies # cat /sys/devices/system/clocksource/clocksource0/current_clocksource tsc The uptime is very strang (after some additional suspend/resume without any problems): $ cat /proc/uptime 2243526065.43 458541.05 $ uptime 22:25:59 up -24855 days, -3:-14, 2 users, load average: 0.62, 0.30, 0.83 I managed to peek in the running kernel memory using crash. The uptime in /proc is calculated as current time(~xtime) + wall_to_monotonic + total_sleep_time. I get the following values: crash> p *(unsigned long *)&total_sleep_time $2 = 2242973258 crash> p *(struct timespec*)&wall_to_monotonic $4 = { tv_sec = -1204804304, tv_nsec = 613050791 } crash> p *(struct timespec*)&xtime $5 = { tv_sec = 1205479223, tv_nsec = 589066861 } The sum matches /proc/uptime, so crash should have used the correct addresses : $ cat /proc/uptime 2243648179.93 480444.05 total_sleep_time is about 71 years, which is certainly incorrect for a kernel released in February 2008. This narrows the error down to the following code: In timekeeping_suspend: timekeeping_suspend_time = read_persistent_clock(); In timekeeping_resume: unsigned long now = read_persistent_clock(); if (now && (now > timekeeping_suspend_time)) { unsigned long sleep_length = now - timekeeping_suspend_time; xtime.tv_sec += sleep_length; wall_to_monotonic.tv_sec -= sleep_length; total_sleep_time += sleep_length; } Can you please add two printk()s printing the values of 'timekeeping_suspend_time' in timekeeping_suspend() and 'now' in timekeeping_resume() and see if the output makes sense? Martin: Assuming you're still seeing this issue, did trying Rafael's suggestion in comment #9 provide any new info? No response from the bug report for over six months. Reject this bug. |