Bug 108991 - Signed 64 bit overflow in timekeeping_get_ns
Summary: Signed 64 bit overflow in timekeeping_get_ns
Status: NEW
Alias: None
Product: Timers
Classification: Unclassified
Component: gettimeofday (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: john stultz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-07 10:20 UTC by Bernd Edlinger
Modified: 2016-02-05 09:20 UTC (History)
0 users

See Also:
Kernel Version: 3.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Bernd Edlinger 2015-12-07 10:20:43 UTC
Hi,

observed with 3.18 kernel, but bug is present in recent kernel too.

We use a 32-bit counter time source which runs at 100 MHz,
so the time wraps at 42.9 sec.

So we have mult=0xA0000000 and shift=28

It happened once, that delta was very large, like 0xF0000000
and the intermediate nsec value thus became negative.

Now the shift also produced a negative value. This value is later
used in unsigned arithmetic.

And the problem is, that timespec64_add_ns took around 40 seconds
to do the division by repeatedly subtracting 1 billion.

Reproducable when stopping the target in the debugger and resuming
at the right time.

This is the suggested patch:


--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
        cycle_t delta;
-       s64 nsec;
+       u64 nsec;
 
        delta = timekeeping_get_delta(tkr);



This is what was in the original call stack:

[70130.983117] INFO: rcu_sched self-detected stall on CPU { 0}  (t=2100 jiffies g=2221030 c=2221029 q=24782)
[70130.991415] Task dump for CPU 0:
[70130.991423] /usr/sbin/apach R running      0   893    774 0x00000002
[70130.991464] [<80017d20>] (unwind_backtrace) from [<80013408>] (show_stack+0x20/0x24)
[70130.997890] [<80013408>] (show_stack) from [<8004b4bc>] (sched_show_task+0xb0/0x104)
[70131.004308] [<8004b4bc>] (sched_show_task) from [<8004e100>] (dump_cpu_task+0x48/0x50)
[70131.010909] [<8004e100>] (dump_cpu_task) from [<80068b88>] (rcu_dump_cpu_stacks+0x9c/0xd8)
[70131.017846] [<80068b88>] (rcu_dump_cpu_stacks) from [<8006c1f8>] (rcu_check_callbacks+0x4a8/0x738)
[70131.025477] [<8006c1f8>] (rcu_check_callbacks) from [<8007077c>] (update_process_times+0x50/0x70)
[70131.033027] [<8007077c>] (update_process_times) from [<80081074>] (tick_sched_timer+0x78/0x264)
[70131.040396] [<80081074>] (tick_sched_timer) from [<80071198>] (__run_hrtimer+0x90/0x1bc)
[70131.047159] [<80071198>] (__run_hrtimer) from [<80071920>] (hrtimer_interrupt+0x134/0x324)
[70131.054093] [<80071920>] (hrtimer_interrupt) from [<800164f0>] (twd_handler+0x40/0x50)
[70131.060684] [<800164f0>] (twd_handler) from [<80064d10>] (handle_percpu_devid_irq+0x90/0x124)
[70131.067877] [<80064d10>] (handle_percpu_devid_irq) from [<80060a4c>] (generic_handle_irq+0x3c/0x4c)
[70131.075589] [<80060a4c>] (generic_handle_irq) from [<80060d10>] (__handle_domain_irq+0x6c/0xb4)
[70131.082953] [<80060d10>] (__handle_domain_irq) from [<800086bc>] (gic_handle_irq+0x34/0x6c)
[70131.089973] [<800086bc>] (gic_handle_irq) from [<80013f80>] (__irq_svc+0x40/0x54)
[70131.096124] Exception stack(0xbe9ffee0 to 0xbe9fff28)
[70131.099859] fee0: c4653600 ffffffff fdfcc63c 93877963 d1e0e8e8 00000000 be9fff60 00000004
[70131.106705] ff00: 3b9ac9ff 00000000 00d5e820 be9fff5c 0000001c be9fff28 fffffffc 80076e90
[70131.113547] ff20: a00f0013 ffffffff
[70131.115729] [<80013f80>] (__irq_svc) from [<80076e90>] (__getnstimeofday64+0xe4/0x14c)
[70131.122316] [<80076e90>] (__getnstimeofday64) from [<80078080>] (do_gettimeofday+0x24/0x6c)
[70131.129337] [<80078080>] (do_gettimeofday) from [<8006dc4c>] (SyS_gettimeofday+0x2c/0xcc)
[70131.136190] [<8006dc4c>] (SyS_gettimeofday) from [<8000ed40>] (ret_fast_syscall+0x0/0x30)
Comment 1 john stultz 2015-12-07 19:42:57 UTC
Thanks for the bug report.

I think an alternative patch for this is under consideration here:
https://lkml.org/lkml/2015/11/29/219

I'll see about tagging it for -stable so it gets backported as well.
Comment 2 Bernd Edlinger 2016-02-05 09:20:05 UTC
Hi,

meanwhile I tracked down why the IRQ was disabled for so long time,
which obviously happened without using the debugger.

You probably won't believe it: See pr111951.

Note You need to log in before you can comment on or make changes to this bug.