Hi, observed with 3.18 kernel, but bug is present in recent kernel too. We use a 32-bit counter time source which runs at 100 MHz, so the time wraps at 42.9 sec. So we have mult=0xA0000000 and shift=28 It happened once, that delta was very large, like 0xF0000000 and the intermediate nsec value thus became negative. Now the shift also produced a negative value. This value is later used in unsigned arithmetic. And the problem is, that timespec64_add_ns took around 40 seconds to do the division by repeatedly subtracting 1 billion. Reproducable when stopping the target in the debugger and resuming at the right time. This is the suggested patch: --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -301,7 +301,7 @@ static inline u32 arch_gettimeoffset(void) { return 0; } static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) { cycle_t delta; - s64 nsec; + u64 nsec; delta = timekeeping_get_delta(tkr); This is what was in the original call stack: [70130.983117] INFO: rcu_sched self-detected stall on CPU { 0} (t=2100 jiffies g=2221030 c=2221029 q=24782) [70130.991415] Task dump for CPU 0: [70130.991423] /usr/sbin/apach R running 0 893 774 0x00000002 [70130.991464] [<80017d20>] (unwind_backtrace) from [<80013408>] (show_stack+0x20/0x24) [70130.997890] [<80013408>] (show_stack) from [<8004b4bc>] (sched_show_task+0xb0/0x104) [70131.004308] [<8004b4bc>] (sched_show_task) from [<8004e100>] (dump_cpu_task+0x48/0x50) [70131.010909] [<8004e100>] (dump_cpu_task) from [<80068b88>] (rcu_dump_cpu_stacks+0x9c/0xd8) [70131.017846] [<80068b88>] (rcu_dump_cpu_stacks) from [<8006c1f8>] (rcu_check_callbacks+0x4a8/0x738) [70131.025477] [<8006c1f8>] (rcu_check_callbacks) from [<8007077c>] (update_process_times+0x50/0x70) [70131.033027] [<8007077c>] (update_process_times) from [<80081074>] (tick_sched_timer+0x78/0x264) [70131.040396] [<80081074>] (tick_sched_timer) from [<80071198>] (__run_hrtimer+0x90/0x1bc) [70131.047159] [<80071198>] (__run_hrtimer) from [<80071920>] (hrtimer_interrupt+0x134/0x324) [70131.054093] [<80071920>] (hrtimer_interrupt) from [<800164f0>] (twd_handler+0x40/0x50) [70131.060684] [<800164f0>] (twd_handler) from [<80064d10>] (handle_percpu_devid_irq+0x90/0x124) [70131.067877] [<80064d10>] (handle_percpu_devid_irq) from [<80060a4c>] (generic_handle_irq+0x3c/0x4c) [70131.075589] [<80060a4c>] (generic_handle_irq) from [<80060d10>] (__handle_domain_irq+0x6c/0xb4) [70131.082953] [<80060d10>] (__handle_domain_irq) from [<800086bc>] (gic_handle_irq+0x34/0x6c) [70131.089973] [<800086bc>] (gic_handle_irq) from [<80013f80>] (__irq_svc+0x40/0x54) [70131.096124] Exception stack(0xbe9ffee0 to 0xbe9fff28) [70131.099859] fee0: c4653600 ffffffff fdfcc63c 93877963 d1e0e8e8 00000000 be9fff60 00000004 [70131.106705] ff00: 3b9ac9ff 00000000 00d5e820 be9fff5c 0000001c be9fff28 fffffffc 80076e90 [70131.113547] ff20: a00f0013 ffffffff [70131.115729] [<80013f80>] (__irq_svc) from [<80076e90>] (__getnstimeofday64+0xe4/0x14c) [70131.122316] [<80076e90>] (__getnstimeofday64) from [<80078080>] (do_gettimeofday+0x24/0x6c) [70131.129337] [<80078080>] (do_gettimeofday) from [<8006dc4c>] (SyS_gettimeofday+0x2c/0xcc) [70131.136190] [<8006dc4c>] (SyS_gettimeofday) from [<8000ed40>] (ret_fast_syscall+0x0/0x30)
Thanks for the bug report. I think an alternative patch for this is under consideration here: https://lkml.org/lkml/2015/11/29/219 I'll see about tagging it for -stable so it gets backported as well.
Hi, meanwhile I tracked down why the IRQ was disabled for so long time, which obviously happened without using the debugger. You probably won't believe it: See pr111951.