Bug 111951 - dead-lock in __schedule with IRQs disabled
Summary: dead-lock in __schedule with IRQs disabled
Status: NEW
Alias: None
Product: Platform Specific/Hardware
Classification: Unclassified
Component: ARM (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: linux-arm-kernel@lists.arm.linux.org.uk
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-05 09:15 UTC by Bernd Edlinger
Modified: 2016-02-05 09:15 UTC (History)
0 users

See Also:
Kernel Version: 3.18
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Bernd Edlinger 2016-02-05 09:15:16 UTC
We had an issue with an Altera CycloneV SoC using Cortex A9 r3p0 with ECO Fix.

The symptoms are sporadic watchdog reboots on a linux 3.18 SMP Kernel.

I tracked that down by disabling the watchdog, and instrumenting the kernel
to catch the deadlock where it happens.  It turned out to be not a real
dead-lock but instead the system blocks both cores waiting for the same
interrupt spin-lock for about a minute and then suddenly continues.  The
instrumentaition traps, when the timer IRQ is late for more than 1 second.
It always happend immediately after __schedule re-enables the IRQs, but not
before.  It is however very difficult to reproduce, I have one system on my
desk where it happens about once per day, and another identical system where
it never happened.

The instrumentation was as follows (after several iterations, the bug is no
longer reproducible when I put debug-code inside the code section where
__schedule disables IRQs).

Note: the change in timekeeping_get_ns was suggested in
https://bugzilla.kernel.org/show_bug.cgi?id=108991
because due to the long delays we had signed 64-bit integer overflows here,
which causes additional trouble.

Index: linux-socfpga/kernel/time/timekeeping.c
===================================================================
--- linux-socfpga/kernel/time/timekeeping.c	(revision 5540)
+++ linux-socfpga/kernel/time/timekeeping.c	(working copy)
@@ -202,8 +202,7 @@ static inline s64 timekeeping_get_ns(struct tk_rea
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
-	nsec = delta * tkr->mult + tkr->xtime_nsec;
-	nsec >>= tkr->shift;
+	nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
 
 	/* If arch requires, add in get_arch_timeoffset() */
 	return nsec + arch_gettimeoffset();
@@ -503,6 +502,7 @@ int __getnstimeofday64(struct timespec64 *ts)
 		nsecs = timekeeping_get_ns(&tk->tkr);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsecs >= 2 * NSEC_PER_SEC);
 
 	ts->tv_nsec = 0;
 	timespec64_add_ns(ts, nsecs);
@@ -544,6 +544,7 @@ ktime_t ktime_get(void)
 		nsecs = timekeeping_get_ns(&tk->tkr);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsecs >= 2 * NSEC_PER_SEC);
 
 	return ktime_add_ns(base, nsecs);
 }
@@ -570,6 +571,7 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs
 		nsecs = timekeeping_get_ns(&tk->tkr);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsecs >= 2 * NSEC_PER_SEC);
 
 	return ktime_add_ns(base, nsecs);
 
@@ -612,6 +614,7 @@ ktime_t ktime_get_raw(void)
 		nsecs = timekeeping_get_ns_raw(tk);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsecs >= 2 * NSEC_PER_SEC);
 
 	return ktime_add_ns(base, nsecs);
 }
@@ -641,6 +644,7 @@ void ktime_get_ts64(struct timespec64 *ts)
 		tomono = tk->wall_to_monotonic;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsec >= 2 * NSEC_PER_SEC);
 
 	ts->tv_sec += tomono.tv_sec;
 	ts->tv_nsec = 0;
@@ -913,6 +917,7 @@ void getrawmonotonic(struct timespec *ts)
 		ts64 = tk->raw_time;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsecs >= 2 * NSEC_PER_SEC);
 
 	timespec64_add_ns(&ts64, nsecs);
 	*ts = timespec64_to_timespec(ts64);
@@ -1732,6 +1737,7 @@ ktime_t ktime_get_update_offsets_now(ktime_t *offs
 		*offs_boot = tk->offs_boot;
 		*offs_tai = tk->offs_tai;
 	} while (read_seqcount_retry(&tk_core.seq, seq));
+	BUG_ON(nsecs >= 2 * NSEC_PER_SEC);
 
 	return ktime_add_ns(base, nsecs);
 }
Index: linux-socfpga/kernel/time/hrtimer.c
===================================================================
--- linux-socfpga/kernel/time/hrtimer.c	(revision 5540)
+++ linux-socfpga/kernel/time/hrtimer.c	(working copy)
@@ -1200,6 +1200,8 @@ static void __run_hrtimer(struct hrtimer *timer, k
 	struct hrtimer_cpu_base *cpu_base = base->cpu_base;
 	enum hrtimer_restart (*fn)(struct hrtimer *);
 	int restart;
+	ktime_t start_time, current_time;
+	start_time = ktime_get_raw();
 
 	WARN_ON(!irqs_disabled());
 
@@ -1232,6 +1234,8 @@ static void __run_hrtimer(struct hrtimer *timer, k
 	WARN_ON_ONCE(!(timer->state & HRTIMER_STATE_CALLBACK));
 
 	timer->state &= ~HRTIMER_STATE_CALLBACK;
+	current_time = ktime_get_raw();
+	BUG_ON((u64)(current_time.tv64 - start_time.tv64) > 99999999);
 }
 
 #ifdef CONFIG_HIGH_RES_TIMERS
Index: linux-socfpga/kernel/sched/core.c
===================================================================
--- linux-socfpga/kernel/sched/core.c	(revision 5540)
+++ linux-socfpga/kernel/sched/core.c	(working copy)
@@ -2786,6 +2786,7 @@ need_resched:
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
 	 */
+	ktime_get_raw();
 	smp_mb__before_spinlock();
 	raw_spin_lock_irq(&rq->lock);
 


The only thing that can possibly wait in __schedule are raw_spin_locks.
And it turns out, that the wfe instruction might be causing the problems:

This patch seems to fix the deadlock:

Index: linux-socfpga/arch/arm/include/asm/spinlock.h
===================================================================
--- linux-socfpga/arch/arm/include/asm/spinlock.h	(revision 5540)
+++ linux-socfpga/arch/arm/include/asm/spinlock.h	(working copy)
@@ -73,7 +73,7 @@
 	: "cc");
 
 	while (lockval.tickets.next != lockval.tickets.owner) {
-		wfe();
+		cpu_relax();
 		lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
 	}
 
But this does _not_ fix the deadlock:

Index: linux-socfpga/arch/arm/include/asm/spinlock.h
===================================================================
--- linux-socfpga/arch/arm/include/asm/spinlock.h	(revision 5540)
+++ linux-socfpga/arch/arm/include/asm/spinlock.h	(working copy)
@@ -74,6 +74,7 @@
 
 	while (lockval.tickets.next != lockval.tickets.owner) {
 		wfe();
+		isb();
 		lockval.tickets.owner = ACCESS_ONCE(lock->tickets.owner);
 	}
 


Also this does _not_ fix the deadlock:

Index: linux-socfpga/arch/arm/include/asm/spinlock.h
===================================================================
--- linux-socfpga/arch/arm/include/asm/spinlock.h	(revision 5540)
+++ linux-socfpga/arch/arm/include/asm/spinlock.h	(working copy)
@@ -40,6 +40,8 @@
 
 	dsb(ishst);
 	__asm__(SEV);
+	isb(sy);
+	__asm__(SEV);
 }
 
 /*


It may play a role, that the Kernel does _not_ use the ARM_ARCH_TIMER
on the Altera CycloneV SoC and therefore does not enable the SEV Event Stream.

I read the latest Cortex A9 errata document and can not find any
known erratum of that kind.

Currently we are using this patch which seems to work for us:

Index: arch/arm/include/asm/spinlock.h
===================================================================
--- arch/arm/include/asm/spinlock.h    (Revision 5580)
+++ arch/arm/include/asm/spinlock.h    (Arbeitskopie)
@@ -7,6 +7,12 @@
 
 #include
 
+#ifdef CONFIG_ARM_ERRATA_WFE_BROKEN
+#undef wfe
+#define wfe()
+#define WFE(x)
+#define dsb_sev()
+#else
 /*
  * sev and wfe are ARMv6K extensions.  Uniprocessor ARMv6 may not have the K
  * extensions, so when running on UP, we have to patch these instructions away.
@@ -41,6 +47,7 @@
     dsb(ishst);
     __asm__(SEV);
 }
+#endif
 
 /*
  * ARMv6 ticket-based spin-locking.
Index: arch/arm/Kconfig
===================================================================
--- arch/arm/Kconfig    (Revision 5580)
+++ arch/arm/Kconfig    (Arbeitskopie)
@@ -1253,6 +1253,13 @@
       loop buffer may deliver incorrect instructions. This
       workaround disables the loop buffer to avoid the erratum.
 
+config ARM_ERRATA_WFE_BROKEN
+    bool "ARM errata: wfe broken"
+    depends on CPU_V7 && SMP
+    help
+      This option disables the wfe instruction for raw_spin_locks.
+      At least some CycloneV SoC devices wait much too long in wfe.
+
 endmenu
 
 source "arch/arm/common/Kconfig"


Further testing showed that the time window for the dead-lock to appear
must be very small.  It is dependent on the linux and the gcc version
as well:  It is definitely reproducible with gcc-6-20151108 and
gcc-6-20151213 and linux-3.18.

But on the other hand it does not reproduce on every (identical) device.
I have two affected systems on my desk and one which is not affected.

Note You need to log in before you can comment on or make changes to this bug.