Bug 202063
Summary: | [Regression] Spinlock not released on kernel 4.9.147 by i915, CPU stuck | ||
---|---|---|---|
Product: | Memory Management | Reporter: | ValdikSS (iam) |
Component: | Other | Assignee: | Andrew Morton (akpm) |
Status: | RESOLVED CODE_FIX | ||
Severity: | high | CC: | airlied, chris, gordan, justincase, longman, paulmck, peterz, rostedt, will.deacon |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 4.9.150 | Subsystem: | |
Regression: | Yes | Bisected commit-id: | |
Attachments: | Proposed fix from David Airlie |
Description
ValdikSS
2018-12-25 17:47:55 UTC
Also happens on 4.9.148. Still happens on 4.9.150. On Sat, Dec 29, 2018 at 09:46:38PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202063 > > --- Comment #1 from ValdikSS (iam@valdikss.org.ru) --- > Also happens on 4.9.148. Could you please try bisecting between 4.9.146 and 4.9.147? That should help pinpoint the offending commit. Thanx, Paul I'm pretty sure that the problem in spinlock patch series. Do you want me to determine exact patch in the patchset? On Tue, Jan 15, 2019 at 04:07:46PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202063 > > --- Comment #4 from ValdikSS (iam@valdikss.org.ru) --- > I'm pretty sure that the problem in spinlock patch series. Do you want me to > determine exact patch in the patchset? Use whatever variant of bisection you like. As long as it finds the offending commit, it is no skin off my teeth. ;-) Thanx, Paul git bisect start # bad: [bbfc30f29cb328111fec12975ded8223ecc8e1a5] Linux 4.9.147 git bisect bad bbfc30f29cb328111fec12975ded8223ecc8e1a5 # good: [0cff89461d557239296735d18b5a144c8f4b151b] Linux 4.9.146 git bisect good 0cff89461d557239296735d18b5a144c8f4b151b # bad: [3e5d4c14a7427dc2a24737c8dcc61688870d737a] mac80211_hwsim: fix module init error paths for netlink git bisect bad 3e5d4c14a7427dc2a24737c8dcc61688870d737a # good: [af20483dbd7c2a01f7874191524fc0397b9d3bec] Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec" git bisect good af20483dbd7c2a01f7874191524fc0397b9d3bec # bad: [60668f3cddf1b25a954b198cade0ce726a6853ab] locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock' git bisect bad 60668f3cddf1b25a954b198cade0ce726a6853ab # good: [d395117fac7943da6966ccbac3b95651f5581f15] IB/hfi1: Remove race conditions in user_sdma send path git bisect good d395117fac7943da6966ccbac3b95651f5581f15 # good: [48c42d4dfec408760d15acc334d91208a6b2262e] locking/qspinlock: Ensure node is initialised before updating prev->next git bisect good 48c42d4dfec408760d15acc334d91208a6b2262e # good: [8e5b3bcc5291092aaac4cadc0b5fb46182172ed3] locking/qspinlock: Bound spinning on pending->locked transition in slowpath git bisect good 8e5b3bcc5291092aaac4cadc0b5fb46182172ed3 # first bad commit: [60668f3cddf1b25a954b198cade0ce726a6853ab] locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock' 60668f3cddf1b25a954b198cade0ce726a6853ab is the first bad commit commit 60668f3cddf1b25a954b198cade0ce726a6853ab Author: Will Deacon <will.deacon@arm.com> Date: Tue Dec 18 23:10:43 2018 +0100 locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock' commit 625e88be1f41b53cec55827c984e4a89ea8ee9f9 upstream. 'struct __qspinlock' provides a handy union of fields so that subcomponents of the lockword can be accessed by name, without having to manage shifts and masks explicitly and take endianness into account. This is useful in qspinlock.h and also potentially in arch headers, so move the 'struct __qspinlock' into 'struct qspinlock' and kill the extra definition. Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Acked-by: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1524738868-31318-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> On Fri, Jan 18, 2019 at 07:27:12AM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202063 > > --- Comment #6 from ValdikSS (iam@valdikss.org.ru) --- > git bisect start > # bad: [bbfc30f29cb328111fec12975ded8223ecc8e1a5] Linux 4.9.147 > git bisect bad bbfc30f29cb328111fec12975ded8223ecc8e1a5 > # good: [0cff89461d557239296735d18b5a144c8f4b151b] Linux 4.9.146 > git bisect good 0cff89461d557239296735d18b5a144c8f4b151b > # bad: [3e5d4c14a7427dc2a24737c8dcc61688870d737a] mac80211_hwsim: fix module > init error paths for netlink > git bisect bad 3e5d4c14a7427dc2a24737c8dcc61688870d737a > # good: [af20483dbd7c2a01f7874191524fc0397b9d3bec] Revert "drm/rockchip: > Allow > driver to be shutdown on reboot/kexec" > git bisect good af20483dbd7c2a01f7874191524fc0397b9d3bec > # bad: [60668f3cddf1b25a954b198cade0ce726a6853ab] locking/qspinlock: Merge > 'struct __qspinlock' into 'struct qspinlock' > git bisect bad 60668f3cddf1b25a954b198cade0ce726a6853ab > # good: [d395117fac7943da6966ccbac3b95651f5581f15] IB/hfi1: Remove race > conditions in user_sdma send path > git bisect good d395117fac7943da6966ccbac3b95651f5581f15 > # good: [48c42d4dfec408760d15acc334d91208a6b2262e] locking/qspinlock: Ensure > node is initialised before updating prev->next > git bisect good 48c42d4dfec408760d15acc334d91208a6b2262e > # good: [8e5b3bcc5291092aaac4cadc0b5fb46182172ed3] locking/qspinlock: Bound > spinning on pending->locked transition in slowpath > git bisect good 8e5b3bcc5291092aaac4cadc0b5fb46182172ed3 > # first bad commit: [60668f3cddf1b25a954b198cade0ce726a6853ab] > locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock' Thank you! Does this happen on mainline? As in, is this a bug in mainline or a bug in backporting a fix? Does reverting this patch in -stable make the problem go away? Adding Boqun on CC, as the rest are CCed on the bugzilla. Thanx, Paul > 60668f3cddf1b25a954b198cade0ce726a6853ab is the first bad commit > commit 60668f3cddf1b25a954b198cade0ce726a6853ab > Author: Will Deacon <will.deacon@arm.com> > Date: Tue Dec 18 23:10:43 2018 +0100 > > locking/qspinlock: Merge 'struct __qspinlock' into 'struct qspinlock' > > commit 625e88be1f41b53cec55827c984e4a89ea8ee9f9 upstream. > > 'struct __qspinlock' provides a handy union of fields so that > subcomponents of the lockword can be accessed by name, without having to > manage shifts and masks explicitly and take endianness into account. > > This is useful in qspinlock.h and also potentially in arch headers, so > move the 'struct __qspinlock' into 'struct qspinlock' and kill the extra > definition. > > Signed-off-by: Will Deacon <will.deacon@arm.com> > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Acked-by: Waiman Long <longman@redhat.com> > Acked-by: Boqun Feng <boqun.feng@gmail.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: linux-arm-kernel@lists.infradead.org > Cc: paulmck@linux.vnet.ibm.com > Link: > > http://lkml.kernel.org/r/1524738868-31318-3-git-send-email-will.deacon@arm.com > Signed-off-by: Ingo Molnar <mingo@kernel.org> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Signed-off-by: Sasha Levin <sashal@kernel.org> > > -- > You are receiving this mail because: > You are on the CC list for the bug. I was running kernels 4.18 and 4.19 (up to 4.19.15) and everything is fine. I think that's a bug only in 4.9.147+. Have not tested other LTS kernels (4.4, 4.14) On Fri, Jan 18, 2019 at 06:51:15PM +0000, bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=202063 > > --- Comment #8 from ValdikSS (iam@valdikss.org.ru) --- > I was running kernels 4.18 and 4.19 (up to 4.19.15) and everything is fine. I > think that's a bug only in 4.9.147+. Have not tested other LTS kernels (4.4, > 4.14) OK, I will bite... Perhaps this commit should not have been backported to 4.9-stable in the first place. So does reverting it in 4.9.147+ help? Thanx, Paul Sorry, I already deleted kernel sources and can't try right now. Better wait for commit author reply. This commit does not revert cleanly. Reverting c6bcf40f769294a80c64213f9175ccd408d64532 through c3b6e79fbf295c9cda4dd1828a8f0593cad53d48 allows this kernel and 151 to work here. Okay I looked at this with Will yesterday, got distracted by the fact that CONFIG_PARAVIRT_SPINLOCKS needs to be not set for it to happen. It appears the first chunk of the indicated patch is what causes it. diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index e07cc206919d..eaba08076030 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -14,7 +14,7 @@ */ static inline void native_queued_spin_unlock(struct qspinlock *lock) { - smp_store_release(&lock->locked, 0); + smp_store_release((u8 *)lock, 0); } seems to fix it for me, [airlied@carbonite linux]$ diff ../works-obj ../fails-obj 317c317 < 224: c6 43 10 00 movb $0x0,0x10(%rbx) --- > 224: c6 43 13 00 movb $0x0,0x13(%rbx) 492c492 < 3a0: 41 c6 44 24 10 00 movb $0x0,0x10(%r12) --- > 3a0: 41 c6 44 24 13 00 movb $0x0,0x13(%r12) 1558c1558 < dde: c6 80 ac a4 00 00 00 movb $0x0,0xa4ac(%rax) --- > dde: c6 80 af a4 00 00 00 movb $0x0,0xa4af(%rax) Which doesn't look good. there is a missing byteorder.h include somewhere. *** Bug 202295 has been marked as a duplicate of this bug. *** Created attachment 280683 [details]
Proposed fix from David Airlie
Please can you try the attached fix from David Airlie?
Fix committed in 4.9.153. Works now, thanks. |