Bug 95431 - regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
Summary: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb...
Status: CLOSED CODE_FIX
Alias: None
Product: Timers
Classification: Unclassified
Component: Other (show other bugs)
Hardware: i386 Linux
: P1 normal
Assignee: john stultz
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-24 19:49 UTC by Trevor Cordes
Modified: 2016-08-17 17:49 UTC (History)
0 users

See Also:
Kernel Version: kernel-PAE-3.19.1-201.fc21.i686
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
(probably) final patch to fix this bug, from John Stultz @ LKML (2.61 KB, patch)
2015-05-11 16:46 UTC, Trevor Cordes
Details | Diff

Description Trevor Cordes 2015-03-24 19:49:51 UTC
Symptom: since upgrading from Fedora 19 (3.16) to Fedora 21 (3.17+) my MythTV infrared blaster (using lirc) stopped working.  It would blast one IR command usually and then stop blasting for a while.  I am using 32-bit PAE on this old MythTV box.

I have successfully bisected and located a commit that is causing my
problem.  Look at commit 166afb64511.  I have confirmed 100% this is causing the problem.

Here's my guess at the bug:
ktime_to_us returns s64, but the commit changes it so ktime_to_us just
returns what ktime_divns returns, and ktime_divns returns a u64!  If the
u64 is big enough, wouldn't it wrap s64 around to a negative number?  Or,
perhaps if some caller is passing in negative ktime_t to begin with it
will trigger without having to hit big numbers.  With my limited knowledge
of C, I am stabbing in the dark here.

This little patch (at bottom of email) that puts the code back in place
and gets rid of the function call fixes the problem for me.  I applied
this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm
and rpmbuilded and the bug is gone!  I can once again MythTV.  Hooray.

I suspect no one else is seeing this because less people are running
32-bit now, and perhaps in most code paths the value of the u64 never gets
above 2^63.  I suspect something in drivers/media (possibly) is passing
very high or negative values (possibly another bug) to these calls.

Obviously my patch isn't the real solution, the real solution is to make
the new function calls use a consistent 64-bit type, or figure out what in
my code path is calling these functions and check it for value sanity.

I've documented the whole process / details of this bug in RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353

diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
--- a/include/linux/ktime.h     2015-02-08 20:54:22.000000000 -0600
+++ b/include/linux/ktime.h     2015-03-23 01:09:43.000000000 -0500
@@ -173,12 +173,16 @@

 static inline s64 ktime_to_us(const ktime_t kt)
 {
-	return ktime_divns(kt, NSEC_PER_USEC);
+/*     return ktime_divns(kt, NSEC_PER_USEC); */
+	struct timeval tv = ktime_to_timeval(kt);
+	return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
 }

 static inline s64 ktime_to_ms(const ktime_t kt)
 {
-	return ktime_divns(kt, NSEC_PER_MSEC);
+/*     return ktime_divns(kt, NSEC_PER_MSEC); */
+	struct timeval tv = ktime_to_timeval(kt);
+	return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC;
 }

 static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier)
Comment 1 Trevor Cordes 2015-05-11 16:46:26 UTC
Created attachment 176431 [details]
(probably) final patch to fix this bug, from John Stultz @ LKML
Comment 2 Trevor Cordes 2015-05-11 16:48:32 UTC
LKML guys helped me out and came up with a "real" patch that solves this bug.  Here is the writeup from John Stultz:

"
It was noted that the 32bit implementation of ktime_divns()
was doing unsigned division and didn't properly handle
negative values.

And when a ktime helper was changed to utilize
ktime_divns, it caused a regression on some IR blasters.
See the following bugzilla for details:
  https://bugzilla.redhat.com/show_bug.cgi?id=1200353

This patch fixes the problem in ktime_divns by checking
and preserving the sign bit, and then reapplying it if
appropriate after the division, it also changes the return
type to a s64 to make it more obvious this is expected.

Nicolas also pointed out that negative dividers would
cause infinite loops on 32bit systems, negative dividers
is unlikely for users of this function, but out of caution
this patch adds checks for negative dividers for both
32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure
no such use cases creep in.
"

This bug is probably fixed now.  Will close this bug when it appears that the patch has been accepted.  Thanks everyone!
Comment 3 Trevor Cordes 2015-06-01 06:49:18 UTC
This bug is now fixed in stock Fedora 21 kernel (4.0.4-202.fc21.i686+PAE).  I assume that means it is also committed to vanilla.  As such I'll mark this ticket as closed.  Please see the RHBZ (link at top) for more details.

Thanks everyone!!

Note You need to log in before you can comment on or make changes to this bug.