Bug 195235

Summary: llist.h llist_for_each_entry/llist_for_each_entry_safe have undefined behavior
Product: Other Reporter: Erich Keane (erich.keane)
Component: OtherAssignee: other_other
Status: NEW ---    
Severity: high CC: erich.keane, sos22-lk, ying.huang
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: 3.13->Trunk Subsystem:
Regression: No Bisected commit-id:

Description Erich Keane 2017-04-03 19:22:17 UTC
The above 2 macros (the first copied below for convenience) have a very nasty bit of UB in their exit condition:

#define llist_for_each_entry(pos, node, member)				\
	for ((pos) = llist_entry((node), typeof(*(pos)), member);	\
	     &(pos)->member != NULL;					\
(pos) = llist_entry((pos)->member.next, typeof(*(pos)), member))

Note the exit condition &(pos)->member != NULL.

For clarification, note the operator precedence makes this the equivalent to:
&((pos)->member) != NULL.

The problem here is that dereferencing a NULL pointer is Undefined behavior, so the Clang compiler (and presumably others) are able to assume that the address of "member" is ALSO not a NULL.  Therefore, emitting an infinite loop is permissible.

Additionally, llist_for_each_entry_safe does the same thing, however it seems to omit the parends around "pos" for some reason.

I believe that the above macro (and the corresponding one) should be the following, though someone with commit rights should likely validate the purpose of this function.
#define llist_for_each_entry(pos, node, member)				\
	for ((pos) = llist_entry((node), typeof(*(pos)), member);	\
	     pos != NULL;					\
(pos) = llist_entry((pos)->member.next, typeof(*(pos)), member))
Comment 1 sos22-lk 2017-04-03 19:52:40 UTC
Is that right? I thought the marker for end of an llist was that member.next is NULL, which isn't quite the same condition as llist_entry(member.next) being null?

Maybe something like this:

#define llist_for_each_entry(pos, node, member)				                \
	for ((pos) = (node) ? llist_entry((node), typeof(*(pos)), member) : NULL;	\
	     (pos) != NULL;					                        \
             (pos) = (pos)->member.next ? llist_entry((pos)->member.next, typeof(*(pos)), member) : NULL)

Although adding a load of extra branches just to avoid unfortunate compiler optimisations is kind of sad. Or maybe:

#define llist_for_each_entry(pos, node, member)				\
	for ((pos) = llist_entry((node), typeof(*(pos)), member);	\
	     (intptr_t)(pos) + offsetof(typeof(*pos), member) != 0;					\
             (pos) = llist_entry((pos)->member.next, typeof(*(pos)), member))
Comment 2 Erich Keane 2017-04-03 19:58:33 UTC
If thats the case, that isn't what the compiler is doing, whether or not there is an optimizer.

Assuming ZERO optimizers, &pos->member != NULL will only be false in the event that "member" is the first member, AND POS is NULL, right?  Otherwise why would "(intptr_t)(pos) + offsetof(typeof(*pos), member)" be zero?
Comment 3 sos22-lk 2017-04-03 20:28:30 UTC
Well, no. In the absence of optimisation, &pos->member is null if member is 10 bytes into the pos type and pos is itself -10. -10 is not null, so &pos->member can be null even when pos isn't. llist_entry(NULL, type, member) returns -offsetof(type, member), so llist_entry(X, type, member) + offsetof(type, member) (with appropriate casts) should be zero precisely when X is null, so my suggestion preserves the existing behavior in the absence of optimization. I would guess (emphasize guess) that it'd be enough to stop the compiler breaking things for you with optimizations on.

A couple of things to note, though:

1) I've not made any attempt to check any of this by actually happens, so it's possible that I've just misunderstood something.

2) The kernel bugzilla doesn't get as much attention as you might expect. If you actually want this fixed, you'd probably be better off posting on the mailing list, or maybe going directly to rev(com, intel., at, huang, ying.), since he wrote that macro.

3) When you make a report, try to include a minimal demonstration of something going wrong. At a minimum, you should probably include an example of the macro being used in a way which you think leads to incorrect behavior, and the compiler you were testing with.
Comment 4 Erich Keane 2017-04-03 20:36:42 UTC
Wow... thats ALSO likely UB (pointer wrapping, since the underlying type of a pointer isn't a signed or unsigned type!).

Fortunately, Ying is apparently from my company as well, so I'll follow up via email. Thanks!
-Erich
Comment 5 Huang Ying 2017-04-05 01:01:13 UTC
There is some discussion on LKML about this, please refer to it for Linux kernel developers' opinions on this issue.

https://lkml.org/lkml/2017/1/15/194