Bug 60774

Summary: Userspace SEGV using latest kernels
Product: Memory Management Reporter: Jeff Shorey (shoreyjeff)
Component: OtherAssignee: Andrew Morton (akpm)
Status: CLOSED CODE_FIX    
Severity: normal CC: alan, dsmythies, petr, xnox
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: 3.10.8 and 3.11-rc6 Subsystem:
Regression: No Bisected commit-id:
Attachments: Java post-mortem dump

Description Jeff Shorey 2013-08-20 21:03:18 UTC
Created attachment 107257 [details]
Java post-mortem dump

Running MATLAB startup java (or the installer) results in SEGV.  Java post-mortem attached.  Only difference is kernel.  3.10.7 is fine, 3.11-rc5 is fine.  Running on an i7-3720QM.

Maybe the TLB updates which were applied to both kernels?  Would have to perform bisection to figure out exactly which patch is causing the problem.
Comment 1 Andrew Morton 2013-08-20 22:50:03 UTC
Linus fixed a TLB issue in 2b047252d087be7f2ba088b4933cd904f92e6fce, which was released in 3.11-rc6.  But that bug appears to have been there since 3.7.2.

Yes, it would be tremendously helpful if you could bisect it, please?
Comment 2 Jeff Shorey 2013-08-21 02:51:08 UTC
Ok - I bisected it.  It's not the TLB after all ;-).  Here is the change that causes my (very repeatable fortunately) SEGV behavior (verified in both 3.10.8 and 3.11-rc6):

> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index dbded5a..48f8375 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -101,7 +101,7 @@ static void find_start_end(unsigned long flags, unsigned
> long *begin,
>                               *begin = new_begin;
>               }
>       } else {
> -             *begin = TASK_UNMAPPED_BASE;
> +             *begin = mmap_legacy_base();
>               *end = TASK_SIZE;
>       }
>  }

Sorry - I don't know what the exact issue was that contains this file as I'm not adept at surfing through the git commits.
Comment 3 Jeff Shorey 2013-08-21 02:58:40 UTC
My guess is this commit:

 x86 get_unmapped_area(): use proper mmap base for bottom-up direction
Comment 4 Dmitrijs Ledkovs 2013-08-23 14:00:21 UTC
Well I also have javac SEGV on me:
 * openjdk-6 & oracle-jdk-6
 * openjdk-7 is fine

When building android open source tree. Rebooting into 3.10.x kernel resolved the issue. Are we interested in that minimal javac invocation reproducer? Could be a bug in java and not actually a kernel bug.
Comment 5 Petr Vandrovec 2013-08-24 22:34:14 UTC
FYI, java crashes only if you run with 'ulimit -s unlimited'.  If you set stack limit to something reasonable (8MB), there is no crash.  Unfortunately 'make' seems to set stack size to unlimited if it can: setting 'ulimit -Hs 8192' before doing 'make' fixed crashes on my boxes, until I can check whether 41aacc1eea645c99edbe8fbcf78a97dc9b862adc fixed it.

From 'strace' it seems that java is supposed to get SIGSEGV, and handle it.  But with old allocation pattern (or with small stack) SIGSEGV is in 0x7Fxxxxxxxxxxx area, while with new code it is in 0x2Axxxxxxxxxx area, and apparently for 0x2Axxxxx SIGSEGV handler in java treats it as unexpected crash.
Comment 6 Doug Smythies 2013-08-26 14:40:44 UTC
*** Bug 60785 has been marked as a duplicate of this bug. ***
Comment 7 Jeff Shorey 2013-08-26 15:43:24 UTC
This is fixed for me in 3.11-rc7.