Bug 60774 - Userspace SEGV using latest kernels
Summary: Userspace SEGV using latest kernels
Status: CLOSED CODE_FIX
Alias: None
Product: Memory Management
Classification: Unclassified
Component: Other (show other bugs)
Hardware: x86-64 Linux
: P1 normal
Assignee: Andrew Morton
URL:
Keywords:
: 60785 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-08-20 21:03 UTC by Jeff Shorey
Modified: 2013-11-13 15:55 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.10.8 and 3.11-rc6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Java post-mortem dump (27.90 KB, text/x-log)
2013-08-20 21:03 UTC, Jeff Shorey
Details

Description Jeff Shorey 2013-08-20 21:03:18 UTC
Created attachment 107257 [details]
Java post-mortem dump

Running MATLAB startup java (or the installer) results in SEGV.  Java post-mortem attached.  Only difference is kernel.  3.10.7 is fine, 3.11-rc5 is fine.  Running on an i7-3720QM.

Maybe the TLB updates which were applied to both kernels?  Would have to perform bisection to figure out exactly which patch is causing the problem.
Comment 1 Andrew Morton 2013-08-20 22:50:03 UTC
Linus fixed a TLB issue in 2b047252d087be7f2ba088b4933cd904f92e6fce, which was released in 3.11-rc6.  But that bug appears to have been there since 3.7.2.

Yes, it would be tremendously helpful if you could bisect it, please?
Comment 2 Jeff Shorey 2013-08-21 02:51:08 UTC
Ok - I bisected it.  It's not the TLB after all ;-).  Here is the change that causes my (very repeatable fortunately) SEGV behavior (verified in both 3.10.8 and 3.11-rc6):

> diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
> index dbded5a..48f8375 100644
> --- a/arch/x86/kernel/sys_x86_64.c
> +++ b/arch/x86/kernel/sys_x86_64.c
> @@ -101,7 +101,7 @@ static void find_start_end(unsigned long flags, unsigned
> long *begin,
>                               *begin = new_begin;
>               }
>       } else {
> -             *begin = TASK_UNMAPPED_BASE;
> +             *begin = mmap_legacy_base();
>               *end = TASK_SIZE;
>       }
>  }

Sorry - I don't know what the exact issue was that contains this file as I'm not adept at surfing through the git commits.
Comment 3 Jeff Shorey 2013-08-21 02:58:40 UTC
My guess is this commit:

 x86 get_unmapped_area(): use proper mmap base for bottom-up direction
Comment 4 Dmitrijs Ledkovs 2013-08-23 14:00:21 UTC
Well I also have javac SEGV on me:
 * openjdk-6 & oracle-jdk-6
 * openjdk-7 is fine

When building android open source tree. Rebooting into 3.10.x kernel resolved the issue. Are we interested in that minimal javac invocation reproducer? Could be a bug in java and not actually a kernel bug.
Comment 5 Petr Vandrovec 2013-08-24 22:34:14 UTC
FYI, java crashes only if you run with 'ulimit -s unlimited'.  If you set stack limit to something reasonable (8MB), there is no crash.  Unfortunately 'make' seems to set stack size to unlimited if it can: setting 'ulimit -Hs 8192' before doing 'make' fixed crashes on my boxes, until I can check whether 41aacc1eea645c99edbe8fbcf78a97dc9b862adc fixed it.

From 'strace' it seems that java is supposed to get SIGSEGV, and handle it.  But with old allocation pattern (or with small stack) SIGSEGV is in 0x7Fxxxxxxxxxxx area, while with new code it is in 0x2Axxxxxxxxxx area, and apparently for 0x2Axxxxx SIGSEGV handler in java treats it as unexpected crash.
Comment 6 Doug Smythies 2013-08-26 14:40:44 UTC
*** Bug 60785 has been marked as a duplicate of this bug. ***
Comment 7 Jeff Shorey 2013-08-26 15:43:24 UTC
This is fixed for me in 3.11-rc7.

Note You need to log in before you can comment on or make changes to this bug.