Bug 2839
Summary: | java segfaults with 2.6.[567] x86_64 but not with 2.4.27-pre5 | ||
---|---|---|---|
Product: | Platform Specific/Hardware | Reporter: | Marc Heckmann (mh) |
Component: | x86-64 | Assignee: | Andi Kleen (andi-bz) |
Status: | CLOSED PATCH_ALREADY_AVAILABLE | ||
Severity: | high | CC: | jk, mh |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.7-rc2-bk6 + x86_64 bugfixes patch from bkbits.net | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: | Fix for segfault logging in 2.6.6 |
Description
Marc Heckmann
2004-06-06 01:21:45 UTC
At least simple java programs work fine for me on 2.6 with the 32bit 1.4.2 JDK from Sun. Does a java hello world also crash for you? I have no time to debug tomcat sorry, someone who knows more about it has to do that. Have you contacted Sun or the tomcat maintainers about it? This is information that we received from Sun in regards to this problem: Here is comment from our HotsSpot engineer: That's a kernel issue. They should not log SEGSEGV if the signal is handled by user application. JVM uses SIGSEGV and several other signals (e.g. for implicit null check, safepoint polling, etc.) What is the OS version? Redhat used to have this problem in one of their beta release, we talked to them and it's fixed in FCS. Someone should log a bug with the Linux vendor. As to performance, yes, excessive logging could have a performance impact, because it is disk I/O. But usually the segfault happens infrequently so the impact is negligible and it's mainly a cosmetic thing. Depending on the number of threads, if they get hundreds of segfaults in a second, that could imply a problem in the Java app (e.g. deref null pointer in inner loop) or in JVM. I have passed your OS information to them. Keep you posted if I get any more information. Regards, -- Ingrid Yao CAP program Developer Support Engineer Java Web Service Sun Microsystems The Sun analysis is outdated. Current kernels log the signal only when the signal is not handled by sigaction and no debugger is running. This basically only happens when the program is really crashing, it's extremly unlikely to be not a crash. I do believe that the app (the java JVM) _is_ really crashing because my webapps (running in the tomcat container) do not run correctly at all and I did manage to get 1.5.0-beta1 to dump a core file. However, the point is that the JVM's do not crash under 2.4.27-rc5 X86_64. I am not alledging that the kernel is too blame instead of java, it may be either one. Perhaps there is a bug in the JVM that it can get away with in 2.4.x. I was just hoping that someone might have some clues. Either way this is a problem for myself and others uses who want to develop Java apps. on the x86_64 platform. Once again, the application is dying with SIGABORT. don't know if that is significant or not. I am willing to help get to the bottom of this, just looking for pointers. OS is Fedora Core 2 X86_64. Created attachment 3138 [details]
Fix for segfault logging in 2.6.6
Andi, the logging in vanilla 2.6.6 seems to be broken: Only catched segfaults
get logged. With this patch it works more like intended for me.
Hi, The ! for SIGSEGV was indeed wrong. Thanks for catching this. I will fix this. But the unhandled_signal() change itself is imho not correct. What the check does is to match when PT_PTRACED is set, but fail when PT_TRACESYSGOOD (= strace running) is set. As far as I can see the original code for this is correct. Overall there must be still some other problem, because these printks have no relation to how the program works (except for making it a bit slower) Ah. Then my strace probably is too old, it doesn't seem to use PTRACE_O_TRACESYSGOOD. We'll release 1.4.2-fcs in few days. It has a lot x86_64 specific fixes (and also works on EMT64 machines). If the bug reporter still has problems (except for bogus segfault logs) with that version, he should report it at Blackdown. 32-bit JVMs on the other hand already _should_ work fine. ok, I retested my own Java code and it does indeed work fine despite the kernel messages. i noticed the fix for the false messages also made it into the vanilla kernel, so I'm going to close this one. sorry for the confusion. -m |