Distribution: Fedora Core 2 Hardware Environment: Shuttle SN85G4, NFORCE3, Athlon 64 2800 Software Environment: java sdk from java.sun.com or blackdown.org, 32 bit or 64 bit versions. Problem Description: The java binaries (sdk) from blackdown.org or java.sun.com, 64 or 32 bit, segfault on me using the 2.6.[567] kernels (haven't tried any older 2.6.x) but not with 2.4.27-pre5. If I understand the code in arch/x86_64/mm/fault.c correctly, 2.4 would produce the same messages if java was segfaulting too. As a consequence, my java webapps do not function under 2.6.x. Below is an excerpt of the messages I get: java[3028]: segfault at 0000000000000000 rip 0000000057a1d2d5 rsp 00000000ffff9f30 error 4 java[3028]: segfault at 0000000000000000 rip 0000000057a1d2d5 rsp 00000000ffff9f5c error 4 Steps to reproduce: 1. download java sdk (http://java.sun.com) and jakarta-tomcat (http://jakarta.apache.org/tomcat) 2. export JAVA_HOME=/path/to/java/sdk/install 3. cd /path/to/tomcat/jakarta-tomcat-4.1.30 4. "./bin/startup.sh" and watch dmesg/syslog for the messages. This is 100% reproducable for me. -m
At least simple java programs work fine for me on 2.6 with the 32bit 1.4.2 JDK from Sun. Does a java hello world also crash for you? I have no time to debug tomcat sorry, someone who knows more about it has to do that. Have you contacted Sun or the tomcat maintainers about it?
This is information that we received from Sun in regards to this problem: Here is comment from our HotsSpot engineer: That's a kernel issue. They should not log SEGSEGV if the signal is handled by user application. JVM uses SIGSEGV and several other signals (e.g. for implicit null check, safepoint polling, etc.) What is the OS version? Redhat used to have this problem in one of their beta release, we talked to them and it's fixed in FCS. Someone should log a bug with the Linux vendor. As to performance, yes, excessive logging could have a performance impact, because it is disk I/O. But usually the segfault happens infrequently so the impact is negligible and it's mainly a cosmetic thing. Depending on the number of threads, if they get hundreds of segfaults in a second, that could imply a problem in the Java app (e.g. deref null pointer in inner loop) or in JVM. I have passed your OS information to them. Keep you posted if I get any more information. Regards, -- Ingrid Yao CAP program Developer Support Engineer Java Web Service Sun Microsystems
The Sun analysis is outdated. Current kernels log the signal only when the signal is not handled by sigaction and no debugger is running. This basically only happens when the program is really crashing, it's extremly unlikely to be not a crash.
I do believe that the app (the java JVM) _is_ really crashing because my webapps (running in the tomcat container) do not run correctly at all and I did manage to get 1.5.0-beta1 to dump a core file. However, the point is that the JVM's do not crash under 2.4.27-rc5 X86_64. I am not alledging that the kernel is too blame instead of java, it may be either one. Perhaps there is a bug in the JVM that it can get away with in 2.4.x. I was just hoping that someone might have some clues. Either way this is a problem for myself and others uses who want to develop Java apps. on the x86_64 platform. Once again, the application is dying with SIGABORT. don't know if that is significant or not. I am willing to help get to the bottom of this, just looking for pointers. OS is Fedora Core 2 X86_64.
Created attachment 3138 [details] Fix for segfault logging in 2.6.6 Andi, the logging in vanilla 2.6.6 seems to be broken: Only catched segfaults get logged. With this patch it works more like intended for me.
Hi, The ! for SIGSEGV was indeed wrong. Thanks for catching this. I will fix this. But the unhandled_signal() change itself is imho not correct. What the check does is to match when PT_PTRACED is set, but fail when PT_TRACESYSGOOD (= strace running) is set. As far as I can see the original code for this is correct. Overall there must be still some other problem, because these printks have no relation to how the program works (except for making it a bit slower)
Ah. Then my strace probably is too old, it doesn't seem to use PTRACE_O_TRACESYSGOOD. We'll release 1.4.2-fcs in few days. It has a lot x86_64 specific fixes (and also works on EMT64 machines). If the bug reporter still has problems (except for bogus segfault logs) with that version, he should report it at Blackdown. 32-bit JVMs on the other hand already _should_ work fine.
ok, I retested my own Java code and it does indeed work fine despite the kernel messages. i noticed the fix for the false messages also made it into the vanilla kernel, so I'm going to close this one. sorry for the confusion. -m