Why does java app crash in gdb but runs normally in real life?

This was too long for a single comment on the accepted answer. Its basically links quoting for future reference (in case the pages vanish).

Some of you might find interest in part 2.

Table of contents

  1. Small trick
  2. Reasons / documentation
  3. Signal Chaining between native code and JVM

0. Small trick

A way around the issue could be to force the JVM to invoke a GDB console on error using the following JVM launch directive (see this blog page from Alexey Pirogov which can be also found in Oracle Java doc along with several usage example):

-XX:OnError="gdb - %p"

p will be replaced with the PID.

Example output from the blog post below. From what I read, it looks like the JVM is able to tell if a given SIGSEGV is Java-induced (and use it silently) or if it comes from a (C++) lib. As far as I understand, this means the GDB session would start on a "legit" SIGSEGV occurrence, with a correct context.

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7348cba806, pid=10055, tid=10057
#
# JRE version: OpenJDK Runtime Environment (10.0.2+13) (build 10.0.2+13->    Ubuntu-1ubuntu0.18.04.4)
# Java VM: OpenJDK 64-Bit Server VM (10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libJNIDemo.so+0x806]  Java_jnidemo_JNIDemoJava_nativeCrash+0x1c
#
...
(gdb)

I found statements in this SO answer inconsistent with the Oracle Java doc description, but I would rather trust Oracle doc.

1. Reasons / documentation

I found this link https://www.ateam-oracle.com/why-am-i-seeing-sigsegv-when-i-strace-a-java-application-on-linux

It gives some insight for JVM behind-the-scene implementation.

The JVM is a multi-threaded process and so under the covers it's using signals to do OS level threading.

But the JVM is also doing a metric ton of other really clever stuff; for example in a regular C/C++ program [emphazis mine] hitting a NULL (a Zero) when you're expecting a pointer to some structure would cause your application to crash. That crash is actually, as you can probably guess by now, the OS sending your process a signal - specifically SIGSEGV. If your app didn't register a signal handler for that signal (and 99.5% of c/c++ apps out there don't) then the signal comes back up to the OS which then terminates the app and (usually) saves the memory state into a core file.

The JVM does register a signal handler for SIGSEGV and not just because it doesn't want to crash out when something goes wrong. The JVM registers a signal handler for SIGSEGV because it actually uses SIGSEGV and a bunch of other signals for its own purposes. [emphazis mine]

[...] And that's perfectly normal and completely safe.

The above link also points to this https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/signals.html

Signal

  • SIGSEGV, SIGBUS, SIGFPE, SIGPIPE, SIGILL

    Used in the implementation for implicit null check, and so forth.

  • SIGQUIT

    Thread dump support: To dump Java stack traces at the standard error stream. (Optional.)

  • SIGTERM, SIGINT, SIGHUP

    Used to support the shutdown hook mechanism (java.lang.Runtime.addShutdownHook) when the VM is terminated abnormally. (Optional.)

  • SIGUSR1

    Used in the implementation of the java.lang.Thread.interrupt method. (Configurable.) Not used starting with Solaris 10 OS. Reserved on Linux.

  • SIGUSR2

    Used internally. (Configurable.) Not used starting with Solaris 10 OS.

  • SIGABRT

    The HotSpot VM does not handle this signal. Instead it calls the abort function after fatal error handling. If an application uses this signal then it should terminate the process to preserve the expected semantics.

2. Quotes related to Signal Chaining

The Oracle link indicates that some actions can be taken to better handle the signals between JVM and non-java code. This is referred as signals chaining.

NOTE: I do not know if it works, and if it has any positive effect when using debugging a library called by a Java app.

I think it won't help at intercepting the "right" signal during a GDB session. But maybe with custom handler code + breakpoint it could?

From my understanding, it seems suited for a native application that would embbed a JVM, not a JVM app that embbeds a native library. I keep quotes there for completness

Quoting:

If an application with native code requires its own signal handlers, then it might need to be used with the signal chaining facility.

An application can link and load the libjsig.so shared library before libc/libthread/libpthread. This library ensures that calls such as signal(), sigset(), and sigaction() are intercepted so that they do not actually replace the Java HotSpot VM's signal handlers if the handlers conflict with those already installed by the Java HotSpot VM. Instead, these calls save the new signal handlers, or chain them behind the VM-installed handlers. During execution, when any of these signals are raised and found not to be targeted at the Java HotSpot VM, the pre-installed handlers are invoked.

The proposed procedure:

Perform one of these two procedures to use the libjsig.so shared library.

  1. Link it with the application that creates/embeds a HotSpot VM [remark: so this is not relevant for a library loaded from a Java app ...] , for example:

    cc -L libjvm.so-directory -ljsig -ljvm java_application.c
    
  2. Use the LD_PRELOAD environment variable, for example [see https://stackoverflow.com/questions/426230/what-is-the-ld-preload-trick]:

    export LD_PRELOAD=libjvm.so-directory/libjsig.so; java_application (ksh)
    
    setenv LD_PRELOAD libjvm.so-directory/libjsig.so; java_application (csh)
    

The interposed signal(), sigset(), and sigaction() return the saved signal handlers, not the signal handlers installed by the Java HotSpot VM and which are seen by the operating system.

Note that SIGUSR1 cannot be chained.

1


Why does java app crash in gdb but runs normally in real life?

Because it doesn't actually crash.

Java uses speculative loads. If a pointer points to addressable memory, the load succeeds. Rarely the pointer does not point to addressable memory, and the attempted load generates SIGSEGV ... which java runtime intercepts, makes the memory addressable again, and restarts the load instruction.

When debugging java programs, one has to generally do this:

(gdb) handle SIGSEGV nostop noprint pass

Unfortunately, if there is some JNI code involved, and that code SIGSEGVs, GDB will happily ignore that signal as well, resulting in the death of inferior (being debugged) process. I have not found an acceptable solution for that latter problem.