Tech

20131217

Java Mission Control (jmc) Crashing: SIGSEGV at C [libsoup-2.4.so.1+0x6dab1] soup_session_feature_detac

I was checking around the new JMC, that comes bundle with the jdk1.7_45; this tool is like a mix of Jrockit Mission Control and jvisualvm. As for now, does not have as many tools as offers the JRMC and the connection is done through JMX, just like you would do while connecting with jvisualvm. 

But, to spoil my fun while running the ./jmc I got :



A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000003119a6dab1, pid=9245, tid=140539766241024
#
# JRE version: Java(TM) SE Runtime Environment (7.0_45-b18) (build 1.7.0_45-b18)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.45-b08 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libsoup-2.4.so.1+0x6dab1] soup_session_feature_detach+0x11
#
# Core dump written. Default location: /usr/java/jdk1.7.0_45/bin/core or core.9245

Yes, its a nice coredump, and with a filedump in which I got the header above. By looking at the filedump, I could check that this .so lib was being use by an or.eclipse.swt.internal.webkit:

Stack: [0x00007fd1f6c7f000,0x00007fd1f6d80000], sp=0x00007fd1f6d7cb20, free space=1014kNative frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)C [libsoup-2.4.so.1+0x6dab1] soup_session_feature_detach+0x11
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)j
 org.eclipse.swt.internal.webkit.WebKitGTK._soup_session_feature_detach(JJ)V+0j org.eclipse.swt.internal.webkit.WebKitGTK.soup_session_feature_detach(JJ)V+9j org.eclipse.swt.browser.WebKit.create(Lorg/eclipse/swt/widgets/Composite;I)V+920j org.eclipse.swt.browser.Browser.<init>(Lorg/eclipse/swt/widgets/Composite;I)V+81j org.eclipse.ui.internal.intro.impl.presentations.BrowserIntroPartImplementation.createPartControl(Lorg/eclipse/swt/widgets/Composite;)V+19j org.eclipse.ui.internal.intro.impl.model.IntroPartPresentation.createPartControl(Lorg/eclipse/swt/widgets/Composite;)V+74j org.eclipse.ui.intro.config.CustomizableIntroPart.createPartControl(Lorg/eclipse/swt/widgets/Composite;)V+64j org.eclipse.ui.internal.ViewIntroAdapterPart.createPartControl(Lorg/eclipse/swt/widgets/Composite;)V+9j org.eclipse.ui.internal.ViewReference.createPartHelper()Lorg/eclipse/ui/IWorkbenchPart;+406j org.eclipse.ui.internal.ViewReference.createPart()Lorg/eclipse/ui/IWorkbenchPart;+5j org.eclipse.ui.internal.WorkbenchPartReference.getPart(Z)Lorg/eclipse/ui/IWorkbenchPart;+65j org.eclipse.ui.internal.Perspective.showView(Ljava/lang/String;Ljava/lang/String;)Lorg/eclipse/ui/IViewPart;+16j org.eclipse.ui.internal.WorkbenchPage.busyShowView(Ljava/lang/String;Ljava/lang/String;I)Lorg/eclipse/ui/IViewPart;+59j org.eclipse.ui.internal.WorkbenchPage$20.run()V+21

Investigating the issue with eclipse bugs, I came across this Bug 404776; Which nicely provided a simple workaround by just adding two new parameters:

-Dorg.eclipse.swt.browser.DefaultType=mozilla
-Dorg.eclipse.swt.browser.XULRunnerPath=/urs/lib64/xulrunner
The XULRunnerPath should be set to the actual xulrunner on your OS. At this case, mine is a 64bits Fedora 19...

To Java Mission Control take these parameters, you need to edit the following file:

/usr/java/jdk1.7.0_45/lib/missioncontrol/configuration/config.ini

Just need to append those two parameters at the end of config.ini file and the Java Mission Control should start without a problem.

#echo org.eclipse.swt.browser.DefaultType=mozilla >> /usr/java/jdk1.7.0_45/lib/missioncontrol/configuration/config.ini
#echo org.eclipse.swt.browser.XULRunnerPath=/urs/lib64/xulrunner/ >> /usr/java/jdk1.7.0_45/lib/missioncontrol/configuration/config.ini

# --> root user... 

20131213

Watching and taking Thread Dumps with WLST

The hanging or looping behaviour trouble shooting can be some troublesome if you do not know how to read this snapshot. Specially if you did not get the snapshot in the correct time and also on the intervals needed to identify and inculpate a specific thread and your respective class stack.

One of the first steps to investigate is to identify how frequent and how does behave the hanging. Usually we must considered a very broad approach by asking:


  • Why do we use this program for?
  • Does it have a process peak?


Then we should ask our selves:


  • Does the hanging behaviour frequency relate with any of the questions asked previously?


Hopefully with these questions answered will help us determine when to actually wait to collect data for analysis. Some behaviour can help us identify what actually what we are looking for, if the hanging consumes a lot CPU, means that we are dealing with a possible loop, a while(true) or even recursive methods loop or recursive Architecture with bad handling exceptions loops; If the behaviour is just hanging and not consuming much CPU resources, this means is possible a deadlock or livelock. But some times the frequency and behaviour are very random, thanks to parallelism, that makes even more difficult to collect good useful data.

To analyse thread dumps if very difficult and we usually need more then just one, I would say that 10 to 20 thread dumps are good amount for investigating and the frequency between thread dumps depends how fast does the contention happens. There are many visual tools such as samurai and TDA that might help in investigating the hanging. But going back to my original mental thread (talking about parallelism ;), is actually to collect the useful data at the right time which holds the key.

On WLS, I have write a simple WLST/script which does the thread-dumping for me. That way I can spend my time on real useful things like updating my facebook or reading Dilbert strips.

How does it work:

1. Select one of the many Managed-Servers, and set the following parameters:

goto: Managed Server:Configuration:Tuning and set 


    • Stuck Thread Max Time: 15 sec (needs restart)
    • Stuck Thread Timer Interval: 10 sec (needs restart)


*This parameters have different behaviours and can be set as you like, please look for oracle WLS documentation for more details.

2. Set your domain Environment by running setDomainEnv.sh from your <DOMAIN>/bin directory:

$ . setDomainEnv.sh

3. Create a python script  in which will watch the health of the server:

<code:>

import java.lang
import os
import string
import time

def serverRuntimeNavegate():
    serverRuntime()
    cd("/")
    cd("ThreadPoolRuntime/ThreadPoolRuntime")

def runtimeNavegate():
    runtime()
    cd("/")
    cd("JVMRuntime/" serverName)

def checkHealthOfServer(serverName):
    print 'Checking : ' serverName
    os.system("echo Starting")
    serverRuntimeNavegate()
    0
    while true
        state str(cmo.getHealthState())
        check string.find(state,"HEALTH_WARN")
        if  check != -1:
          print "Warning State"
          threadDump(writeToFile="true",fileName"ServerDump" str(x))
          serverRuntimeNavegate()
          Thread.sleep(20000)
          += 1
        else
          print 'Its all good...'
          Thread.sleep(5000)

connect("weblogic","weblogic1","localhost:7001")
checkHealthOfServer(serverName)



4. All you need to do on this script is to change the username, password and server URL:PORT on the connect() command from the script, at the next to last script. Then you can call the WLST to run the script:

$ java weblogic.WLST <scriptName>.py

Basically this Jython/WLST connects to any server and check on the health, if returns OK, all it prints is a message. Soon as the WLS engine decides that exist a long running thread, the script start taking thread dumps and writes the output on files generated on the same location where you called the script.