Tuesday, 11 August 2009

How to find Java Memory Leaks - Part 2

In the previous post I looked at how to monitor your application in test or production to observe its heap behaviour and understand whether or not you have a memory leak. In this post I will start to look at the next step - how to gather data to help to trace the root cause of your leak.

In some simple cases it is possible to use inspection of the source code, a debugger or trace statements to figure out what is going wrong by spotting where the developer has made a mistake. If this works for you, then that's great. I want to focus on the more complex situations where you are dealing with lots of code and lots of heap objects that may not be yours and you can't just go straight to a piece of Java and see what the problem is.

To understand a memory leak we usually need to use tools which are very different from those used to diagnose other issues. For most issues we want to understand the dynamic behaviour of our code - the steps that it performs, which paths it takes, how long it takes, where exceptions are thrown and so on. For memory leaks we need to start with something completely different - namely one or more snapshots of the heap at points in time which can give an insight into the way it is being used up. Only once we have this picture can we start to work back to the code to understand why the heap is being used up.

Before going any further, a small warning - most of the techniques from here on are not well suited to production environments. They can hang the JVM for anything between a few seconds and a few minutes and write large files to disk. You really need to reproduce your issue in a test environment and use these tools there. In a crisis, you may be allowed to use them in production if your app is already under severe heap stress and is about to be killed or restarted anyway.

The simplest type of snapshot is a heap 'histogram'. This gives you a quick summary (by class) of which objects are consuming the most space in the heap including how many bytes they are using and how many instances are in the heap.

There are a couple of ways to get a histogram. The simplest is jmap - a tool included in some JDK distributions from Java 1.5 onwards. Alternatively, you can get a histogram using a JVM startup argument -XX:-PrintClassHistogram and sending your app a Ctrl-Break (Windows) or kill -3 (Unix) signal. This JVM option is also settable via JMX if your JVM is recent enough.

Here is an example (from a test app that deliberately leaks memory)...

andy@belstone:~> jmap -histo 2937
Attaching to process ID 2937, please wait...
Debugger attached successfully.
Client compiler detected.
JVM version is 1.5.0_09-b03
Iterating over heap. This may take a while...
Object Histogram:

Size Count Class description
-------------------------------------------------------
65419568 319406 char[]
7472856 311369 java.lang.String
1966080 61440 MemStress
596000 205 java.lang.Object[]
64160 4010 java.lang.Long
34248 13 byte[]
32112 2007 java.lang.StringBuffer
10336 34 * ObjArrayKlassKlass
4512 47 java.lang.Class
4360 29 int[]
(snip)
8 1 java.lang.reflect.ReflectAccess
8 1 java.util.Collections$ReverseComparator
8 1 java.util.jar.JavaUtilJarAccessImpl
8 1 java.lang.System$2
Heap traversal took 42.243 seconds.
andy@belstone:~>

There are a few things to note here:-
  • Using this tool will hang the JVM until it has finished the histogram dump - in this case for 42 seconds. Don't try this if you can't afford to hang your JVM.
  • The objects highest in the list are java Strings and their associated character arrays. This is very common. You may also see other built in types (e.g. collections) near the top of the list.
  • What you should usually look for are the classes which appear highest in the list over which you actually have some direct control. This should give you the best clue about what types of object are involved in your memory leak.
In the (rather contrived) example above, the class 'MemStress' looks like it could be the real consumer of the heap space and indeed it is. Even though we've found it, most of the space it is using is being used indirectly - in this case by the Strings it uses to hold its instance variables. We were fortunate that MemStress objects on their own are big enough to push the class high enough up the list to be noticed. In some cases you may not be so lucky - instances of the main culprit may be quite small and it may therefore be hiding lower down the list. Try importing the list into a spreadsheet and sorting by instance count to see if that offers any additional clues.

It's usually best to have several heap histograms from the same run of your app at different points in time. Comparing these will give you a much better picture of how your memory leak is building up over time. Try to arrange to trigger them so that your app is in a similar state (usually idle) for each dump - then you will be comparing 'like for like'.

A heap histogram may be all you need to figure out what's going wrong, in which case you can don't need to worry about the next step.

What a histogram can't tell you is why each object is staying in the heap. This happens because at least one other object is still holding a reference to it. A full heap dump has the information needed to trace these references and also look in detail at the values of individual attributes in every object on the heap. It's also a big step up in size and complexity from a heap histogram. I'll look at how to take and analyse a full heap dump in the next installment.

No comments: