Friday 18 September 2009

How to find Java Memory Leaks - Part 3

In the previous post I looked at how to get a heap histogram and how to use it to help to understand the cause of a memory leak. While a heap histogram will tell you what is leaking, it doesn't provide many clues about why. For the (probably) final installment, I will look at how to perform a more in-depth analysis using a heap dump. This is most definitely in the 'advanced' category of debugging.

A heap dump is a diagnostic file containing information about the entire contents of the heap. As a result, it can be a pretty big file - very roughly it will be about the same size as the space used on the heap. Before going any further you need to make sure that you have enough disk space to allow the heap dump to be written. If you are going to analyse it somewhere else (which is probably a good idea) then you need to check that you will be able to transfer such a big file.

Next, my usual warning - this level of debugging is very intrusive and will hang the JVM for several seconds so you should reproduce your issue on a test environment and take heap dumps there.

There are several ways to get a heap dump... and the options depend on which JVM version you have. Because I often need to work with 1.4 and 1.5, I usually use a JVM startup switch -XX:+HeapDumpOnCtrlBreak . Depending on your JVM version, you can also trigger a heap dump using jmap (e.g. jmap -heap:format=b), jconsole or via JMX. I've seen some very long pauses (several minutes) when using jmap like this, so be warned. There is another JVM startup switch -XX:-HeapDumpOnOutOfMemoryError which should be fairly self-explanatory. I've never used this myself because I usually want to control when heap dumps are taken and am always worried that triggering a heap dump when the JVM is already in trouble might make things even worse.

So having got your heap dump, how do you analyse it? The most frequently quoted tool is jhat from Sun. This will read your heap dump file and start a web server. You can then connect a browser to it to analyse your heap dump. I'm not going to say much more about jhat because I have not used it very often.

Naturally there are commercial tools but I wont cover them here. My own (free) tool of choice is called HeapAnalyzer from IBM. This is a Java Swing application which provides most of the capabilities available in jhat (with the notable exception of OQL) and also provides some useful power tools for quickly homing in on a memory leak - basically this tool has the feel of having been written by people who have actually spent some serious time tracking down memory leaks.

Whichever tool you choose, you will probably need to make sure that the tool itself has enough heap space by using the -Xmx option with a suitably high value when starting the tool.

Here is a screenshot of HeapAnalyzer just after opening a heap dump.



If you're lucky then the the first view that HeapAnalyzer shows will highlight your leak in blue. Before we look at this in more detail, let's look at some other views that will help us to understand what HeapAnalyzer is telling us.

View/Root List - this shows all of the 'top level' objects that will stay on the heap without being referred to by another object. This view (and the others) is sorted by 'total size' which needs a little explanation. This number is not the same as the total size of the object on the heap. HeapAnalyzer tries to do something more useful than that...

HeapAnalyzer organizes all objects as 'parent' and 'child' based on their references (the target of a reference is called the 'child'). The total size is the sum of the sizes of the object itself and all of the children that it owns. This is useful and often works in a way that seems natural, but since the heap is just a bunch of objects that refer to each other, it doesnt always present things in the way that you might expect, especially in the presence of circular reference chains (which happens quite a lot). It also tracks which objects have already been visited when working this out, so if an object has more than one 'parent', its size will only be counted under one of them, which HeapAnalyzer nominates as the 'owner'. HeapAnalyzer has to make a fairly arbitrary choice of which parent to count as the owner which may or may not be what you would consider to be 'correct'. Keep this final point in mind when using HeapAnalyzer.

View/Type List - this is very similar to the info provided by the heap histogram which I described in part 2. HeapAnalyzer also adds its 'total size' column. This is a view that I use a lot - once I find a class that is of interest, I can right click on it and select 'Find Same Type' which takes me to the...

View/Object List - this shows a list of object instances, again ordered by 'total size'. Once you have found an object which is of interest, you can right click and select 'Find Object in Tree View' to jump to the...

View/Tree View - this is probably the most useful view of all. It allows you to see an object in the context of its parents (or to be precise, owner) and children. Children are ordered by total size, so the ones that HeapAnalyzer thinks are using the most space show up at the top of the list of children. The tree view also allows you to view (in the right hand pane) the values of each attribute of the object.

Be careful with the tree view - keep in mind that it is presenting a simple view of something which is actually more complex than it appears. Each object can have multiple parents but the tree view can only show one of them (the one HA picked as the 'owner'). You can see the other parents by right clicking the object and selecting 'List Parents'.

And finally the tools - by right clicking an object in the tree view we have the option to 'go to the largest drop in subtrees' - i.e. find the point in the tree of children that is accounting for the most heap space. At the top of the view we have the 'Subpoena Leak Suspects' tool which will jump to the objects that HA has decided are the most likely candidates as leaking objects. This brings us back to the initial view that I described above because this is where HA will go to when the heap dump is first opened.

HeapAnalyzer also comes with a reasonably comprehensive help page which describes most of the key features.

So what can HA tell us? Unfortunately it still can't tell us why we have a memory leak. What it can do is allow us to home in on the leaking objects and understand which other objects are holding references to them and keeping them in the heap. Working out why we have a leak is something that we have to use our own brains to do, for example by figuring out where the code should be resetting a reference and making the objects eligible for garbage collection. This may be both difficult and time consuming. You will also need to decide which 'leaks' are actually object caches which are working as intended and eliminate these from your list of suspects... maybe. Sometimes object caches may be incorrectly tuned or misbehaving so they may really be the cause of the leak.

Finally I'd like to mention one more (and rather simpler) technique. As I observed earlier, string data is usually the thing which takes up most heap space. Looking at the content of those strings may provide a better clue about your memory leak. So use the Unix 'strings' command on the heap dump, followed by 'sort'. This will give you a big text file which you can analyze to find out which are the most commonly occurring strings. I've used this in the past to track down a JDBC related leak by finding the most common SQL statements on the heap.

There are a few pitfalls with the 'strings' approach:-
  • Double byte character sets - try using different flavours of the '-e' switch to do another run of 'strings' to pick up double byte strings.
  • Multi-line strings - what may be a single string object in Java will become multiple lines in your text file. Sorting the text file will then redistribute these lines so that parts of the same Java string are widely separated.
  • Relating the content to anything in HeapAnalyzer - I haven't found a good way to do this. It would be nice if HA had a string content search feature.
So that's it - you now have a kit bag of tools and techniques for tracking down a Java memory leak. It probably wont be easy, even with these tools so I wish you the best of luck.

No comments: