Wednesday 28 September 2011

And the Second Prize Goes To...

I've written here before about the virtues of Eclipse MAT for doing heap analysis but I recently encountered a situation in which I couldn't persuade it to give me the answers that I really needed.

I was analysing a set of heap dumps from an application and MAT's leak suspects report showed me that circa 300Mb of heap space was being retained by a single object. The object in question was a metadata manager, so it was well known that this object would retain a big chunk of heap space. This overhead was fixed size, and so not really of interest. The report categorised everything else in a single bucket as 'other', which wasn't much help.

I really needed to know about the most significant objects that were retaining space outside of the retained set of the metadata manager. While it's possible to explore around using MAT views like the histogram and come up with some answers, I didn't feel that this was a very methodical approach and it would be quite easy to miss something important.

I realised that the problem could be helped by applying a little simple set theory - what we are interested in is how a histogram or retained set looks after subtracting the retained set of the number one leak suspect. This is very simple to do, but very repetitive - an ideal job for a script, so that's what I wrote.

So at the bottom of the post is a perl script that will read two files saved as CSV from either the MAT histogram or retained set views and print the result of the subtraction in CSV format to its output. Some views don't automatically calculate the retained size, so you need to select the rows and ask MAT to do at least the minimal retained set calculation before saving the CSV. If you only speak Windows, then you can use the script by installing a suitable perl implementation - I usually use ActiveState's ActivePerl.

I managed to use this quite successfully. I started by subtracting the retained set of the metadata manager from the overall histogram. By continuing to subtract more 'suspects' from the result, I was able to identify the second, third and fourth placed consumers of heap space in my dump files, which I felt was going far enough.

Finally, the script... unfortunately Blogger doesn't seem to allow attachments.



# perl script to subtract one Java retained set from another.
# Retained sets or histograms can be saved in CSV format from Eclipse MAT
# The retained size is required, so ask MAT to calculate it if it is missing
#
# usage: perl subtract-retained.pl .csv .csv [.csv]...
#
# Andy Carlson
# http://threeturves.blogspot.com/
# September 2011
#
use strict;

my ($basefile) = shift;
my (%count,%shallow,%retained,%loaders);
my ($lineno) = 0;

# read the baseline histogram
open (HIN,$basefile) || die;
while () {
$lineno++;
next if ($lineno == 1);
next if (!$_);
my ($class,$count,$shallow,$retained) = split (/,/);

if (!$retained && ($retained ne '0')) {
warn "warning: no retained size for $class at $basefile line $lineno - skipping remainder of file";
last;
}

$loaders{$class}++;
$count{$class} += $count;
$shallow{$class} += $shallow;
$retained{$class} += $retained;
}
close HIN;

# now read one or more retained sets and subtract from the base histogram
while (1) {
my ($rsetfile) = shift;
last if (!$rsetfile);
open (HIN,$rsetfile) || die;
$lineno = 0;
while () {
$lineno++;
next if ($lineno == 1);
next if (!$_);
my ($class,$count,$shallow,$retained) = split (/,/);
if (!$retained && ($retained ne '0')) {
warn "warning: no retained size for $class at $rsetfile line $lineno - skipping remainder of file";
last;
}
$count{$class} -= $count;
$shallow{$class} -= $shallow;
$retained{$class} -= $retained;
}
close HIN;
}

# produce the output
print "class,objects,shallow heap,retained heap,loaders\n";
my ($class);
foreach $class (sort keys (%loaders)) {
print "$class,$count{$class},$shallow{$class},$retained{$class},$loaders{$class}\n";
}