Let's address this one question straight away - if you really have a memory leak, increasing the heap size will at best only delay the appearance of an OutOfMemoryError. If your issue is just a temporary spell of high demand for heap memory then a bigger heap may help. If you take it too far you may see much worse performance because of swapping or even crash your JVM.
So what should you do? In essence, I would recommend three steps:-
1. Monitor your application in test and/or production to get an understanding of its heap behavior
2. If you really do have a problem, you need to gather data to help to track down the cause.
3. Finally having got the data, you need to figure out what it means - what is causing your problem and how to fix it.
Sadly these steps are not always easy to do - particularly step 3.
I'll be talking about the Sun JVM from here on, although many of the same principles and sometimes tools apply to other JVM implementations.
First let's look at monitoring...
To get a clear picture, you really need to look in detail at the heap stats from your JVM, but I quite often find that these are not available when I'm first asked to look at a problem, so what can we tell without this?
Clearly if you see this in your application logs then there is a problem...
java.lang.OutOfMemoryError: Java heap space
The usual response if you have real users is to restart the application straight away. You should then continue to monitor because you're probably going to need to know more before you get to the end of the diagnosis.
I often start by looking at CPU consumption using top, vmstat or whatever tools are available. What I'm looking for is periods where the Java app is using close to 100% of a single CPU. While it's not conclusive proof (being stuck in a loop could cause the same effect), this often means that the app is spending a lot of time doing full garbage collections - the most commonly used garbage collectors are single threaded and therefore tend to use 100% of a CPU.
Monitoring memory usage from the operating system is not very informative - Java will typically grow the heap to the maximum size long before there is a problem. If you have oversized your heap compared to physical memory you will probably see heavy swapping activity, but that's a different issue.
Some applications and app servers will make calls to java.lang.Runtime.freeMemory(). While this may be better than nothing, it's fairly uninformative and provides very little info about which generations have free space, how much time is being spent garbage collecting and so on.
The JVM can provide much more detailed info about what is going on. The JVM itself provides two main ways to get hold of this:-
1. JVM Startup arguments to write information to log files. Here are some suggestions, but check the docs for your specific JVM version (or try here) because the available options vary:-
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintTenuringDistribution
You may also want to add -Xloggc:myapp-gc.log to send your GC info to a dedicated log file
You can then analyse the log files to produce a graph using a tool such as gcviewer. This approach gives you detailed information about every GC run but be careful to check that you have enough disk space - these options can produce a lot of data. Typically you will also need to restart your app to enable these options, although they are also controllable via JMX for more recent JVM versions.
2. JDK tools - jvmstat in Java 1.4 and jstat in Java 1.5 and higher. These are tools that you run from the command line. They connect to your Java app via one of the JVM debugging APIs and will then report on heap and GC stats according to the options you specify. I usually use jstat -gcutil -t
There are other commercial tools (e.g. Wily Introscope) which can provide the same data as the JDK tools. They may provide better visualisation, historical reporting and perhaps proactive alerting
So what should you look for to know whether you have a problem?
- OutOfMemoryErrors in your application logs - clearly you have a problem if you are getting these.
- A persistent upwards trend in the Old Generation usage after each full garbage collection.
- Heavy GC activity. What 'heavy' means is rather dependant on your application. A rule of thumb for a non-interactive app might be 15 seconds during each minute - which means that your app only has 75% of its time left to do useful work. You might choose a lower figure or look at the duration of individual full GC runs, especially if your app has real users waiting for it.
Here is another graph from the same app. The original leak has been fixed and there is now no evidence of a cumulative memory leak, but there are some intermittent spikes in heap usage. If the spike is severe enough it can still cause an error. Tracking this problem down will need some knowledge about what unusual types of system activity are occurring at the time of the spike.
Here is a graph drawn by gcviewer after using JVM startup arguments to produce a log file. The graph is quite 'busy' and includes a lot of extra info - for example this app's arguments allow the heap to grow as more is needed. The graph shows the growth in overall heap size as well as the growth in consumption. This info is available using jstat too, but if you really need it all then using gcviewer will probably be more convenient.
In the next post I'll look at how to gather data to help trace the cause of your memory leak.
No comments:
Post a Comment