I also wanted to check on the very limited explanations that I'd found and get a better understanding of what was going on. In particular, I wanted to know how Java (i.e. the Sun JVM) allocates heap memory and whether it adopts any strategies to avoid swap thrashing.
I did some further digging using the various things under the /proc file system and the JDK source code to find out.
The first surprise was in /proc/meminfo - the only counter that was going up significantly during the test was 'Mapped' - i.e. memory mapped files. I was expected this approach to be used for reading in .jar files and native libraries (and it was), but I wasn't expecting this for the heap. Digging into the source code explains why - The JDK uses the mmap system call to request more heap memory from the O/S.
I also took several snapshots of /proc/PID/smaps to see exactly what memory regions were being used in the processes address space. What this showed was:-
- There was a memory region (in my case starting from 0x51840000) that was clearly growing as the app allocated more and more heap. During the early part of the app's execution this would show up with a resident size 7Mb less than its overall size and with all of the resident pages showing up as being dirty.
- Once memory started to become scarce, many of the other memory regions start to show a reduction in their resident sizes and their shared sizes.
- Once swap thrashing was happening, the memory region which had been growing still had a 6-7Mb difference between the resident size and the allocated size. The big difference, however was that 15Mb of the space was now showing up as 'Private_Clean'.
- During the early stages of execution the app is getting as much memory as it asks for but Linux delays giving it real physical memory until the specific pages are really accessed. This explains why the resident size is less than the allocated size - Java has probably extended the heap but hasn't yet accessed all of the allocated space.
- Memory is getting scarce, so Linux starts to reclaim pages that have the smallest impact. In the first instance it is hunting around for less critical pages (e.g. pages of jar files or libraries that haven't been used recently) that it can reclaim.
- This behaviour surprised me a little - I was expecting the resident size of the heap to have reduced, but this doesn't seem to have happened. What we can see is that part of the heap is now 'clean' - this tells us that Linux has indeed flushed part of the heap out to the swap file. The fact that the resident size has not reduced significantly tells us that we aren't getting much benefit - basically I think that the swapper is trying to swap pages out but the garbage collector is pulling them all back in again.
So it would seem to me that the design choices in terms of the in-memory layout of objects and their garbage collector data mean that the garbage collectors really do conflict with the swapper once memory becomes tight. Based on the simple test that I did earlier, this happens both suddenly and with a severe impact.
In real life situations there may be several other Java and non-Java apps running on the same machine. I think this has a couple of implications:-
- The requirements of other apps may mean that memory becomes scarce much sooner - i.e. well before your Java heap size reaches the amount of physical memory.
- The swapper is not redundant - there may be plenty of 'low risk' pages belonging to other apps (or JAR mappings used only at startup time) that can be swapped out before the system gets to the point of swap thrashing.