The theory behind Memory Management - Java
A deep dive into Memory Management and how it is implemented in Java
In the first part of this series, I wrote about some abstract concepts that programming languages use to manage their memory. In this parts I will discuss how each programming language utilise some of these concepts for their own memory management. I will focus on languages like Java, Python, Go and Rust. In this part we will start with Java.
Garbage Collection Process
Java's heap memory is divided into 4 main parts (plus some others that we won't focus on right now). Eden Space, Survivor Space 1, Survivor Space2 and Tenured Space.
Java utilises Generational Garbage Collection, Tracing (Mark and Sweep), Stop-the-world Mechanism and Compaction in their Garbage Collection process.
Every object in Java has a header with the following values:
- Type — To identify the class or type of the object.
- Lock — Used in synchronise statements.
- Mark — Used during mark and sweep face of the garbage collector.
- When you create an object, it's created in the heap and is referenced from the stack.
- At the beginning all objects are created in Eden space.
- When eden is full it triggers a Minor GC.
- The Mark step starts from the root (usually references from the stack) and traverse it's way in depth to all connected objects, marking each reachable object as alive.
- The Sweep step removes all the unreachable objects from the heap and moves all alive objects to one of the two survivor spaces in a contiguous manner.
At this point you don't have memory fragmentation, Eden space is empty, Survivor 1 space has all the alive objects contiguously and Survivor 2 is empty.
When the previous cycle happens again and the Minor GC gets triggered. This leaves us with a problem as the GC comes along and clears out some space in both Eden and the Survivor space, but the spaces aren't contiguous.
Java solution to this problem was to move all the objects in Eden and the survivor space to the other empty survivor space contiguously.
After an objects survives a specific number of GC, it moves to the Tenured Space where the older generation live.
When the tenured space is almost full, Java triggers a Major GC process that goes over all the heap like previously discussed and remove the unreachable objects from all the spaces.
This algorithm is a stop-the-world algorithm as it requires the program to be paused for a specific period of time. Depending on which GC you use, this stop-the-world can change.
Garbage Collectors
JVM comes with few Garbage Collectors for different use-cases.
Serial collector-> Its very basic, used in case of small amount of heap. Runs in a single thread. Single thread perform GC operations then application resumes.
Concurrent collector (CMS)-> Threads that perform GC along with application runs in parallel. It does not wait for old generation to be full.
Parallel collector-> Used multiple CPU core to perform GC. So multiple threads perform all GC oprerations once old generation almost full. It does not run concurrent with applications. Application halts once parallel GC triggers and resumes once it completes.
G1 garbage collector-> It was implemented from Java7 onward. It mixes the goodness of Parallel and CMS to return better heap utilization. It divides the heap in small regions, which regions have more garbage that picks up first. This is for multi-processor machines with a large amount of memory and is enabled as default on most modern machines and OS.
Z Garbage Collector-> This GC was introduced in JDK11. It is a scalable low-latency collector. It’s concurrent and does not stop the execution of application threads, hence no stop-the-world. It is intended for applications that require low latency and/or use a very large heap(multi-terabytes).
Performance Issues
Memory Leaks
The memory usage increases until a garbage collection process is triggered and the allocations drops back down. Heap usually remains steady so the drop should be to more or less the same volume. With a memory leak, each GC process clears a smaller portion of heap objects (although many objects left behind are not in use) so heap utilisation will continue to increase until the heap memory is full and an OutOfMemoryError exception will be thrown. This is known as Memory Leak.
When memory leaks occur, continuous Stop the World events are also problematic. As less heap memory space is cleaned with each GC process which lead to less time for the remaining memory to fill up. When memory is full, Java triggers another GC process. Eventually, the JVM will be running repeated Stop the World events causing major performance concerns.
Additional Memory
As discussed earlier, Java stores a header for each object with 3 variables. If you're creating an object with only a 32-bit integer it should only require 4 bytes.
class Example {
int health;
}
Then we add additional 16 bytes for the header to it. This is a 4:1 ration between the overall size and the actual data size.
Memory Fragmentation
Java overcomes Fragmentations through Compaction. The cost of keep moving all these objects in memory is actually expensive, add that to stop-the-world event that happens in most collectors' fragmentation event, the overall cost of Java GC is expensive.
Conclusion
Understanding how Java GC behaves and which Collectors at your disposal will help you take better decisions to write optimised code. In next part we will explain how CPython manages it's memory in details.