Notes on tuning G1GC for low-latency batch workloads

2025-08-19 · ~9 min read

Batch workloads are a weird case for the JVM garbage collector. Most GC tuning literature is written for either long-running web services (low pause time is the whole game) or for true batch (throughput is everything, pauses are fine). Low-latency batch — "a 15-minute job that absolutely must finish in 15 minutes, 95th-percentile" — sits awkwardly between them, and G1 is often a reasonable default that nonetheless needs a few adjustments.

Here's the set of flags and heuristics that have worked for me on batch-grid workloads over the last few years, and a few that I learned not to use.

The defaults that are usually wrong

G1's default region size and pause target are tuned for a generic server workload, which isn't what we're running. Two flags are worth touching first:

-XX:MaxGCPauseMillis=100
-XX:G1HeapRegionSize=16m

MaxGCPauseMillis is a soft goal, not a hard cap. Setting it too aggressively (10ms, 20ms) triggers more frequent young-gen collections and reduces throughput substantially. For a batch job where the sink is a DB or a distributed store, 100–200ms pauses are usually invisible anyway. I aim for 100ms and let G1 figure out the rest.

G1HeapRegionSize matters more than you'd think. The default (computed from heap size) often gives you small regions, which means short-lived objects exceeding the region size get promoted to humongous allocations and skip the usual young-gen path. If you have lots of mid-size objects (say, 2–8 MB records), bumping region size to 16m or 32m can dramatically reduce humongous allocation churn.

Humongous allocations: the quiet killer

A humongous object in G1 is anything larger than half a region. They're allocated directly in old gen, they can't be moved, and they pin down regions that could otherwise be reclaimed. If your job does lots of humongous allocations, you'll see old gen fill up fast and trigger mixed GCs more often than expected.

Worth logging and watching:

-Xlog:gc+heap=trace
-Xlog:gc+humongous=debug

If you see humongous allocations dominate the profile, your options are: increase region size (if the objects are in the 4–16 MB range, this often fixes it), or restructure the hot path to not allocate these objects in the first place — streaming or chunked processing instead of materializing a giant collection.

Young gen sizing

G1 adjusts young gen dynamically, within the bounds of G1NewSizePercent and G1MaxNewSizePercent. The defaults (5% and 60%) are usually fine, but for allocation-heavy batch workloads I've had success bumping the max:

-XX:G1NewSizePercent=20
-XX:G1MaxNewSizePercent=60

Setting the floor higher means more room for short-lived objects to die young without being promoted to old gen, which reduces mixed GC pressure later. On a 32 GB heap, this means young gen starts at ~6.4 GB instead of 1.6 GB, and that 6.4 GB of headroom absorbs a lot of churn.

String deduplication

If your batch job processes records with repetitive string fields (ticker symbols, account IDs, enum-like categorical values), turn on string deduplication:

-XX:+UseStringDeduplication

It's only applicable to G1 and it has a small CPU cost, but the memory saving can be substantial for text-heavy payloads. I've seen heap savings of 15–20% on systems that process serialized market data with many repeated fields.

Things I don't touch anymore

InitiatingHeapOccupancyPercent. The old wisdom was to lower this (from 45 to 35 or 30) to trigger concurrent marking earlier. With recent JDKs, G1's adaptive IHOP is genuinely good and I've stopped second-guessing it.

ParallelGCThreads and ConcGCThreads. Defaults are fine for modern hardware. The old advice to set these manually was useful in JDK 8; JDK 17+ picks reasonable values.

Trying ZGC. Tempting. Sometimes it's the right answer — if you have a very large heap (100+ GB) and strict pause requirements, ZGC is fantastic. For the typical 16–32 GB batch workload, G1 is simpler to reason about and the pause reduction isn't worth the change.

Always turn on logging

Whatever flags you set, the one universal win is turning on GC logging:

-Xlog:gc*:file=/var/log/app/gc-%t.log:tags,uptime,time,level
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/app/heap-dumps/

Without good logs, GC tuning is guessing. With them, most of the above becomes an educated read of what the collector is actually doing, rather than a ritual based on what worked five years ago.