Migrating Java Applications to Kubernetes

Your CTO messages you out of the blue one day:

How much effort would it be to run XYZ on Kubernetes?

Shudder.

Although there are some nuances to the process, it is fairly straightforward. This article assumes some knowledge of containers, Kubernetes, and JVM applications. Our goal is to migrate the application with as few changes as possible and for the application to run as well or better on Kubernetes.

Use a Cgroups-Friendly JDK

The first step in migrating your JVM application to Kubernetes is to make sure that your JDK is a version that is cgroups-friendly (or container-aware). Prior to Java 8u202, the JVM did not behave correctly while running inside of a container because it did not respect Linux cgroups which are used to enforce resource limits. If cgroups is configured to limit the amount of CPU time for your container, the JVM would detect the wrong number of available CPUs and try to utilize all of them. This matters because the JVM uses the number of available CPUs to configure the size of the fork join pool, the number of garbage collector threads, and the garbage collector implementation itself. There are also a number of frameworks and libraries that use the number of available CPUs for configuring the number of threads, event loops, etc. For instance, Netty uses two times the number of processors for the default number of event loops. While nothing bad might happen, your application may exhibit strangeness.

This is also a good opportunity to upgrade to the latest LTS Java release because both the Java language and JVM have improved a lot in the last few years. But if you are stuck with Java 8, it’s recommended to use 8u372 or later because it has cgroups v2 support. If you are already running a newer JDK, then great! If not, then you have some work and regression testing to do to make sure your application works in a newer JDK.

In terms of JDK runtime base images, I recommend the images produced by Eclipse Temurin and Amazon Corretto.

Liveness and Ready Probe Endpoints

Kubernetes uses liveness and readiness probes to determine the health of running containers and will restart unhealthy containers if necessary. The probes can be mapped to any HTTP endpoint that return a 200 OK status to indicate success. If your application does not have endpoints that can be used for this purpose, you’ll have to add them:

A liveness (health check) endpoint which returns a 200 OK status if the application has started and is alive.
A ready endpoint which returns 200 OK status if the application is ready for traffic.

There is a nuance here that the two different probes allow the application to express that it is alive but not yet ready. This is a very powerful concept that we can leverage if the application needs to load data or perform initialization before it is ready for traffic. For this case, we would configure the liveness probe to begin immediately and the readiness probe to begin after an initial delay period during which the application would get itself ready.

CYA: Containerize Your Application

The next step in this process is to package your JVM application in an image with a Dockerfile. There are two main schools of thought here:

The Dockerfile is a multi-stage build which builds the application with Maven/Gradle in the first stage and then copies the artifacts (JAR files, start script, etc) onto a base JVM runtime image in the second stage. This approach is completely self-contained as the build tools are in the first stage’s base image.
Maven/Gradle will build the application and then the Dockerfile will copy the artifacts onto the base JVM runtime image.

Either one is fine and you should pick the one that best works with your CI/CD situation. After building your image, it should have the application JAR files on a JDK base image with no embedded application configuration or credentials, right?

External Application Configuration

Most applications need some sort of configuration to initialize itself whether it is a properties file, logging configuration file, or secrets. It is considered a bad practice to bundle configuration within the image especially if there is no way to override it.

Luckily, Kubernetes has built-in primitives to support external configuration:

ConfigMaps for providing external configuration data to the container via volume mount or container environment variables
Secrets for providing confidential configuration data to the container via volume mount or container environment variables

For example, the contents of your logback.xml file can be stored in a ConfigMap and then mounted as a file to the container at a specific path where Logback will know where to look for it (there’s an example of this later). The same can be done for your application configuration properties file. For AWS credentials, those can be stored as a Secret and then exposed to the container as environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION).

The one thing to know about Kubernetes ConfigMaps and Secrets is that if they are updated, the updates are propagated in different ways. Updates to ConfigMaps and Secrets exposed as volume mounts are automatically propagated but it is up to the running application to react to the updated files. However, updates to ConfigMaps and Secrets exposed as container environment variables require a pod restart for the application to get the updated values.

Container Entry Point

In my experience, it is much more manageable in the long run to have an entry point script that contains the java -jar command to start your application than having it in the Dockerfile itself. My typical entry point script looks like this:

VM options for configuring garbage collection: -server, -Xms and -Xmx, -Xlog:gc, etc.
VM option for setting the time zone: -Duser.timezone=UTC
VM option for configuring SecureRandom: -Djava.security.egd
VM option for configuring Logback configuration file path: -Dlogback.configurationFile
VM option for attaching and configuring the JDWP agent for remote debugging
VM option for configuring JMX
JAR file paths
Main class name

I also use environment variables to customize or override parts of the entry point script. For instance, my entry point script checks for an environment variable named DEBUG to enable JDWP debugging. It also makes sense to use environment variables to avoid having to create a new image to update a hard coded configuration value.

Observability

Let’s talk about application logs in a Kubernetes environment. In the pre-container world, we used to configure the logging library to write logs to disk that would get rolled over daily and eventually deleted. This can also work in a container but the log files will disappear if and when the pod restarts or is relocated to another node. In Kubernetes, application logs are expected to be emitted out onto the System.out console so they can be retrieved with kubectl logs. Typically, a tool like Vector or Logstash is used to aggregate container logs on the cluster and “ship” them to a system like ELK for indexing and retrieval.

Application metrics are another aspect of observability. In Kubernetes, application metrics are exposed via an HTTP endpoint that is then periodically scraped by a tool like Prometheus for shipping to another system for storing, aggregation, alerting, and retrieval. This is beyond the scope of this article but Micrometer and Prometheus are the most common Java libraries used to do this.

Testing the Image

At this point, we should have a working container image with externalized configuration and secrets. Pat yourself on the back! Testing the image for regressions is an important step because they should behave identically. Beyond traditional regression testing, there are a couple of techniques that you should consider using:

Mirror traffic – Envoy/Istio (and most other proxies) have support for sending a copy of a portion of traffic to another host. For example, Envoy/Istio can be configured to send 1% of traffic to a staging environment. With the right observability tooling, this is a great way for new releases to safely get production traffic to validate that everything is working correctly. However, not all traffic can be mirrored and there is a lot of cost to get this set up and working.
GoReplay – GoReplay is a HTTP traffic capture and playback tool that can be used if traffic mirroring is not possible. Production traffic can be captured to a file, cleaned up, and then replayed to the staging environment for analysis.
Load testing – It’s possible that your application may not behave correctly in a container environment due to an issue or an outdated library somewhere. I think that it’s important to check that the containerized version of your application does not have any performance regressions. WireMock and Testcontainers are tools that can allow you to create application upstreams to perform load testing with fault simulation in a safe way. Grafana k6 is my favorite load testing tool as of late because it supports both HTTP and gRPC load testing.

Kubernetes Resource Requests and Limits

Your container is not the only container running on a Kubernetes node. There could be dozens or hundreds of containers running on a node as Kubernetes moves workloads (a group of containers, or a pod) around nodes to balance resource utilization across the cluster. The container is sharing CPU, memory, disk, and network resources with other containers so it’s important to play nice with others by specifying container resource requests and limits. The resource requests and limits also dictates the quality of service class that is assigned to our workload. Let’s go into this a bit deeper and see how that applies to our JVM application.

Requests and Limits

The Kubernetes scheduler uses the resource requests of a workload to determine which nodes have enough available resources to run the workload. Once it determines that a node has enough resources, the node will begin running the workload. Kubernetes will also reserve the requested resources exclusively for the workload on the node.

Resource limits are used to prevent the workload from using more of that resource than allowed but it is important to know that the memory limit is enforced differently than the CPU limit. If a container exceeds the memory limit, the container is killed and restarted. If a container exceeds the CPU limit, the container’s CPU usage is throttled. We will go more into this later.

If a resource limit is not provided, the workload will be allowed to use any available amounts of that resource on the node.

Memory Resource Misconceptions

The memory resource unit is expressed in bytes. For example, a memory request of 512M is 512 megabytes. A most common source of problems is when the memory limit is equal to or close to the JVM heap maximum size. The heap is not the only memory used in the container! A JVM process has JIT code caches, class storage, and direct memory buffers in addition to the heap. Memory backed volumes (emptyDir) also count against the memory limit. When setting a container memory limit, it’s important to account for all of these or your container will be killed quite often because of an undersized limit.

CPU Resource Misconceptions

Although the CPU resource request and limit resource unit is expressed in CPU core units, it is not the number of CPU cores allocated to the container! It is actually the amount of CPU time the container is allowed to use (also known as the quota) per time period. For instance, a CPU request of 0.5 means that the container is allowed to use up to 50 milliseconds of CPU time per 100 millisecond period. For a CPU request of 2.0, the container is allowed to use up to 200 milliseconds of CPU time per 100 milliseconds. The JVM doesn’t help clear this confusion either because a CPU request of 2 will cause the JVM to see 2 available CPUs! This CPU resource unit nuance is a common misconception among many engineers.

CPU Limit Throttling

If your container uses more than the allowed CPU in a period, then Kubernetes will throttle the container’s CPU usage until the next period. The container is effectively starved of CPU so everything is stopped in the container. Several things can happen here and all of them are bad:

Application threads are paused so performance will heavily degrade. If you have code that performs actions with timeouts, you will see timeouts and issues during the throttling period.
Garbage collection threads are paused as well which increases JVM heap pressure.
Until the throttling ends, some threads will get CPU time while others will not. Application performance will be degraded and uneven until threads can “catch up.”
An unintended consequence is that Kubernetes liveness and ready probes may fail if the application threads are paused. Kubernetes will kill the container if the probe failure thresholds are exceeded which can cause unintended systemic consequences.

It should be obvious that we do not want our application to be CPU throttled! The alternative to setting a CPU limit is to not set a CPU limit at all. When the CPU limit is not set, the container will be allowed to use any available CPU on the node. But there is a tradeoff here because Kubernetes (and your ops team) is happiest when there is a defined CPU limit.

Quality of Service Classes

Kubernetes uses the resource requests and limits to classify every workload with a quality of service (QoS) class. The QoS class is used to determine which workloads can be evicted from a node when the node is under resource pressure. There are three classes:

Guaranteed – all containers in the workload have resource requests and limits where the request is the same as the limit, least likely to be evicted
Burstable – at least one containers in the workload have resource requests and/or limits, more likely to be evicted than guaranteed QoS
Best effort – none of the containers in the workload have resource requests or limits, most likely to be evicted than burstable QoS

Configuring the JVM and Resource Requests and Limits

Every workload is different but this is a general guide to a good starting point configuration for most JVM applications.. I highly encourage you to test and tweak to see what works well for your setup. It is important to not rely on the JVM defaults as they are generally unsuitable for production. In fact, I recommend being explicit with your configuration and not rely on JVM defaults at all.

Let’s start with the Java heap size and memory request and limit. How much heap does your application need to run? Let’s say that it is 512MB. There are two common ways to configure the heap: the classic -Xms and -Xmx method or with -XX:MaxRamPercentage.

The classic -Xms and -Xmx method will initially size the heap and set a heap limit. For example, -Xms512m and -Xmx512m will configure the JVM heap to be initially sized at 512MB and will not be allowed to grow beyond 512MB. As we discussed earlier, we need to account for the other parts of the JVM in the memory request and limit. Generally, the heap will account for about 70% of the total JVM memory usage. If we are using a 512MB heap, the total memory usage of the container will be about 768MB so that’s what we will use as a starting point for our container memory request and limit.

The other method, -XX:MaxRamPercentage, works backwards from the container memory as a starting point. The -XX:MaxRamPercentage method configures the JVM to use a percentage of the container memory for the maximum heap size. If we use the rule of thumb, -XX:MaxRamPercentage=70 with a container request and limit of 768M will achieve the same JVM settings as the method above. This method is also more convenient than the -Xms and -Xmx method because we only need to size the container memory usage.

Should you set a container memory limit? Yes, your ops team will love you for being a good citizen on the cluster. If your containers keep getting killed by Kubernetes for exceeding the memory limit, your application probably has a memory leak that needs to be diagnosed. Another good tip is set -XX:+ExitOnOutOfMemoryError so the JVM will exit if it runs out of memory and the container can be restarted. Without the flag, the JVM can get into a zombie state where it can’t free or allocate memory and do much of anything.

Let’s move onto the CPU request and limit. What kind of application is it? How much CPU does your application need to run and be responsive? Generally, JVM applications do better with more than one CPU in order to run application threads along with garbage collection. A good starting point for most applications is setting the container CPU request to 2 and the JVM will configure itself for 2 CPUs and Kubernetes will reserve 2 CPU core units for the container.

This will work but it will not be an efficient usage of CPU resources on the node if your application uses less than 2 CPU core units and/or is idle most of the time. We can set a lower CPU request of 0.5 or 1 CPU core unit but then the JVM will only configure itself for 1 CPU. The workaround is to use a JVM flag, -XX:ActiveProcessorCount=2, that we can set to configure the JVM to see 2 CPUs.

As for the CPU limit, you should ask whether being throttled is acceptable for your workload or not. For a batch application, getting throttled periodically will not be noticeable. Conversely, a web application and its users will definitely suffer if throttled so perhaps it would be best to not set a CPU limit. Maybe your cluster admins have mandated that all workloads have to have a CPU limit set so that all workloads are good cluster citizens. Even setting a high limit (2-3x the request) may be a good balance between getting throttled regularly and having no limit at all. My point is that you should choose the best CPU limit for your situation.

It is also worth mentioning that the JVM uses more CPU when the application starts in order to do JIT compilation, code caching, and other HotSpot optimizations. If this is a problem for your workload, Kube Startup CPU Boost is a solution that temporarily boosts the CPU requests and limits of a pod during startup.

Next, let’s choose the garbage collector which determines the garbage collection pause characteristics of your application. The “best” garbage collector for your workload is a function of the heap size, number of CPUs, and desired application responsiveness. Since we are configuring for more than one CPU, our job becomes a little bit easier. If your heap is small (less than 4GB) or the workload does not need to be responsive, then serial or parallel collectors will be good enough. But if you have a larger heap (4GB or later), then the G1 collector is a good choice.

These resource requests and limits and JVM configuration are a good starting point. I recommend load testing to establish a baseline performance profile. Once you have a baseline, you can make incremental changes to look for further improvements.

Into Production

Now that you have a production ready application running on Kubernetes, what is the best way to shift traffic to it? Typically, we can do this with swapping DNS records or making changes to a load balancer configuration. While this happens, it’s important to keep an eye on your observability tooling like metrics and logging. Once you’re comfortable with the application on Kubernetes, we can “burn the boats” and decommission the old production environment.

Congratulations but we’ve only scratched the surface of what’s possible on Kubernetes! There’s a lot more that can be done that is outside the scope of this article like horizontal pod scaling, blue-green deployments, and service meshes. You’ll also know exactly what to do the next time your CTO needs a JVM application migrated to Kubernetes!