The Art and Science of Probing a Kubernetes Container

Look ready, and stay alive.

(I originally published this story in my technical blog*)

Keeping containers alive in a Kubernetes cluster can feel more like art than science.

In this article, I dive into the sea of madness awaiting those responsible for authoring container probes, with particular attention to the relatively new addition of startup probes to the mix. Along the way, I leave a breadcrumb trail of curated links you can use to take the next step in implementing the various suggestions in the article.

Starting, nay, requesting the start of a new container in a Kubernetes cluster is relatively simple: provide a pod specification to the cluster, possibly as a pod template wrapped inside one of the various flavors of workload resources, such as a Deployment or a Job. After receiving the pod specification, the kube-scheduler assigns the pod to a node, then the kubelet in that node starts the containers in the pod.

Pods follow a clear lifecycle, and part of that lifecycle allows the kubelet to probe the pod containers to ensure they are responsive throughout their existence. The probes follow a contract where pod containers advertise endpoints from which the kubelet can poll different facets of their internal status.

As a short recap, there are three types of probes to represent the internal status of a container:

Figure 1 — Relationship between container probes and a pod lifecycle

Readiness probes

If a container fails to respond to its readiness probe, the kubelet removes the container from the service load balancer, effectively diverting traffic away from the container. Here, the developer hopes that a replica elsewhere can handle that traffic.

The design of a readiness probe is somewhat straightforward. You want to take into account the state of dependencies and the resource usage in the container:

Do’s:

Don’ts:

Liveness probes

If a container fails to respond to this probe consecutively, the kubelet will terminate the container. Emphasis on “terminate” versus “restart,” which depends on the restart policy of the pod.

These probes are notoriously difficult to code correctly because the goal of the probe’s developer is to anticipate the unexpected, such as a bug in a process that could put the whole container in an unrecoverable state.

If the probe is too lenient, the container may become unresponsive without being terminated, effectively reducing the number of replicas of that pod with a chance of serving traffic.

If the probe is too strict, the container may keep getting terminated unnecessarily, a condition that is insidiously hard to detect when happening intermittently, risking the pod looking healthy right when you are looking for problems.

Do’s:

Don’ts:

Figure 2 — A liveness probe should allow more time to fail completely than the readiness probe.

Startup probes

Startup probes are a relatively new addition to the stable of container probes, achieving GA status in late 2020 in Kubernetes 1.20. Note: Credit to my colleague, the ever-knowledgeable Nathan Brophy, for pointing out the feature was already available by default, albeit in beta stage, as early as Kubernetes 1.18.

A startup probe creates a “buffer” in the lifecycle of containers that need an excessive amount of time to become ready.

In the past, in the absence of startup probes, developers resorted to a mix of using initialization containers and setting long initialDelaySeconds values for readiness and liveness probes, each with its own set of compromises:

Consider whether your existing liveness and readiness probes take relatively long to start and replace high values of initialDelaySeconds with an equivalent startup probe.

For instance, this container spec below:

spec:
containers:
- name: myslowstarter
image: ...
...
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 600
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 600
periodSeconds: 20

can be significantly improved by moving that delay to be observed in a startup probe, like in the next example:

spec:
containers:
- name: myslowstarter
image: ...
...
readinessProbe:
tcpSocket:
port: 8080
# i̵n̵i̵t̵i̵a̵l̵D̵e̵l̵a̵y̵S̵e̵c̵o̵n̵d̵s̵:̵ ̵6̵0̵0̵
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
# i̵n̵i̵t̵i̵a̵l̵D̵e̵l̵a̵y̵S̵e̵c̵o̵n̵d̵s̵:̵ ̵6̵0̵0̵
periodSeconds: 20
startupProbe:
tcpSocket:
port: 8080
failureThreshold: 60
periodSeconds: 10

In the first example, the kubelet waits 600 seconds before evaluating the readiness and liveness probes. In contrast, in the second example, the kubelet checks up to 60 times in 10-second intervals, thus enabling the container to start as soon as it meets its startup conditions.

A hidden benefit of the frequent checks in a startup probe is that it enables a developer to set high values for failureThreshold and periodSeconds without worrying about slowing down the container startup. In contrast, the unwieldy observance of initialDelaySeconds puts pressure on developers to ignore edge cases and set lower values that allow the entire application to start faster. In my experience, “edge cases” are synonymous with “things we have not seen during development,” which translates to unstable containers in some production environments.

As a rule of thumb, use startup probes if the initialDelaySeconds field in your readiness and liveness probes exceeds the total time specified through failureThreshold * periodSeconds fields. As a companion rule of thumb, anything over 60 seconds for initialDelaySeconds in a readiness or liveness probe is a good sign that your application would benefit from using a startup probe instead.

Timing is everything

After observing the suggestions in this article, you are ready to ask the inevitable question:

“So, what should I use for the probe settings?”

In general, you want the readiness probe to be sensitive and start reporting failures as soon as the container starts struggling with responding to requests. On the other hand, you want the liveness probe to be a little lax and only report failures once the code loses a grip on what you consider a valid internal state.

For timeoutSeconds, I would recommend keeping the value at its default (one second.) This suggestion builds atop my other advice for assessing the responses of a probe outside the thread answering the kubelet request. Using a higher value widens the window where a cluster can route traffic to a container that cannot handle the request.

For the combination of periodSeconds and failureThreshold, more checks in the same interval tend to be more accurate than fewer checks. Assuming you followed the suggestion of assessing the container status separately from the thread responding to the request, more frequent checks will not add significant overhead to the container.

Mind your CPU limits

Different clusters, different speeds.

A common problem with probes, especially liveness probes, is assuming a cluster will always give your container as much CPU as you request. The other common mistake is assuming clusters will always observe fractional requests with surgical precision.

Starting with the hypervisor and the VMs hosting worker nodes, all the way to the CPU limits in a pod specification, there are myriad reasons why a container can run the same stretch of code at different speeds.

Here are the top factors that can blind-side a developer:

Learning a bit about the hardware characteristics and overcommitment settings of your IaaS provider can go a long way in deciding the safety multipliers to add to settings like timeoutSeconds, failureThreshold, and periodSeconds. Keep those two factors in mind when setting the values for probes, especially liveness probes. Depending on what you learn, you may also reconsider the settings for CPU requests and limits so that your probes have enough processing capacity to respond to requests promptly.

Figure 3 — Healthy container getting terminated due to a restrictive CPU limit

Conclusion

This article offers a range of suggestions to improve the precision and performance of container probes, allowing containers to start faster and stay running longer.

The next step comes from careful analysis of what runs inside the containers and studying their behavior in actual runtime across a diverse set of clusters and conditions, going as far as simulating failures of dependencies and reduced availability of system resources. Using the kubectl utility and its ability to format and filter contents is a great way to find containers with a high number of restarts and inadequate probe limits, which is a more technical subject covered here. Using PromQL with Kubernetes metrics can further expand that technique with charts for various time series, a topic covered in this other article.

In summary, keep the goal of the probes in mind when writing them, and ensure they run quickly and assuredly, providing clear information with minimal (if any) false positives in their response to the kubelet. Then trust the cluster to do what it does best with the data, ensuring maximum availability of your containers to their clients.

*This story was originally published in my blog in December 2021. I mentioned it enough in my medium articles that I felt it would be more readable if I made it available in the same general format and layout.

--

--

Operations architect, corporate observer, software engineer, inventor. @dnastacio

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Denilson Nastacio

Operations architect, corporate observer, software engineer, inventor. @dnastacio