Deploy Kubernetes gRPC Workloads with Zero Downtime

6 min readAug 12, 2020

So you started down the path of using Kubernetes and everything has been a dream! You were able to deploy workloads with immense speed and efficiency and you were able to integrate your various services with each other with ease. Then you realized that requests were failing every time you deployed an update to one of your workloads during the deploy window. Zero downtime deploys can be achieved in Kubernetes. There is still hope for the stability of your workloads!

We will start off by defining a simple proto file containing the API which will be exposed by our sample application via gRPC:

The API is defined as having one rpc (or method) called Work. This api will be used to send in some simulated work to be performed by our server.

Now we will write some Go code to implement the gRPC server interface:

The Work method is implemented to sleep for a period of seconds based upon the incoming request’s size parameter. This will be used later to simulate requests that take a long time to be processed.

Now we will write a func main() to define the startup of the application with a gRPC server:

This code makes use of a run.Group to define multiple long running components that will be executed in parallel. Each component has a defined startup func and a shutdown func. The run.Group takes care of automatically calling the shutdown func for each component when one of the startup func’s returns an error. Notice that the first component being added into the run.Group with a call to g.Add is the gRPC server. The second component is a listener for OS signals SIGINT and SIGTERM. When either of these signals occurs an error will be returned which will then cause the shutdown func of the gRPC server to be invoked. This allows the application to gracefully shutdown the gRPC server and ideally cleanly end connections that are open.

Now we will define a GRPCServer struct to encapsulate some helpful behavior:

The func NewGRPCServerFromListener serves as a constructor for our newly defined struct. This constructor instantiates a new instance of the healthServer. This is a gRPC server that implements the gRPC Health Checking Protocol which will provide us with a way to integrate into the automated health checks that can be performed by Kubernetes later. Both this healthServer and our example gRPC server defined earlier are exposed on the same tcp listener, so the health checks performed will be flowing through similar networking logic to our real business logic APIs.

Now that we have an application, we need to build it into a docker image to be deployed to Kubernetes. We need to make sure that our docker image has a Docker entrypoint that handles or passes OS signals. Kubernetes will send a SIGTERM immediately upon a pod termination being initiated. After the terminationGracePeriodSeconds has elapsed a SIGKILL will then be sent. There are various ways with Docker to configure an entrypoint, the recommended method is to use ENTRYPOINT (see the last line in the dockerfile defined below) because it doesn’t wrap the command in a shell that might accidentally suppress the signals that are received. We will also include the installation of the open source grpc-health-probe application. The grpc-health-probe will be invoked by Kubernetes as defined later in the Kubernetes manifest. For example:

With our Docker image defined we will now move on to configuring Kubernetes manifests to deploy our workload. We will define a Service and a Deployment:

Above we have defined 1 container to be deployed named podlifecycle. The podlifecycle container has a livenessProbe and readinessProbe defined as commands to be executed. The command in both probes is going to execute the /usr/bin/local/grpc-health-probe application which was compiled into the docker image using the Dockerfile defined above.

Note: in a past version of this article I suggested deploying the grpchealthprobe image as a sidecar container and configuring the livenessProbe and readinessProbe on the sidecar to hit the gRPC endpoint on the podlifecycle container. This did work to detect when problems were being experienced by the podlifecycle container, however, it did not work to resolve the problems. With this setup when a probe failed Kubernetes would restart only the grpchealthprobe container, not the podlifecycle container. I had assumed that Kubernetes would restart the entire pod, but in reality only the specific container which with failing probes will be restarted.

When configuring liveness and readiness probes, it is important that the thresholds not be set identically for both probes. The readiness probe is intended to be used to notify Kubernetes when a workload is ready to serve traffic. In simple terms, this means that Kubernetes will add the IP address of the pod into the endpoint set for the corresponding Kubernetes Service when the workload is ready. The Liveness probe is intended to be used to ensure that your application does not ever hang. Sometimes applications will get into a bad state that can only be recovered from by restarting the application, this is the type of behavior that a liveness probe is built to help resolve. When a liveness probe fails Kubernetes will restart the container that is failing the probe. This can be dangerous, take caution when defining liveness probes.

Some basic suggestions for configuring these probes are to set the periodSeconds on the liveness probe to be 3 times the entire failure threshold (periodSeconds * failureThreshold) of the readiness probe.

When a pod enters a Terminating state the pod is removed from Kubernetes Services and Ingresses to prevent new traffic from reaching the terminating pod. Unfortunately, this is done using some asynchronous API calls, therefore, it is unknown exactly when a pod will be removed from routing. For this purpose a preStop hook has been added to sleep for 5 seconds. In my case, this was sufficient for Kubernetes to execute all updates. This period will need to be based upon how quickly your Ingress Controller performs updates.

With all of this in place, we will now deploy it all using Skaffold. We will start by defining a skaffold.yaml file:

Now we run:

skaffold run

You can alternatively execute `skaffold dev` if you want the logs of the workload to be piped to your terminal.

After the Skaffold completes the deployment you should see one pod:

➜  ~  kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
podlifecycle-8577f67547-5k8gw   2/2     Running   0          70s

Now we will write a client application to send repeated load to the server. The client app will make use of the proto that we defined earlier to invoke the Work rpc in an infinite loop. If errors occur, they will be logged and not cause the application to shutdown. This is helpful when testing because if a zero downtime deploy does not occur then the client application will experience errors. Here is the code for our client application:

The address const defined at the top of the file is the address where the kubernetes podlifecycle service can be reached. In my case, I am deploying my workload to a minikube k8s cluster so I can obtain the address of the podlifecycle service by running the command:

➜  ~  minikube service --url podlifecycle
http://192.168.64.24:32332

You can see the client application runs an infinite loop to send Work requests with 100 milliseconds between requests so we don’t overload anything. Build and start the client application to start sending traffic to the server.

With the client application running and traffic successfully hitting the Kubernetes podlifecycle workload, we can now test zero downtime deploys by making small changes to the Kubernetes manifests defined earlier. We will change the version label value from “1” to “2”:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: podlifecycle
...
spec:
...
  template:
    metadata:
      labels:
        app: podlifecycle
        version: "2"

If you ran skaffold dev earlier then as soon as you save your label change a deploy will be initiated. If you used skaffold run then run the command again to start a new deploy. Watch the logs on the client application as the deploy progresses. You should see endless logs indicating success and no errors like:

2020/08/12 00:38:59 Response: test
2020/08/12 00:38:59 Response: test
2020/08/12 00:38:59 Response: test
2020/08/12 00:38:59 Response: test
2020/08/12 00:38:59 Response: test

Congrats, you just successfully completed a zero downtime deployment of a gRPC service! The source code for all components mentioned above can be found at https://github.com/jwenz723/podlifecycle

To learn more about the methods describe above check out the following resources:

CloudNativeCon 2019 talk on zero downtime deploys by Leigh Capili https://www.youtube.com/watch?v=0o5C12kzEDI&feature=youtu.be
Health checking gRPC servers on Kubernetes: https://kubernetes.io/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/
Official K8s documentation on probes: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Golang graceful shutdown techniques: http://www.codershaven.com/graceful-shutdown-of-a-go-service/
GRPC Health Checking Protocol: https://github.com/grpc/grpc/blob/master/doc/health-checking.md
grpc-health-probe: https://github.com/grpc-ecosystem/grpc-health-probe
Kubernetes graceful shutdown: https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html

I am still new to this topics, so I would love to hear your opinion as to how my methods can be improved.

Deploy Kubernetes gRPC Workloads with Zero Downtime

Written by Jeff Wenzbauer