Understanding Service Mesh: Operations Guide


The birth of Kubernetes has made the ability to go completely into the cloud a reality. It has provided the industry with a platform to finally build "cloud aware" applications and truly be "cloud-native".

This new push of cloud-native came with, naturally, a set of challenges that needed to be solved. While Kubernetes solved the problem of orchestrating the workloads; the challenges of security, policy, and management still exist. Also with the introduction of serverless and knative; you also bring in more complexity into the mix.

One of the technologies that came to light is the idea of a "service mesh". A service mesh sets to tackle the challenge of security, policy, and traffic management of microservices in a cloud-native platform. In this blog I will explore the idea behind service mesh, explore Istio, and try to explain how it all fits together from an Operations point of view.

Application communication design

Current Design

Whether you are doing monolithic or microservices, the design on how different applications communicate with each other is generally the same. When you take a look at it; it's generally two or more applications communicating with each other over the network via HTTP/HTTPS.

This should look pretty familiar. The challenge with this design is (even taking away cloud-native and microservices for a moment); that security, circuit breaking, and Layer 7 and Layer 4 has to be built into the application stack itself. This is for each application stack across your entire ecosystem. Maintaining rulesets for hundreds of applications can become a maintenance nightmare.

This is where a Service Mesh can be powerful!

Service Mesh Fundamentals

In order to manage the rulesets for an application; it needs to be 1) Application agnostic and 2) Abstracted away from the application. In order to do this a proxy (currently the most popular one is Envoy) is implemented that holds these rules/polices independent of the application (and vice versa). This is deployed in a "sidecar pattern" design. This way ANY application can be plugged in without it needing to know about the Service Mesh.

In this design; Application 1 is talking to Application 2 via a proxy. You can even further dissect it as saying: Application 1's Proxy is talk to Application 2's proxy. Some of the advantages of this design are...

  • Rules and Polices (like routing and circuit breaking) no longer have to be built into the application.
  • Applications can be "plugged in" without them knowing they are being governed.
  • Having a single set of authority makes management easier.

One of the things you need to keep in mind is that once you have hundreds of application with hundreds of instances; the management can get pretty out of hand. Which is why taking the concept (Now I'm teasing Istio) of a control plane (an idea likely borrowed from Kubernetes) fits nicely here.

With this design you can control/manage a fleet of Proxy systems (along with their rluesets/policies) from a central location. But this is only part of the solution.

ISTIO Service Mesh

There are a few technologies that take this design pattern and create a solution based on it. Some include Linkerd, Gloo, and Conduit. The one solution that has gained a lot of favor in the community is Istio.

Istio can kind of get complex and there have been other blogs that have gone in depth on how Istio works under the hood. I also suggest you give Christian Posta a follow. I will try and keep this at a high level from an Operations understanding of Istio's implementation of a Service Mesh.

Istio is made up of two parts. A Control Plane and a Data Plane.

The Control Plane is made of the following components

  • Pilot - This is where the traffic management is set and the config data for the proxies is stored and pushed out of.
  • Mixer - This is the policy engine. It enforces access control and usage. It also collects telemetry from the mesh
  • Citadel - This provides mTLS by way of handing out certificates to the proxies and managing them.

The Data Plane is simply the Envoy Proxies themselves that enforce the rulesets stored on the control plane.

With Istio you can do

  • Circuit Breaking - This allows you to avoid concurrent request to a slow instance or avoid multiple concurrent requests to an instance.
  • Pool Ejection - This removes a failing instances from the pool.
  • Retries - This will foward a request forward the request to another instance just in case we get a falue (open circuit breaker and/or pool ejection)
  • Mutual TLS - This allows you to encrypt all traffic automatically (sometimes called "zero trust" architecture)
  • Telemetry/Tracing - This gives you observability into your microservices and able to trace failures and calls into (and out of) your services.


Service Meshes is still a new technology and is ever evolving. Some of these technologies (like Istio and Linkerd) overlap in functionality and others (like Istio and Gloo) can compliment each other. The important thing is to get familiar with these technologies before they start running in your environment. More importantly, also, so you can make an intelligent decision on which one to use!

I encourage everyone not familiar to go and check out the Katacoda Istio track to get hands on!