Exploring Kubernetes Storage

Introduction

When you think about Kubernetes the first thing that usually comes to mind is running stateless application. The very nature of the design of Kubernetes lends itself to running stateless applications. However, since Kubernetes runs on Linux, you were able to attach storage systems to support stateful applications. But how do you face the challenge of adding support for new volume plugins?

This is where CSI comes in. CSI (or Container Storage Interface) provides a standard to expose block/file storage to containers. This allows storage vendors to write storage plugins for Kubernetes without having to modify the Kubernetes core code. CSI was introduced in v1.9 and is now GA in v1.13

This has enabled various storage vendors to integrate their storage systems into Kubernetes, and even cloud providers have provided integrated solutions (Like EBS on Amazon, for example).

In this blog I will be taking look at GlusterFS and Rook and exploring some of the advantages and pitfalls.

Setup

For this blog I have installed/setup the following for my environment (although using minikube should work as well)

Rook installation

To install rook, I took a look at their latest documentation page. There I found this helpful quickstart page that provided an easy way to deploy rook with ceph using helm. To get a rook system up and running consists of 3 parts: The operator, the rook ceph cluster, and the storageclass.

To install the rook operator I used helm. This is pretty straight forward and I was able to install following the documentation.

$ helm repo add rook-stable https://charts.rook.io/stable
$ helm install --namespace rook-ceph-system rook-stable/rook-ceph

Now that the operator is up and running, we need to deploy a rook cluster. More specific; we want to deploy a rook ceph cluster. I will be deploying a copy of the yaml from my github page but you should look at the quickstart page for an up to date yaml.

$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-cluster.yaml

This creates (among other things) the rook CRDs for ceph. Please see the documentation if you need to customize any values.

Next is the storageClass. However before we can create the storageClass, we have to create the CR of CephBlookPool. This custom resource will notify the operator to create a 3 way replicate cluster to serve block storage. I included it in my yaml, along with my storageclass; but more information can be found in the docs.

kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-storageclass.yaml

After a bit, you should see rook-ceph-block as an available storageClass.

$ kubectl get sc
NAME                 PROVISIONER            AGE
rook-ceph-block      ceph.rook.io/block     3m

Testing Rook/Ceph

In order to test this I created a namespace and then deployed a sample application (that accepts file uploads) to that namespace

$ kubectl create ns test
namespace/test created
$ kubectl create deployment upload --image=quay.io/redhatworkshops/upload:latest -n test
deployment.apps/upload created

I also exposed this deployment and created an ingress as well in order for me to test the upload.

Now, when I created the pvc I specified that I wanted to use the block storage provided by rook/ceph by using the volume.beta.kubernetes.io/storage-class: rook-ceph-block annotation in my yaml file.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: ceph-block-pvc0001
 annotations:
   volume.beta.kubernetes.io/storage-class: rook-ceph-block
spec:
 accessModes:
  - ReadWriteOnce
 resources:
   requests:
     storage: 1Gi

Now I just loaded this yaml to create my pvc

$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-sample-pvc.yaml -n test
persistentvolumeclaim/ceph-block-pvc0001 created

Here, rook will create my block volume on the fly for me, creating the pv that satisfies my claim. Checking the pvc status shows that I have it bound to a pv.

$ kubectl get pvc -n test
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc0001   Bound    pvc-0419077f-4510-11e9-bd1f-42010a8e0033   1Gi        RWO            rook-ceph-block   5m

I edited my deployment using kubectl edit deployment upload -n test and adding the volumeMounts and volumes section highlighted below.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: null
  generation: 1
  labels:
    app: upload
  name: upload
  selfLink: /apis/extensions/v1beta1/namespaces/test/deployments/upload
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: upload
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: upload
    spec:
      containers:
      - image: quay.io/redhatworkshops/upload:latest
        imagePullPolicy: Always
        name: upload
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/app-root/src/uploaded
          name: upload-storage
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: upload-storage
        persistentVolumeClaim:
          claimName: ceph-block-pvc0001

Taking a look in the container; you can see this appears as /dev/rbd0 on the container and it's already mounted /formatted.

$ kubectl exec -it upload-7d9d6b987-fhq69 -n test -- df -h /opt/app-root/src/uploaded
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0      1014M   33M  982M   4% /opt/app-root/src/uploaded

Issues and resolutions

When I went and tested the application; I got the following error: Permission denied in /opt/app-root/src/upload.php

Doing some digging around I found that the permissions are wrong on my directory.

bash-4.2$ ls -ld /opt/app-root/src/uploaded/
drwxr-xr-x 2 root root 6 Mar 12 22:04 /opt/app-root/src/uploaded/

This is an issue since I am running this container as a non-root user so I can't just chmod the directory. A little "hacking" was in order. First I figured out where the pod was running.

$ kubectl get pod upload-7d9d6b987-fhq69 -n test -o jsonpath='{.spec.nodeName}{"\n"}'
nodes-8z9

Looks like this pod is running on node nodes-8z9nn. So I logged into this node

$ gcloud compute ssh nodes-8z9n --zone us-east1-d

I used docker commands to findout what docker ID the container had...then used nsenter to get into the namespace.

 $ nsenter --target $PID --mount --uts --ipc --net --pid 

Once inside I was able to chmod the directory

# chmod 777 /opt/app-root/src/uploaded/

After I did that, I was able to use my app to upload files in the ceph block storage system.

Glusterfs Installation

In order to test gluster; I fist needed to add some raw storage devices to the nodes. Gluster (specifically glusterfs-kubernetes) likes to work with raw devices. I added 100GB volumes to each of my 3 nodes.

Note that ceph can also work with raw disks and not just use directories to store data.

I mainly used the github page for installation. Also I went through and made sure all the prereqs were done on all servers. In short, I did the following (I also added iptables rules)

# for i in dm_snapshot dm_mirror dm_thin_pool; do modprobe $i; done
# apt -y install glusterfs-client glusterfs-common

After the prereqs are done, I cloned the git repo to use the installation script provided.

$ git clone https://github.com/gluster/gluster-kubernetes

After you have that, take the sample topology file (provided in the repo) and create your own. Being careful to make sure your settings are right. Mine looked like this.

{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "ip-172-20-107-182.us-west-2.compute.internal"
              ],
              "storage": [
                "172.20.107.182"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/xvdz"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "ip-172-20-47-197.us-west-2.compute.internal"
              ],
              "storage": [
                "172.20.47.197"
              ]
            },
            "zone": 2
          },
          "devices": [
            "/dev/xvdz"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "ip-172-20-93-253.us-west-2.compute.internal"
              ],
              "storage": [
                "172.20.93.253"
              ]
            },
            "zone": 3
          },
          "devices": [
            "/dev/xvdz"
          ]
        }
      ]
    }
  ]
}

I'll try and break this down a bit.

  • manage - This is the actual node name that you get from the kubectl get nodes command
  • storage - This is kind of missnamed. This is the IP address of the node itself (the actual IP not an SDN ip)
  • zone - the way that glusterfs works, it'll pick a node from each zone to create a 3way replicate volume. This is basically failure domains and you need at least 3 (if you're running 1 because of minikube, that's okay)
  • devices - this is an array of raw devices. Minimum is 1.

NOTE: Please see the following bug about gluster-blockd. I had to edit the file deploy/kube-templates/glusterfs-daemonset.yaml and change GLUSTER_BLOCKD_STATUS_PROBE_ENABLE to 0

Now, using the gk-deploy command I run the following (NOTE: you may need to run it with --single-node if you're using minikube)

$ ./gluster-kubernetes/deploy/gk-deploy gfs.json -g \
-c kubectl  -n glusterfs -w 1200 --no-object -y

You will get a message that it's complete and you can verify that all the pods are running

$ kubectl get pods -n glusterfs
NAME                      READY   STATUS    RESTARTS   AGE
glusterfs-bfhqx           1/1     Running   0          12m
glusterfs-hwb98           1/1     Running   0          12m
glusterfs-xpc2r           1/1     Running   0          12m
heketi-7495cdc5fd-b6s82   1/1     Running   0          4m11s

Now you need to create the storageClass based on the service address. Using my example yaml as a template; I created the following spec. (Note that I got the resturl by running the kubectl get svc -n glusterfs command and looking at the heketi service address)

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: gluster-container
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: "http://172.30.59.174:8080"
  restuser: "admin"
  volumetype: "replicate:3"

Now you should be able to see the storageclass

$ kubectl get sc  gluster-container
NAME                PROVISIONER               AGE
gluster-container   kubernetes.io/glusterfs   21s

Testing gluster

I will be using tha same deployment as before; I will modify it to reference the new storage. First I verify it's running

$ kubectl get pods -n test
NAME                           READY   STATUS    RESTARTS   AGE
upload-bb9df669f-twmq6         1/1     Running   0          65s

Now using my pvc template for gluster; I created a pvc. And just like rook, gluster creates the pv on the fly to satisfy my pvc request.

$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/glusterfs-sample-pvc.yaml -n test
persistentvolumeclaim/gluster-pvc0001 created

Checking my pvc status, I see that I have storage bound

$ kubectl get pvc -n test
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
gluster-pvc0001   Bound    pvc-8ed0bbfc-4538-11e9-8da3-001a4a16011b   1Gi        RWX            gluster-container   7m14s

Next, I used kubectl edit deploy/upload -n test to edit my deployment to specify the new gluster volume. In the end my deployment looked like this.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: null
  generation: 1
  labels:
    app: upload
  name: upload
  selfLink: /apis/extensions/v1beta1/namespaces/test/deployments/upload
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: upload
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: upload
    spec:
      containers:
      - image: quay.io/redhatworkshops/upload:latest
        imagePullPolicy: Always
        name: upload
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/app-root/src/uploaded
          name: upload-storage
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: upload-storage
        persistentVolumeClaim:
          claimName: gluster-pvc0001

If you look inside the pod, you will see the network mount (since glusterfs is filebased storage; you won't see it as a block device)

$ kubectl exec -it upload-7cb79f89cb-pjhls -n test -- df -h uploaded
Filesystem                                         Size  Used Avail Use% Mounted on
192.168.1.19:vol_7eb133c254df1695d670b6c8dc437fdd 1014M   43M  972M   5% /opt/app-root/src/uploaded

Issues/Resolutions for Gluster

As noted above; I ran into this bug and had to disable block. Since I wasn't using block it wasn't such a big deal. However you do need to watch out for it since your install won't work without disabling it.

Also I spent quite a bit of time getting the firewall rules right. This took some trial and error on my part. In the end I ran this on ALL servers in my kubernetes cluster

iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m multiport --dports 49152:49664 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24010 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 3260 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT

Also, since gluster requires raw devices, you need to check with your provider on how to do so. You may run into challenges if you have instance-groups and the like.

I am happy to see that I was able to use the volume without the need to change permissions.

conclusion

In this blog we took a brief look at CSI and how it will help storage vendors to write storage plugins for Kubernetes. We also explored glusterfs and rook, to test how block and file storage works in Kubernetes.

There are a plethora of other storage providers for k8s including OpenEBS, Trident, logDNA, and many more.

As Kubernetes becomes more and more of a standard; I expect to see a lot more storage vendors and storage projects providing solutions. This will provide a wide range of choice for many workloads.