Introduction
When you think about Kubernetes the first thing that usually comes to mind is running stateless application. The very nature of the design of Kubernetes lends itself to running stateless applications. However, since Kubernetes runs on Linux, you were able to attach storage systems to support stateful applications. But how do you face the challenge of adding support for new volume plugins?
This is where CSI comes in. CSI (or Container Storage Interface) provides a standard to expose block/file storage to containers. This allows storage vendors to write storage plugins for Kubernetes without having to modify the Kubernetes core code. CSI was introduced in v1.9 and is now GA in v1.13
This has enabled various storage vendors to integrate their storage systems into Kubernetes, and even cloud providers have provided integrated solutions (Like EBS on Amazon, for example).
In this blog I will be taking look at GlusterFS and Rook and exploring some of the advantages and pitfalls.
Setup
For this blog I have installed/setup the following for my environment (although using minikube should work as well)
- Kubernetes v1.11 on GCE
- Used NGINX as an ingress controller (for testing)
- Installed and setup Helm
- Set up gcloud for ease of use
Rook installation
To install rook, I took a look at their latest documentation page. There I found this helpful quickstart page that provided an easy way to deploy rook with ceph using helm. To get a rook system up and running consists of 3 parts: The operator, the rook ceph cluster, and the storageclass.
To install the rook operator I used helm. This is pretty straight forward and I was able to install following the documentation.
$ helm repo add rook-stable https://charts.rook.io/stable $ helm install --namespace rook-ceph-system rook-stable/rook-ceph
Now that the operator is up and running, we need to deploy a rook cluster. More specific; we want to deploy a rook ceph cluster. I will be deploying a copy of the yaml from my github page but you should look at the quickstart page for an up to date yaml.
$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-cluster.yaml
This creates (among other things) the rook CRDs for ceph. Please see the documentation if you need to customize any values.
Next is the storageClass. However before we can create the storageClass, we have to create the CR of CephBlookPool. This custom resource will notify the operator to create a 3 way replicate cluster to serve block storage. I included it in my yaml, along with my storageclass; but more information can be found in the docs.
kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-storageclass.yaml
After a bit, you should see rook-ceph-block as an available storageClass.
$ kubectl get sc NAME PROVISIONER AGE rook-ceph-block ceph.rook.io/block 3m
Testing Rook/Ceph
In order to test this I created a namespace and then deployed a sample application (that accepts file uploads) to that namespace
$ kubectl create ns test namespace/test created $ kubectl create deployment upload --image=quay.io/redhatworkshops/upload:latest -n test deployment.apps/upload created
I also exposed this deployment and created an ingress as well in order for me to test the upload.
Now, when I created the pvc I specified that I wanted to use the block storage provided by rook/ceph by using the volume.beta.kubernetes.io/storage-class: rook-ceph-block annotation in my yaml file.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ceph-block-pvc0001 annotations: volume.beta.kubernetes.io/storage-class: rook-ceph-block spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi
Now I just loaded this yaml to create my pvc
$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-sample-pvc.yaml -n test persistentvolumeclaim/ceph-block-pvc0001 created
Here, rook will create my block volume on the fly for me, creating the pv that satisfies my claim. Checking the pvc status shows that I have it bound to a pv.
$ kubectl get pvc -n test NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ceph-block-pvc0001 Bound pvc-0419077f-4510-11e9-bd1f-42010a8e0033 1Gi RWO rook-ceph-block 5m
I edited my deployment using kubectl edit deployment upload -n test and adding the volumeMounts and volumes section highlighted below.
apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "2" creationTimestamp: null generation: 1 labels: app: upload name: upload selfLink: /apis/extensions/v1beta1/namespaces/test/deployments/upload spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: upload strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: upload spec: containers: - image: quay.io/redhatworkshops/upload:latest imagePullPolicy: Always name: upload resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /opt/app-root/src/uploaded name: upload-storage dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - name: upload-storage persistentVolumeClaim: claimName: ceph-block-pvc0001
Taking a look in the container; you can see this appears as /dev/rbd0 on the container and it's already mounted /formatted.
$ kubectl exec -it upload-7d9d6b987-fhq69 -n test -- df -h /opt/app-root/src/uploaded Filesystem Size Used Avail Use% Mounted on /dev/rbd0 1014M 33M 982M 4% /opt/app-root/src/uploaded
Issues and resolutions
When I went and tested the application; I got the following error: Permission denied in /opt/app-root/src/upload.php
Doing some digging around I found that the permissions are wrong on my directory.
bash-4.2$ ls -ld /opt/app-root/src/uploaded/ drwxr-xr-x 2 root root 6 Mar 12 22:04 /opt/app-root/src/uploaded/
This is an issue since I am running this container as a non-root user so I can't just chmod the directory. A little "hacking" was in order. First I figured out where the pod was running.
$ kubectl get pod upload-7d9d6b987-fhq69 -n test -o jsonpath='{.spec.nodeName}{"\n"}' nodes-8z9
Looks like this pod is running on node nodes-8z9nn. So I logged into this node
$ gcloud compute ssh nodes-8z9n --zone us-east1-d
I used docker commands to findout what docker ID the container had...then used nsenter to get into the namespace.
$ nsenter --target $PID --mount --uts --ipc --net --pid
Once inside I was able to chmod the directory
# chmod 777 /opt/app-root/src/uploaded/
After I did that, I was able to use my app to upload files in the ceph block storage system.
Glusterfs Installation
In order to test gluster; I fist needed to add some raw storage devices to the nodes. Gluster (specifically glusterfs-kubernetes) likes to work with raw devices. I added 100GB volumes to each of my 3 nodes.
Note that ceph can also work with raw disks and not just use directories to store data.
I mainly used the github page for installation. Also I went through and made sure all the prereqs were done on all servers. In short, I did the following (I also added iptables rules)
# for i in dm_snapshot dm_mirror dm_thin_pool; do modprobe $i; done # apt -y install glusterfs-client glusterfs-common
After the prereqs are done, I cloned the git repo to use the installation script provided.
$ git clone https://github.com/gluster/gluster-kubernetes
After you have that, take the sample topology file (provided in the repo) and create your own. Being careful to make sure your settings are right. Mine looked like this.
{ "clusters": [ { "nodes": [ { "node": { "hostnames": { "manage": [ "ip-172-20-107-182.us-west-2.compute.internal" ], "storage": [ "172.20.107.182" ] }, "zone": 1 }, "devices": [ "/dev/xvdz" ] }, { "node": { "hostnames": { "manage": [ "ip-172-20-47-197.us-west-2.compute.internal" ], "storage": [ "172.20.47.197" ] }, "zone": 2 }, "devices": [ "/dev/xvdz" ] }, { "node": { "hostnames": { "manage": [ "ip-172-20-93-253.us-west-2.compute.internal" ], "storage": [ "172.20.93.253" ] }, "zone": 3 }, "devices": [ "/dev/xvdz" ] } ] } ] }
I'll try and break this down a bit.
- manage - This is the actual node name that you get from the kubectl get nodes command
- storage - This is kind of missnamed. This is the IP address of the node itself (the actual IP not an SDN ip)
- zone - the way that glusterfs works, it'll pick a node from each zone to create a 3way replicate volume. This is basically failure domains and you need at least 3 (if you're running 1 because of minikube, that's okay)
- devices - this is an array of raw devices. Minimum is 1.
NOTE: Please see the following bug about gluster-blockd. I had to edit the file deploy/kube-templates/glusterfs-daemonset.yaml and change GLUSTER_BLOCKD_STATUS_PROBE_ENABLE to 0
Now, using the gk-deploy command I run the following (NOTE: you may need to run it with --single-node if you're using minikube)
$ ./gluster-kubernetes/deploy/gk-deploy gfs.json -g \ -c kubectl -n glusterfs -w 1200 --no-object -y
You will get a message that it's complete and you can verify that all the pods are running
$ kubectl get pods -n glusterfs NAME READY STATUS RESTARTS AGE glusterfs-bfhqx 1/1 Running 0 12m glusterfs-hwb98 1/1 Running 0 12m glusterfs-xpc2r 1/1 Running 0 12m heketi-7495cdc5fd-b6s82 1/1 Running 0 4m11s
Now you need to create the storageClass based on the service address. Using my example yaml as a template; I created the following spec. (Note that I got the resturl by running the kubectl get svc -n glusterfs command and looking at the heketi service address)
apiVersion: storage.k8s.io/v1beta1 kind: StorageClass metadata: name: gluster-container provisioner: kubernetes.io/glusterfs parameters: resturl: "http://172.30.59.174:8080" restuser: "admin" volumetype: "replicate:3"
Now you should be able to see the storageclass
$ kubectl get sc gluster-container NAME PROVISIONER AGE gluster-container kubernetes.io/glusterfs 21s
Testing gluster
I will be using tha same deployment as before; I will modify it to reference the new storage. First I verify it's running
$ kubectl get pods -n test NAME READY STATUS RESTARTS AGE upload-bb9df669f-twmq6 1/1 Running 0 65s
Now using my pvc template for gluster; I created a pvc. And just like rook, gluster creates the pv on the fly to satisfy my pvc request.
$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/glusterfs-sample-pvc.yaml -n test persistentvolumeclaim/gluster-pvc0001 created
Checking my pvc status, I see that I have storage bound
$ kubectl get pvc -n test NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE gluster-pvc0001 Bound pvc-8ed0bbfc-4538-11e9-8da3-001a4a16011b 1Gi RWX gluster-container 7m14s
Next, I used kubectl edit deploy/upload -n test to edit my deployment to specify the new gluster volume. In the end my deployment looked like this.
apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "2" creationTimestamp: null generation: 1 labels: app: upload name: upload selfLink: /apis/extensions/v1beta1/namespaces/test/deployments/upload spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: upload strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: upload spec: containers: - image: quay.io/redhatworkshops/upload:latest imagePullPolicy: Always name: upload resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /opt/app-root/src/uploaded name: upload-storage dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - name: upload-storage persistentVolumeClaim: claimName: gluster-pvc0001
If you look inside the pod, you will see the network mount (since glusterfs is filebased storage; you won't see it as a block device)
$ kubectl exec -it upload-7cb79f89cb-pjhls -n test -- df -h uploaded Filesystem Size Used Avail Use% Mounted on 192.168.1.19:vol_7eb133c254df1695d670b6c8dc437fdd 1014M 43M 972M 5% /opt/app-root/src/uploaded
Issues/Resolutions for Gluster
As noted above; I ran into this bug and had to disable block. Since I wasn't using block it wasn't such a big deal. However you do need to watch out for it since your install won't work without disabling it.
Also I spent quite a bit of time getting the firewall rules right. This took some trial and error on my part. In the end I ran this on ALL servers in my kubernetes cluster
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m multiport --dports 49152:49664 -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24010 -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 3260 -j ACCEPT iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT
Also, since gluster requires raw devices, you need to check with your provider on how to do so. You may run into challenges if you have instance-groups and the like.
I am happy to see that I was able to use the volume without the need to change permissions.
conclusion
In this blog we took a brief look at CSI and how it will help storage vendors to write storage plugins for Kubernetes. We also explored glusterfs and rook, to test how block and file storage works in Kubernetes.
There are a plethora of other storage providers for k8s including OpenEBS, Trident, logDNA, and many more.
As Kubernetes becomes more and more of a standard; I expect to see a lot more storage vendors and storage projects providing solutions. This will provide a wide range of choice for many workloads.