OpenShift 4.1 Bare Metal Install Quickstart

NOTE: This post originally appeard when I wrote it on the OpenShift Blog site.

In this blog we will go over how to get you quickly up and running with an OpenShift 4.1 Bare Metal install on pre-existing infrastructure. Although this quickstart focuses on the bare metal installer, this can also be seen as a "manual" way to install OpenShift 4.1. Moreover, this is also applicable to installing to any platform which doesn't have the ability to provide ignition pre-boot. For more information about using this generic approach to install on untested platforms, please see this knowledge base article.

Introduction

Openshift 4 introduces a new way of installing the platform that is automated, reliable, and repeatable. Based on the Kubernetes Cluster-API SIG, Red Hat has developed an OpenShift installer for full stack automated deployments. This means that the installer not only installs OpenShift, but it installs (and manages) the entire infrastructure as well, from DNS all the way down the stack to the VM. This provides a fully integrated system that can resize automatically with the needs of your workload. Currently, full stack automated deployment is supported on AWS.

For pre-existing infrastructure deployments is if you have existing infrastructure that you would like to use for the purposes of running OpenShift 4. Most are familiar with this method as it was the default (and only) way to install OpenShift 3. Currently guides for Pre-existing infrastructure installs are on AWS, VMWare vSphere, and bare metal. The latter being the "catch all", since you can use the bare metal method for non-tested platforms.

I will be going over installing OpenShift 4 Bare Metal, on a pre-existing infrastructure along with the prerequisites. However, as already stated, you can use this method for other infrastructure, for example VMs running on Red Hat Virtualization.

Prerequisites

It's important that you get familiar with the prerequisites by reading the official documentation for OpenShift. There you can find more details about the prerequisites and what it entails. I have broken up the prerequisites into sections and have marked those that are optional.

DNS

Proper DNS setup is imperative for a functioning OpenShift cluster. DNS is used for name resolution (A records), certificate generation (PTR records), and service discovery (SRV records). Keep in mind that OpenShift 4 has a concept of a "clusterid" that will be incorporated into your clusters DNS records. Your DNS records will all have <clusterid>.<basedomain> in them. In other words, your "clusterid" will end up being part of your FQDN. Read the official documentation for more information.

Forward DNS Records

Create forward DNS records for your bootstrap, master, and worker nodes. Also, you'll need to create entries for both api and api-int and point them to their respective load balancers (NOTE both of those entries can point to the same load balancer). You will also need to create a wildcard DNS entry pointing to the load balancer. This entry is used by the OpenShift router. Here is a sample using bind with ocp4 as the <clusterid>.

; The api and api-inf can point to the IP of the same load balancer
api.ocp4            IN      A       192.168.1.5
api-int.ocp4        IN      A       192.168.1.5
;
; The wildcard points to the load balancer
*.apps.ocp4        IN      A       192.168.1.5
;
; Create entry for the bootstrap host
bootstrap.ocp4        IN      A       192.168.1.96
;
; Create entries for the master hosts
master0.ocp4        IN      A       192.168.1.97
master1.ocp4        IN      A       192.168.1.98
master2.ocp4        IN      A       192.168.1.99
;
; Create entries for the worker hosts
worker0.ocp4        IN      A       192.168.1.11
worker1.ocp4        IN      A       192.168.1.7

An example of a DNS zonefile with forward records can be found here.

Reverse DNS Records

Create reverse DNS records for your bootstrap, master, workers nodes, api, and api-int. The reverse records are important because that is how RHEL CoreOS sets the hostname for all the nodes. Furthermore, these PTR records are used in order to generate the various certificates OpenShift needs to operate. The following is an example using example.com as the <basedomain> and using ocp4 as the <clusterid>. Again, this was done using bind.

; syntax is "last octet" and the host must have fqdn with trailing dot
;
97        IN      PTR     master0.ocp4.example.com.
98          IN      PTR     master1.ocp4.example.com.
99          IN      PTR     master2.ocp4.example.com.
;
96          IN      PTR     bootstrap.ocp4.example.com.
;
5           IN      PTR     api.ocp4.ocp4.example.com.
5           IN      PTR     api-int.ocp4.ocp4.example.com.
;
11          IN      PTR     worker0.ocp4.example.com.
7           IN      PTR     worker1.ocp4.example.com.
;

An example of a DNS zonefile with reverse records can be found here.

DNS Records for ETCD

Two record types need to be created for ETCD. The forward record needs to point to the IPs of the masters (CNAMEs are fine as well). Also the names need to be etcd-<index> where <index> is a number starting at 0. An example will be etcd-0, etcd-1, and etcd-2. You will also need to create SRV records pointing to the various etcd-<index> entries. You'll need to set these records with a priority 0, weight 10 and port 2380. Below is an example using example.com as the <basedomain> and using ocp4 as the <clusterid>.

; The ETCd cluster lives on the masters...so point these to the IP of the masters
etcd-0.ocp4             IN      A       192.168.1.97
etcd-1.ocp4             IN      A       192.168.1.98
etcd-2.ocp4             IN      A       192.168.1.99
;
; The SRV records point to FQDN of etcd...note the trailing dot at the end...
_etcd-server-ssl._tcp.ocp4      IN      SRV     0 10 2380 etcd-0.ocp4.example.com.
_etcd-server-ssl._tcp.ocp4      IN      SRV     0 10 2380 etcd-1.ocp4.example.com.
_etcd-server-ssl._tcp.ocp4      IN      SRV     0 10 2380 etcd-2.ocp4.example.com.
;

An example of these entries can be found in the example zonefile.

Load Balancer

You will need a load balancer to frontend the APIs, both internal and external, and the OpenShift router. Although Red Hat has no official recommendation to which Load Balancer to use, one that supports SNI is necessary (most load balancers do this today).

You will need to configure Port 6443 and 22623 to point to the bootstrap and master nodes. The below example is using HAProxy (NOTE that it must be TCP sockets to allow SSL passthrough)

frontend openshift-api-server
    bind *:6443
    default_backend openshift-api-server
    mode tcp
    option tcplog

backend openshift-api-server
    balance source
    mode tcp
    server bootstrap 192.168.1.96:6443 check
    server master0   192.168.1.97:6443 check
    server master1   192.168.1.98:6443 check
    server master2   192.168.1.99:6443 check

frontend machine-config-server
    bind *:22623
    default_backend machine-config-server
    mode tcp
    option tcplog

backend machine-config-server
    balance source
    mode tcp
    server bootstrap 192.168.1.96:22623 check
    server master0   192.168.1.97:22623 check
    server master1   192.168.1.98:22623 check
    server master2   192.168.1.99:22623 check

You will also need to configure 80 and 443 to point to the worker nodes. The HAProxy configuration is below (keeping in mind that we're using TCP sockets).

frontend ingress-http
    bind *:80
    default_backend ingress-http
    mode tcp
    option tcplog

backend ingress-http
    balance source
    mode tcp
    server worker0 192.168.1.11:80 check
    server worker1 192.168.1.7:80 check

frontend ingress-https
    bind *:443
    default_backend ingress-https
    mode tcp
    option tcplog

backend ingress-https
    balance source
    mode tcp
    server worker0 192.168.1.11:443 check
    server worker1 192.168.1.7:443 check

A full example of an haproxy.cfg file can be found here.

Webserver

A webserver is needed in order to hold the ignition configurations and installation images for when you install RHEL CoreOS. Any webserver will work as long as the webserver can be reached by the bootstrap, master, and worker nodes during installation. I will be using Apache. Download either the metal-bios or the uefi-metal-bios file, depending on what your servers need, from here. For example, this is how I downloaded the metal-bios file to my webserver.

mkdir -p /var/www/html/{ignition,install}
cd /var/www/html/install
curl -J -L -O https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.1/latest/rhcos-4.1.0-x86_64-metal-bios.raw.gz

Setup DHCP (Optional if doing static ips)

It is recommended to use the DHCP server to manage the node's IP addresses for the cluster long-term. Ensure that the DHCP server is configured to provide persistent IP addresses and host names to the cluster machines. Using DHCP with IP reservation ensures the IPs won't change on reboots. For a sample configuration; please see this dhcpd.conf file.

Reconciling Prerequisites

If you plan on installing OpenShift 4 in a "lab" environment (either on bare metal or using VMs); you might want to take a look at the "Helper Node" github page. The "Helper Node" ansible playbook sets up an "all-in-one" node with all the aforementioned prerequisites. This playbook has two modes: "standard" and "static ips".

Take a look at the quickstart to see if it might be of use. These steps are written for Libvirt, but the playbook is agnostic. So you can run it on your BareMetal environm

Installation

Unlike the full stack automated install method, the pre-existing infrastructure install is done in phases. The three main phases are: ignition config creation, bootstrap, and install complete. In this section I will be going over how to install OpenShift 4 on Bare Metal with the assumption that you have all the prerequisites in place. I will be installing the following:

  • 3 Master nodes, 2 Worker nodes, and 1 bootstrap node.
  • I will be using my internal example.com domain.
  • I will be using ocp4 as my clusterid.
  • I will be using static IPs (but will go over DHCP as well)
  • I am doing the install from a "bastion" Linux host
  • Make sure you download the client and installer

Creating The Install Configuration

First (after all the prereqs are done), we need to create an install-config.yaml file. This is the file where we set parameters for our installation. Create a working directory to store all the files.

mkdir ~/ocp4
cd ~/ocp4

Once in this directory, create the install-config.yaml file based on the following template. Substitute your entries where applicable. I will go over the relevant configurations from a high level.

apiVersion: v1
baseDomain: example.com
compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3
metadata:
  name: ocp4
networking:
  clusterNetworks:
  - cidr: 10.254.0.0/16
    hostPrefix: 24
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  none: {}
pullSecret: '{"auths": ...}'
sshKey: 'ssh-ed25519 AAAA...'

Please note/change the following:

  • baseDomain - This is the domain of your environment
  • metadata.name - This is your clusterid
    • Note: This will effectively make all FQDNS ocp4.example.com
  • pullSecret - This pull secret can be obtained by going to cloud.redhat.com
    • Login with your Red Hat account
    • Click on "Bare Metal"
    • Either "Download Pull Secret" or "Copy Pull Secret"
  • sshKey - This is your public SSH key (e.g. id_rsa.pub)

Note: The worker replicas is set to 0 doesn't mean you're going to install 0 workers...it means that we are not going to generate machineconfigs for the cluster.

Generate Ignition Configurations

Ignition is a tool for manipulating configuration during early boot, before the operating system starts. This includes things like writing files (regular files, systemd units, networkd units, etc.) and configuring users. Think of it as a cloud-init that runs once (during first boot).

OpenShift 4 installer generates these ignition configs to prepare the node as an OpenShift bootstrap/master/worker node. From within your working directory (in this example it's ~/ocp4) generate the ignition configs.

cd ~/ocp4
openshift-install create ignition-configs

REMINDER: Your install-config.yaml must be in your working directory (~/ocp4 in this example). Creating the ignition-configs will result in the install-config.yaml file being removed by the installer, you may want to create a copy and store it outside of this directory.

This will leave the following files in your ~/ocp4 working directory.

tree .
.
├── auth
│   ├── kubeadmin-password
│   └── kubeconfig
├── bootstrap.ign
├── master.ign
├── metadata.json
└── worker.ign

You will need to do one of the following, depending on what kind of installation you're doing.

DHCP

If you're using DHCP, simply copy over the ignition files to your webserver. For example, this is what I did for my installation.

scp ~/ocp4/*.ign webserver.example.com:/var/www/html/ignition/

Static IPs

For static IPs; you need to generate new ignition files based on the ones that the OpenShift installer generated. You can use the filetranspiler tool in order to make this process a little easier. When using filetranspiler you first need to create a "fakeroot" filesystem. This is an example form the bootstrap node.

cat <<EOF > bootstrap/etc/sysconfig/network-scripts/ifcfg-enp1s0
DEVICE=enp1s0
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.7.20
NETMASK=255.255.255.0
GATEWAY=192.168.7.1
DNS1=192.168.7.77
DNS2=8.8.8.8
DOMAIN=ocp4.example.com
PREFIX=24
DEFROUTE=yes
IPV6INIT=no
EOF

NOTE: Your interface WILL probably differ, be sure to determine the persistent name of the device(s) before creating the network configuration files.

Using filetranspiler, create a new ignition file based on the one created by openshift-install. Continuing with the example of my bootstrap server; it looks like this.

filetranspiler -i bootstrap.ign -f bootstrap -o bootstrap-static.ign

The syntax is: filetranspiler -i $ORIGINALIGN -f $FAKEROOT -o $OUTPUTIGN

NOTE: If you're using the container version of filetranspiler, you need to be in the directory where these files/dirs are. In other words, absolute paths won't work.

Once you created the new file, copy it over to your webserver:

scp ~/ocp4/bootstrap-static.ign webserver.example.com:/var/www/html/ignition/

IMPORTANT: When using static IP addresses, you will need to do this for ALL nodes in your cluster. In my environment I ended up with six ignition files.

tree /var/www/html/ignition/
├── bootstrap-static.ign
├── master0.ign
├── master1.ign
├── master2.ign
├── worker0.ign
└── worker1.ign

0 directories, 6 files

Install Red Hat Enterprise Linux CoreOS

Installing RHEL CoreOS (RHCOS) is a straightforward process. Depending on which method you are doing (DHCP or Static IPs); choose one of the following.

DHCP

Boot from the ISO, and you'll be greeted with the following screen.

isoinstall

Once you see this menu, press Tab and append the options needed to the boot line. These include the url for BIOS or UEFI image the node needs and the ignition file created by openshift-install (NOTE: The entries need to be all in one line). Here is an example.

coreos.inst.install_dev=vda
coreos.inst.image_url=http://192.168.7.77:8080/install/rhcos-4.1.0-x86_64-metal-bios.raw.gz
coreos.inst.ignition_url=http://192.168.7.77:8080/ignition/bootstrap.ign

Here is an explanation of the CoreOS options:

  • coreos.inst.install_dev - The block device which RHCOS will install to.
  • coreos.inst.image_url - The URL of the UEFI or BIOS image that you uploaded to the web server.
  • coreos.inst.ignition_url - The URL of the Ignition config file for this machine type.

Static IPs

Just like the DHCP method, boot from the ISO, and you'll be greeted with the following screen.

isoinstall

Once you see this menu, press tab and enter the options that will image the node using the bios file you downloaded, and prepare the node using the ignition file you'll provide. Here is an example that I did for my bootstrap server.

ip=192.168.7.20::192.168.7.1:255.255.255.0:bootstrap:enp1s0:none:192.168.7.77
coreos.inst.install_dev=vda
coreos.inst.image_url=http://192.168.7.77:8080/install/rhcos-4.1.0-x86_64-metal-bios.raw.gz
coreos.inst.ignition_url=http://192.168.7.77:8080/ignition/bootstrap-static.ign

AGAIN: This needs to be all in one line. I only used line breaks for ease of readability. You will need to put it all in one like the example below.

isoinstall

Syntax for the ip= portion is: ip=$IP::$DEFAULTGW:$NETMASK:$HOSTNAME:$IFACE:none:$DNSSERVER

Finishing Up The Install

Once the bootstrap server is up and running, the install is actually already in progress. First the masters "check in" to the bootstrap server for it's configuration. After the masters are done being configured, the bootstrap server "hands off" responsibility to the masters. You can track the bootstrap process with the following command.

openshift-install wait-for bootstrap-complete --log-level debug

Once the bootstrap process is finished, you'll see the following message.

DEBUG OpenShift Installer v4.1.0-201905212232-dirty
DEBUG Built from commit 71d8978039726046929729ad15302973e3da18ce
INFO Waiting up to 30m0s for the Kubernetes API at https://api.ocp4.example.com:6443...
INFO API v1.13.4+838b4fa up
INFO Waiting up to 30m0s for bootstrapping to complete...
DEBUG Bootstrap status: complete
INFO It is now safe to remove the bootstrap resources

At this point you can remove the bootstrap server from the load balancer. If you're using VMs, you can safely delete the bootstrap node. If you're using bare metal, you can safely repurpose this machine.

Basic functionality of the cluster is now available, however the cluster is not ready for applications. You can now login and take a look at what's finishing up.

cd ~/ocp4
export KUBECONFIG=auth/kubeconfig
oc get nodes

You can take a look to see if any node CSRs are pending.

oc get csr

You can accept the CSRs by running oc adm certificate approve <csr_name> - conversely, you can run the following to approve them all (requires jq command).

oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

The install won't complete without you setting up some storage for the image registry. The below command sets up an "emptyDir" (temp storage). If you'd like to use a more permanent solution; please see this.

oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

Once that's set, finish up the installation by running the following command

openshift-install wait-for install-complete

You'll see the following information about your cluster, including information about the kubeadmin account. This is meant to be a temporary administrative account. Please see this doc to configure identity providers.

INFO Waiting up to 30m0s for the cluster at https://api.ocp4.example.com:6443 to initialize...
INFO Waiting up to 10m0s for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/ocp4/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.example.com
INFO Login to the console with user: kubeadmin, password: PftLM-P6i6B-SEZ2R-QLICJ

Upgrade Cluster

If you've installed an earlier Z release; you can upgrade it to the latest release from the command line. First check what version you have.

# oc get clusterversion
NAME      VERSION        AVAILABLE           PROGRESSING             SINCE         STATUS
version   4.1.6          True                False                   21m           Cluster version is 4.1.6

Initiate an upgrade with the following command

oc adm upgrade --to-latest=true

Check the status with the following

# oc get clusterversion
NAME          VERSION             AVAILABLE       PROGRESSING                  SINCE     STATUS
version       4.1.6               True            True                         45s       Working towards 4.1.7: 13% complete

Conclusion

In this blog, I went through how to install an OpenShift 4 cluster on pre-existing infrastructure on bare metal. This method can also be used on other environments that doesn’t yet have the ability to do an ignition pre-boot.

The new install and deploy process used by OpenShift 4 for bare metal can be a bit confusing and intimidating at first, however this guide, and the documentation, aim to explain the requirements and our goal is to help you be successful. The prerequisites, especially DNS and the load balancers, are critical to success and often the most complex part, so it’s important to read ahead of time to avoid deployment issues.

If you encounter issues, you can connect to the nodes using the SSH key you provided in the install-config.yaml to check the status and look for errors. Once the cluster has been instantiated, you can pull logs and diagnostic information from the nodes using standard oc CLI commands or the administrator GUI. And, you can always open a support case for help with any aspect of your OpenShift cluster.

After your cluster is deployed, you may want to do some additional configuration tasks such as:

  • Configuring authentication and additional users
  • Adding additional routes and/or sharding network traffic
  • Migrating OpenShift services to specific nodes
  • Configuring persistent storage or adding a dynamic storage provisioner
  • Adding more nodes to the cluster

If you have any questions, please leave a comment below or reach out via the Customer Portal Discussions page.

Understanding Service Mesh: Operations Guide

Introduction

The birth of Kubernetes has made the ability to go completely into the cloud a reality. It has provided the industry with a platform to finally build "cloud aware" applications and truly be "cloud-native".

This new push of cloud-native came with, naturally, a set of challenges that needed to be solved. While Kubernetes solved the problem of orchestrating the workloads; the challenges of security, policy, and management still exist. Also with the introduction of serverless and knative; you also bring in more complexity into the mix.

One of the technologies that came to light is the idea of a "service mesh". A service mesh sets to tackle the challenge of security, policy, and traffic management of microservices in a cloud-native platform. In this blog I will explore the idea behind service mesh, explore Istio, and try to explain how it all fits together from an Operations point of view.

Application communication design

Current Design

Whether you are doing monolithic or microservices, the design on how different applications communicate with each other is generally the same. When you take a look at it; it's generally two or more applications communicating with each other over the network via HTTP/HTTPS.

This should look pretty familiar. The challenge with this design is (even taking away cloud-native and microservices for a moment); that security, circuit breaking, and Layer 7 and Layer 4 has to be built into the application stack itself. This is for each application stack across your entire ecosystem. Maintaining rulesets for hundreds of applications can become a maintenance nightmare.

This is where a Service Mesh can be powerful!

Service Mesh Fundamentals

In order to manage the rulesets for an application; it needs to be 1) Application agnostic and 2) Abstracted away from the application. In order to do this a proxy (currently the most popular one is Envoy) is implemented that holds these rules/polices independent of the application (and vice versa). This is deployed in a "sidecar pattern" design. This way ANY application can be plugged in without it needing to know about the Service Mesh.

In this design; Application 1 is talking to Application 2 via a proxy. You can even further dissect it as saying: Application 1's Proxy is talk to Application 2's proxy. Some of the advantages of this design are...

  • Rules and Polices (like routing and circuit breaking) no longer have to be built into the application.
  • Applications can be "plugged in" without them knowing they are being governed.
  • Having a single set of authority makes management easier.

One of the things you need to keep in mind is that once you have hundreds of application with hundreds of instances; the management can get pretty out of hand. Which is why taking the concept (Now I'm teasing Istio) of a control plane (an idea likely borrowed from Kubernetes) fits nicely here.

With this design you can control/manage a fleet of Proxy systems (along with their rluesets/policies) from a central location. But this is only part of the solution.

ISTIO Service Mesh

There are a few technologies that take this design pattern and create a solution based on it. Some include Linkerd, Gloo, and Conduit. The one solution that has gained a lot of favor in the community is Istio.

Istio can kind of get complex and there have been other blogs that have gone in depth on how Istio works under the hood. I also suggest you give Christian Posta a follow. I will try and keep this at a high level from an Operations understanding of Istio's implementation of a Service Mesh.

Istio is made up of two parts. A Control Plane and a Data Plane.

The Control Plane is made of the following components

  • Pilot - This is where the traffic management is set and the config data for the proxies is stored and pushed out of.
  • Mixer - This is the policy engine. It enforces access control and usage. It also collects telemetry from the mesh
  • Citadel - This provides mTLS by way of handing out certificates to the proxies and managing them.

The Data Plane is simply the Envoy Proxies themselves that enforce the rulesets stored on the control plane.

With Istio you can do

  • Circuit Breaking - This allows you to avoid concurrent request to a slow instance or avoid multiple concurrent requests to an instance.
  • Pool Ejection - This removes a failing instances from the pool.
  • Retries - This will foward a request forward the request to another instance just in case we get a falue (open circuit breaker and/or pool ejection)
  • Mutual TLS - This allows you to encrypt all traffic automatically (sometimes called "zero trust" architecture)
  • Telemetry/Tracing - This gives you observability into your microservices and able to trace failures and calls into (and out of) your services.

Conclusion

Service Meshes is still a new technology and is ever evolving. Some of these technologies (like Istio and Linkerd) overlap in functionality and others (like Istio and Gloo) can compliment each other. The important thing is to get familiar with these technologies before they start running in your environment. More importantly, also, so you can make an intelligent decision on which one to use!

I encourage everyone not familiar to go and check out the Katacoda Istio track to get hands on!

Getting Familiar With ClusterAPI

Introduction

There are many tools around to get a Kubernetes cluster up and running. Some of these include kops, kubeadm, openshift-ansible, and kubicon (just to name a few). There is even a way dubbed "The Hard Way", as made famous by Kelsy Hightower.

Some of these tools (like kops and kubicon) aim to manage your entire stack. That is from the infrastructure layer all the way to the Kubernetes layer. This is what I like to think of a fully managed system/install. Other tools take the UPI approach (User Provided Infrastructure). Tools like openshift-ansible and kubeadm let a user bring an already existing infrastructure where you just layer Kubernetes on top of.

ClusterAPI is a SIG group that is trying to bring a declarative approach to setting up Kubernetes clusters. The idea here is that you have a "wanted state" (your described cluster) and ClusterAPI will reconcile that for you. The SIG group has the goal to have ClusterAPI be 1. Use declarative Kubernetes-style APIs and 2. Be environment agnostic (while still being flexible).

This Diagram taken from their github shows the architecture

In this blog I'm going to go through an example of installing Kubernetes on AWS using the ClusterAPI AWS provisioner

prerequisites

So I mostly followed the quickstart that is on the github page. There it lists some good tools to have (some as must have and others as nice to have). To summarize here are the MUST haves:

  • Linux or Mac (no Windows support at this time)
  • AWS Credentials
  • An IAM role to give to the k8s control-plane
  • KIND
    • KIND has it's own dependencies including docker
  • The gettext package installed

Some of the optional nice-to-haves are:

Once you have those; you'll need to install the cli tools. Below is what I installed as of 19-MAR-2019 ...please see here for the latest binaries

# wget https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.1.1/clusterawsadm-linux-amd64
# wget https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.1.1/clusterctl-linux-amd64
# chmod +x clusterctl-linux-amd64
# chmod +x clusterawsadm-linux-amd64
# mv clusterctl-linux-amd64 /usr/local/bin/clusterctl
# mv clusterawsadm-linux-amd64 /usr/local/bin/clusterawsadm

I also downloaded the examples tarball to help generate some files I'll need later

# wget https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases/download/v0.1.1/cluster-api-provider-aws-examples.tar
# tar -xf cluster-api-provider-aws-examples.tar

Setting up environment variables

There is a helper script in the cluster-api-provider-aws-examples.tar tarball that generates a lot of the manifests for you. In the doc it explains some, but not all, of the environment vars that you need to export. I dug around the script and found that these are helpful to set.

export AWS_REGION="us-west-1"
export AWS_ACCESS_KEY_ID="XXXXXXXXXXXXXXXXXXXXX"
export AWS_SECRET_ACCESS_KEY="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/ZZZZZZZZZZ"
export SSH_KEY_NAME="chernand-ec2"
export CLUSTER_NAME="pony-unicorns"
export CONTROL_PLANE_MACHINE_TYPE="m4.xlarge"
export NODE_MACHINE_TYPE="m4.xlarge"

When exporting SSH_KEY_NAME, you need to make sure this key exists in AWS already.

I verified that my exports with the AWS cli

# aws sts get-caller-identity
{
    "Account": "123123123123",
    "UserId": "TH75ISMYR3F4RCHUS3R1D",
    "Arn": "arn:aws:iam::123123123123:user/clusterapiuser"
}

Generating manifests

When I untar-ed the cluster-api-provider-aws-examples.tar file it created an aws dir in my current working directory.

# tree ./aws
./aws
├── addons.yaml
├── cluster-network-spec.yaml.template
├── cluster.yaml.template
├── generate-yaml.sh
├── getting-started.md
├── machines.yaml.template
└── provider-components-base.yaml

Running the generate-yaml.sh script in this directory will generate the needed manifests files for the installer.

# cd ./aws
# ./generate-yaml.sh 
Done generating /root/aws/out/cluster.yaml
Done generating /root/aws/out/machines.yaml
Done copying /root/aws/out/addons.yaml
Generated credentials
Done writing /root/aws/out/provider-components.yaml
WARNING: /root/aws/out/provider-components.yaml includes credentials

Go ahead and go into the out directory and examine these files. Making sure they match what you set in your environment variables

# cd out
# cat *

Once you're okay with these manifests...you can move along to the installer!

installing kubernetes on aws

Using the clusterctl command I created a cluster with the following command

# cd /root/aws/out
# clusterctl create cluster -v 3 \
--bootstrap-type kind \
--provider aws \
-m machines.yaml \
-c cluster.yaml \
-p provider-components.yaml \
-a addons.yaml

You should see the following output

I0319 19:11:27.808556   25430 createbootstrapcluster.go:27] Creating bootstrap cluster
I0319 19:11:27.808667   25430 kind.go:57] Running: kind [create cluster --name=clusterapi]
I0319 19:12:10.001664   25430 kind.go:60] Ran: kind [create cluster --name=clusterapi] Output: Creating cluster "clusterapi" ...
 • Ensuring node image (kindest/node:v1.13.3) 🖼  ...
 ✓ Ensuring node image (kindest/node:v1.13.3) 🖼
 • Preparing nodes 📦  ...
 ✓ Preparing nodes 📦
 • Creating kubeadm config 📜  ...
 ✓ Creating kubeadm config 📜
 • Starting control-plane 🕹️  ...
 ✓ Starting control-plane 🕹️
Cluster creation complete. You can now use the cluster with:

export KUBECONFIG="$(kind get kubeconfig-path --name="clusterapi")"
kubectl cluster-info
I0319 19:12:10.001735   25430 kind.go:57] Running: kind [get kubeconfig-path --name=clusterapi]
I0319 19:12:10.043264   25430 kind.go:60] Ran: kind [get kubeconfig-path --name=clusterapi] Output: /root/.kube/kind-config-clusterapi
I0319 19:12:10.046231   25430 clusterdeployer.go:78] Applying Cluster API stack to bootstrap cluster
I0319 19:12:10.046258   25430 applyclusterapicomponents.go:26] Applying Cluster API Provider Components
I0319 19:12:10.046273   25430 clusterclient.go:919] Waiting for kubectl apply...
I0319 19:12:11.757657   25430 clusterclient.go:948] Waiting for Cluster v1alpha resources to become available...
I0319 19:12:11.765143   25430 clusterclient.go:961] Waiting for Cluster v1alpha resources to be listable...
I0319 19:12:11.792776   25430 clusterdeployer.go:83] Provisioning target cluster via bootstrap cluster
I0319 19:12:11.852236   25430 applycluster.go:36] Creating cluster object pony-unicorns in namespace "default"
I0319 19:12:11.877091   25430 clusterdeployer.go:92] Creating control plane controlplane-0 in namespace "default"
I0319 19:12:11.897136   25430 applymachines.go:36] Creating machines in namespace "default"
I0319 19:12:11.915500   25430 clusterclient.go:972] Waiting for Machine controlplane-0 to become ready...

What's happening here is that the installer is creating a local kubernetes cluster using kind. There the local cluster uses your creds to install a kubernetes cluster on AWS. Open another terminal window and see the following pods come up.

# kubectl  get pods  --all-namespaces 
NAMESPACE             NAME                                               READY   STATUS    RESTARTS   AGE
aws-provider-system   aws-provider-controller-manager-0                  1/1     Running   0          72s
cluster-api-system    cluster-api-controller-manager-0                   1/1     Running   0          72s
kube-system           coredns-86c58d9df4-4r2jx                           1/1     Running   0          73s
kube-system           coredns-86c58d9df4-lg2zd                           1/1     Running   0          73s
kube-system           etcd-clusterapi-control-plane                      1/1     Running   0          24s
kube-system           kube-apiserver-clusterapi-control-plane            1/1     Running   0          5s
kube-system           kube-controller-manager-clusterapi-control-plane   1/1     Running   0          16s
kube-system           kube-proxy-qj7qp                                   1/1     Running   0          73s
kube-system           kube-scheduler-clusterapi-control-plane            1/1     Running   0          17s
kube-system           weave-net-qpcq2                                    2/2     Running   0          73s

Once they are all running, tail the log of the aws-provider-controller-manager-0 pod to see what's happening (useful for debugging).

# kubectl logs -f -n aws-provider-system aws-provider-controller-manager-0

Once it's done you'll see an output that looks something like this (note that the KIND cluster is only temporary)

I0319 19:22:58.000769   25430 clusterdeployer.go:143] Done provisioning cluster. You can now access your cluster with kubectl --kubeconfig kubeconfig
I0319 19:22:58.000823   25430 createbootstrapcluster.go:36] Cleaning up bootstrap cluster.
I0319 19:22:58.000832   25430 kind.go:57] Running: kind [delete cluster --name=clusterapi]
I0319 19:22:58.882121   25430 kind.go:60] Ran: kind [delete cluster --name=clusterapi] Output: Deleting cluster "clusterapi" ...
$KUBECONFIG is still set to use /root/.kube/kind-config-clusterapi even though that file has been deleted, remember to unset it

Now the moment of truth...See if I can see my cluster...

# kubectl get nodes --kubeconfig=kubeconfig 
NAME                                      STATUS   ROLES    AGE   VERSION
ip-10-0-0-11.us-west-1.compute.internal   Ready    <none>   70m   v1.13.3
ip-10-0-0-88.us-west-1.compute.internal   Ready    master   72m   v1.13.3

It works! I have one controller and one worker node. Looks like they are also preparing for multimaster since I can see that an ELB was created for me.

# kubectl config view --kubeconfig=kubeconfig 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://pony-unicorns-apiserver-1159964288.us-west-1.elb.amazonaws.com:6443
  name: pony-unicorns
contexts:
- context:
    cluster: pony-unicorns
    user: kubernetes-admin
  name: kubernetes-admin@pony-unicorns
current-context: kubernetes-admin@pony-unicorns
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: REDACTED
    client-key-data: REDACTED

uninstalling and cleanuP

To delete everything I created, I used the clusterctl command

# clusterctl delete cluster \
--bootstrap-type kind \
--kubeconfig kubeconfig -p provider-components.yaml 
I0319 20:37:14.156586    2852 clusterdeployer.go:149] Creating bootstrap cluster
I0319 20:37:14.156630    2852 createbootstrapcluster.go:27] Creating bootstrap cluster
I0319 20:37:56.466031    2852 clusterdeployer.go:157] Pivoting Cluster API stack to bootstrap cluster
I0319 20:37:56.466130    2852 pivot.go:67] Applying Cluster API Provider Components to Target Cluster
I0319 20:37:57.876975    2852 pivot.go:72] Pivoting Cluster API objects from bootstrap to target cluster.
I0319 20:38:33.196752    2852 clusterdeployer.go:167] Deleting objects from bootstrap cluster
I0319 20:38:33.196782    2852 clusterdeployer.go:214] Deleting MachineDeployments in all namespaces
I0319 20:38:33.198438    2852 clusterdeployer.go:219] Deleting MachineSets in all namespaces
I0319 20:38:33.200085    2852 clusterdeployer.go:224] Deleting Machines in all namespaces
I0319 20:38:43.227284    2852 clusterdeployer.go:229] Deleting MachineClasses in all namespaces
I0319 20:38:43.229738    2852 clusterdeployer.go:234] Deleting Clusters in all namespaces
I0319 20:41:13.253792    2852 clusterdeployer.go:172] Deletion of cluster complete
I0319 20:41:13.254168    2852 createbootstrapcluster.go:36] Cleaning up bootstrap cluster.

This was the easiest and most straight forward of the whole process.

conclusion

In this blog I took a look at ClusterAPI and tested the ClusterAPI AWS Provider. The ClusterAPI SIG aims to unify how we provide the infrastructure to/for Kubernetes clusters. It aims to rebuild what we currently have out there by learning from what we got out of tools like kops, kubicon, and ansible.

The project is still in it's infancy and is bound to change. I encourage you to try it out and provide feedback. There is also a channel on the Kubernetes Slack that you can join as well.

Exploring Kubernetes Storage

Introduction

When you think about Kubernetes the first thing that usually comes to mind is running stateless application. The very nature of the design of Kubernetes lends itself to running stateless applications. However, since Kubernetes runs on Linux, you were able to attach storage systems to support stateful applications. But how do you face the challenge of adding support for new volume plugins?

This is where CSI comes in. CSI (or Container Storage Interface) provides a standard to expose block/file storage to containers. This allows storage vendors to write storage plugins for Kubernetes without having to modify the Kubernetes core code. CSI was introduced in v1.9 and is now GA in v1.13

This has enabled various storage vendors to integrate their storage systems into Kubernetes, and even cloud providers have provided integrated solutions (Like EBS on Amazon, for example).

In this blog I will be taking look at GlusterFS and Rook and exploring some of the advantages and pitfalls.

Setup

For this blog I have installed/setup the following for my environment (although using minikube should work as well)

Rook installation

To install rook, I took a look at their latest documentation page. There I found this helpful quickstart page that provided an easy way to deploy rook with ceph using helm. To get a rook system up and running consists of 3 parts: The operator, the rook ceph cluster, and the storageclass.

To install the rook operator I used helm. This is pretty straight forward and I was able to install following the documentation.

$ helm repo add rook-stable https://charts.rook.io/stable
$ helm install --namespace rook-ceph-system rook-stable/rook-ceph

Now that the operator is up and running, we need to deploy a rook cluster. More specific; we want to deploy a rook ceph cluster. I will be deploying a copy of the yaml from my github page but you should look at the quickstart page for an up to date yaml.

$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-cluster.yaml

This creates (among other things) the rook CRDs for ceph. Please see the documentation if you need to customize any values.

Next is the storageClass. However before we can create the storageClass, we have to create the CR of CephBlookPool. This custom resource will notify the operator to create a 3 way replicate cluster to serve block storage. I included it in my yaml, along with my storageclass; but more information can be found in the docs.

kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-storageclass.yaml

After a bit, you should see rook-ceph-block as an available storageClass.

$ kubectl get sc
NAME                 PROVISIONER            AGE
rook-ceph-block      ceph.rook.io/block     3m

Testing Rook/Ceph

In order to test this I created a namespace and then deployed a sample application (that accepts file uploads) to that namespace

$ kubectl create ns test
namespace/test created
$ kubectl create deployment upload --image=quay.io/redhatworkshops/upload:latest -n test
deployment.apps/upload created

I also exposed this deployment and created an ingress as well in order for me to test the upload.

Now, when I created the pvc I specified that I wanted to use the block storage provided by rook/ceph by using the volume.beta.kubernetes.io/storage-class: rook-ceph-block annotation in my yaml file.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: ceph-block-pvc0001
 annotations:
   volume.beta.kubernetes.io/storage-class: rook-ceph-block
spec:
 accessModes:
  - ReadWriteOnce
 resources:
   requests:
     storage: 1Gi

Now I just loaded this yaml to create my pvc

$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/rook-sample-pvc.yaml -n test
persistentvolumeclaim/ceph-block-pvc0001 created

Here, rook will create my block volume on the fly for me, creating the pv that satisfies my claim. Checking the pvc status shows that I have it bound to a pv.

$ kubectl get pvc -n test
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
ceph-block-pvc0001   Bound    pvc-0419077f-4510-11e9-bd1f-42010a8e0033   1Gi        RWO            rook-ceph-block   5m

I edited my deployment using kubectl edit deployment upload -n test and adding the volumeMounts and volumes section highlighted below.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: null
  generation: 1
  labels:
    app: upload
  name: upload
  selfLink: /apis/extensions/v1beta1/namespaces/test/deployments/upload
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: upload
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: upload
    spec:
      containers:
      - image: quay.io/redhatworkshops/upload:latest
        imagePullPolicy: Always
        name: upload
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/app-root/src/uploaded
          name: upload-storage
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: upload-storage
        persistentVolumeClaim:
          claimName: ceph-block-pvc0001

Taking a look in the container; you can see this appears as /dev/rbd0 on the container and it's already mounted /formatted.

$ kubectl exec -it upload-7d9d6b987-fhq69 -n test -- df -h /opt/app-root/src/uploaded
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd0      1014M   33M  982M   4% /opt/app-root/src/uploaded

Issues and resolutions

When I went and tested the application; I got the following error: Permission denied in /opt/app-root/src/upload.php

Doing some digging around I found that the permissions are wrong on my directory.

bash-4.2$ ls -ld /opt/app-root/src/uploaded/
drwxr-xr-x 2 root root 6 Mar 12 22:04 /opt/app-root/src/uploaded/

This is an issue since I am running this container as a non-root user so I can't just chmod the directory. A little "hacking" was in order. First I figured out where the pod was running.

$ kubectl get pod upload-7d9d6b987-fhq69 -n test -o jsonpath='{.spec.nodeName}{"\n"}'
nodes-8z9

Looks like this pod is running on node nodes-8z9nn. So I logged into this node

$ gcloud compute ssh nodes-8z9n --zone us-east1-d

I used docker commands to findout what docker ID the container had...then used nsenter to get into the namespace.

 $ nsenter --target $PID --mount --uts --ipc --net --pid 

Once inside I was able to chmod the directory

# chmod 777 /opt/app-root/src/uploaded/

After I did that, I was able to use my app to upload files in the ceph block storage system.

Glusterfs Installation

In order to test gluster; I fist needed to add some raw storage devices to the nodes. Gluster (specifically glusterfs-kubernetes) likes to work with raw devices. I added 100GB volumes to each of my 3 nodes.

Note that ceph can also work with raw disks and not just use directories to store data.

I mainly used the github page for installation. Also I went through and made sure all the prereqs were done on all servers. In short, I did the following (I also added iptables rules)

# for i in dm_snapshot dm_mirror dm_thin_pool; do modprobe $i; done
# apt -y install glusterfs-client glusterfs-common

After the prereqs are done, I cloned the git repo to use the installation script provided.

$ git clone https://github.com/gluster/gluster-kubernetes

After you have that, take the sample topology file (provided in the repo) and create your own. Being careful to make sure your settings are right. Mine looked like this.

{
  "clusters": [
    {
      "nodes": [
        {
          "node": {
            "hostnames": {
              "manage": [
                "ip-172-20-107-182.us-west-2.compute.internal"
              ],
              "storage": [
                "172.20.107.182"
              ]
            },
            "zone": 1
          },
          "devices": [
            "/dev/xvdz"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "ip-172-20-47-197.us-west-2.compute.internal"
              ],
              "storage": [
                "172.20.47.197"
              ]
            },
            "zone": 2
          },
          "devices": [
            "/dev/xvdz"
          ]
        },
        {
          "node": {
            "hostnames": {
              "manage": [
                "ip-172-20-93-253.us-west-2.compute.internal"
              ],
              "storage": [
                "172.20.93.253"
              ]
            },
            "zone": 3
          },
          "devices": [
            "/dev/xvdz"
          ]
        }
      ]
    }
  ]
}

I'll try and break this down a bit.

  • manage - This is the actual node name that you get from the kubectl get nodes command
  • storage - This is kind of missnamed. This is the IP address of the node itself (the actual IP not an SDN ip)
  • zone - the way that glusterfs works, it'll pick a node from each zone to create a 3way replicate volume. This is basically failure domains and you need at least 3 (if you're running 1 because of minikube, that's okay)
  • devices - this is an array of raw devices. Minimum is 1.

NOTE: Please see the following bug about gluster-blockd. I had to edit the file deploy/kube-templates/glusterfs-daemonset.yaml and change GLUSTER_BLOCKD_STATUS_PROBE_ENABLE to 0

Now, using the gk-deploy command I run the following (NOTE: you may need to run it with --single-node if you're using minikube)

$ ./gluster-kubernetes/deploy/gk-deploy gfs.json -g \
-c kubectl  -n glusterfs -w 1200 --no-object -y

You will get a message that it's complete and you can verify that all the pods are running

$ kubectl get pods -n glusterfs
NAME                      READY   STATUS    RESTARTS   AGE
glusterfs-bfhqx           1/1     Running   0          12m
glusterfs-hwb98           1/1     Running   0          12m
glusterfs-xpc2r           1/1     Running   0          12m
heketi-7495cdc5fd-b6s82   1/1     Running   0          4m11s

Now you need to create the storageClass based on the service address. Using my example yaml as a template; I created the following spec. (Note that I got the resturl by running the kubectl get svc -n glusterfs command and looking at the heketi service address)

apiVersion: storage.k8s.io/v1beta1
kind: StorageClass
metadata:
  name: gluster-container
provisioner: kubernetes.io/glusterfs
parameters:
  resturl: "http://172.30.59.174:8080"
  restuser: "admin"
  volumetype: "replicate:3"

Now you should be able to see the storageclass

$ kubectl get sc  gluster-container
NAME                PROVISIONER               AGE
gluster-container   kubernetes.io/glusterfs   21s

Testing gluster

I will be using tha same deployment as before; I will modify it to reference the new storage. First I verify it's running

$ kubectl get pods -n test
NAME                           READY   STATUS    RESTARTS   AGE
upload-bb9df669f-twmq6         1/1     Running   0          65s

Now using my pvc template for gluster; I created a pvc. And just like rook, gluster creates the pv on the fly to satisfy my pvc request.

$ kubectl create -f https://raw.githubusercontent.com/christianh814/kubernetes-toolbox/master/resources/examples/glusterfs-sample-pvc.yaml -n test
persistentvolumeclaim/gluster-pvc0001 created

Checking my pvc status, I see that I have storage bound

$ kubectl get pvc -n test
NAME              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
gluster-pvc0001   Bound    pvc-8ed0bbfc-4538-11e9-8da3-001a4a16011b   1Gi        RWX            gluster-container   7m14s

Next, I used kubectl edit deploy/upload -n test to edit my deployment to specify the new gluster volume. In the end my deployment looked like this.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: null
  generation: 1
  labels:
    app: upload
  name: upload
  selfLink: /apis/extensions/v1beta1/namespaces/test/deployments/upload
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: upload
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: upload
    spec:
      containers:
      - image: quay.io/redhatworkshops/upload:latest
        imagePullPolicy: Always
        name: upload
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/app-root/src/uploaded
          name: upload-storage
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - name: upload-storage
        persistentVolumeClaim:
          claimName: gluster-pvc0001

If you look inside the pod, you will see the network mount (since glusterfs is filebased storage; you won't see it as a block device)

$ kubectl exec -it upload-7cb79f89cb-pjhls -n test -- df -h uploaded
Filesystem                                         Size  Used Avail Use% Mounted on
192.168.1.19:vol_7eb133c254df1695d670b6c8dc437fdd 1014M   43M  972M   5% /opt/app-root/src/uploaded

Issues/Resolutions for Gluster

As noted above; I ran into this bug and had to disable block. Since I wasn't using block it wasn't such a big deal. However you do need to watch out for it since your install won't work without disabling it.

Also I spent quite a bit of time getting the firewall rules right. This took some trial and error on my part. In the end I ran this on ALL servers in my kubernetes cluster

iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m multiport --dports 49152:49664 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 24010 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 3260 -j ACCEPT
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 111 -j ACCEPT

Also, since gluster requires raw devices, you need to check with your provider on how to do so. You may run into challenges if you have instance-groups and the like.

I am happy to see that I was able to use the volume without the need to change permissions.

conclusion

In this blog we took a brief look at CSI and how it will help storage vendors to write storage plugins for Kubernetes. We also explored glusterfs and rook, to test how block and file storage works in Kubernetes.

There are a plethora of other storage providers for k8s including OpenEBS, Trident, logDNA, and many more.

As Kubernetes becomes more and more of a standard; I expect to see a lot more storage vendors and storage projects providing solutions. This will provide a wide range of choice for many workloads.

Automating OpenShift Installs

Introduction

I've been involved with OpenShift since it's pre-Kubernetes days. I've also been through it's re-write when Kubernetes came on the scene about four years ago. I've been through the evolution of DevOps being built around Kubernetes and the birth of "Cloud Native" and the CNCF.

When I started working on automating installs on my github page the first thought that came to mind was people asking "Why?". I've done a lot of engagements with many customers and every one of them started with the "One Cluster to rule them all" frame of mind, to always end up with multiple clusters across multiple data centers.

If you're running Kubernetes/OpenShift in production; you will quickly learn that you can't do what you've always done, but just use Kubernetes. (Kelsey Hightower has a great talk where he says "You can't rub Kubernetes over your situation to make it better")

In the end you are going to be running many clusters and automating that is going to save you lots of time.

Let's get started!

Technologies Used

I used the following technologies in my tests

  • OpenShift Container Platform v3.11
  • Red Hat Enterprise Virtualization 4.2
  • Red Hat Identity Manager 4.6
  • Ansible 2.7

Although I haven't tested it, this same set up should work with okd, ovirt, and freeipa as well.

For those not familiar with RHEV/oVirt or RHIDM/FreeIPA; I'll give a short explanation on what these provide.

RHEV/oVirt provides a vitualization platform that is comparable to VMWare ESXi with vCenter. RHEV has a rich api and templating system that I will be leveraging to create the vms where I'll be installing OpenShift

Red Hat IdM/FreeIPA
provides a centrally managed Identity, Policy, and Audit server. It combines, LDAP, MIT Kerberos, NTP, DNS, and a CA Certificate system. For my testing I am mainly using the DNS system to dynamically create my DNS entries using the api.

Prework and Assumptions

I'm going to make a few assumptions, mostly having to do with infrastructure. These are things that are either out of the scope of this post or things that I'm assuming you already have in place in your environment.

  • I have created a VM Template based on the section in the OpenShift doc that describes how to prepare your hosts for OpenShift.
  • All these hosts had the correct SSH keys installed
  • I have an IPA domain/realm of cloud.chx that I also use for DNS.
  • I have DHCP set up with all my IPs in DNS (forward AND reverse are setup)

Automated Installs

I'm going to go through, at a high level, some of the sections of my playbook from my github repo. For a more detailed overview please see that repo itself. This is just to give you an idea of the thought process in automating installs.

RHEV/oVirt Auth

So first and foremost, you'll need to set up how ansible authenticates to RHEVM/oVirt. This settings just sets up credentials to use for subsequent ovirt_vm module calls. The config looks like this.

  - name: Get RHEVM Auth
    ovirt_auth:
      url: https://rhevm.cloud.chx/ovirt-engine/api
      username: admin@internal
      password: "{{ lookup('env','OVIRT_PASSWORD') }}"
      insecure: true
      state: present

Note the {{ lookup('env','OVIRT_PASSWORD') }} value. This says that ansible will be looking up the password in an environment variable (I'll be doing this a lot in my playbook).

VM Creation

In order to create a VM from my template; I will need to call the ovirt_vm module. This is where you specify the size and specs of the servers. You also specify the template (this is the one I created that I used the host preparation guide against).

  - name: Create VMs for OpenShift
    ovirt_vm:
      auth: "{{ ovirt_auth }}"
      name: "{{ item }}"
      comment: This is {{ item }} for the OCP cluster
      state: running
      cluster: Default
      template: rhel-7.6-template
      memory: 16GiB
      memory_max: 24GiB
      memory_guaranteed: 12GiB
      cpu_threads: 2
      cpu_cores: 2
      cpu_sockets: 1
      type: server
      operating_system: rhel_7x64
      nics:
      - name: nic1
        profile_name: ovirtmgmt
      wait: true
    with_items:
      - master1
      - app1
      - app2
      - app3

Adding OCS Disk

Since I am using OpenShift Container Storage (OCS); I used ovirt_disk to attach an extra disk to my application servers

  - name:  Attach OCS disk to VM 
    ovirt_disk:
      auth: "{{ ovirt_auth }}"
      name: "{{ item }}_disk2"
      vm_name: "{{ item }}"
      state: attached
      size: 250GiB
      storage_domain: vmdata
      format: cow
      interface: virtio_scsi
      wait: true
    with_items:
      - app1
      - app2
      - app3

Creating an install hostfile

In order to install OpenShift v3.11 you will need to create an ansible host file (since OpenShift v3.x uses ansible to install). I use ansible templating to dynamically create this file to use for installation. Here I am getting the server information from what I created and using it to build my ansible host file for OpenShift

  - name: Obtain VM information
    ovirt_vm_facts:
      auth: "{{ ovirt_auth }}"
      pattern: name=master* or name=app* and cluster=Default
      fetch_nested: true
      nested_attributes: ips

  - name: Write out a viable hosts file for OCP installation
    template:
      src: ../templates/poc-generated_inventory.j2
      dest: ../output_files/poc-generated_inventory.ini

Now that I have that file, I add (what will be) the master to the in memory inventory file in order to copy that inventory file to the master.

  - name: Obtain Master1 VM information
    ovirt_vm_facts:
      auth: "{{ ovirt_auth }}"
      pattern: name=master1 and cluster=Default
      fetch_nested: true
      nested_attributes: ips

  - name: Set Master1 VM Fact
    set_fact:
      ocp_master: "{{ ovirt_vms.0.fqdn }}"

  - name: Add "{{ ocp_master }}" to in memory inventory
    add_host:
      name: "{{ ocp_master }}"

  - name: Copy inventory to "{{ ocp_master }}"
    copy:
      src: ../output_files/poc-generated_inventory.ini
      dest: /etc/ansible/hosts
      owner: root
      group: root
      mode: 0644
    delegate_to: "{{ ocp_master }}"

Note I'm using delegate_to in oder to reference the master that just got provisioned.

Creating DNS entries

Since I'm using IPA for DNS, I will be tapping into the API in order to create entries. I set up some defaults (like domains and such) as variables.

- hosts: all
  vars:
    wildcard_domain: "osa.cloud.chx"
    console_fqdn: "openshift.cloud.chx"
    zone_fwd: "cloud.chx"

Using these variables; I created DNS entries pointing the wildcard DNS and console DNS to the master (where these objects will be running

  - name: Create DNS CNAME record
    ipa_dnsrecord:
      ipa_host: ipa1.cloud.chx
      ipa_pass: "{{ lookup('env','IPA_PASSWORD') }}"
      ipa_user: admin
      ipa_timeout: 30
      validate_certs: false
      zone_name: "{{ zone_fwd }}"
      record_name: openshift
      record_type: 'CNAME'
      record_value: "{{ ocp_master }}."
      record_ttl: 3600
      state: present

  - name: IPA create Wildcard DNS record
    ipa_dnsrecord:
      ipa_host: ipa1.cloud.chx
      ipa_pass: "{{ lookup('env','IPA_PASSWORD') }}"
      ipa_user: admin
      ipa_timeout: 30
      validate_certs: false
      zone_name: "{{ zone_fwd }}"
      record_name: '*.osa'
      record_type: 'CNAME'
      record_value: "{{ ocp_master }}."
      record_ttl: 3600
      state: present

Note: This overwrites the entries if it's not already set. If nothing is set, it will create them.

Preparing the hosts

When I created the VM template I created it as generic as possible. Still, it's nice to make sure the servers are updated with the proper packages. This is kind of ugly and a work in progress.

  - name: Update packages via ansible from "{{ ocp_master }}"
    shell: |
      ansible all -m shell -a "subscription-manager register --username {{ lookup('env','OREG_AUTH_USER') }} --password {{ lookup('env','OREG_AUTH_PASSWORD') }}"
      ansible all -m shell -a "subscription-manager attach --pool {{ lookup('env','POOL_ID') }}"
      ansible all -m shell -a "subscription-manager repos --disable=*"
      ansible all -m shell -a "subscription-manager repos --enable=rhel-7-server-rpms --enable=rhel-7-server-extras-rpms --enable=rhel-7-server-ose-3.11-rpms --enable=rhel-7-server-ansible-2.6-rpms --enable=rh-gluster-3-client-for-rhel-7-server-rpms"
      #ansible all -m shell -a "yum -y update"
      # Temp fix because of https://access.redhat.com/solutions/3949501
      ansible all -m shell -a "yum -y update --exclude java-1.8.0-openjdk*"
      ansible all -m shell -a "systemctl reboot"
    delegate_to: "{{ ocp_master }}"

  - name: Wait for servers to restart
    wait_for:
      host: "{{ ocp_master }}"
      port: 22
      delay: 30
      timeout: 300

In the above I'm using the shell module. Ideally you'd want to use the package and the redhat_subscription modules. For right now this should work fine.

Note: I reference a bug where you have to add an exclude to your yum update command.

Running the installer

Now that I have my hostsfile for the installer, my DNS in place, and my VMs ready to go; I can go ahead with the install. I run the playbook directly on master from my laptop via the playbook. Again I'm using delegate_to to do this.

  - name: Running OCP prerequisites from "{{ ocp_master }}"
    shell: |
      ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml
    delegate_to: "{{ ocp_master }}"

  - name: Running OCP installer from "{{ ocp_master }}"
    shell: |
      ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
    delegate_to: "{{ ocp_master }}"

  - name: Display OCP artifacts from "{{ ocp_master }}"
    shell: oc get pods --all-namespaces
    register: ocpoutput
    delegate_to: "{{ ocp_master }}"

  - debug: var=ocpoutput.stdout_lines

Once the install is done I run an oc getpods --all-namespaces to see what the status of everything is and I print them out

Torubleshooting

I took a "cattle" approach in building this. Since I'm creating everything dynamically; I basically destroyed everything, fixed what was wrong, and then re-ran the playbook. This is the beauty of automating installs. Here is an example of my destroy/uninstall playbook

  - name: Delete VM
    ovirt_vm:
      auth: "{{ ovirt_auth }}"
      name: "{{ item }}"
      state: absent
      cluster: Default
      wait: true
    with_items:
      - master1
      - master2
      - master3
      - infra1
      - infra2
      - infra3
      - app1
      - app2
      - app3
      - lb

As you can see, I just delete everything. Since my DNS get's updated on creation, there's not a NEED to remove those entries either! (although it's probably good that you do).

Summary

In this blog I took you through a high level overview on how you can automate Kubernetes/Openshift installations using opensource tools. I used OpenShift, RHEV, Red Hat IdM, and Ansible specifically in my example.

You can also apply this to other tools like Dyamic DNS, Kubernetes, Puppet, Chef, VMWare ESXi vCenter, etc. The tools aren't necessarily important but getting to where you are automating installs is!

If you plan on running Kubernetes/OpenShift in production you will be running many clusters, and having a way to stamp these out will be paramount for your cloud native environment.

Use less YAML

I have been thinking about what should my first blog post be about. I figured since I just took the CKA (by the way, I passed!), I have kubernetes short hand commands on the brain; so I'll write about using less YAML when working with k8s.

When studying for the CKA; I came across a lot of blogs/howtos that show things like creating pods and deployments by creating a YAML file and using kubectl create -f ... or (what's worse) you'll see a cat <<EOF | kubectl create -f - to create a resource.

Now don't get me wrong; I'm not bashing using YAML. When working with kubernetes, you'll inevitably have to use YAML at some point. It's also completely valid way to do things. But when you're doing something like an exam, where time is precious. These can come in handy!

During the CKA exam, you have (if you average it out); 7 minutes per question. So time is precious and I learned how to generate resources through the kubectl command to save time.

So to create a deployment you can do the following

$ kubectl create deployment welcome-php --image=quay.io/redhatworkshops/welcome-php:latest
deployment.apps/welcome-php created

This creates all my resources
$ kubectl get deploy,rs,pod
NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.extensions/welcome-php   1         1         1            1           7m

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.extensions/welcome-php-57db6cbb6   1         1         1       7m

NAME                              READY   STATUS    RESTARTS   AGE
pod/welcome-php-57db6cbb6-kjtcz   1/1     Running   0          7m


Now, I can actually create a service by exposing the deplpyment
$ kubectl expose deploy welcome-php --port=8080 --target-port=8080
service/welcome-php exposed


Now I have all the resources I need for my application.
$ kubectl get deploy,svc,rs,pod
NAME                                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.extensions/welcome-php   1         1         1            1           15m

NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/welcome-php   ClusterIP   100.71.31.164   <none>        8080/TCP   1m

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.extensions/welcome-php-57db6cbb6   1         1         1       15m

NAME                              READY   STATUS    RESTARTS   AGE
pod/welcome-php-57db6cbb6-kjtcz   1/1     Running   0          15m


If you look at the kubectl create -h it'll show you what you can create via the cli. Here is a snippet.
Available Commands:
  clusterrole         Create a ClusterRole.
  clusterrolebinding  Create a ClusterRoleBinding for a particular ClusterRole
  configmap           Create a configmap from a local file, directory or literal value
  deployment          Create a deployment with the specified name.
  job                 Create a job with the specified name.
  namespace           Create a namespace with the specified name
  poddisruptionbudget Create a pod disruption budget with the specified name.
  priorityclass       Create a priorityclass with the specified name.
  quota               Create a quota with the specified name.
  role                Create a role with single rule.
  rolebinding         Create a RoleBinding for a particular Role or ClusterRole
  secret              Create a secret using specified subcommand
  service             Create a service using specified subcommand.
  serviceaccount      Create a service account with the specified name


So to recap...you can run the following to create and expose your application without using any YAML.
$ kubectl create deployment welcome-php --image=quay.io/redhatworkshops/welcome-php:latest
$ kubectl expose deploy welcome-php --port=8080 --target-port=8080


You can even do the same with a pod and node port. (note that I named the nodeport the same as the pod)
$ kubectl run nginx --image=nginx --generator=run-pod/v1 -l app=nginx
pod/nginx created

$ kubectl create service nodeport nginx --node-port=32000 --tcp=80:80 
service/nginx created


When working with kubernetes, you will run into lots of YAMLs that you will be copying and pasting. You can save yourself some typing if use the kubectl to create these resources for you!