As we’ve covered before, shared file systems are a tricky problem in the cloud. One solution to that problem is a distributed file system. Something each one of your app nodes can read from and write to. When it comes to distributed file systems, GlusterFS is one of the leading products.
With a few simple scripts on your Mac OS X or Linux machine, you can deploy a multi-zone High Availability (HA) GlusterFS cluster to Google Compute Engine (GCE) that provides scalable, persistent shared storage for your GCE or Google Container Engine (GKE) Kubernetes clusters.
In this post, I will demo these scripts and show you how to do this. By default, our GlusterFS cluster will use three GlusterFS servers, one server per Google Cloud zone in the same chosen region.
Before continuing, please make sure you have:
The first thing you need to do is clone my GitHub repo:
$ git clone https://github.com/rimusz/glusterfs-gce
Then, open the cluster/settings
file in your text editor. Find the section marked your cluster region and zones zones
and set REGION
and ZONES
to whatever matches your preferred setup. The rest of settings in this file are probably fine, but can be adjusted if need be.
There’s nothing more you need to do. You should already be authenticated with Google from when you installed the Google Cloud SDK.
You can go right ahead and create the cluster by running:
$ ./cluster/create_cluster.sh
This command will create three servers.
Each server will have:
- A static IP
- The GlusterFS server package installed
- A Google Cloud persistent disk to be used as a GlusterFS brick, that is: storage space made available to the cluster
A GlusterFS volume is a collection of bricks. A volume can store data across the bricks in three basic ways: distributed, striped, or replicated.
In summary:
- A distributed volume stores each file on one brick
- A striped volume stores a single file, in chunks, across multiple bricks
- A replicated volume stores a copy of each file on every brick
My script configures the replicated volume store. This provides you with faul-tolerance, should one of your GlusterFS nodes go down.
Here’s how to create a replicated volume on all three servers:
$ cd ..
$ ./cluster/create_volume.sh VOLUME_NAME
At this point, your GlusterFS cluster should be fully set up and operational.
Let’s test it.
Spin up a new GCE virtual machine, or use one of your existing non-GlusterFS virtual machines, and grab a shell on that machine.
Then mount your GlusterFS volume (replacing VOLUME_NAME
with the actual volume name you chose) to /mnt/gfs
by running this command:
mount -t glusterfs gfs-cluster1-server-1:/VOLUME_NAME /mnt/gfs
The gfs-cluster1-server-1
is the name your cluster is given by my script.
Then copy a bunch of files into the mount point:
$ for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done
You don’t have to run this exact command. Any files will do.
Now, grab a shell on one of your GlusterFS virtual machines, and take a look inside the brick volume to see the files you just created.
Do that like so:
$ ls -lA /data/brick1/VOLUME_NAME
Again, replace VOLUME_NAME
with the volume name you chose.
In this post, we saw how to use a few scripts to quickly launch a GlusterFS cluster on GCE for scalable, persistent storage for your apps.
In cluster
folder of this repo, there are two more scripts:
- The
upgrade_glusterfs.sh
script upgrades GlusterFS on all servers
- The
upgrade_servers.sh
script upgrades your distro (via APT) on all servers
To learn more about using GlusterFS with Kubernetes, check out the GlusterFS examples in the official Kubernetes repository.
We plan to cover more on this topic in future posts!