Proof of Concept for Setting up GlusterFS with Fedora and VirtualBox
Published on by nick
A client of mine has a problem: too much data; not enough storage. Because of historically bad practices, they are stuck with a large file server, holding around 15 TB of data, and backups that are also full, and not even stored off site. Instead of replacing disks in the file server RAID (currently holds 7 x 4 TB WD Reds), another approach is needed. Something that scales and that can handle the growth of their projected data storage needs.
In comes GlusterFS, "[...] a scalable network file system. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks. GlusterFS is free and open source software (Gluster web site)."
I chose to go with Fedora (version 25 as of this writing), since this is the OS mentioned in their documentation. I'm more of an Ubuntu person, but Fedora is cool. Plus, any chance to become more familiar with another Linux distro is OK by me. I also went with the VM model, using VirtualBox on Windows 7. This was OK, because after creating the VMs, I did all the rest of the work in the terminal on the Fedora boxes and my Ubuntu laptop.
I created two VMs, essentially clones of one another, on my subnet...
192.168.x.101. I configured these in bridge mode in VirtualBox, because I wanted them available on my network for easy testing. When I was done, I had two servers:
server2. Yeah, I know, very original names! That's OK, because these VMs probably will not stick around all that long...
On each VM, aside from the root directory and swap, I created an XFS file system that was mounted at something like
Next I had to install and run the GlusterFS server (remember, I'm on Fedora 25 -- yum yum):
yum install glusterfs-server service glusterd start service glusterd status
This should spit out some status output and show that the daemon is running... OK, great... moving on.
The next part involves getting these servers to talk to one another. My network is equipped with a pfSense firewall, so I have a DNS forwarder and DHCP server helping me resolve URLs. For example, I can
ping server1.mydomain.com from anywhere on my network. This is good, because Gluster relies on server names for much of its communication.
So to get the "peers" talking, run this command (from each server):
gluster peer probe server2 # from server 1 gluster peer probe server1 # from server 2
You can probably use the FQD, but either should work if your DNS is working. I think you can use IPs here, if you want to give that a try. The next thing we need to do is set up the Gluster volume. This is pretty straight forward. On both
mkdir -p /data/brick1/gv0 # on server 1 mkdir -p /data/brick2/gv0 # on server 2
In retrospect, it's probably better to name each brick the same name, to avoid confusion. So the command would just be
mkdir -p /data/brick1/gv0. Then, you must create and start the volume (can be done on either server):
gluster volume create gv0 replica 2 server1:/data/brick1/gv0 server2:/data/brick2/gv0 gluster volume start gv0 gluster volume info
At this point, you should have a working Gluster volume that can be mounted somewhere. On Ubuntu 14.04, using NFS, this can be done like so:
mount -t nfs server1:/gv0 /mnt/gv0
GlusterFS has a client as well, which can be used in a similar fashion:
mount -t glusterfs server1:/gv0 /mnt/gv0
Now that these bricks are configured into a volume, GlusterFS keeps track of them and can do handy things. For example, you might be wondering how you can mount the volume by calling it on one server.
From the GlusterFS Administrator's Guide:
The server specified in the mount command is only used to fetch the gluster configuration volfile describing the volume name. Subsequently, the client will communicate directly with the servers mentioned in the volfile (which might not even include the one used for mount).
Windows and Mac OS clients can also mount volumes. Samba is supported but requires you set up and configure Samba on each node (server). This means you have to install Samba and configure shares in
/etc/samba/smb.conf. Plus, you need to add users with
smbpasswd -a nick and enter a password.
I didn't so any Windows client configuration, because I ran out of time. But I plan to make this a fully accessible system from all the major OSes. The whole point of this endeavor is to make a flexible, distributed and redundant file system for Windows, Mac OS, and Linux clients.
Speaking of redundancy, GlusterFS has many cool features I would recommend you check out in their architecture section. For example, in this article, I used the default volume type, replicated, which treats servers (or nodes) similar to multiple disks in an array. It replicates the files created on the volume on each brick, thereby adding a layer of redundancy. This can be looked at as a kind of RAID system for servers:
Redundant Array of Inexpensive [Independent] Servers (RAIS) is the use of multiple servers to provide the same service in such a way that service will still be available if the servers fails. The term may imply some kind of load balancing between the servers.
I'd like to claim the coin for this term, but it's found in Wikipedia already.
I plan on updating this article as I make more progress with this project. Next, I need to puppetize the Samba configuration and get this set up running on Windows clients. Until then, ciao!