Overview

This project contains a top level control script and a number of Ansible playbooks to provision and configure a cluster of GlusterFS storage nodes. Configuration has been provided to be able to deploy to either AWS or Google Compute Engine environments.

Environment Setup

Each cloud environment requires a some account creation and project configuration to work with your credentials.

Ansible

Being a bit of a Python novice I ran into a number of PATH related issues with Anisible installed from either RPM on my Linux machines or Homebrew on my Mac. I got the best results installing the Python Pip installer using the appropriate method for either system and then using pip install --upgrade ansible to get and keep Ansible up to date.

Ansible version 2.1 required

Ansible 2.1 which is currently in development has an EC2 feature to set the delete_on_termination flag when creating storage disks. This prevents you from having unattached volumes that you have to delete with a separate action after terminating cluster instances. For the moment this requires that you run from source but at some point in the near future >v2.1 will be generally available.

Amazon Web Services

  • Create an API key or run from a server with the appropriate instance role

# Configure key or run from instance with role
export AWS_ACCESS_KEY_ID=FOO
export AWS_SECRET_ACCESS_KEY=BAR
export AWS_REGION=us-east-1
  • Download a copy of the ec2.ini configuration file and save it to hosts/aws/ec2.ini

  • Create a new configuration file in vars/ which you’ll reference when you run the gluster.sh script

    • You can support multiple environments by making more than one config file

  • Most ec2 instance and ebs disk settings have defaults and can be overriden in this file, several settings are environment specific and you’ll have to set for your account

# Example contents of AWS vars/mydomain.yml file values
ec2_subnet_ids: [ 'subnet-c70dd123' ]
ec2_security_groups: [ 'ssh-whitelist', 'all-internal' ]
domain: "mydomain.com"
ec2_keypair: "mydomain.com"

Google Compute Engine

  • Create a project in the the Google Developers Console

  • Create a service account and key through the Google Developers Console API section

  • Add an ssh key to the project

  • Rename hosts/gce/gce.ini.example to hosts/gce/gce.ini and fill in the values with your account information

  • Create an environment specific config file like the one described above for aws and populate with your account information

# Example contents of GCE vars/mydomain.yml file values
gce_service_account_email: 13_digit_acct_number-compute@developer.gserviceaccount.com
gce_pem_file: ~/.ssh/my-account-key.pem
gce_project_id: my-project-id

gluster.sh script

This script collects the arguments needed to perform any action against a provider and coordinates the execution of the Ansible playbooks. Both multi-step actions that will configure everything in a single command and the ability to run or re-run individual playbooks are allowed.

The --vars argument is used to specify a local configuration override file you create in the vars/ directory with the settings you wish to override for your cluster configuration.

Build and Configuration Examples

# Run the inventory listing for the specified provider
./gluster.sh --provider gce --action list-inventory

# Run a multi-step action
./gluster.sh --prefix dev --provider gce --vars mydomain.yml --action build-all

# Run a specific playbook (omit the provider prefix in the name)
./gluster.sh --prefix dev --provider gce --vars mydomain.yml --action provision

# You can add the --verbose to get extra output from Ansible to assist debugging issues
./gluster.sh --prefix dev --provider gce --vars mydomain.yml --action configure --verbose

# Ping all the nodes to check to see if they are reachable
./gluster.sh --prefix dev --provider gce --vars mydomain.yml --action ping

# Use the default (aws) provider and list the nodes and public IPs of the storage nodes
./gluster.sh --prefix dev --vars mydomain.yml --action info

Gluster Volume Configuration

The documentation on the various configuration is very detailed. Some common options are shown below. Once you’ve connected to a storage node there is an inventory file located at /root/bricks.txt that will assist you with the machine and volume information needed to run the commands shown below.

# Add peers
gluster peer probe <host name>

# Peer status
gluster peer status

# A three brick dispersed volume
gluster volume create gv0 \
    disperse 3 redundancy 1 \
    t7-storage-95f4c9:/bricks/xvdf/gv0 \
    t7-storage-a89e9f:/bricks/xvdf/gv0 \
    t7-storage-2e99cb:/bricks/xvdf/gv0

# Start and status the volume
gluster volume start gv0
gluster volume info gv0

Volume Name: gv0
Type: Disperse
Volume ID: 71970221-fc75-48d9-8205-bb3f389500d2
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: t7-storage-95f4c9:/bricks/xvdf/gv0
Brick2: t7-storage-a89e9f:/bricks/xvdf/gv0
Brick3: t7-storage-2e99cb:/bricks/xvdf/gv0
Options Reconfigured:
performance.readdir-ahead: on

Version

This documentation was generated for gluster-ansible version 0.0.1-SNAPSHOT from commit b7a6008370ba12ee907302b6271847b3091b8559.