Create a cluster in the cloud¶
Set up the cloud infrastructure¶
Getting ready¶
The first step is to get a bunch of servers powered on in your cloud. We do this using a tool called Terraform.
Make sure that Terraform is installed by running the following in the command-line:
$ terraform version
you should get output like:
Terraform v0.11.8
We’re now ready to start configuring our infrastructure.
Setting the config¶
Start by making a new directory which will hold all our configuration. We will refer to this directory as the base config directory. Change to that directory in your terminal.
Grab the Terraform config from Git using:
$ git clone https://github.com/ACRC/oci-cluster-terraform.git
Now move into that directory and initialise the Terraform repo:
$ terraform init
Now, when you check the Terraform version, you should see the OCI provider showing up:
$ terraform version
Terraform v0.11.8
+ provider.null v1.0.0
+ provider.oci v3.2.0
+ provider.tls v1.2.0
Rename the example config file terraform.tfvars.example
to terraform.tfvars
and open it in a text editor:
$ mv terraform.tfvars.example terraform.tfvars
$ vim terraform.tfvars
Following the instructions at the Oracle Terraform plugin docs,
set the values of tenancy_ocid
, user_ocid
, private_key_path
, fingerprint
and region
.
Make sure that the user account you use for user_ocid
has admin access in your tenancy to create infrastructure.
You will also need to set the compartment OCID of the compartment that you are using. If you are using the default root compartment, this will be the same as your tenancy OCID.
The next thing to set is an SSH key that you will use to connect to the server once it is built.
See GitHub’s documentation on information on how to do this
and then paste the contents of the public key into the ssh_public_key
config variable between the two EOF
s.
Finally, you need to decide what type of machines will make up your cluster.
This is dependent on what shapes you have access to so check your service limits in the OCI web console.
You will want a simple, lightweight VM for the management node and a set of more powerful VMs (or better, bare metal) machines for the compute nodes.
For this tutorial, we will use VM.Standard2.16
for the management node and 4 VM.Standard2.24
for the compute nodes but it will depend on what you have access to.
Set the ManagementShape
config variable to the shape you want for the management node:
ManagementShape = "VM.Standard2.16"
To set the compute nodes, there are two config variables we need to set.
The variable ComputeShapes
contains a list of all the shapes for each node and InstanceADIndex
contains a list of numbers referring to the availability domain each node should be in:
InstanceADIndex = ["1", "1", "1", "1"]
ComputeShapes = ["VM.Standard2.24", "VM.Standard2.24", "VM.Standard2.24", "VM.Standard2.24"]
You see that there are two lists, each with four elements.
The nth element in each list are related to each other.
Once the nodes are created, they will be named compute001
, compute002
etc. in the order they are listed here.
If we instead wanted a BM.GPU2.2
in AD 1, three BM.Standard1.36
in AD 2 and one BM.DenseIO1.36
in AD3 we would instead write:
InstanceADIndex = ["1", "2", "2", "2", "3"]
ComputeShapes = ["BM.GPU2.2", "BM.Standard1.36", "BM.Standard1.36", "BM.Standard1.36", "BM.DenseIO1.36"]
Finally, we need to tell Terraform about all of the ADs that we are putting this in to make sure that the networking is working correctly.
Set ADS
to a list of all the availability domains that we have put infrastructure in:
ADS = ["1"]
That has defined the types and location of all the nodes we are installing.
We need to tell OCI what OS to install onto each machine which we do by setting ComputeImageOCID
and ManagementImageOCID
.
To decide what values to put in these, look at OCI’s list of images.
We will install the latest version of Oracle Linux onto each:
ComputeImageOCID = {
VM.Standard2.24 = {
eu-frankfurt-1 = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7qdjjqlvryzxx4i2zs5si53edgmwr2ldn22whv5wv34fc3sdsova"
}
}
ManagementImageOCID = {
eu-frankfurt-1 = "ocid1.image.oc1.eu-frankfurt-1.aaaaaaaa7qdjjqlvryzxx4i2zs5si53edgmwr2ldn22whv5wv34fc3sdsova"
}
At this point, we are ready to provision our infrastructure. Check that there’s no immediate errors with
$ terraform validate
It should return with no errors. If there are any problems, fix them before continuing.
Next, check that Terraform is ready to run with
$ terraform plan
which should have, near the end, something like Plan: 13 to add, 0 to change, 0 to destroy.
.
We’re now ready to go. Run
$ terraform apply
and, when prompted, tell it that “yes”, you do want to apply.
It will take some time but should return without any errors with something green that looks like:
Apply complete! Resources: 13 added, 0 changed, 0 destroyed.
Outputs:
ComputeHostnames = [
compute001,
compute002,
compute003,
compute004
]
ManagementPublicIPs = [
130.61.43.69
]
You are now ready to move on to installing the software on the cluster.
Finalising the setup¶
Terraform will have automatically started the cluster software configuration step. It will run in the background and will take some time to complete. In the meantime, you can connect to the custer and follow its progress.
Final configuration step¶
You can log into the management node at yourusername@mgmtipaddress
,
using the IP address that terraform printed at the end of its run. You can forward
your SSH key
from your workstation to avoid copying the private key in to the cloud. First test
that you can talk to the SSH agent by asking it to list the keys:
$ ssh-add -L
Then connect to the managment node. For example:
$ ssh -A opc@130.61.43.69
Once logged in, you can run the finish
script:
[opc@mgmt ~]$ ./finish
It will most likely tell you that the nodes have not finished configuring.
If the finish
script is not there, wait a minute or two and it should appear.
To follow the progress, you can look at the file ansible-pull.log
in opc
’s home directory.
You can keep on running trying to run finish
until all nodes have finished configuring.
Once they have, you need to tell the system about what user accounts you want to create.
Copy the users.yml.example
file to users.yml
:
[opc@mgmt ~]$ cp users.yml.example users.yml
and edit it to contain the users you want.
For the key
attribute you can specify a URL of a file which contains a list of public keys (such as provided by GitHub)
or explicitly provide a public key inline.
For example, it might look like:
---
users:
- name: matt
key: https://github.com/milliams.keys
- name: anotheruser
key: ssh-rsa UmFuZG9tIGtleSBjb250ZW50cy4gUHV0IHlvdXIgb3duIGtleSBpbiBoZXJlIG9idmlvdXNseS4= user@computer
Run finish
again and it should create those users across the system:
[opc@mgmt ~]$ ./finish
Once it has succeeded, log out and try logging as one of those users.
Check Slurm is running¶
$ ssh -A matt@130.61.43.69
Once logged in, try running the sinfo
command to check that Slurm is running:
[matt@mgmt ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up infinite 4 idle compute[001-004]
Brilliant! Start submitting jobs.
Check out the information on running the cluster.
Running the cluster¶
Now that you have a cluster which is up and running, it’s worth knowing what you can do with it.
Slurm jobs¶
A full Slurm tutorial is outside of the scope of this document but it’s configured in a fairly standard way.
By default there’s one single partition called compute
which contains all the compute nodes.
A simple first Slurm script, test.slm
, could look like:
#! /bin/bash
#SBATCH --job-name=test
#SBATCH --partition=compute
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --time=10:00
#SBATCH --exclusive
echo start
srun -l hostname
echo end
which you could run with:
[matt@mgmt ~]$ sbatch test.slm
Slurm elastic scaling¶
Slurm is configured to use its elastic computing mode. This allows Slurm to automatically turn off any nodes which are not currently being used for running jobs and turn on any nodes which are needed for running jobs. This is particularly useful in the cloud as a node which has been shut down will not be charged for.
Slurm does this by calling a script /usr/local/bin/startnode
as the slurm
user.
If necessary, you can call this yourself from the opc
user like:
[opc@mgmt ~]$ sudo -u slurm /usr/local/bin/startnode compute001
to turn on the node compute001
.
You should never have to do anything to explicitly shut down the cluster, it will automatically turn off all nodes which are not in use after a timeout. The management node will always stay running which is why it’s worth only using a relatively cheap VM for it.
Warning
Currently, due to a quirk in OCI, it seems that while all VMs and most bare-metal nodes are not charged for while stopped, the DenseIO nodes are. This means that the auto-shutdown will not work as well for those shapes and you will be charged. Development is ongoing to avoid this.
The rate at which Slurm shuts down is managed in /mnt/shared/apps/slurm/slurm.conf
by the SuspendTime
parameter.
See the slurm.conf documentation for more details.
Cluster shell¶
A common task is to want to run commands across all nodes in a cluster. By default you have access to clustershell. Read the documentation there to get details of how to use the tool.
The gist is that you give it a hostname or a group and a command to run.
You can see a list of the available groups with cluset
:
[opc@mgmt ~]$ cluset --list-all
@compute
@state:idle
@role:mgmt
You can then run a command with clush
:
[opc@mgmt ~]$ clush -w @compute uname -r
compute001: 3.10.0-862.2.3.el7.x86_64
compute002: 3.10.0-862.2.3.el7.x86_64
compute003: 3.10.0-862.2.3.el7.x86_64
compute004: 3.10.0-862.2.3.el7.x86_64
You can combine the output from different nodes using the -b
flag:
[opc@mgmt ~]$ clush -w @compute -b uname -r
---------------
compute[001-004] (4)
---------------
3.10.0-862.2.3.el7.x86_64
Installing software on your cluster¶
In order to do any actual work you will likely need to install some software.
There are many ways to get this to work but I would recommend either using clush
to install the software
or, preferably, create a local Ansible playbook which installs it for you across the cluster.
In the latter case, you can use /home/opc/hosts
as an inventory file and point your playbook to use it.
Performance metrics¶
The cluster automatically collects data from all the nodes and makes them available in a web dashboard.
It is available at the IP address of you management node on port 3000. Point your browser at http://your.mgmt.ip.address:3000 and log in with the username admin and the password admin. You will be prompted to create a new password before you continue.
Destroying the whole cluster¶
Warning
Please bear in mind that this will also destroy your file system which contains your user’s home area and any data stored on the cluster.
When you’ve completely finished with the cluster, you can destroy it using Terraform.
$ terraform destroy
Welcome to the documentation for cluster in the cloud. By the end of this you will have a fully-operational, elastically-scaling Slurm cluster running on cloud resources.
In the future, the intention is that this tutorial will cover installing on all major cloud providers but for now only Oracle Public Cloud is covered.
This tutorial was created by Matt Williams at the ACRC in Bristol. Contributions to this document are welcome at GitHub.
Prerequisites¶
To complete this tutorial you will need:
- access to a command-line (i.e. Linux, MacOS Terminal or WSL)
- an SSH key pair
- an account with credit on Oracle cloud
- the account must have admin permissions to create infrastructure
- local software installed
- Terraform 0.11
- SSH
Start by creating the infrastructure.