Getting Started Guide

Pegasus in a nutshell (includes what is HPC)

HPC simple example — Simplified HPC diagram

Pegasus and Cerberus are supercomputers commonly referred to as a cluster. A compute cluster is a set of connected computers or nodes that work together which, in many respects, can be viewed as a single system. Pegasus works to solve computationally intensive problems that would take your standard home computer weeks or even months to solve in days. It does this by being able to pool resources and leverage that concept of a single system (a little extra specialized hardware doesn’t hurt either).

These HPC systems are only accessible remotely, meaning you can’t come and sit in front of it to log in or to do your work. From your laptop or some other remote system. You will connect to a computer that we call a login node, which is meant exclusively for maintaining your connection to the cluster computational jobs are not meant to be run on login nodes. If jobs are run on login nodes and you are caught it may result in suspension of your account. From the login node you will be able to submit jobs, perform file-transfers, software compilation, and simulation preparation work. All compute nodes on a cluster mount a shared filesystem; a file server or set of servers that keeps track of all files on a large array of disks, so that you can access and edit your data from any compute node.

Request an account

Please make sure that you have an account on Pegasus first. If you do not have an account please fill out the form at Getting Access page. If you are not yet part of a research group or would like to have a new group added please email us hpchelp@gwu.edu or be sure to specify in the additional information box.

Rules & Things to Keep in Mind

Login nodes are not to be used for jobs. Run all compute jobs on the compute/gpu nodes. Failure to adhere to the Pegasus Support terms of service will result in a suspension of your account.
Backups are your responsibility. We cannot recover accidentally deleted files from any file system in Pegasus. We recommend storing your source codes on Github.
Space on the Lustre filesystem is subject to removal under the conditions set in Lustre Purge Policy. And storage on Pegasus is not meant for the long-term.

Logging in

sNote: Pegasus is accessible via secure shell client or ssh at a command line prompt.

After you are approved, log in on the command line by typing:

ssh <username>@pegasus.arc.gwu.edu

where <username> is your GW UserID. Please visit https://it.gwu.edu/one-identity-manager section 4 to know what is your GW UserID.

GW UserID passwords are not longer used to authenticate to the cluster. You must use instead SSH Keys and Public Key Authentication.

On your first connection to the Pegasus HPC cluster, you’ll be prompted to accept the ssh keys and verify the fingerprint of Pegasus.

If you are on a Mac or Linux machine you can use your terminal if you are on a PC you can Putty or Xming shown here: Connecting to Pegasus

In order to submit jobs to the cluster, you must also set up two-factor authentication (2fA).

How to use installed software (Modules)

Modules are a tool to help users manage their Unix or Linux shell environment, by allowing installed applications to be added or removed dynamically. A module file contains all the information needed to configure the shell for an application. Once a module package is initialized, the environment can be modified using module-specific commands. You can use modules to gain access to software or to use different versions of packages.

module avail - shows what modules are available to use on Pegasus

module list - shows what modules and versions are currently loaded.

module load module_name/version - loads a module. If you do not list a version number, the most recent version will load by default.

module unload module_name - unloads the module, and reverts settings back to OS defaults.

source activate env_name - activates an environment, or a space within a project with specific dependencies and settings that allow you to use a module.

source deactivate - to leave the environment.

module spider module_name - search for a module/version to find if it is available.

module show module_name - shows information about a module and what is included with it.

Scheduling a job with SLURM

SLURM (formerly known as Simple Linux Utility for Resource Management) is the scheduler on Pegasus and all jobs must be submitted through the SLURM scheduler to allocate access to compute resources on the system. SLURM provides three key functions: allocation, regulation and arbitration of resources.

Note: Make sure that your jobs are reading and writing to Lustre filesystem!

Whether batch or interactive.

sbatch shell_file.sh

When executed after navigating to lustre, sbatch submits a job to the cluster (see example below). The script will typically contain one or more srun commands to launch parallel tasks.

squeue

Shows your jobs that are either running or in the queue. It returns the following information: Job ID, Partition, Name, User, Time, and Nodes.

sinfo

Shows available and unavailable nodes on the cluster according to partition (i.e., 64gb, 128gb, etc.) It has a wide variety of filtering, sorting, and formatting options.

The nodes that you can use are:

defq: This is the default queue. It has 192GB nodes and a 14 day time limit
short: 192GB nodes, 1 day time limit
tiny: 192GB nodes, 8 hour time limit
highMem: 3TB nodes and a 14 day time limit
highThru:384GB nodes and a 7 day time limit
debug: 192GB, GPU and CPU nodes each with a 4 hour time limit, for interactive jobs or quick tests
debug-cpu: 192GB CPU nodes with a 4 hour time limit, for interactive jobs or quick tests
debug-gpu: 192GB GPU nodes with a 4 hour time limit, for interactive jobs or quick tests
large-gpu: 384GB GPU nodes with a 7 day time limit
small-gpu: 192GB GPU nodes with a 7 day time limit

salloc -N 1 -p short -t 300

Typically this is used to allocate resources for a job and spawn a shell. The shell is then used to execute srun commands to launch parallel tasks. Use this when you are running interactive jobs on Pegasus.

Before you can use the node that has been allocated to you, you must first ssh into it. First use squeue to find out which node has been allocated to you. Then, if node121 has been allocated, run the command ssh node121 to get into the node. When you are finished running things on a node, type the command exit to return to the login node.

srun

This is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared resources within the job’s node allocation.

scancel job_id

This cancels a job that is in the queue or running on the cluster. You can get the job id by executing squeue when logged in on the cluster.

Please see this page for more information and examples.

New Linux Users

A basic familiarity with Linux commands is required for interacting with the clusters. We periodically run an Intro to Linux hosted by both SEAS Computing Facility (For SEAS Students only) and GW Library (all university) to get you started. There are also many excellent beginner tutorials available for free online, including the following:

Command Line Quick Guide

First navigate to the command line. If you are using a Mac, open Terminal (in Applications in the Utilities folder).

If you are using a windows machine, use Putty, Mobaxterm or another terminal emulator (a video tutorial is available on our website).

You should see a black window that has a command prompt that looks like this:

Usernames-Macbook-Pro:~ Username$

And you can go to the host via secure shell, aka ssh

Usernames-Macbook-Pro:~ Username$ ssh username@hostname

man pages

If you ever get stuck using a command or want to know more information about said command you can pull up the manual page by typing 'man' and then the command that you are attempting to use. The manual pages are a set of pages that explain every command available on your system including what they do, the specifics of how you run them and what command line arguments they accept. To search for something in the man pages, use '/' and then type the search term. To proceed to the next instance of the term, use 'n'. To return to the command line, type 'q'.

BASH Shortcuts

CTRL+c - Stop current command

CTRL+a - got to start of line

'!!' - Repeat last command

'!abc' - Run last command starting with abc

BASH Variables

'env' - Show environment variables

$PATH Executable search path - can see by typing echo $PATH

$HOME Home directory - can see by typing echo $HOME

$SHELL Current Shell - can see by typing echo $SHELL

BASH Commands

Below are some common and useful commands you can use to perform a variety of different functions.

'pwd' - To know which directory you are in, you can use the “pwd” command. It gives the absolute path, which means the path that starts from the root. The root is the base of the Linux file system. It is denoted by a forward slash( / ). The user directory is usually something like "/home/username".

'ls' - Lists the files and directories in the current directory.

'cd' - Executed by itself will bring you to your home directory, while adding a path after cd will bring you to that directory. A directory is essentially a folder that holds your files or other directories.

cp file_name path/to_new/directory

Copies a file from you current directory and places in a different directory.

mv file_name path/to_new/directory

Moves the file to a new directory.

mv file_name new_file_name

Renames the file

'rm' - This will remove or delete a file (be careful–cannot undo).

'mkdir' - Makes a directory.

'chmod' - Change permissions

sed -i -e 's/something/else/g' myfile

This will replace every instance of “something” with “else” in myfile.

'exit' Disconnect from the host that you’re ssh’d into

Additional commands can be found below. Material was also used from these sources:

Additional Help

For additional questions or help please email hpchelp@gwu.edu

Download a PDF of this post