Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Status embed update

(Previously known as "Odyssey Related")

Real-time cluster status:

iFrame
srchttps://fasrc.instatus.com/embed-status/light-sm
width230
styleborder: none;
height45


Table of Contents

HTML
<script src="https://gist-it.appspot.com/github/PackardChan/harvard-cluster-monitor/blob/master/output/motd?slice=18:-1"></script>

Introduction

This page describes the Harvard FAS Research Computing cluster.

Quick start guide: https://docs.rc.fas.harvard.edu/kb/quickstart-guide/

There is an "Introduction to the Cluster course" that new users are required to attend within 45 days of account issue. The online training and quiz can be accessed here:
https://docs.rc.fas.harvard.edu/kb/introduction-to-cluster-online/


Please check out the companion GitHub page on customizing cluster account, which is positioned to host scripts, and be more general to any computer cluster.

The followings are some notes further to the above links.

Storage

https://docs.rc.fas.harvard.edu/kb/cluster-storage/

Cost: https://www.rc.fas.harvard.edu/services/data-storage/#Offerings_Tiers_of_Service

As an example, below are available disks for Kuang's group (update on every Monday). Kuang's group members can run ~pchan/git/harvard-cluster-monitor/script/df.sh to see the latest usage (should take only several seconds).

HTML
<script src="https://gist-it.appspot.com/github/PackardChan/harvard-cluster-monitor/blob/master/output/df-txt"></script>
  • The last column is manually and infrequently updated to show some drawbacks and limitations.
  • Disk usages are limited by hardware limits and (for lustre disks) software quotas, whichever is more limiting.
  • In the aforementioned df.sh, df is the command to show hardware limits, lfs quota is the command to show software quotas.
  • On lustre type disks, lfs quota -hu $USER some_disk is a quick way to see your usage.
  • On non-lustre type disks, run du -h some_folder to see the usage of that folder (can take hours for large number of files).
  • nfs disks are not appropriate for I/O intensive or large numbers of jobs.
  • Some nfs disks can be mounted on desktops and laptops. Documentation
  • Only backups of Home directory are available to users for individual file or folder recovery.
  • Permission: "group readable" above means files within those disks are at most group readable. Some directories are default to be more restrictive. Learn more about Unix permissions.
  • kuangdss01: Zhiming is talking to RC to keep kuangdss01 around for a while longer; RC want to decommission it. It will eventually be replaced by tape storage, which will be charged at a lower rate. (6/2/2021)

  home directory: 100G per person
  /n/home??/.snapshot/rc_homes_*/${USER}/ stores snapshot of home directory regularly.

  /scratch: only visible for each compute node, not shared with other nodes

Auto health check

The following nodes might have difficulties in accessing the following disks (update at around noon).

HTML
<script src="https://gist-it.appspot.com/github/PackardChan/harvard-cluster-monitor/blob/master/output/disk-bad-node"></script>

Unix permissions

https://docs.rc.fas.harvard.edu/kb/unix-permissions/
https://www2.cisl.ucar.edu/user-support/setting-file-and-directory-permissions

  • Examining / understanding permission

[pchan@boslogin04 ~]$ ls -ld /n/home05/pchan
drwxr-xr-x 78 pchan kuang_lab 3860 Nov  1 22:42 /n/home05/pchan

In the string "drwxr-xr-x", the 1st char (d) says it is a directory.
2nd-4th chars (rwx) describe the permissions for user (pchan). All permissions read (r), write (w) and execute/search (x) are given here.
5th-7th chars (r-x) describe the permissions for group (kuang_lab). Read (r) and execute/search (x) permissions are given here, but not write (w) permission.
8th-10th chars (r-x) describe the permissions for others.

For directories, you often need both r and x permissions to read its contents.


id pchan

getent group kuang_lab

limit by parent directory


  • Changing permission

Beware group, e.g. transfer space

  • Advanced topics

[pchan@datamover01 ~]$ ls -ld /n/holylfs/INTERNAL_REPOS/CLIMATE_MODELS/
drwxrws--- 9 root huce 4096 Oct  9 23:45 /n/holylfs/INTERNAL_REPOS/CLIMATE_MODELS/


sticky bit

ACL access control list
https://www2.cisl.ucar.edu/resources/storage-and-file-systems/glade/using-access-control-lists


Tips

  List files that could be cleaned in scratchlfs:
  lfs find /n/scratchlfs/`id -gn`/${USER}/ -mtime +88 -type f -print

Transfer

RC is moving to Globus for large-scale transfers. datamover01 will be less supported. (9/29/2020)

Globus transfer (https://docs.rc.fas.harvard.edu/kb/globus-file-transfer/)

  1. Create your Globus ID at https://www.globusid.org/create
  2. Use your Globus ID to log in to Globus File Manager Page
  3. In Collection field, type: Harvard FAS RC Holyoke (or Harvard FAS RC Boston for Boston disks), click on the Continue button, log in with your RC username and 6-digit code.
  • Once you are connected, 3 folders /n/holylfs/TRANSFER/$USER, /n/holylfs02/TRANSFER/$USER and /n/boslfs/TRANSFER/$USER are created. Files in these transfer space do count towards the group quota for the same disk.

Using compute node (some info out-dated):

Transfer data within the cluster (https://docs.rc.fas.harvard.edu/kb/transferring-data-on-the-cluster/)

(rsync can be parallelized, ask RC how. (?https://github.com/fasrc/slurm_migration_scripts))

sbatch -p huce_intel -c 8 -t 1440 --mem-per-cpu=1000 --open-mode=append --wrap='fpsync -n $SLURM_CPUS_PER_TASK -o "-ax" -O "-b" "/n/kuanglfs/pchan/jetshift/" "/n/holylfs/LABS/kuang_lab/pchan/jetshift/"'  # 13T, 4h54m

Using datamover01 (some info out-dated):

  To copy a lot of files: (https://docs.rc.fas.harvard.edu/kb/globus-file-transfer/)
  ssh datamover01
  rsync -avu source_dir dest_dir  # DO NOT add -z !!

Update: holylfs, kuanglfs, kuangfs1 & kuang100 are moved to holylfs04 (10/14/2020)

They are now in:
/n/holylfs04/LABS/kuang_lab/Lab/$USER
/n/holylfs04/LABS/kuang_lab/Lab/kuanglfs/$USER
/n/holylfs04/LABS/kuang_lab/Lab/kuangfs1/$USER
/n/holylfs04/LABS/kuang_lab/Lab/kuang100/01/$USER, etc.

Job queues

https://docs.rc.fas.harvard.edu/kb/running-jobs/#Slurm_partitions
https://docs.rc.fas.harvard.edu/kb/huce-partitions/

Below are available job queues for Packard (excluding *_requeue, update on every Monday). Run ~pchan/git/harvard-cluster-monitor/script/sinfo.sh to see job queues available to you (should take only several seconds).

HTML
<script src="https://gist-it.appspot.com/github/PackardChan/harvard-cluster-monitor/blob/master/output/sinfo-exclude-requeue"></script>


Melissa Sulprizio created a Google group for HUCE partition users. (4/28/2020)

Introduction to Slurm

Convenient Slurm commands: https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/

If you are familiar with PBS or other job schedulers, here is a good comparison: https://slurm.schedmd.com/rosetta.pdf

Batch job

https://docs.rc.fas.harvard.edu/kb/quickstart-guide/#Run_a_batch_job

Interactive job

RC recommends srun -p test --pty --x11=first --mem 500 -t 0-08:00 /bin/bash to start an interactive session. However, this interactive session will be not responsive after one hour of inactivity. Work around: ssh to the allocated node (instead of logged in by srun), there will be no timeout for this ssh session. Unlike srun, most environment variables are not transmitted through ssh, you will need to reload the modules, cd to your working directory and set any other environment variable you need.

Personally, I use salloc -p huce_intel -n 1 -t 0-12:00 --mem=30000 to allocate resources. Salloc opens a new shell on local machine, with SLURM_* variables set up. Then I run ssh -Y $SLURM_JOB_NODELIST, to ssh to the allocated node. Once you finish, you can exit twice, to logout compute node and to exit the salloc shell.

  xbash: alias function defined in my bashrc that submits an interactive job. You can modify to submit it in a different partition.

Checking job status

sacct

squeue

  You can also use sacct, squeue, scontrol show, scontrol update, sinfo and other slurm commands. But do note that squeue is modified by the rc to reduce load on the scheduler.

Planning resource

  nodeinfo: function defined in ~pchan/.bashrc that gives current 'partition-level' status of the huce partitions.
  ~pchan/bin/lsload.pl gives current 'node-level' status of non-full nodes in huce partitions.
  Usage: lsload.pl;  OR    lsload.pl huce_intel


RC's advice on submitting large numbers of jobs

If you are submitting a lot of small jobs, that will take up the whole huce_intel overnight. I propose to exclude several nodes as a fast lane. This perhaps may mean several more minutes for you on top of an overnight job, but will mean one less night for those who are running small jobs.

Opening up a fast lane:

squeue -u $USER -t PD --noheader -o "%13i %.9P %.24j %.8u %.2t %.10M %.4C %.3D %R" |awk '{print $1}' |xargs -I {} scontrol update jobid={} ExcNodeList=`cat /n/home05/pchan/sw/crontab/node-fastlane`

Cost: https://www.rc.fas.harvard.edu/services/cluster-computing/#Offerings_Tiers_of_Service

Troubleshooting job

https://rc.fas.harvard.edu/wp-content/uploads/2016/03/Troubleshooting-Jobs-on-Odyssey.pdf

Within a slurm environment (including remote desktop), where environment variables SLURM* are set, srun will be interfered. If you don't mean to create a job step, please at least unset SLURM_JOBID SLURM_JOB_ID.


Software/ Modules

Favorite modules in Kuang's group:

module load matlab  #/R2017a-fasrc02
 # load matlab first: avoid "undefined reference to `ncdimdef'", only conflict with netcdf/4.1.3
module load intel/17.0.4-fasrc01 impi/2017.2.174-fasrc01 netcdf/4.1.3-fasrc02
module load libpng/1.6.25-fasrc01  # for WRF grib2
module load jasper/1.900.1-fasrc02  # for WRF grib2
module load perl-modules/5.10.1-fasrc13  # for CESM
module load nco/4.7.4-fasrc01
module load ncview/2.1.7-fasrc01
module load ncl_ncarg/6.4.0-fasrc01
module load grads/2.0.a5-fasrc01

Best place to search for modules: https://portal.rc.fas.harvard.edu/apps/modules

  module show ncview/2.1.2-fasrc01    # must load prerequisite before running this line

  You can look at ~pchan/.bashrc
  NCL: see the 6 lines of "export" in my bashrc.
  mod18: the current modules I am loading now. Called in bashrc during login.
  matlab: I am using nodesktop as default. You can call Matlab with desktop by \matlab -nosplash -singleCompThread

Python: https://docs.rc.fas.harvard.edu/kb/python/

spyder is only available in Anaconda3/5.0.1-fasrc02 & Anaconda/5.0.1-fasrc02.

ipython & jupyter are available in all 5.0.1-fasrc01 & 5.0.1-fasrc02.


Update: impi is preferred (Feb 2017)

impi (Intel mpi) is preferred, because it is faster than openmpi and mvapich2, etc., by some 50-100%.

You have to use mpiifort, mpiicc & mpiicpc to replace mpif90, mpicc & mpicxx, in Makefile, configure.wrf, etc.

srun flag --mpi=pmi2 is recommended. (?https://slurm.schedmd.com/mpi_guide.html#intel_mpi)


Misc

Login node with the smallest 15-minute load is shown in the last row. This is updated hourly by ~pchan/git/harvard-cluster-monitor/script/loginnode-loadavg.sh

HTML
<script src="https://gist-it.appspot.com/github/PackardChan/harvard-cluster-monitor/blob/master/output/loginnode-loadavg-sorted"></script>

You can specify login node by using boslogin01.rc.fas.harvard.edu or boslogin.rc.fas.harvard.edu instead of login.rc.fas.harvard.edu.

Leaving Harvard: Your account will at some point be disabled. https://docs.rc.fas.harvard.edu/kb/leaving-external/

Update: CentOS 7 (May 2018)

FAQ:

  • SSH key or 'DNS spoofing' errors: https://docs.rc.fas.harvard.edu/kb/ssh-key-error/
  • Modules in bashrc no longer work or give errors on login: favorite modules in Kuang's group
  • source new-modules.sh can be removed from bashrc. (recommended, though not necessary)
  • Don't cross submit jobs to CentOS 7 partitions (basically all partitions) from CentOS 6 nodes (rcnx01, holynx01).
  • If you did module load centos6/0.0.1-fasrc01, module purge might not clean up everything. Logging in again is the best way to clean up everything.
  • Bash tab completion has changed. ls ~pchan/*txt<tab> will not respond in CentOS 7. Use complete to remove/change bash completions, also see ~pchan/.bashrc.

Read more in https://docs.rc.fas.harvard.edu/kb/centos-7-transition-faq/, and Plamen's slides kuang_lab-7-8-2018.pdf.

Update: Cannon is live (9/24/2019)

Known issues

  • Interactive srun will be not responsive after one hour of inactivity. Work around
  • Bash is the standard default shell on the cluster. Limited support is given to alternate shells (e.g. csh, tcsh).
  • sbatch's "--wrap" option can cause issues. (2/26/2020)
  • Disks: see Auto health check
  • holylogin* and compute nodes prefer Internet2 routes. If you see any networking issue, try to use boslogin* nodes. (2/26/2020)