Child pages
  • GPU Computing (AP 278)
Skip to end of metadata
Go to start of metadata

Odyssey and GPU Computing on Odyssey

Before proceeding further, make yourself familiar with the basics of Odyssey and GPU computing on Odyssey:

https://rc.fas.harvard.edu/resources/odyssey-quickstart-guide/

https://rc.fas.harvard.edu/resources/documentation/gpgpu-computing-on-odyssey/

CUDA

Compiling and running CUDA code

1) Login to a node with a GPU. Use the holyseasgpu partition (for AP 278).

srun --pty --x11=first -p holyseasgpu --mem 4000 -t 0-10:00 --gres=gpu:1 bash

2) To find out the cuda versions available, In a command window, type:

module-query cuda

3) Load one of the available modules (try cuda/9.2.88 or cuda/8.0):

module load cuda/9.2.88

4) Write or obtain a cuda code. Here are example cuda codes from the following excellent reference (go through the reference to understand the difference between the three versions below):

https://devblogs.nvidia.com/even-easier-introduction-cuda/

add.cu

add_block.cu

add_grid.cu

5) Compile the code:

To compile, for example add.cu, in a terminal type:

nvcc add.cu -o add_cuda

6) Run the code in interactive mode (test runs only):

./add_cuda

7) Running batch jobs

Create a script (say runscript.sh) to run the executable by copying and pasting the following lines:

#!/bin/bash
#SBATCH -p holyseasgpu #Partition to submit to 
#SBATCH -n 1 #Number of cores 
#SBATCH --gres=gpu
#SBATCH -t 5 #Runtime in minutes 
#SBATCH --mem-per-cpu=100 #Memory per cpu in MB (see also --mem)
module load cuda/9.2.88-fasrc01
time ./add_cuda

4) Run the script in batch mode with:

sbatch runscript.sh

Links on CUDA (with tutorials and sample CUDA programs)

1) On Odyssey, after you load a cuda module you can access sample programs from:

      $CUDA_HOME/samples

2) https://devblogs.nvidia.com/even-easier-introduction-cuda/

3) https://devblogs.nvidia.com/easy-introduction-cuda-fortran/

4) https://www.pgroup.com/resources/cudafortran.htm

OpenACC

On Odyssey the PGI OpenACC compiler suite is installed in /n/seasfs03/IACS/ap278/pgi/. To make the compilers (pgcc, pgc++, pgf90, etc.)

available in your path, add the following  lines to your .bashrc file (assumes you are using bash, which is the default shell):

export PGI=/n/seasfs03/IACS/ap278/pgi/
export PATH=/n/seasfs03/IACS/ap278/pgi/linux86-64/18.4/bin:$PATH
export MANPATH=$MANPATH:/n/seasfs03/IACS/ap278/pgi/linux86-64/18.4/man
export LM_LICENSE_FILE=/n/seasfs03/IACS/ap278/pgi/license.dat

Once you add these to your ~/.bashrc, to make these take effect, you can do, in a terminal:

source ~/.bashrc

or

. ~/.bashrc
or you can log out and log back in.

Some useful commands

processor information: nvidia-smi (short), pgaccelinfo (long)

performance profiler: pgprof (For more info: https://www.pgroup.com/resources/docs/18.5/pdf/pgi18profug.pdf)

Compiling and running code with OpenACC directives

You need to first compile your code (say code_acc.c or code_acc.f90) containing OpenACC (see below for example programs).

Note that pgcc and pgf90 should be available in your path for this to succeed (see above for instructions).

For c program:

pgcc -acc code_acc.c -Minfo=accel

For fortran program:

pgf90 -acc code_acc.f90 -Minfo=accel
will create an executable with name a.out. The option -Minfo=accel will display useful information on parallelization.

Slurm Script for running the job on odyssey:

#!/bin/bash 
#SBATCH -N 1  #Number of nodes 
#SBATCH -p holyseasgpu  #Partition to submit to 
#SBATCH --ntasks-per-node 2
#SBATCH --gres=gpu:1
#SBATCH -t 5  #Runtime in minutes 
 ./a.out

An OpenACC example

1) Get the sample code (see Ref. 3 and watch the excellent short video tutorial in Ref 3. before working through this tutorial):

git clone https://github.com/parallel-forall/cudacasts
cd cudacasts/ep3-first-openacc-program

or

cp -r /n/seasfs03/IACS/ap278/cudacasts/ep3-first-openacc-program/ .
cd ep3-first-openacc-program

2) Compile "serial" non-acc code:

pgcc laplace2d.c -o a.out_serial

3) Run the "serial" version and time it:

time ./a.out_serial
4) Compile the code with acc-directives:
pgcc -acc laplace_acc.c -o a.out_acc -Minfo=accel

5) Run the acc-executable:

time ./a.out_acc

Links on OpenACC (with tutorials and sample OpenACC programs)

1) OpenACC example programs

    On Odyssey, you can find the OpenACC example programs in:

       /n/seasfs03/IACS/ap278/pgi/linux86-64/2018/examples/OpenACC/

2The following links are very good general references:

    https://devblogs.nvidia.com/parallelforall/openacc-example-part-1/

    https://devblogs.nvidia.com/openacc-example-part-2/

    https://www.pgroup.com/resources/accel.htm?utm_source=nvidia_otk&utm_medium=web_link&utm_term=download

3) Excellent reference:

    https://devblogs.nvidia.com/cudacasts-episode-3-your-first-openacc-program/

   (Contains excellent video tutorials. Recommended: The video "Your First OpenACC Program" (7.5 minutes).)

   For sample (laplace) code:

   https://github.com/parallel-forall/cudacasts

4) Introductory OpenACC tutorial (free, but requires an account):

   https://nvidia.qwiklab.com/quests/3?locale=en

5) https://www.openacc.org/get-started

6) https://www.pgroup.com/resources/docs/18.4/x86/openacc-gs/index.htm

7) https://docs.computecanada.ca/wiki/OpenACC_Tutorial

8)http://web.stanford.edu/class/cme213/files/lectures/Lecture_14_openacc2017.pdf

  • No labels