sc :: elementary

apr'15

25

OpenACC support in GCC

What is OpenACC?

OpenACC is a GPU parallel programming model based on compiler directives. OpenACC is very similar in structure and philosophy to OpenMP. OpenACC programmer augments the serial code by compiler directives around identified parallel regions. Standard supports C, C++ and FORTRAN languages. OpenACC was demonstrated to be vendor independent and work on NVIDIA, AMD and Intel accelerators.

There is a handful of compilers supporting OpenACC (PGI, Cray, CAPS, accUll, OpenUH, OpenARC). The last three mentioned on the list are open source compilers. So far only the commercial compilers were production ready and accUll, OpenARC and OpenUH remained as research compilers. The newest release of GCC starting 5.x.x series does support OpenACC in via its GOMP library. Support for NVIDIA accelerators using OpenACC and support for Intel Xeon Phi using OpenMP 4.

To install GCC 5.0.1 with OpenACC support you may use the procedure described below. The steps have been taken from https://gcc.gnu.org/wiki/Offloading and can be put together into one script for complete installation.

Installation of GCC 5.0.1 with OpenACC support

Keep in mind that the GCC 5.0.1 is the first release of GCC that supports OpenACC. According to the man gcc

Note that this is an experimental feature, incomplete, and subject to change in future versions of GCC. See https://gcc.gnu.org/wiki/OpenACC for more information.

The installation will be more complicated than the standard CPU only installation. To use OpenACC on GPU the compiler needs to generate binary that can execute on CPU and GPU which are two very different architectures. Two compilers will need to be installed together with their dependencies. The first compiler will be for NVIDIA accelerator, the second will be the host compiler for CPU which will be the accelerator compiler aware and linked.

Prepare for the installation

Prerequisites

  • working GCC installation (we have used 4.8.1)

  • GCC dependencies GMP 4.3.2, MPFR 2.4.2, MPC 0.8.1

  • CUDA toolkit (we have used 6.0.37)

We will be using the following variables in the compilation of GCC with OpenACC support.

GCC5ROOT=/
CUDA=/install/software/cuda-toolkit/6.0.37

Get source code of all components

Make a separate directory for source codes

mkdir $GCC5ROOT/source
cd $GCC5ROOT/source

nvptx-newlib and nvptx-tools from Github

git clone https://github.com/MentorEmbedded/nvptx-newlib.git
git clone https://github.com/MentorEmbedded/nvptx-tools.git

We will download GCC weekly snapshot of trunk of one the mirror repositories. This download depends on the date and you should check LATEST-5-SNAPSHOT for the latest available snapshot before downloading

date=20150421
wget http://mirrors.concertpass.com/gcc/snapshots/LATEST-5/gcc-5-$date.tar.bz2
tar jxvf gcc-5-$date.tar.bz2
mv gcc-5-$date gcc

Build nvptx-tools

nvptx-tools is a collection of tools for PTX code handling. It brings "assembler" and "linker" for the GPUs as well as codes to run binary files. nvptx-tools is needed for the offload compiler.

You can get more information about the project on its Github repository page NVPTX-TOOLS. Follow the instructions below to install.

mkdir -p $GCC5ROOT/build/nvptx-build
cd $GCC5ROOT/build/nvptx-build

$GCC5ROOT/source/nvptx-tools/configure  \
  --prefix=$GCC5ROOT/install \
  --target=nvptx-none \
  --with-cuda-driver-include=$CUDA/include \
  --with-cuda-driver-lib=/usr/lib64/nvidia \
  --with-cuda-runtime-include=$CUDA/include \
  --with-cuda-runtime-lib=$CUDA/lib64 \
  CC='gcc -m64' \
  CXX='g++ -m64'

make
make install

Build GCC with nvptx-none target (offload compiler)

After having nvptx-tools compiled and installed we are ready to compile and install the offload compiler for nvptx-none target. We will also link the nvptx-newlib/newlib directory in the GCC source code directory to have it compile together with GCC.

mkdir -p $GCC5ROOT/build/gcc5-accel
cd $GCC5ROOT/build/gcc5-accel

ln -vs $GCC5ROOT/source/nvptx-newlib/newlib $GCC5ROOT/source/gcc/newlib
ln -vs . $GCC5ROOT/install/nvptx-none/usr
target=$($GCC5ROOT/source/gcc/config.guess)

$GCC5ROOT/source/gcc/configure \
  --prefix= \
  --target=nvptx-none \
  --enable-as-accelerator-for="$target" \
  --enable-languages=c,c++,fortran,lto \
  --enable-checking=yes,df,fold,rtl \
  --disable-multilib \
  --with-sysroot=/nvptx-none \
  --with-build-sysroot=$GCC5ROOT/install/nvptx-none \
  --with-build-time-tools=$GCC5ROOT/install/nvptx-none/bin \
  --disable-sjlj-exceptions \
  --enable-newlib-io-long-long \
  CC='gcc -m64'\
  CXX='g++ -m64'

make
make DESTDIR=$GCC5ROOT/install install

Build host compiler

Finally we can prepare the host compiler and point it to the offload compiler which we have just installed. Using the switch --enable-offload-targets=nvptx-none=$GCC5ROOT/install we point host GCC to the offload GCC compiler.

mkdir -p $GCC5ROOT/build/gcc5
cd $GCC5ROOT/build/gcc5

$GCC5ROOT/source/gcc/configure \
  --prefix= \
  --disable-bootstrap \
  --enable-languages=c,c++,fortran,lto \
  --disable-multilib \
  --enable-offload-targets=nvptx-none=$GCC5ROOT/install \
  --with-cuda-driver-include=$CUDA/include \
  CC='gcc -m64' \
  CXX='g++ -m64' \
  --with-sysroot=

make
make DESTDIR=$GCC5ROOT/install install

Post installation

To be able to fully use the just installed compiler you need to adjust environmental variables responsible for binary files and library paths. We will also augment the include files path for C++ and manual pages path. All necessary variable may be updated using the following

export PATH=/install/gcc/5.0.1/bin:$PATH
export LIBRARY_PATH=/install/gcc/5.0.1/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=/install/gcc/5.0.1/lib64:$LD_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=/install/gcc/5.0.1/include/c++/5.0.1:$CPLUS_INCLUDE_PATH
export MANPATH=/software/gcc/5.0.1/share/man:$MANPATH

At this moment you may check if you can use the new version of GCC

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/install/bin/../libexec/gcc/x86_64-unknown-linux-gnu/5.0.1/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-unknown-linux-gnu
Configured with: /source/gcc/configure --prefix= --disable-bootstrap --enable-languages=c,c++,fortran,lto --disable-multilib --enable-offload-targets=nvptx-none=/install --with-cuda-driver-include=/install/software/cuda-toolkit/6.0.37/include CC='gcc -m64' CXX='g++ -m64' --with-sysroot=
Thread model: posix
gcc version 5.0.1 20150421 (prerelease) (GCC)

You can now compile and run OpenACC enabled code giving -fopenacc switch to the compiler. C,C++ and Fortran can include OpenACC directives and will work with GCC. Remember that the code to run needs CUDA libraries to be available in the LD_LIBRARY_PATH despite the binary is not explicitly linked to them.

OpenACC test program

After successful installation of the GCC with the OpenACC support we may finally test it and try to achieve some speedup of the GPU code over the CPU code. We will use a simple C program which calculates numerically value of Pi

#include <stdio.h>

#define N 2000000000

#define vl 1024

int main(void) {

  double pi = 0.0f;
  long long i;

  #pragma acc parallel vector_length(vl) 
  #pragma acc loop reduction(+:pi)
  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

  printf("pi=%11.10f\n",pi/N);

  return 0;

}

There is one OpenACC parallel region marked with the OpenACC pragmas

#pragma acc parallel vector_length(vl) 
  #pragma acc loop reduction(+:pi)
  for (i=0; i<N; i++) {
    double t= (double)((i+0.5)/N);
    pi +=4.0/(1.0+t*t);
  }

Compile the GPU version using the following command

gcc pi.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o gpu.x 

and the CPU version using

gcc pi.c -O3 -o cpu.x

Test the program and time the execution

time ./gpu.x
time ./cpu.x

For example

$ time ./gpu.x 
pi=3.1415926536

real    0m3.420s
user    0m2.363s
sys 0m1.053s

$ time ./cpu.x 
pi=3.1415926536

real    0m17.208s
user    0m17.219s
sys 0m0.000s

This quick test was done using NVIDIA K20m GPU and Intel Xeon E5-2670v2 CPU.