sc :: elementary

may'15

19

OpenACC on Jetson TK1 - [2] ipmacc compiler

OpenACC on ARM systems

We will use NVIDIA Jetson TK1 development platform as an example of accelerator enabled ARM based system. CUDA framework is available on Jetson, however, the choice of OpenACC compilers is still very limited for ARM or even for x86 systems. We have described before the initial implementation of OpenACC in GCC which brings that functionality to x86 systems so far. We have also shown that OpenACC may be used on ARM systems (including Jetson TK1) with free accULL compiler.

In this post we will focus on a similar project called ipmacc which is similar in its philosophy to accULL compiler.

ipmacc installation

The ipmacc is a compiler/translator of OpenACC enabled C programs to CUDA/OpenCL enabled executables. We will use the CUDA back-end to get OpenACC capable free compiler on NVIDIA Jetson TK1 development board.

To start the installation clone the project repository from Github

$ mkdir ~/src
$ cd ~/src
$ git clone https://github.com/lashgar/ipmacc.git
$ cd ipmacc

Read the INSTALL file. We will need to install several dependencies

$ sudo apt-get install bison libxml2 libxml2-dev antlr libantlr-dev libarchive-dev libxslt-dev libboost-all-dev 

Edit file setup_enviroment and change the following variable

export IPMACCROOT=/home/ubuntu/src/ipmacc
export CUDASUPPORT=1
export CUDAHOME=/usr/local/cuda/
export OPENCLSUPPORT=0

Variable IPMACCROOT points to the root directory with the ipmacc source code. Change this variable to one appropriate for you. We need to do few modifications to make the code compile and work properly on Jetson TK1 platform.

  1. Open file ~/src/ipmacc/include/openacc.h and comment line

     extern void acc_update_device( void*hptr, size_t );
    

    since it will cause the compilation to fail.

  2. Edit file ~/src/ipmacc/compile-all and comment section responsible for compilation of srcML. We will compile it separately afterwards and copy the binary into the ipmacc root directory. The following section

    # srcML
    echo -en '~ compiling srcML parser .'
    cd $ROOTDIR/srcML/
    tar xvzf srcml.tar.gz > /dev/null
    cd src/
    make > /dev/null
    #ln -s $ROOTDIR/srcML/wrapper/wrapper.py $ROOTDIR/wrapper.py
    echo '. done'  
    

    should be commented out.

  3. Edit file ipmacc and modify line 52 defining the compile/link flags for CUDA destination. The original file searches directory $CUDAHOME/lib64 for libcudart.so. We need to change it to search in $CUDAHOME/lib, i.e.

    export LDFLAG="$LDFLAG -L$CUDAHOME/lib/ -lcudart"
    

We are ready to compile ipmacc

$ ./compile-all

You should see message

~ compiling OpenACC API .. done

The installation is however not complete yet. Since we have commented out the section devoted to srcML, we need to provide that part.

srcML installation

The ipmacc relies on srcML program for translation of the source code. We need to download the code from http://www.srcml.org. Proceed to http://www.srcml.org/downloads.html and click "Download" at the bottom of the page. You will be directed to a short form. After filling the form you will be able to download the source code.

After you have the code downloaded, unpack the archive and proceed with installation following the steps

$ tar zxvf srcML-src.tar.gz
$ cd srcML-src
$ mkdir build
$ cd build
$ cmake ..
$ ls bin
libsrcml.a  libsrcml.so  src2srcml  srcml2src

In the ipmacc root directory create folder srcML/bin. Copy all files from srcML's bin directory (just compiled in the above procedure).

$ mkdir ~/src/ipmacc/srcML/bin
$ cp ~/src/srcML-src/build/bin/* ~/src/ipmacc/srcML/bin

Now the our installation of ipmacc OpenACC to CUDA translator/compiler is complete and we can try to compile some example codes.

Test ipmacc

Our installation of ipmacc compiler is complete. Type ipmacc --help to see available options for the compiler. By typing ipmacc --list-devices CUDA you may verify if the installation is able to detect the CUDA capability of Jetson TK1, i.e.

$ ipmacc --list-devices CUDA
spec of  CUDA-capable devices:
CUDA Device Query...
There are 1 CUDA devices.

CUDA Device #0
    Major revision number:         3
    Minor revision number:         2
    Name:                          GK20A
    Total global memory:           1827323904
    Total shared memory per block: 49152
    Total registers per block:     32768
    Warp size:                     32
    Maximum memory pitch:          2147483647
    Maximum threads per block:     1024
    Maximum dimension 0 of block:  1024
    Maximum dimension 1 of block:  1024
    Maximum dimension 2 of block:  64
    Maximum dimension 0 of grid:   2147483647
    Maximum dimension 1 of grid:   65535
    Maximum dimension 2 of grid:   65535
    Clock rate:                    852000
    Total constant memory:         65536
    Texture alignment:             512
    Concurrent copy and execution: Yes
    Number of multiprocessors:     1
    Kernel execution timeout:      No

The ipmacc comes with test-case. To verify the installation try to compile few example codes from that directory.

$ cd ~/src/ipmacc/test-case
$ ipmacc vectorAdd.c -o vectorAdd.x
    warning: Storing the translated code in <vectorAdd_ipmacc.cu> (target: <nvcuda>)
$ ./vectorAdd.x
Calculation on GPU ...  27.3380 ms
Calculation on GPU ...  0.2380 ms
Calculation on GPU ...  0.2010 ms
Calculation on CPU ...  0.0260 ms
OpenACC vectoradd test was successful!

The test-case directory contains much more examples of OpenACC codes. Enjoy your free OpenACC compiler on your ARM system!