OpenACC on ARM systems
We will use NVIDIA Jetson TK1 development platform as an example of accelerator enabled ARM based system. CUDA framework is available on Jetson, however, the choice of OpenACC compilers is still very limited for ARM or even for x86 systems. We have described before the initial implementation of OpenACC in GCC which brings that functionality to x86 systems so far. We have also shown that OpenACC may be used on ARM systems (including Jetson TK1) with free accULL compiler.
In this post we will focus on a similar project called
is similar in its philosophy to accULL compiler.
ipmacc is a compiler/translator of OpenACC enabled C programs
to CUDA/OpenCL enabled executables. We will use the CUDA back-end to
get OpenACC capable free compiler on NVIDIA Jetson TK1 development
To start the installation clone the project repository from Github
$ mkdir ~/src $ cd ~/src $ git clone https://github.com/lashgar/ipmacc.git $ cd ipmacc
INSTALL file. We will need to install several dependencies
$ sudo apt-get install bison libxml2 libxml2-dev antlr libantlr-dev libarchive-dev libxslt-dev libboost-all-dev
setup_enviroment and change the following variable
export IPMACCROOT=/home/ubuntu/src/ipmacc export CUDASUPPORT=1 export CUDAHOME=/usr/local/cuda/ export OPENCLSUPPORT=0
IPMACCROOT points to the root directory with the
code. Change this variable to one appropriate for you. We need to do few modifications
to make the code compile and work properly on Jetson TK1 platform.
~/src/ipmacc/include/openacc.hand comment line
extern void acc_update_device( void*hptr, size_t );
since it will cause the compilation to fail.
~/src/ipmacc/compile-alland comment section responsible for compilation of
srcML. We will compile it separately afterwards and copy the binary into the
ipmaccroot directory. The following section
# srcML echo -en '~ compiling srcML parser .' cd $ROOTDIR/srcML/ tar xvzf srcml.tar.gz > /dev/null cd src/ make > /dev/null #ln -s $ROOTDIR/srcML/wrapper/wrapper.py $ROOTDIR/wrapper.py echo '. done'
should be commented out.
ipmaccand modify line 52 defining the compile/link flags for CUDA destination. The original file searches directory
libcudart.so. We need to change it to search in
export LDFLAG="$LDFLAG -L$CUDAHOME/lib/ -lcudart"
We are ready to compile
You should see message
~ compiling OpenACC API .. done
The installation is however not complete yet. Since we have commented out
the section devoted to
srcML, we need to provide that part.
ipmacc relies on
srcML program for translation of the source code.
We need to download the code from http://www.srcml.org. Proceed to
http://www.srcml.org/downloads.html and click "Download" at the bottom
of the page. You will be directed to a short form. After filling the form
you will be able to download the source code.
After you have the code downloaded, unpack the archive and proceed with installation following the steps
$ tar zxvf srcML-src.tar.gz $ cd srcML-src $ mkdir build $ cd build $ cmake .. $ ls bin libsrcml.a libsrcml.so src2srcml srcml2src
ipmacc root directory create folder
all files from
bin directory (just compiled in the above procedure).
$ mkdir ~/src/ipmacc/srcML/bin $ cp ~/src/srcML-src/build/bin/* ~/src/ipmacc/srcML/bin
Now the our installation of
ipmacc OpenACC to CUDA translator/compiler
is complete and we can try to compile some example codes.
Our installation of
ipmacc compiler is complete. Type
ipmacc --help to
see available options for the compiler. By typing
ipmacc --list-devices CUDA
you may verify if the installation is able to detect the CUDA capability of
Jetson TK1, i.e.
$ ipmacc --list-devices CUDA spec of CUDA-capable devices: CUDA Device Query... There are 1 CUDA devices. CUDA Device #0 Major revision number: 3 Minor revision number: 2 Name: GK20A Total global memory: 1827323904 Total shared memory per block: 49152 Total registers per block: 32768 Warp size: 32 Maximum memory pitch: 2147483647 Maximum threads per block: 1024 Maximum dimension 0 of block: 1024 Maximum dimension 1 of block: 1024 Maximum dimension 2 of block: 64 Maximum dimension 0 of grid: 2147483647 Maximum dimension 1 of grid: 65535 Maximum dimension 2 of grid: 65535 Clock rate: 852000 Total constant memory: 65536 Texture alignment: 512 Concurrent copy and execution: Yes Number of multiprocessors: 1 Kernel execution timeout: No
ipmacc comes with
test-case. To verify the installation try to
compile few example codes from that directory.
$ cd ~/src/ipmacc/test-case $ ipmacc vectorAdd.c -o vectorAdd.x warning: Storing the translated code in <vectorAdd_ipmacc.cu> (target: <nvcuda>) $ ./vectorAdd.x Calculation on GPU ... 27.3380 ms Calculation on GPU ... 0.2380 ms Calculation on GPU ... 0.2010 ms Calculation on CPU ... 0.0260 ms OpenACC vectoradd test was successful!
test-case directory contains much more examples of OpenACC codes.
Enjoy your free OpenACC compiler on your ARM system!