sc :: elementary

apr'15

09

Nvidia drop-in BLAS in NumPy

Optimized general purpose BLAS from NVIDIA

As a part of CUDA 6.0 NVIDIA has prepared a drop-in version of BLAS library i.e. nvblas. It uses cuBLAS and its back-end library. However, the memory management is fully hidden inside the "standard" BLAS routines. This brings the ability to effectively use cuBLAS in all programs that call BLAS routines via its standard interface. More than that, there is no need to recompile the program since the libnvblas.so library can be preloaded using the LD_PRELOAD variable.

NVIDIA Tesla K40 card

nvblas supports only BLAS level 3 operations and the free license (the same as for CUDA 6.0) allows you to use 1 NVIDIA board per node. nvblas has the ability to use more than one board per node but requires a premier license for cuBLAS library. Evaluation license is however available in CUDA Developers Zone.

One configuration file nvblas.conf is needed and must be placed in the working directory or the path to it has to be defined with NVBLAS_CONFIG_FILE environmental variable. The example file looks like this

NVBLAS_LOGFILE  nvblas.log
NVBLAS_CPU_BLAS_LIB  libblas.so
NVBLAS_GPU_LIST ALL
NVBLAS_TILE_DIM 2048
NVBLAS_AUTOPIN_MEM_ENABLED

The NVBLAS_CPU_BLAS_LIB variable defines the CPU BLAS library which will be used as a fall-back library. It has to be a standard fully functional BLAS library. It is best to put here BLAS that was used during linking, e.g. NETLIB's libblas.so, Intel's MKL libmkl_rt.so etc.

NVBLAS in NumPy

To make sure that NumPy is linked against libmkl_rt.so you have to modify the site.cfg file before the compilation. Copy the site.cfg.example to site.cfg, search for section marked [mkl] and uncomment it. Make the following changes. Of course you have to change the paths for the ones for your installation of MKL

[mkl]
library_dirs = /opt/intel/composer_xe_2013/mkl/lib/intel64
include_dirs = /opt/intel/composer_xe_2013/mkl/include
mkl_libs = mkl_rt
lapack_libs =

after that you are ready to build and install

$ python setup.py build
$ python setup.py install

You do not to use Intel compiler to link against MKL library, GCC compiler works perfectly. The last command may have to run as root if you are installing in the standard directory e.g. /usr. To be able to run your NumPy code, you need to add MKL libraries into the LD_LIBRARY_PATH variable, i.e.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/composer_xe_2013/mkl/lib/intel64

After this installation your NumPy will be using MKL for algebraic operations e.g. matrix-matrix multiplications. You are now also able to use NVIDIA's nvblas with your NumPy code. Consider simple python file test.py

import numpy as np
n=10**4
A = np.random.random((n,n))
B = np.random.random((n,n))
C = A.dot(B)

you would run this file with the following command

$ python ./test.py

this will use MKL's library libmkl_rt.so. Now you may offload your BLAS level 3 operations like dgemm to the NVIDIA accelerator using the command

$ LD_PRELOAD="/opt/cuda-toolkit/6.0/lib64/libnvblas.so" python ./test.py

this will offload the calculations to the GPU. You may check its status, memory usage and the cores utilization using nvidia-smi command. libnvblas.so will try to load other libraries so your LD_LIBRARY_PATH variable should have the path to your CUDA libraries. In the case above

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda-toolkit/6.0/lib64

Keep in mind that your NumPy is still naively using libmkl_rt.so and for any BLAS routine that is not present in the libnvblas.so it will rely on MKL. Therefore, MKL should still be present in your LD_LIBRARY_PATH variable.