sc :: elementary



Nvidia drop-in BLAS in NumPy

Optimized general purpose BLAS from NVIDIA

As a part of CUDA 6.0 NVIDIA has prepared a drop-in version of BLAS library i.e. nvblas. It uses cuBLAS and its back-end library. However, the memory management is fully hidden inside the "standard" BLAS routines. This brings the ability to effectively use cuBLAS in all programs that call BLAS routines via its standard interface. More than that, there is no need to recompile the program since the library can be preloaded using the LD_PRELOAD variable.

NVIDIA Tesla K40 card

nvblas supports only BLAS level 3 operations and the free license (the same as for CUDA 6.0) allows you to use 1 NVIDIA board per node. nvblas has the ability to use more than one board per node but requires a premier license for cuBLAS library. Evaluation license is however available in CUDA Developers Zone.

One configuration file nvblas.conf is needed and must be placed in the working directory or the path to it has to be defined with NVBLAS_CONFIG_FILE environmental variable. The example file looks like this

NVBLAS_LOGFILE  nvblas.log

The NVBLAS_CPU_BLAS_LIB variable defines the CPU BLAS library which will be used as a fall-back library. It has to be a standard fully functional BLAS library. It is best to put here BLAS that was used during linking, e.g. NETLIB's, Intel's MKL etc.


To make sure that NumPy is linked against you have to modify the site.cfg file before the compilation. Copy the site.cfg.example to site.cfg, search for section marked [mkl] and uncomment it. Make the following changes. Of course you have to change the paths for the ones for your installation of MKL

library_dirs = /opt/intel/composer_xe_2013/mkl/lib/intel64
include_dirs = /opt/intel/composer_xe_2013/mkl/include
mkl_libs = mkl_rt
lapack_libs =

after that you are ready to build and install

$ python build
$ python install

You do not to use Intel compiler to link against MKL library, GCC compiler works perfectly. The last command may have to run as root if you are installing in the standard directory e.g. /usr. To be able to run your NumPy code, you need to add MKL libraries into the LD_LIBRARY_PATH variable, i.e.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/composer_xe_2013/mkl/lib/intel64

After this installation your NumPy will be using MKL for algebraic operations e.g. matrix-matrix multiplications. You are now also able to use NVIDIA's nvblas with your NumPy code. Consider simple python file

import numpy as np
A = np.random.random((n,n))
B = np.random.random((n,n))
C =

you would run this file with the following command

$ python ./

this will use MKL's library Now you may offload your BLAS level 3 operations like dgemm to the NVIDIA accelerator using the command

$ LD_PRELOAD="/opt/cuda-toolkit/6.0/lib64/" python ./

this will offload the calculations to the GPU. You may check its status, memory usage and the cores utilization using nvidia-smi command. will try to load other libraries so your LD_LIBRARY_PATH variable should have the path to your CUDA libraries. In the case above

$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cuda-toolkit/6.0/lib64

Keep in mind that your NumPy is still naively using and for any BLAS routine that is not present in the it will rely on MKL. Therefore, MKL should still be present in your LD_LIBRARY_PATH variable.