Advanced installation

In this section we discuss some important details regarding code performance when using PyLops.

To get the most out of PyLops operators in terms of speed you will need to follow these guidelines as much as possible or ensure that the Python libraries used by PyLops are efficiently installed (e.g., allow multithreading) in your systemt.

Dependencies

PyLops relies on the numpy and scipy libraries and being able to link these to the most performant BLAS will ensure optimal performance of PyLops when using only required dependencies.

As already mentioned in the Installation page, we strongly encourage using the Anaconda Python distribution as numpy and scipy will be automatically linked to the Intel MKL library, which is per today the most performant library for basic linear algebra operations (if you don’t believe it, take a read at this blog post).

The best way to understand which BLAS library is currently linked to your numpy and scipy libraries is to run the following commands in ipython:

import numpy as np
import scipy as sp
print(np.__config__.show())
print(sp.__config__.show())

You should be able to understand if your numpy and scipy are linked to Intel MKL or something else.

Note

Unfortunately, PyLops is so far only shipped with PyPI, meaning that if you have not already installed numpy and scipy in your environment they will be installed as part of the installation process of the pylops library, all of those using pip. This comes with the disadvantage that numpy and scipy are linked to OpenBlas instead of Intel MKL, leading to a loss of performance. To prevent this, we suggest the following strategy:

  • create conda environment, e.g. conda create -n envname python=3.6.4 numpy scipy
  • install pylops using pip install pylops

Finally, it is always important to make sure that your environment variable OMP_NUM_THREADS is correctly set to the maximum number of threads you would like to use in your code. If that is not the case numpy and scipy will underutilize your hardware even if linked to a performant BLAS library.

For example, first set OMP_NUM_THREADS=1 (single-threaded) in your terminal:

>> export OMP_NUM_THREADS=1

and run the following code in python:

import os
import numpy as np
from timeit import timeit

size = 4096
A = np.random.random((size, size)),
B = np.random.random((size, size))
print('Time with %s threads: %f s' \
      %(os.environ.get('OMP_NUM_THREADS'),
        timeit(lambda: np.dot(A, B), number=4)))

Subsequently set OMP_NUM_THREADS=2, or any higher number of threads available in your hardware (multi-threaded):

>> export OMP_NUM_THREADS=2

and run the same python code. By both looking at your processes (e.g. using top) and at the python print statement you should see a speed-up in the second case.

Alternatively, you could set the OMP_NUM_THREADS variable directly inside your script using os.environ['OMP_NUM_THREADS']=str(2). Moreover, note that when using Intel MKL you can alternatively set the MKL_NUM_THREADS instead of OMP_NUM_THREADS: this could be useful if your code runs other parallel processes which you can control indipendently from the Intel MKL ones using OMP_NUM_THREADS.

Note

Always remember to set OMP_NUM_THREADS (or MKL_NUM_THREADS) in your enviroment when using PyLops

Optional dependencies

To avoid increasing the number of required dependencies, which may lead to conflicts with other libraries that you have in your system, we have decided to build some of the additional features of PyLops in such a way that if an optional dependency is not present in your python environment, a safe fallback to one of the required dependencies will be enforced.

When available in your system, we reccomend using the Conda package manager and install all the mandatory and optional dependencies of PyLops at once using the command:

>> conda install -c conda-forge pylops

in this case all dependencies will be installed from their conda distributions.

Alternatively, from version 1.4.0 optional dependencies can also be installed as part of the pip installation via:

>> pip install pylops[advanced]

Dependencies are however installed from their PyPI wheels.

An exception is however represented by cupy. This library is NOT installed automatically. Users interested to accelerate their compuations with the aid of GPUs should install it prior to installing pylops (see below for more details).

numba

Although we always stive to write code for forward and adjoint operators that takes advantage of the perks of numpy and scipy (e.g., broadcasting, ufunc), in some case we may end up using for loops that may lead to poor performance. In those cases we may decide to implement alternative (optional) back-ends in numba.

In this case a user can simply switch from the native, always available implementation to the numba implementation by simply providing the following additional input parameter to the operator engine='numba'. This is for example the case in the pylops.signalprocessing.Radon2D.

If interested to use numba backend from conda, you will need to manually install it:

>> conda install numba

Finally, it is also advised to install the additional package icc_rt.

>> conda install -c numba icc_rt

or pip equivalent. Similarly to Intel MKL, you need to set the environment variable NUMBA_NUM_THREADS to tell numba how many threads to use. If this variable is not present in your environment, numba code will be compiled with parallel=False.

fft routines

Two different engines are provided by the pylops.signalprocessing.FFT operator for fft and ifft routines in the forward and adjoint modes: engine='numpy' (default) and engine='fftw'.

The first engine comes as default as numpy is part of the dependencies of PyLops and automatically installed when PyLops is installed if not already available in your Python distribution.

The second engine implements the well-known FFTW via the python wrapper pyfftw.FFTW. This optimized fft tends to outperform the one from numpy in many cases, however it has not been inserted in the mandatory requirements of PyLops, meaning that when installing PyLops with pip, pyfftw.FFTW will not be installed automatically.

Again, if interested to use FFTW backend from conda, you will need to manually install it:

>> conda install -c conda-forge pyfftw

or pip equivalent.

skfmm

This library is used to compute traveltime tables with the fast-marching method in the initialization of the pylops.waveeqprocessing.Demigration operator when choosing mode == 'eikonal'.

As this may not be of interest for many users, this library has not been inserted in the mandatory requirements of PyLops. If interested to use skfmm, you will need to manually install it:

>> conda install -c conda-forge scikit-fmm

or pip equivalent.

spgl1

This library is used to solve sparsity-promoting BP, BPDN, and LASSO problems in pylops.optimization.sparsity.SPGL1 solver.

If interested to use spgl1, you can manually install it:

>> pip install spgl1

pywt

This library is used to implement the Wavelet operators.

If interested to use pywt, you can manually install it:

>> conda install pywavelets

or pip equivalent.

Note

If you are a developer, all the above optional dependencies can also be installed automatically by cloning the repository and installing pylops via make dev-install or make dev-install_conda.

cupy

This library is used as a drop-in replacement to numpy for GPU-accelerated computations. Since many different versions of cupy exist (based on the CUDA drivers of the GPU), users must install cupy prior to installing pylops. PyLops will automatically check if cupy is installed and in that case use it any time the input vector passed to an operator is of cupy type. For more details of GPU-accelerated PyLops read the GPU support section.

cusignal

This library is used as a drop-in replacement to scipy.signal for GPU-accelerated computations. Similar to cupy, users must install cusignal prior to installing pylops. PyLops will automatically check if cusignal is installed and in that case use it any time the input vector passed to an operator is of cusignal type. For more details of GPU-accelerated PyLops read the GPU support section.