In this section we discuss some important details regarding code performance when using PyLops.
To get the most out of PyLops operators in terms of speed you will need to follow these guidelines as much as possible or ensure that the Python libraries used by PyLops are efficiently installed (e.g., allow multithreading) in your systemt.
As already mentioned in the Installation page, we strongly encourage using
the Anaconda Python distribution as
numpy and scipy will be automatically linked to the
library, which is per today the most performant library for basic linear algebra
operations (if you don’t believe it, take a read at this
The best way to understand which
BLAS library is currently linked to your
numpy and scipy libraries is to run the following commands in ipython:
import numpy as np import scipy as sp print(np.__config__.show()) print(scipy.__config__.show())
You should be able to understand if your numpy and scipy are
Intel MKL or something else.
Unfortunately, PyLops is so far only shipped with PyPI, meaning that if you
have not already installed numpy and scipy in your environment they will be installed as
part of the installation process of the pylops library, all of those using
pip. This comes with
the disadvantage that numpy and scipy are linked to
OpenBlas instead of
leading to a loss of performance. To prevent this, we suggest the following strategy:
- create conda environment, e.g.
conda create -n envname python=3.6.4 numpy scipy
- install pylops using
pip install pylops
Finally, it is always important to make sure that your environment variable
correctly set to the maximum number of threads you would like to use in your code. If that is not the
case numpy and scipy will underutilize your hardware even if linked to a performant
For example, first set
OMP_NUM_THREADS=1 (single-threaded) in your terminal:
>> export OMP_NUM_THREADS=1
and run the following code in python:
import os import numpy as np from timeit import timeit size = 4096 A = np.random.random((size, size)), B = np.random.random((size, size)) print('Time with %s threads: %f s' \ %(os.environ.get('OMP_NUM_THREADS'), timeit(lambda: np.dot(A, B), number=4)))
OMP_NUM_THREADS=2, or any higher number of threads available
in your hardware (multi-threaded):
>> export OMP_NUM_THREADS=2
and run the same python code. By both looking at your processes (e.g. using
top) and at the
python print statement you should see a speed-up in the second case.
Alternatively, you could set the
OMP_NUM_THREADS variable directly
inside your script using
Moreover, note that when using
Intel MKL you can alternatively set
MKL_NUM_THREADS instead of
OMP_NUM_THREADS: this could
be useful if your code runs other parallel processes which you can
control indipendently from the
Intel MKL ones using
Always remember to set
in your enviroment when using PyLops
To avoid increasing the number of required dependencies, which may lead to conflicts with other libraries that you have in your system, we have decided to build some of the additional features of PyLops in such a way that if an optional dependency is not present in your python environment, a safe fallback to one of the required dependencies will be enforced.
When available in your system, we reccomend using the Conda package manager and install all the mandatory and optional dependencies of PyLops at once using the command:
>> conda install -c conda-forge pylops
in this case all dependencies will be installed from their conda distributions.
Alternatively, from version
1.4.0 optional dependencies can also be installed as
part of the pip installation via:
>> pip install pylops[advanced]
Dependencies are however installed from their PyPI wheels.
Although we always stive to write code for forward and adjoint operators that takes advantage of the perks of numpy and scipy (e.g., broadcasting, ufunc), in some case we may end up using for loops that may lead to poor performance. In those cases we may decide to implement alternative (optional) back-ends in numba.
In this case a user can simply switch from the native,
always available implementation to the numba implementation by simply providing the following
additional input parameter to the operator
engine='numba'. This is for example the case in the
If interested to use
numba backend from conda, you will need to manually install it:
>> conda install numba
Finally, it is also advised to install the additional package icc_rt.
>> conda install -c numba icc_rt
or pip equivalent. Similarly to
Intel MKL, you need to set the environment variable
NUMBA_NUM_THREADS to tell numba how many threads to use.
Two different engines are provided by the
pylops.signalprocessing.FFT operator for
ifft routines in the forward and adjoint modes:
The first engine comes as default as numpy is part of the dependencies of PyLops and automatically installed when PyLops is installed if not already available in your Python distribution.
The second engine implements the well-known FFTW
via the python wrapper
pyfftw.FFTW. This optimized fft tends to
outperform the one from numpy in many cases, however it has not been inserted
in the mandatory requirements of PyLops, meaning that when installing PyLops with
pyfftw.FFTW will not be installed automatically.
Again, if interested to use
FFTW backend from conda, you will need to manually install it:
>> conda install -c conda-forge pyfftw
or pip equivalent.
This library is used to compute traveltime tables with the fast-marching method in the
initialization of the
mode == 'eikonal'.
As this may not be of interest for many users, this library has not been inserted
in the mandatory requirements of PyLops. If interested to use
you will need to manually install it:
>> conda install -c conda-forge scikit-fmm
or pip equivalent.
This library is used to solve sparsity-promoting BP, BPDN, and LASSO problems
If interested to use
spgl1, you can manually install it:
>> pip install spgl1
If you are a developer, all the optional dependencies can also be
installed automatically by cloning the repository and installing
make dev-install or