GPU / TPU Support#
Overview#
From v1.12.0
, PyLops supports computations on GPUs powered by
CuPy (cupy-cudaXX>=13.0.0
).
This library must be installed before PyLops is installed.
From v2.3.0
, PyLops supports also computations on GPUs/TPUs powered by
JAX.
This library must be installed before PyLops is installed.
Note
Set environment variables CUPY_PYLOPS=0
and/or JAX_PYLOPS=0
to force PyLops to ignore
cupy
and jax
backends. This can be also used if a previous version of cupy
or jax
is installed in your system, otherwise you will get an error when importing PyLops.
Apart from a few exceptions, all operators and solvers in PyLops can
seamlessly work with numpy
arrays on CPU as well as with cupy/jax
arrays
on GPU. For CuPy, users simply need to consistently create operators and
provide data vectors to the solvers, e.g., when using
pylops.MatrixMult
the input matrix must be a
cupy
array if the data provided to a solver is also cupy
array.
For JAX, apart from following the same procedure described for CuPy, the PyLops operator must
be also wrapped into a pylops.JaxOperator
.
See below for a comphrensive list of supported operators and additional functionalities for both the
cupy
and jax
backends.
Examples#
Letβs now briefly look at some use cases.
End-to-end GPU powered inverse problems#
First we consider the most common scenario when both the model and data
vectors fit onto the GPU memory. We can therefore simply replace all our
numpy
arrays with cupy
arrays and solve the inverse problem of
interest end-to-end on the GPU.

Letβs first write a code snippet using numpy
arrays, which PyLops
will run on your CPU:
ny, nx = 400, 400
G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = np.ones(nx, dtype=np.float32)
# Create operator
Gop = MatrixMult(G, dtype='float32')
# Create data and invert
y = Gop @ x
xest = Gop / y
Now we write a code snippet using cupy
arrays, which PyLops will run on
your GPU:
ny, nx = 400, 400
G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = cp.ones(nx, dtype=np.float32)
# Create operator
Gop = MatrixMult(G, dtype='float32')
# Create data and invert
y = Gop @ x
xest = Gop / y
The code is almost unchanged apart from the fact that we now use cupy
arrays,
PyLops will figure this out.
Similarly, we write a code snippet using jax
arrays which PyLops will run on
your GPU/TPU:
ny, nx = 400, 400
G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
x = jnp.ones(nx, dtype=np.float32)
# Create operator
Gop = JaxOperator(MatrixMult(G, dtype='float32'))
# Create data and invert
y = Gop @ x
xest = Gop / y
# Adjoint via AD
xadj = Gop.rmatvecad(x, y)
Again, the code is almost unchanged apart from the fact that we now use jax
arrays.
Mixed CPU-GPU powered inverse problems#
Let us now consider a more intricate scenario where we have access to a GPU-powered operator, however the model and/or data vectors are too large to fit onto the GPU memory (or VRAM).
For the sake of clarity, we consider a problem where
the operator can be written as a pylops.BlockDiag
of
PyLops operators. Note how, by simply sandwiching any of the GPU-powered
operator within two pylops.ToCupy
operators, we are
able to tell PyLops to transfer to the GPU only the part of the model vector
required by a given operator and transfer back the output to the CPU before
forming the combine output vector (i.e., the output vector of the
pylops.BlockDiag
).

nops, n = 5, 4
Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
for i in range(nops)]
Ms = [M.T @ M for M in Ms]
# Create operator
Mops = []
for iop in range(nops):
Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
Top = ToCupy(Mop.dims, dtype=dtype)
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
Mop = Top1.H @ Mop @ Top
Mops.append(Mop)
Mops = BlockDiag(Mops, forceflat=True)
# Create data and invert
x = np.ones(n * nops, dtype=dtype)
y = Mops @ x.ravel()
xest = Mops / y
Finally, let us consider a problem where
the operator can be written as a pylops.VStack
of
PyLops operators and the model vector can be fully transferred to the GPU.
We can use again the pylops.ToCupy
operator, however this
time we will only use it to move the output of each operator to the CPU.
Since we are now in a special scenario, where the input of the overall
operator sits on the GPU and the output on the
CPU, we need to inform the pylops.VStack
operator about this.
This can be easily done using the additional inoutengine
parameter. Letβs
see this with an example.

nops, n, m = 3, 4, 5
Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]
# Create operator
Mops = []
for iop in range(nops):
Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
Top1 = ToCupy(Mop.dimsd, dtype=dtype)
Mop = Top1.H @ Mop
Mops.append(Mop)
Mops = VStack(Mops, inoutengine=("numpy", "cupy"))
# Create data and invert
x = cp.ones(m, dtype=dtype)
y = Mops @ x.ravel()
xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]
These features are currently not available for jax
arrays.
Supported Operators#
In the following, we provide a list of methods in pylops.LinearOperator
with their current status (available on CPU,
GPU with CuPy, and GPU with JAX):
Operator/method |
CPU |
GPU with CuPy |
GPU/TPU with JAX |
---|---|---|---|
|
β |
π΄ |
π΄ |
|
β |
β |
β |
|
β |
β |
β |
|
β |
π΄ |
π΄ |
|
β |
β |
β |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
Similarly, we provide a list of operators with their current status.
Basic operators:
Operator/method |
CPU |
GPU with CuPy |
GPU/TPU with JAX |
---|---|---|---|
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
π΄ |
π΄ |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
Smoothing and derivatives:
Operator/method |
CPU |
GPU with CuPy |
GPU/TPU with JAX |
---|---|---|---|
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
Signal processing:
Operator/method |
CPU |
GPU with CuPy |
GPU/TPU with JAX |
---|---|---|---|
β |
β |
β οΈ |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
β |
|
β |
β |
π΄ |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
π΄ |
|
β |
β |
β |
Wave-Equation processing
Operator/method |
CPU |
GPU with CuPy |
GPU/TPU with JAX |
---|---|---|---|
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β |
|
β |
π΄ |
π΄ |
|
β |
π΄ |
π΄ |
Geophysical subsurface characterization:
Operator/method |
CPU |
GPU with CuPy |
GPU/TPU with JAX |
---|---|---|---|
β |
β |
β |
|
β |
β |
β |
|
β |
β |
β οΈ |
|
β |
β |
β οΈ |
Warning
1. The JAX backend of the pylops.signalprocessing.Convolve1D
operator
currently works only with 1d-arrays due to a different behaviour of
scipy.signal.convolve
and jax.scipy.signal.convolve
with
nd-arrays.
2. The JAX backend of the pylops.avo.prestack.PrestackLinearModelling
operator currently works only with explicit=True
due to the same issue as
in point 1 for the pylops.signalprocessing.Convolve1D
operator employed
when explicit=False
.