🎮 GPU / TPU Support

Overview

From v1.12.0, PyLops supports computations on GPUs powered by CuPy (cupy-cudaXX>=13.0.0). This library must be installed before PyLops is installed.

From v2.3.0, PyLops supports also computations on GPUs/TPUs powered by JAX. This library must be installed before PyLops is installed.

Note

Set environment variables CUPY_PYLOPS=0 and/or JAX_PYLOPS=0 to force PyLops to ignore cupy and jax backends. This can be also used if a previous version of cupy or jax is installed in your system, otherwise you will get an error when importing PyLops.

Apart from a few exceptions, all operators and solvers in PyLops can seamlessly work with numpy arrays on CPU as well as with cupy/jax arrays on GPU. For CuPy, users simply need to consistently create operators and provide data vectors to the solvers, e.g., when using pylops.MatrixMult the input matrix must be a cupy array if the data provided to a solver is also cupy array. For JAX, apart from following the same procedure described for CuPy, the PyLops operator must be also wrapped into a pylops.JaxOperator.

See below for a comphrensive list of supported operators and additional functionalities for both the cupy and jax backends.

Install dependencies

GPU-enabled development environments can created using conda

>> make dev-install_conda_gpu

or pip

>> make dev-install_gpu

Examples

Let’s now briefly look at some use cases.

End-to-end GPU powered inverse problems

First we consider the most common scenario when both the model and data vectors fit onto the GPU memory. We can therefore simply replace all our numpy arrays with cupy arrays and solve the inverse problem of interest end-to-end on the GPU.

_images/cupy_diagram.png

Let’s first write a code snippet using numpy arrays, which PyLops will run on your CPU:

ny, nx = 400, 400
G = np.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = np.ones(nx, dtype=np.float32)

# Create operator
Gop = MatrixMult(G, dtype='float32')

# Create data and invert
y = Gop @ x
xest = Gop / y

Now we write a code snippet using cupy arrays, which PyLops will run on your GPU:

ny, nx = 400, 400
G = cp.random.normal(0, 1, (ny, nx)).astype(np.float32)
x = cp.ones(nx, dtype=np.float32)

# Create operator
Gop = MatrixMult(G, dtype='float32')

# Create data and invert
y = Gop @ x
xest = Gop / y

The code is almost unchanged apart from the fact that we now use cupy arrays, PyLops will figure this out.

Similarly, we write a code snippet using jax arrays which PyLops will run on your GPU/TPU:

ny, nx = 400, 400
G = jnp.array(np.random.normal(0, 1, (ny, nx)).astype(np.float32))
x = jnp.ones(nx, dtype=np.float32)

# Create operator
Gop = JaxOperator(MatrixMult(G, dtype='float32'))

# Create data and invert
y = Gop @ x
xest = Gop / y

# Adjoint via AD
xadj = Gop.rmatvecad(x, y)

Again, the code is almost unchanged apart from the fact that we now use jax arrays.

Mixed CPU-GPU powered inverse problems

Let us now consider a more intricate scenario where we have access to a GPU-powered operator, however the model and/or data vectors are too large to fit onto the GPU memory (or VRAM).

For the sake of clarity, we consider a problem where the operator can be written as a pylops.BlockDiag of PyLops operators. Note how, by simply sandwiching any of the GPU-powered operator within two pylops.ToCupy operators, we are able to tell PyLops to transfer to the GPU only the part of the model vector required by a given operator and transfer back the output to the CPU before forming the combine output vector (i.e., the output vector of the pylops.BlockDiag).

_images/numpy_cupy_bd_diagram.png
nops, n = 5, 4
Ms = [np.diag((i + 1) * np.ones(n, dtype=dtype)) \
         for i in range(nops)]
Ms = [M.T @ M for M in Ms]

# Create operator
Mops = []
for iop in range(nops):
   Mop = MatrixMult(cp.asarray(Ms[iop], dtype=dtype))
   Top = ToCupy(Mop.dims, dtype=dtype)
   Top1 = ToCupy(Mop.dimsd, dtype=dtype)
   Mop = Top1.H @ Mop @ Top
   Mops.append(Mop)
Mops = BlockDiag(Mops, forceflat=True)

# Create data and invert
x = np.ones(n * nops, dtype=dtype)
y = Mops @ x.ravel()
xest = Mops / y

Finally, let us consider a problem where the operator can be written as a pylops.VStack of PyLops operators and the model vector can be fully transferred to the GPU. We can use again the pylops.ToCupy operator, however this time we will only use it to move the output of each operator to the CPU. Since we are now in a special scenario, where the input of the overall operator sits on the GPU and the output on the CPU, we need to inform the pylops.VStack operator about this. This can be easily done using the additional inoutengine parameter. Let’s see this with an example.

_images/numpy_cupy_vs_diagram.png
nops, n, m = 3, 4, 5
Ms = [np.random.normal(0, 1, (n, m)) for _ in range(nops)]

# Create operator
Mops = []
for iop in range(nops):
   Mop = MatrixMult(cp.asarray(Ms[iop]), dtype=dtype)
   Top1 = ToCupy(Mop.dimsd, dtype=dtype)
   Mop = Top1.H @ Mop
   Mops.append(Mop)
Mops = VStack(Mops, inoutengine=("numpy", "cupy"))

# Create data and invert
x = cp.ones(m, dtype=dtype)
y = Mops @ x.ravel()
xest = pylops_cgls(Mops, y, x0=cp.zeros_like(x))[0]

These features are currently not available for jax arrays.

Note

More examples for the CuPy and JAX backends be found at link1 and link2.

Supported Operators

In the following, we provide a list of methods in pylops.LinearOperator with their current status (available on CPU, GPU with CuPy, and GPU with JAX):

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.LinearOperator.cond

🔴

🔴

pylops.LinearOperator.conj

pylops.LinearOperator.div

pylops.LinearOperator.eigs

🔴

🔴

pylops.LinearOperator.todense

pylops.LinearOperator.tosparse

🔴

🔴

pylops.LinearOperator.trace

🔴

🔴

Similarly, we provide a list of operators with their current status.

Basic operators:

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.basicoperators.MatrixMult

pylops.basicoperators.Identity

pylops.basicoperators.Zero

pylops.basicoperators.Diagonal

pylops.basicoperators.Transpose

pylops.basicoperators.Flip

pylops.basicoperators.Roll

pylops.basicoperators.Pad

pylops.basicoperators.Sum

pylops.basicoperators.Symmetrize

pylops.basicoperators.Restriction

pylops.basicoperators.Regression

pylops.basicoperators.LinearRegression

pylops.basicoperators.CausalIntegration

pylops.basicoperators.Spread

🔴

🔴

pylops.basicoperators.VStack

pylops.basicoperators.HStack

pylops.basicoperators.Block

pylops.basicoperators.BlockDiag

Smoothing and derivatives:

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.basicoperators.FirstDerivative

pylops.basicoperators.SecondDerivative

pylops.basicoperators.Laplacian

pylops.basicoperators.Gradient

pylops.basicoperators.FirstDirectionalDerivative

pylops.basicoperators.SecondDirectionalDerivative

Signal processing:

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.signalprocessing.Convolve1D

⚠️

pylops.signalprocessing.Convolve2D

pylops.signalprocessing.ConvolveND

pylops.signalprocessing.NonStationaryConvolve1D

pylops.signalprocessing.NonStationaryFilters1D

pylops.signalprocessing.NonStationaryConvolve2D

🔴

pylops.signalprocessing.NonStationaryFilters2D

🔴

pylops.signalprocessing.Interp

pylops.signalprocessing.Bilinear

🔴

pylops.signalprocessing.FFT

pylops.signalprocessing.FFT2D

pylops.signalprocessing.FFTND

pylops.signalprocessing.Shift

pylops.signalprocessing.DWT

🔴

🔴

pylops.signalprocessing.DWT2D

🔴

🔴

pylops.signalprocessing.DCT

🔴

🔴

pylops.signalprocessing.Seislet

🔴

🔴

pylops.signalprocessing.Radon2D

🔴

🔴

pylops.signalprocessing.Radon3D

🔴

🔴

pylops.signalprocessing.FourierRadon2D

🔴

pylops.signalprocessing.FourierRadon3D

🔴

pylops.signalprocessing.ChirpRadon2D

pylops.signalprocessing.ChirpRadon3D

pylops.signalprocessing.PWSprayer2D

🔴

🔴

pylops.signalprocessing.PWSmoother2D

🔴

🔴

pylops.signalprocessing.Sliding1D

🔴

pylops.signalprocessing.Sliding2D

🔴

pylops.signalprocessing.Sliding3D

🔴

pylops.signalprocessing.Patch2D

🔴

pylops.signalprocessing.Patch3D

🔴

pylops.signalprocessing.Fredholm1

Wave-Equation processing

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.avo.avo.PressureToVelocity

pylops.avo.avo.UpDownComposition2D

pylops.avo.avo.UpDownComposition3D

pylops.avo.avo.BlendingContinuous

pylops.avo.avo.BlendingGroup

pylops.avo.avo.BlendingHalf

pylops.avo.avo.MDC

pylops.avo.avo.Kirchhoff

🔴

🔴

pylops.avo.avo.AcousticWave2D

🔴

🔴

Geophysical subsurface characterization:

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.avo.avo.AVOLinearModelling

pylops.avo.poststack.PoststackLinearModelling

pylops.avo.prestack.PrestackLinearModelling

⚠️

pylops.avo.prestack.PrestackWaveletModelling

⚠️

Medical:

Operator/method

CPU

GPU with CuPy

GPU/TPU with JAX

pylops.medical.ct.CT2D

pylops.medical.mri.MRI2D

🔴

Warning

1. The JAX backend of the pylops.signalprocessing.Convolve1D operator currently works only with 1d-arrays due to a different behaviour of scipy.signal.convolve and jax.scipy.signal.convolve with nd-arrays.

2. The JAX backend of the pylops.avo.prestack.PrestackLinearModelling operator currently works only with explicit=True due to the same issue as in point 1 for the pylops.signalprocessing.Convolve1D operator employed when explicit=False.