{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nLinear Regression\n=================\n\nThis example shows how to use the :py:class:`pylops.LinearRegression` operator\nto perform *Linear regression analysis*.\n\nIn short, linear regression is the problem of finding the best fitting\ncoefficients, namely intercept $\\mathbf{x_0}$ and gradient\n$\\mathbf{x_1}$, for this equation:\n\n    .. math::\n        y_i = x_0 + x_1 t_i   \\qquad \\forall i=1,2,...,N\n\nAs we can express this problem in a matrix form:\n\n    .. math::\n        \\mathbf{y}=  \\mathbf{A} \\mathbf{x}\n\nour solution can be obtained by solving the following optimization problem:\n\n    .. math::\n        J= ||\\mathbf{y} - \\mathbf{A} \\mathbf{x}||_2\n\nSee documentation of :py:class:`pylops.LinearRegression` for more detailed\ndefinition of the forward problem.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport matplotlib.pyplot as plt\n\nimport pylops\n\nplt.close('all')\nnp.random.seed(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define the input parameters: number of samples along the t-axis (``N``),\nlinear regression coefficients (``x``), and standard deviation of noise\nto be added to data (``sigma``).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "N = 30\nx = np.array([1., 2.])\nsigma = 1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's create the time axis and initialize the\n:py:class:`pylops.LinearRegression` operator\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "t = np.arange(N, dtype='float64')\nLRop = pylops.LinearRegression(t, dtype='float64')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can then apply the operator in forward mode to compute our data points\nalong the x-axis (``y``). We will also generate some random gaussian noise\nand create a noisy version of the data (``yn``).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "y = LRop*x\nyn = y + np.random.normal(0, sigma, N)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We are now ready to solve our problem. As we are using an operator from the\n:py:class:`pylops.LinearOperator` family, we can simply use ``/``,\nwhich in this case will solve the system by means of an iterative solver\n(i.e., :py:func:`scipy.sparse.linalg.lsqr`).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "xest = LRop / y\nxnest = LRop / yn"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's plot the best fitting line for the case of noise free and noisy data\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(5, 7))\nplt.plot(np.array([t.min(), t.max()]),\n         np.array([t.min(), t.max()]) * x[1] + x[0], 'k', lw=4,\n         label=r'true: $x_0$ = %.2f, $x_1$ = %.2f' % (x[0], x[1]))\nplt.plot(np.array([t.min(), t.max()]),\n         np.array([t.min(), t.max()]) * xest[1] + xest[0], '--r', lw=4,\n         label=r'est noise-free: $x_0$ = %.2f, $x_1$ = %.2f'\n         %(xest[0], xest[1]))\nplt.plot(np.array([t.min(), t.max()]),\n         np.array([t.min(), t.max()]) * xnest[1] + xnest[0], '--g', lw=4,\n         label=r'est noisy: $x_0$ = %.2f, $x_1$ = %.2f'\n         %(xnest[0], xnest[1]))\nplt.scatter(t, y, c='r', s=70)\nplt.scatter(t, yn, c='g', s=70)\nplt.legend()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Once that we have estimated the best fitting coefficients $\\mathbf{x}$\nwe can now use them to compute the *y values* for a different set of values\nalong the *t-axis*.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "t1 = np.linspace(-N, N, 2*N, dtype='float64')\ny1 = LRop.apply(t1, xest)\n\nplt.figure(figsize=(5, 7))\nplt.plot(t, y, 'k', label='Original axis')\nplt.plot(t1, y1, 'r', label='New axis')\nplt.scatter(t, y, c='k', s=70)\nplt.scatter(t1, y1, c='r', s=40)\nplt.legend()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We consider now the case where some of the observations have large errors.\nSuch elements are generally referred to as *outliers* and can affect the\nquality of the least-squares solution if not treated with care. In this\nexample we will see how using a L1 solver such as\n:py:func:`pylops.optimization.sparsity.IRLS` can drammatically improve the\nquality of the estimation of intercept and gradient.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Add outliers\nyn[1] += 40\nyn[N-2] -= 20\n\n# IRLS\nnouter = 20\nepsR = 1e-2\nepsI = 0\ntolIRLS = 1e-2\n\nxnest = LRop / yn\nxirls, nouter, xirls_hist, rw_hist = \\\n    pylops.optimization.sparsity.IRLS(LRop, yn, nouter, threshR=False,\n                                      epsR=epsR, epsI=epsI,\n                                      tolIRLS=tolIRLS, returnhistory=True)\nprint('IRLS converged at %d iterations...' % nouter)\n\nplt.figure(figsize=(5, 7))\nplt.plot(np.array([t.min(), t.max()]),\n         np.array([t.min(), t.max()]) * x[1] + x[0], 'k', lw=4,\n         label=r'true: $x_0$ = %.2f, $x_1$ = %.2f' % (x[0], x[1]))\nplt.plot(np.array([t.min(), t.max()]),\n         np.array([t.min(), t.max()]) * xnest[1] + xnest[0], '--r', lw=4,\n         label=r'L2: $x_0$ = %.2f, $x_1$ = %.2f' %\n         (xnest[0], xnest[1]))\nplt.plot(np.array([t.min(), t.max()]),\n         np.array([t.min(), t.max()]) * xirls[1] + xirls[0], '--g', lw=4,\n         label=r'L1 - IRSL: $x_0$ = %.2f, $x_1$ = %.2f' %\n         (xirls[0], xirls[1]))\nplt.scatter(t, y, c='r', s=70)\nplt.scatter(t, yn, c='g', s=70)\nplt.legend()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's finally take a look at the convergence of IRLS. First we visualize\nthe evolution of intercept and gradient\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "fig, axs = plt.subplots(2, 1, figsize=(8, 10))\nfig.suptitle('IRLS evolution', fontsize=14,\n             fontweight='bold', y=0.95)\naxs[0].plot(xirls_hist[:, 0], xirls_hist[:, 1], '.-k', lw=2, ms=20)\naxs[0].scatter(x[0], x[1], c='r', s=70)\naxs[0].set_title('Intercept and gradient')\naxs[0].grid()\nfor iiter in range(nouter):\n    axs[1].semilogy(rw_hist[iiter],\n                    color=(iiter/nouter, iiter/nouter, iiter/nouter),\n                    label='iter%d' %iiter)\naxs[1].set_title('Weights')\naxs[1].legend(loc=5, fontsize='small')\nplt.tight_layout()\nplt.subplots_adjust(top=0.8)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}