{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nPolynomial Regression\n=====================\n\nThis example shows how to use the :py:class:`pylops.Regression` operator\nto perform *Polynomial regression analysis*.\n\nIn short, polynomial regression is the problem of finding the best fitting\ncoefficients for the following equation:\n\n    .. math::\n        y_i = \\sum_{n=0}^{order} x_n t_i^n  \\qquad \\forall i=1,2,...,N\n\nAs we can express this problem in a matrix form:\n\n    .. math::\n        \\mathbf{y}=  \\mathbf{A} \\mathbf{x}\n\nour solution can be obtained by solving the following optimization problem:\n\n    .. math::\n        J= ||\\mathbf{y} - \\mathbf{A} \\mathbf{x}||_2\n\nSee documentation of :py:class:`pylops.Regression` for more detailed\ndefinition of the forward problem.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nimport matplotlib.pyplot as plt\n\nimport pylops\n\nplt.close('all')\nnp.random.seed(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define the input parameters: number of samples along the t-axis (``N``),\norder (``order``), regression coefficients (``x``), and standard deviation\nof noise to be added to data (``sigma``).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "N = 30\norder = 3\nx = np.array([1., .05, 0., -.01])\nsigma = 1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's create the time axis and initialize the\n:py:class:`pylops.Regression` operator\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "t = np.arange(N, dtype='float64') - N//2\nPRop = pylops.Regression(t, order=order, dtype='float64')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can then apply the operator in forward mode to compute our data points\nalong the x-axis (``y``). We will also generate some random gaussian noise\nand create a noisy version of the data (``yn``).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "y = PRop*x\nyn = y + np.random.normal(0, sigma, N)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We are now ready to solve our problem. As we are using an operator from the\n:py:class:`pylops.LinearOperator` family, we can simply use ``/``,\nwhich in this case will solve the system by means of an iterative solver\n(i.e., :py:func:`scipy.sparse.linalg.lsqr`).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "xest = PRop / y\nxnest = PRop / yn"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's plot the best fitting curve for the case of noise free and noisy data\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "plt.figure(figsize=(5, 7))\nplt.plot(t, PRop*x, 'k', lw=4,\n         label=r'true: $x_0$ = %.2f, $x_1$ = %.2f, '\n               r'$x_2$ = %.2f, $x_3$ = %.2f' % (x[0], x[1], x[2], x[3]))\nplt.plot(t, PRop*xest, '--r', lw=4,\n         label='est noise-free: $x_0$ = %.2f, $x_1$ = %.2f, '\n               r'$x_2$ = %.2f, $x_3$ = %.2f' %\n               (xest[0], xest[1], xest[2], xest[3]))\nplt.plot(t, PRop*xnest, '--g', lw=4,\n         label='est noisy: $x_0$ = %.2f, $x_1$ = %.2f, '\n               r'$x_2$ = %.2f, $x_3$ = %.2f' %\n               (xnest[0], xnest[1], xnest[2], xnest[3]))\nplt.scatter(t, y, c='r', s=70)\nplt.scatter(t, yn, c='g', s=70)\nplt.legend(fontsize='x-small')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We consider now the case where some of the observations have large errors.\nSuch elements are generally referred to as *outliers* and can affect the\nquality of the least-squares solution if not treated with care. In this\nexample we will see how using a L1 solver such as\n:py:func:`pylops.optimization.sparsity.IRLS` can drammatically improve the\nquality of the estimation of intercept and gradient.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Add outliers\nyn[1] += 40\nyn[N-2] -= 20\n\n# IRLS\nnouter = 20\nepsR = 1e-2\nepsI = 0\ntolIRLS = 1e-2\n\nxnest = PRop / yn\nxirls, nouter, xirls_hist, rw_hist = \\\n    pylops.optimization.sparsity.IRLS(PRop, yn, nouter, threshR=False,\n                                      epsR=epsR, epsI=epsI,\n                                      tolIRLS=tolIRLS, returnhistory=True)\nprint('IRLS converged at %d iterations...' % nouter)\n\nplt.figure(figsize=(5, 7))\nplt.plot(t, PRop*x, 'k', lw=4,\n         label=r'true: $x_0$ = %.2f, $x_1$ = %.2f, '\n               r'$x_2$ = %.2f, $x_3$ = %.2f' % (x[0], x[1], x[2], x[3]))\nplt.plot(t, PRop*xnest, '--r', lw=4,\n         label=r'L2: $x_0$ = %.2f, $x_1$ = %.2f, '\n               r'$x_2$ = %.2f, $x_3$ = %.2f' % (xnest[0], xnest[1], xnest[2], xnest[3]))\nplt.plot(t, PRop*xirls, '--g', lw=4,\n         label=r'IRLS: $x_0$ = %.2f, $x_1$ = %.2f, '\n               r'$x_2$ = %.2f, $x_3$ = %.2f' % (xirls[0], xirls[1], xirls[2], xirls[3]))\nplt.scatter(t, y, c='r', s=70)\nplt.scatter(t, yn, c='g', s=70)\nplt.legend(fontsize='x-small')"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}