Calculus and Derivatives ✅

import numpy as np
import matplotlib.pyplot as pl
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('svg')

1 Derivatives

1.1 Functions

  • Function can take zero or more inputs

  • Function can return a value of any dimension

Some Definitions

  • Arity of a function

    The number of arguments expected by the function is called the arity of the function. A function \(f(x, y)\) has an arity of 2. Note, each argument can be a vector over multiple dimensions. A function can have zero arity, which means that the function returns a constant value.

  • Scalar functions

  • Potential field

  • Vector field

def f1():
    return 3.1415

f1()
3.1415
def f2(x):
    return x**2 + 3*x + 15

f2(0.5)
16.75
def f3(x, y):
    return (x-1)**2 + (y-2)**2

f3(0.5, -1.5)
12.5

1.2 Derivatives

Given a function \(f:\mathbb{R}\to\mathbb{R}\), the derivative is given by:

\[ f'(x) = \lim_{h \rightarrow 0} \frac{f(x+h) - f(x)}{h} \]

There are several notations for derivatives:

\[ f'(x) = y' = \frac{dy}{dx} = \frac{df}{dx} = \frac{d}{dx} f(x) = Df(x) = D_x f(x) \]

1.3 Derivatives of elementary functions

  • \(DC = 0\) where \(C\) is a constant,
  • \(Dx^n = nx^{n-1}\). Note \(n\) need not be an integer.
  • \(De^x = e^x\),
  • \(D\ln(x) = 1/x\)
  • \(D\sin(x) = \cos(x)\)
  • \(D\cos(x) = -\sin(x)\)

1.4 Derivatives of arithmetics of functions

Scaling by constant: \(D(c\cdot f) = c\cdot(Df)\)

Addition: \(D(f + g) = Df + Dg\)

Multiplications: \(D(f\cdot g) = (Df)\cdot g + f\cdot(Dg)\)

Fractions:

\[ D\left[\frac{f}{g}\right] = \frac{g(x)\cdot Df(x) - f(x)\cdot Dg(x)}{g(x)^2} \]

1.5 Derivatives of composition of functions

Composition: consider \(f(g(x))\). Let’s define:

  • \(u = g(x)\)
  • \(y = f(u)\)

Then, we have:

\[ \frac{dy}{dx} = \frac{dy}{du}\cdot\frac{du}{dx} \]

So, we have:

\[ D(fg)(x) = (Df)(g(x))\cdot (Dg)(x) \]

This is known as the chain rule, and sometimes written as:

\[(fg)'(x) = f'(g(x))\cdot g'(x)\]

2 Understanding derivatives

2.1 What’s a derivative?

Think of the slopes as the directions to take along the \(x\)-axis in order to increase the value of \(f(x)\).

x = np.linspace(-10, 10)
y = f2(x)
pl.figure(figsize=(4,4))
pl.plot(x, y);

def f2_der(x):
    return 2*x + 3

y_der = f2_der(x)
pl.plot(x, y, '--', x, y_der);

#
# This is a line going through (x0, y0) with slope m
#
def line(x, x0, y0, m):
    b = y0 - m * x0
    return m * x + b
x = np.linspace(-10, 10)
y = f2(x)
x0, y0, m0 = -5, f2(-5), f2_der(-5)
x1, y1, m1 = 2.5, f2(2.5), f2_der(2.5)

pl.plot(x, y, '--', 
        x, line(x, x0, y0, m0), '-',
        x, line(x, x1, y1, m1));

2.2 Partial Derivative

The definition:

\[\frac{\partial y}{\partial x_i} = \lim_{h \rightarrow 0} \frac{f(x_1, \ldots, x_{i-1}, x_i+h, x_{i+1}, \ldots, x_n) - f(x_1, \ldots, x_i, \ldots, x_n)}{h}\]

It is expressed by several equivalent notations:

\[\frac{\partial y}{\partial x_i} = \frac{\partial f}{\partial x_i} = f'_{x_i} = f'_i = D_i f = D_{x_i} f\]

2.3 Gradients

Suppose \(f:\mathbb{R}^n\to\mathbb{R}\) is a potential function over a vector space \(\mathbb{R}^n\).

We write the function \(f(\mathbf{x})\) with the input vector as:

\(\mathbf{x} = (x_1, x_2, \dots, x_n)\).

\[ \nabla f(\mathbf{x}) = \left[ \begin{array}{c} \frac{\partial f(\mathbf{x})}{\partial x_1} \\ \frac{\partial f(\mathbf{x})}{\partial x_2} \\ \vdots \\ \frac{\partial f(\mathbf{x})}{\partial x_n} \end{array} \right] \]

2.4 Chain Rules

Recall the basic form of chain rule for single variable function:

  • \(y = f(u)\) where \(f:\mathbb{R}\to\mathbb{R}\)
  • \(u = g(x)\) where \(g:\mathbb{R}\to\mathbb{R}\)

Then,

\[\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} \]

Recall the basic form of chain rule for single variable function:

  • \(y = f(u_1, u_2, u_3, \dots, u_m)\)
  • \(u_i = g_i(x_1, x_2, \dots, x_n)\)

Then,

\[\frac{\partial y}{\partial x_i} = \sum_{j=1}^m \frac{\partial y}{\partial u_j}\frac{\partial u_j}{\partial x_i} \]

2.5 Gradient of potential field

def g(x, y):
    return (x-1)**2 + (y-2)**2
# Let's plot this in the 2D
def contour():
    xs = np.linspace(-4, 4, 100)
    ys = np.linspace(-4, 4, 100)
    xx, yy = np.meshgrid(xs, ys)
    z = g(xx, yy)
    pl.contour(xx, yy, z, levels=20);

pl.figure(figsize=(4,4))
contour()

from matplotlib import cm

def bowl(ax):
    xs = np.linspace(-4, 4, 100)
    ys = np.linspace(-4, 4, 100)
    xx, yy = np.meshgrid(xs, ys)
    z = g(xx, yy)
    ax.plot_surface(xx, yy, z, cmap=cm.coolwarm);
    
fig = pl.figure(figsize=(7,7))
ax = fig.add_subplot(projection='3d')

bowl(ax)

def g_der_x(x, y):
    return 2*(x-1)
def g_der_y(x, y):
    return 2*(y-2)
xs = np.linspace(-4, 4, 10)
ys = np.linspace(-4, 4, 10)
xx, yy = np.meshgrid(xs, ys)
z = g(xx, yy)

u = g_der_x(xx, yy)
v = g_der_y(xx, yy)

fig = pl.figure(figsize=(12,6))
fig.add_subplot(1, 2, 1)
contour()
pl.quiver(xx, yy, u, v);

ax = fig.add_subplot(1, 2, 2, projection='3d')
bowl(ax);

2.6 Interpretation of gradient

Let \(\mathbf{v} = \nabla f(\mathbf{x})\)

  1. The direction of \(\mathbf{v}\) tells us the direction to take to maximize the change in \(f(\mathbf{x}+\Delta\mathbf{x})\).

  2. The magnitude \(\|\mathbf{v}\|\) tells us how much change we can expect by taking a step of length 1, i.e. \(\|\Delta x\| = 1\), if the gradient at \(\mathbf{x}\) applies to the entire input space.