6  General functions

6.1 Derivative matrix

A function \(\fb:\Real^n\to\Real^m\) is differentiable at \(\xb\in\Real^n\) if there exists an \(m\times n\) matrix \(M\) and a map \(\Rb:\Real^n\to\Real^m\) such that \[ \fb(\xb + \hb) = \fb(\xb) + M\hb + \Rb(\hb) \quad \textrm{for all $\hb\in\Real^n$} \] and \[ \lim_{\hb\to\bfzero}\frac{\Rb(\hb)}{|\hb|} = \bfzero. \]

This definition says that \(\fb\) looks locally like a linear map. The second condition ensures that the nonlinear remainder \(\Rb(\hb)\) goes to zero faster than linear, as you look closer and closer to \(\xb\).

Meaning of differentiability.

The definition should be familiar from Calculus I, where you saw the case \(m=1\). A function \(\fb:\Real^n\to\Real^m\) is effectively just \(m\) functions from \(\Real^n\to\Real\), given by the components \(f_1, f_2, \ldots f_m\) in the standard basis.

If \(M\) exists then it is called the (total) derivative of \(\fb\).

For any \(\hb\), we have \[\begin{align*} \fb(\xb+\hb) &= \fb(2+h_1, 1+h_2)\\ &= \begin{pmatrix}h_1-h_2+1\\h_1 + 2h_2 + 4\end{pmatrix}\\ &= \begin{pmatrix}1\\4\end{pmatrix} + \begin{pmatrix}1 & -1\\1 & 2\end{pmatrix}\begin{pmatrix}h_1\\h_2\end{pmatrix}\\ &= \fb(2,1) + \begin{pmatrix}1 & -1\\1 & 2\end{pmatrix}\begin{pmatrix}h_1\\h_2\end{pmatrix}\\ &= \fb(\xb) + M\hb \quad \textrm{with}\quad M = \begin{pmatrix}1 & -1\\1 & 2\end{pmatrix}. \end{align*}\] Since \(\fb\) is linear, the remainder term is \(\Rb(\hb) = \bfzero\) (identically), so \(\fb\) is differentiable at this point.

Proposition 6.1 If \(\fb:\Real^n\to\Real^m\) is differentiable, then the derivative \(M\) is given by the Jacobian matrix \[ J_{\fb} = \begin{pmatrix} \Big| & \Big| & & \Big|\\ \displaystyle\ddy{\fb}{x_1} & \displaystyle\ddy{\fb}{x_2} & \cdots & \displaystyle\ddy{\fb}{x_n}\\ \Big| & \Big| & & \Big| \end{pmatrix} = \begin{pmatrix} \displaystyle\ddy{f_1}{x_1} & \displaystyle\ddy{f_1}{x_2} & \cdots & \displaystyle\ddy{f_1}{x_n}\\ \displaystyle\ddy{f_2}{x_1} & \displaystyle\ddy{f_2}{x_2} & \cdots & \displaystyle\ddy{f_2}{x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \displaystyle\ddy{f_m}{x_1} & \displaystyle\ddy{f_m}{x_2} & \cdots & \displaystyle\ddy{f_m}{x_n}\\ \end{pmatrix}. \]

Proof. Suppose we take \(\hb = h\eb_j\). Then since \(\fb\) is differentiable at \(\xb\), we must have \[\begin{align*} \lim_{h\to 0}\frac{\Rb(\hb)}{h} = 0 &\iff \lim_{h\to 0}\frac{1}{h}\Big(\fb(\xb + h\eb_j) - \fb(\xb) - hM\eb_j\Big) = 0\\ &\iff \lim_{h\to 0}\frac{\fb(\xb + h\eb_j) - \fb(\xb)}{h} - M\eb_j = 0\\ &\iff \ddy{\fb}{x_j}(\xb) - M\eb_j = 0\\ &\iff \Big[\textrm{column $j$ of $M$}\Big] = \ddy{\fb}{x_j}(\xb). \end{align*}\]

Here we have \(n=m=2\), so \(J_{\fb}\) will be a \(2\times 2\) matrix. We have \[ J_{\fb} = \begin{pmatrix} \displaystyle\ddy{f_1}{x} & \displaystyle\ddy{f_1}{y}\\ \displaystyle\ddy{f_2}{x} & \displaystyle\ddy{f_2}{y} \end{pmatrix} = \begin{pmatrix} 1 & -1\\ 1 & 2 \end{pmatrix}. \] Indeed this is the same matrix we found before.

Assuming \(f\) is differentiable, Proposition 6.1 gives \[ J_f = \begin{pmatrix}\displaystyle\ddy{f}{x} & \displaystyle\ddy{f}{y}\end{pmatrix} = \begin{pmatrix}2xy & x^2\end{pmatrix} \quad \implies J_f(0,0) = \begin{pmatrix}0 & 0\end{pmatrix}. \] So for small \(\hb\) and \(\xb=(0,0)\), we expect \[ f(\xb + \hb) \approx f(\xb) + J_f\hb = \begin{pmatrix}0\\ 0\end{pmatrix} + \begin{pmatrix}0 & 0\end{pmatrix}\begin{pmatrix}h_1\\h_2\end{pmatrix} = 0. \] In other words, if we plot the scalar field as a surface with height \(z=f(x,y)\) and zoom in around \(\xb=(0,0)\), it should look more and more like a flat plane \(z=0\). We can check this with http://www.desmos.com: (press + repeatedly!)

Note that \(J_f\) for a scalar field \(f:\Real^n\to\Real\) is just the transpose of \(\grad f\). For a vector field \(\fb:\Real^n\to\Real^n\), we have that \(\grad\cdot\fb = \mathrm{Tr}(J_\fb)\), where trace means the sum of diagonal entries in the matrix. For \(n=3\), the components of \(\grad\times\fb\) are combinations of the off-diagonal entries.

Again, assuming \(\fb\) is differentiable, Proposition 6.1 gives \[ J_{\fb} = \begin{pmatrix}\displaystyle\ddy{f_1}{x} & \displaystyle\ddy{f_1}{y} & \displaystyle\ddy{f_1}{z}\\ \displaystyle\ddy{f_2}{x} & \displaystyle\ddy{f_2}{y} & \displaystyle\ddy{f_2}{z}\end{pmatrix} = \begin{pmatrix}y^2-2xz & 2xy & -x^2\\ 3y & 3x & 3z^2\end{pmatrix}. \] So we expect \[\begin{align*} \fb(\xb+\hb) &\approx \fb(\xb) + J_f(\xb)\hb\\ &= \fb(2,-1,3) + \begin{pmatrix}-11 & -4 & -4\\ -3 & 6 & 27 \end{pmatrix} \begin{pmatrix} h_1\\h_2\\h_3 \end{pmatrix}\\ &= \begin{pmatrix}-10\\21\end{pmatrix} + \begin{pmatrix}-11h_1 - 4h_2 - 4h_3\\-3h_1 + 6h_2 + 27h_3\end{pmatrix} = \begin{pmatrix}-10 -11h_1 - 4h_2 - 4h_3\\21 -3h_1 + 6h_2 + 27h_3\end{pmatrix}. \end{align*}\] For example, with \(\hb = (0.1, 0.1, 0.1)\), this predicts \[ \fb(2.1, -0.9, 3.1) \approx \begin{pmatrix}-11.9\\24\end{pmatrix}, \] which compares well with the true value of the nonlinear function: \(\fb(2.1,-0.9,3.1) = \begin{pmatrix}-11.97\\24.12\end{pmatrix}\).

[Notice that the sizes of the entries in \(J_f\) also tell you that \(f_1\) is most sensitive to changes in \(x\), while \(f_2\) is most sensitive to changes in \(z\).]

This local linear approximation for a differentiable function \(\fb\) near a point \(\xb\in\Real^n\) is called its linearisation.

Linearisation is an incredibly useful technique in applied (and pure) mathematics, because linear functions are much easier to deal with than nonlinear functions (as we shall see ourselves).

6.2 Continuous differentiability

You might think that existence of \(J_\fb\), or in other words existence of all partial derivatives, would guarantee \(\fb\) to be differentiable. But unfortunately, the converse of Proposition 6.1 fails!

Here is a counter-example.

This time if we zoom in, we don’t see a flatter and flatter surface:

To prove this, consider first whether the partial derivatives exist at \((0,0)\). (If not, then \(f\) would definitely not be differentiable by Proposition 6.1.)

We need to use the limit definition, which gives \[ \ddy{f}{x}(0,0) = \lim_{h\to 0}\frac{f(h,0)-f(0,0)}{h} = 0, \qquad \ddy{f}{y}(0,0) = \lim_{h\to 0}\frac{f(0,h)-f(0,0)}{h} = 0. \] So \(J_f(0,0) = \begin{pmatrix}0 & 0\end{pmatrix}\), in particular it exists. [Graphically, because \(f=0\) along each of the coordinate axes.]

To show that \(f\) is not differentiable, we just need to find one possible sequence of \(\hb\) where the remainder doesn’t vanish faster than \(|\hb|\). Try a diagonal path \(\hb=h(\eb_1+\eb_2)\). Then \[\begin{align*} \lim_{\hb\to \bfzero}\frac{R(\hb)}{|\hb|} &= \lim_{\hb\to\bfzero}\frac{1}{\sqrt{2}|h|}\Big(f(h_1,h_2) - f(0,0) - J_f(0,0)\hb\Big)\\ &= \lim_{h\to 0}\frac{f(h_1,h_2)}{\sqrt{2}|h|} = \lim_{h\to 0}\frac{h^{2/3}}{\sqrt{2}|h|} = \infty. \end{align*}\] So \(f\) is not differentiable at \((0,0)\). [In particular, we cannot locally approximate \(f\) by a linear function.]

A function \(\fb:\Real^n\to\Real^m\) is continuously differentiable at \(\xb\in\Real^n\) if all of its partial derivatives \(\displaystyle\ddy{f_i}{x_j}\) exist and are continuous at \(\xb\). Alternatively, we say that \(\fb\) is (of differentiability class) \(C^1\).

We have already seen this restriction in many of the results for scalar and vector fields.

More generally, \(f\) is of class \(C^k\) if all of its partial derivatives up to and including order \(k\) exist and are continuous functions. A \(C^\infty\) function is also known as smooth.

For example, swapping the order of partial derivatives \(\displaystyle\ddy{^2f}{x\partial y} = \ddy{^2f}{y\partial x}\) requires \(f\) to be (locally) \(C^2\).

Theorem 6.1 If \(\fb:\Real^n\to\Real^m\) is continuously differentiable near \(\xb\in\Real^n\), then \(\fb\) is differentiable at \(\xb\).

Proof (sketch). For clarity, consider \(f:\Real^2\to\Real\). Differentiability at \(\xb\) requires that \[\begin{align*} f(\xb + \hb) &\approx f(\xb) + \begin{pmatrix}\displaystyle\ddy{f}{x}(\xb) & \displaystyle\ddy{f}{y}(\xb)\end{pmatrix}\begin{pmatrix}h_1\\h_2\end{pmatrix}\\ &= f(\xb) + h_1\ddy{f}{x}(\xb) + h_2\ddy{f}{y}(\xb). \quad (\dagger) \end{align*}\]

The partial derivatives tell us what happens if we change one coordinate at a time.

Sketch for partial derivatives.

In particular, existence of \(\displaystyle\ddy{f}{x}\) at \(\xb\) gives us that \[ f(\xb + h_1\eb_1) \approx f(\xb) + h_1\ddy{f}{x}(\xb). \] Then, assuming \(\displaystyle\ddy{f}{y}\) exists at this point \(\xb+h_1\eb_1\), we also have \[\begin{align*} f(\xb + \hb) &\approx f(\xb + h_1\eb_1) + h_2\ddy{f}{y}(\xb + h_1\eb_1)\\ &\approx f(\xb) + h_1\ddy{f}{x}(\xb) + h_2\ddy{f}{y}(\xb + h_1\eb_1). \end{align*}\] For this to equal \((\dagger)\), we need that \(\displaystyle\ddy{f}{y}(\xb+h_1\eb_1)\to\ddy{f}{y}(\xb)\) as \(h_1\to 0\).

In other words, we need \(\displaystyle\ddy{f}{y}\) to be continuous, not merely to exist.

[A fully rigorous proof needs to deal explicitly with the remainder terms, but this is the intuition behind it.]


We have \[ \textrm{continuously differentiable} \implies \textrm{differentiable} \implies \textrm{partials exist}, \] but neither of the converse statements are true.

We saw that \(J_f(0,0) = \begin{pmatrix}0 & 0\end{pmatrix}\). Moreover, for \(x\neq 0\) and \(y\neq 0\), we have \[ J_f(x,y) = \begin{pmatrix}\displaystyle\ddy{f}{x} & \displaystyle\ddy{f}{y}\end{pmatrix} = \begin{pmatrix}\displaystyle\frac13 x^{-2/3}y^{1/3} & \displaystyle\frac13 x^{1/3}y^{-2/3}\end{pmatrix}, \] which are continuous functions.

The problem comes from the behaviour along the coordinate axes for \(\xb\neq\bfzero\). For example, at a point \(y>0\) on the \(y\)-axis, we have \[ \lim_{h\to 0}\frac{f(h,y) - f(0,y)}{h} = \lim_{h\to 0}\frac{h^{1/3}y^{1/3} - 0}{h} = y^{1/3}\lim_{h\to 0}\frac{1}{h^{2/3}} = \infty, \] so that \(\displaystyle\ddy{f}{x}\) doesn’t exist.

We cannot find an (open) neighbourhood of \(\xb=\bfzero\) where the partial derivatives exist, so \(f\) is not \(C^1\) at \(\xb=\bfzero\).

(i) It is continuously differentiable where \(\displaystyle\ddy{f}{x}, \ddy{f}{y}\) are continuous.

For \(x>2\), we have \(f(x,y)=y(x-2)\) so \(\displaystyle\ddy{f}{x}=y, \ddy{f}{y}=x-2.\)

For \(x<2\), we have \(f(x,y)=-y(x-2)\) so \(\displaystyle\ddy{f}{x}=-y, \ddy{f}{y}=-x+2.\)

For \(x=2\), we need to be more careful. We have \[ \left.\ddy{f}{x}\right|_{x=2} = \lim_{h\to 0}\frac{f(2+h,y) - f(2,y)}{h} = \lim_{h\to 0}\frac{y|h|}{h}. \] For \(y\neq 0\), the limits for \(h<0\) and \(h>0\) are different, so the limit does not exist and this partial derivative does not exist for \(x=2\), \(y\neq 0\). If \(y=0\) then the limit exists and is equal to \(0\). However, this partial derivative is not continuous because it does not exist anywhere except a single point on the line \(x=2\).

Therefore, \(f\) is continuously differentiable everywhere except the line \(x=2\).

(ii) By Theorem 6.1, we immediately know that \(f\) is differentiable everywhere except on the line \(x=2\). For \(x=2\) and \(y\neq 0\), it cannot be differentiable because \(J_f\) does not exist. But it might be differentiable at \((2,0)\). Note that \[ \left.\ddy{f}{y}\right|_{(2,0)} = \lim_{h\to 0}\frac{f(2,y+h)-f(2,y)}{h} = 0, \quad \textrm{so} \quad J_f(2,0) = \begin{pmatrix}0 & 0\end{pmatrix}. \] Writing \(\hb=h_1\eb_1 + h_2\eb_2\), we have \[ \lim_{\hb\to\bfzero}\frac{R(\hb)}{|\hb|} = \lim_{|\hb|\to 0}\frac{1}{|\hb|}\Big(f(2+h_1,h_2) - f(2,0) - 0\Big)= \lim_{|\hb|\to 0}\frac{h_2|h_1|}{|\hb|}. \] But \[ \left|\frac{h_2|h_1|}{|\hb|}\right| \leq \left|\frac{|\hb|^2}{|\hb|}\right| = |\hb| \quad \implies \lim_{|\hb|\to 0}\frac{h_2|h_1|}{|\hb|} = 0. \] Therefore \(f\) is differentiable for \(x\neq 2\) and also at the point \((2,0)\).

6.3 Chain rule

Now we know that the derivative of a function is a matrix, the chain rule for any number of variables and unknowns becomes easy to write down and remember.

Schematic of the Chain Rule.

Theorem 6.2 (Chain Rule) If \(\gb:\Real^k\to\Real^n\) is differentiable at \(\xb\in\Real^k\) and \(\fb:\Real^n\to\Real^m\) is differentiable at \(\gb(\xb)\in\Real^n\), then \[ J_{\fb\circ\gb}(\xb) = J_{\fb}(\gb(\xb))\,J_\gb(\xb). \]

The right-hand side here is a matrix product!

If \(n=m=1\), this reduces to \[ (f\circ g)'(x) = f'\big(g(x)\big)\,g'\big(x\big) \] which is just the familiar chain rule from single-variable calculus. For example, if \(f(x)=\sin{x}\) and \(g(x)=x^2\), then \((f\circ g)(x) = \sin{x^2},\) and \(f'\big(g(x)\big) = \cos{x^2}\).

Proof. From the assumptions of differentiability, it follows for any \(\hb\in\Real^k\) that \[\begin{align*} (\fb\circ\gb)(\xb+\hb) &= \fb\big(\gb(\xb + \hb)\big)\\ &= \fb\big(\gb(\xb) + J_\gb(\xb)\hb + \Rb_\gb(\hb)\big)\\ &= \fb\big(\gb(\xb)\big) + J_\fb\big(\gb(\xb)\big)\Big(J_\gb(\xb)\hb + \Rb_\gb(\hb)\Big) + \Rb_\fb\big(J_\gb(\xb)\hb + \Rb_\gb(\hb)\big)\\ &= \fb\big(\gb(\xb)\big) + J_\fb\big(\gb(\xb)\big)\,J_\gb(\xb)\hb + \textrm{remainder terms}. \end{align*}\] It follows from the assumed differentiability of \(\fb\) and \(\gb\) that the combined remainder terms vanish faster than \(|\hb|\) as \(\hb\to\bfzero\).


When applying the chain rule, be very careful with notation, and in particular keep track of where the matrices are evaluated.

First, compute the individual derivative matrices with Proposition 6.1: \[ J_\gb(x,y) = \begin{pmatrix}\displaystyle\ddy{g_1}{x} & \displaystyle\ddy{g_1}{y}\\ \displaystyle\ddy{g_2}{x} & \displaystyle\ddy{g_2}{y}\end{pmatrix} = \begin{pmatrix}2 & -1\\-1 & 1\end{pmatrix} \] and \[ J_\fb(u,v) = \begin{pmatrix}\displaystyle\ddy{f_1}{u} & \displaystyle\ddy{f_1}{v}\\ \displaystyle\ddy{f_2}{u} & \displaystyle\ddy{f_2}{v}\end{pmatrix} = \begin{pmatrix}2u & 2v\\v & u\end{pmatrix}. \] So the chain rule gives \[\begin{align*} J_{\fb\circ\gb}(x,y) = J_\fb\big(\gb(x,y)\big)\,J_\gb(x,y) &= \begin{pmatrix}2g_1 & 2g_2\\ g_2 & g_1\end{pmatrix}\begin{pmatrix}2 & -1\\-1 & 1\end{pmatrix}\\ &= \begin{pmatrix}4g_1 - 2g_2 & -2g_1 + 2g_2\\ 2g_2-g_1 & -g_2+g_1 \end{pmatrix} = \begin{pmatrix}10x -6y& -6x+4y\\-4x+3y&3x-2y\end{pmatrix}. \end{align*}\]

We should get the same answer if we first work out \(\fb\circ\gb = \displaystyle\begin{pmatrix}(2x-y)^2 + (y-x)^2\\(2x-y)(y-x)\end{pmatrix},\) then compute the partial derivatives of each component directly.

We can use Theorem 6.2 to derive the appropriate chain rule formula for any \(n\) and \(m\).

The formula was \[ \frac{\mathrm{d}f}{\mathrm{d}t} = \ddy{f}{x}\frac{\mathrm{d}x}{\mathrm{d}t} + \ddy{f}{y}\frac{\mathrm{d}y}{\mathrm{d}t} + \ddy{f}{z}\frac{\mathrm{d}z}{\mathrm{d}t}, \] where \(\xb(t) : \Real\to\Real^3\) was a parametrised curve and \(f(\xb):\Real^3\to\Real\) was a scalar field.

The left-hand side is the \(1\times 1\) matrix \(J_{f\circ\xb}(t) = \begin{pmatrix}\displaystyle\frac{\mathrm{d}(f\circ\xb)}{\mathrm{d}t}\end{pmatrix}\).

The right-hand side is a product of a \(1\times 3\) matrix and a \(3\times 1\) matrix: \[ J_f\big(\xb(t)\big)\,J_{\xb}(t) = \left.\begin{pmatrix}\displaystyle\ddy{f}{x} & \displaystyle\ddy{f}{y} & \displaystyle\ddy{f}{z}\end{pmatrix}\right|_{\xb(t)}\begin{pmatrix}\displaystyle\frac{\mathrm{d}x}{\mathrm{d}t}\\ \displaystyle\frac{\mathrm{d}y}{\mathrm{d}t}\\ \displaystyle\frac{\mathrm{d}z}{\mathrm{d}t}\end{pmatrix}, \] which multiplies out to the given formula.

The formulae were \[ \ddy{\xb}{u} = \ddy{\xb}{\mu}\ddy{\mu}{u} + \ddy{\xb}{\nu}\ddy{\nu}{u}, \qquad \ddy{\xb}{v} = \ddy{\xb}{\mu}\ddy{\mu}{v} + \ddy{\xb}{\nu}\ddy{\nu}{v}, \] where \(\xb(\mu, \nu) : \Real^2\to\Real^3\) was a parametrisation of a surface, related to an alternative parametrisation \(\xb(u, v)\) by some “change of coordinates” mapping \((\mu, \nu) = \gb(u,v) : \Real^2\to\Real^2\).

In this case, the composite map \(\xb\circ\gb\) goes from \(\Real^2\to\Real^3\) so the left-hand side of Theorem 6.2 is a \(3\times 2\) matrix \[ J_{\xb\circ\gb}(u,v) = \begin{pmatrix} \displaystyle\ddy{x}{u} & \displaystyle\ddy{x}{v}\\ \displaystyle\ddy{y}{u} & \displaystyle\ddy{y}{v}\\ \displaystyle\ddy{z}{u} & \displaystyle\ddy{z}{v} \end{pmatrix} = \begin{pmatrix} \Big| & \Big|\\ \displaystyle\ddy{\xb}{u} & \displaystyle\ddy{\xb}{v}\\ \Big| & \Big| \end{pmatrix}. \] The right-hand side has the form \[ J_{\xb}\big(\gb(u,v)\big)\,J_{\gb}(u,v) = \begin{pmatrix} \displaystyle\ddy{x}{\mu} & \displaystyle\ddy{x}{\nu}\\ \displaystyle\ddy{y}{\mu} & \displaystyle\ddy{y}{\nu}\\ \displaystyle\ddy{z}{\mu} & \displaystyle\ddy{z}{\nu} \end{pmatrix}\begin{pmatrix} \displaystyle\ddy{\mu}{u} & \displaystyle\ddy{\mu}{v}\\ \displaystyle\ddy{\nu}{u} & \displaystyle\ddy{\nu}{v} \end{pmatrix}. \] Multiplying out these matrices gives our original formulae.

6.4 Inverse functions

A function \(\fb:\Real^n\to\Real^n\) has an inverse \(\fb^{-1}\) if and only if, for all \(\xb\in\Real^n\), \[ (\fb^{-1}\circ\fb)(\xb) = \xb \quad \textrm{and} \quad (\fb\circ\fb^{-1})(\xb) = \xb. \]

Inverse of a function.

It only makes sense to discuss an inverse when the source and target dimension are the same (\(m=n\)).

We can write \(\fb\) as \[ \fb(x,y) = \begin{pmatrix}1 & -1\\1 & 2\end{pmatrix}\begin{pmatrix}x\\y \end{pmatrix}. \] It follows that the inverse is \[ \fb^{-1}(x,y) = \begin{pmatrix}1 & -1\\1 & 2\end{pmatrix}^{-1}\begin{pmatrix}x\\ y \end{pmatrix} = \frac{1}{3}\begin{pmatrix}2 & 1\\ -1 & 1\end{pmatrix}\begin{pmatrix}x\\ y \end{pmatrix}. \]

Notice in Tip 6.11 that \(J_{\fb^{-1}}\) was the matrix inverse of \(J_{\fb}\). This remains true for nonlinear functions!

Proposition 6.2 If \(\fb:\Real^n\to\Real^n\) has an inverse and both \(\fb,\fb^{-1}\) are differentiable, then \[ J_{\fb^{-1}}\big(\yb\big) = (J_\fb)^{-1}(\xb), \quad \textrm{where} \quad \yb=\fb(\xb). \]

Proof. Firstly, taking the derivative of both sides of \((\fb^{-1}\circ\fb)(\xb) = \xb\), we get \[ J_{\fb^{-1}\circ\fb}(\xb) = I \quad \implies J_{\fb^{-1}}\big(\fb(\xb)\big)\,J_\fb(\xb) = I \quad \implies J_{\fb^{-1}}(\yb)\,J_\fb(\xb) = I,\\ \] by the chain rule (Theorem 6.2).

On the other hand, taking the derivative of both sides of \((\fb\circ\fb^{-1})(\yb)=\yb\), with respect to \(\yb\), leads to \[ J_{\fb\circ\fb^{-1}}(\yb) = I \quad \implies J_{\fb}\big(\fb^{-1}(\yb)\big)\,J_{\fb^{-1}}(\yb) = I \quad \implies J_{\fb}(\xb)\,J_{\fb^{-1}}(\yb) = I. \]

It follows that \(J_{\fb^{-1}}(\yb)\) is the inverse of \(J_\fb(\xb)\).


When \(n=1\), Proposition 6.2 reduces to the inverse function rule from Calculus I, namely \[ (f^{-1})'(y) = \frac{1}{f'(x)}, \quad \textrm{where} \quad y=f(x). \] The reciprocal derivative arises because \(f^{-1}\) is the reflection of \(f\) in the line \(y=x\):

Inverse function as a reflection.

Notice that \(f^{-1}(x)\) is not differentiable (infinite slope) at a point \(x=f(a)\) where the corresponding point \(x=a\) is a turning point \(f'(a)=0\).

We have \(J_\fb(u,v) = \displaystyle\begin{pmatrix}2u & 2v\\v & u\end{pmatrix}\), so by Proposition 6.2, \[ J_{\fb^{-1}}(\yb) = (J_{\fb})^{-1}(2,1) = \displaystyle\frac{1}{2(u^2-v^2)}\left.\begin{pmatrix}u & -2v\\-v & 2u\end{pmatrix}\right|_{(2,1)} = \frac{1}{6}\begin{pmatrix}2 & -2\\-1 & 4\end{pmatrix}. \]

Since we know \(J_{\fb^{-1}}\), we can reconstruct the linearisation of \(\fb^{-1}\), even though we don’t know the actual function(!) In particular, assuming differentiability, we know that \[\begin{align*} \fb^{-1}(\yb + \hb) &\approx \fb^{-1}(\yb) + J_{\fb^{-1}}(\yb)\hb\\ &= \xb + (J_\fb)^{-1}(\xb)\hb \quad \textrm{where $\xb=\fb^{-1}(\yb)$}\\ &= \begin{pmatrix}2\\1\end{pmatrix} + \frac{1}{6}\begin{pmatrix}2 & -2\\-1 & 4\end{pmatrix}\begin{pmatrix}h_1\\h_2\end{pmatrix}. \end{align*}\] In other words, we’ve found an approximation for the (nonlinear) inverse function \(\fb^{-1}(\yb)\) near the point \(\yb=(5,2)\).

More sophisticated analysis yields the following important theorem.

Theorem 6.3 (Inverse Function Theorem) Let \(\fb:\Real^n\to\Real^n\) be a \(C^1\) function. Then \(\fb\) has a (local) differentiable inverse near \(\yb=\fb(\xb)\) if and only if it has an invertible derivative matrix \(J_\fb(\xb)\) at the point \(\xb\).

Proof. Omitted. (The main content of the theorem is to show that the remainder terms behave.)


In fact, invertibility of \(J_\fb(\xb)\) implies that \(\fb^{-1}\) is not only differentiable but continuously differentiable (\(C^1\)).

Some terminology: a function \(\fb:\Real^n\to\Real^n\) satisying Theorem 6.3 is called a (local) diffeomorphism. It is called orientation-preserving if \(\det\,J_\fb(\xb) > 0\) or orientation-reversing if \(\det\,J_\fb(\xb) < 0\).

Observe that this is just the mapping from polar coordinates to Cartesian coordinates. We have \[ J_\fb(r,\theta) = \begin{pmatrix} \displaystyle\ddy{f_1}{r} & \displaystyle\ddy{f_1}{\theta}\\ \displaystyle\ddy{f_2}{r} & \displaystyle\ddy{f_2}{\theta} \end{pmatrix}= \begin{pmatrix} \cos\theta & -r\sin\theta\\ \sin\theta & r\cos\theta \end{pmatrix} \quad \implies \det\,J_\fb(r,\theta) = r. \] So \(\fb\) is a diffeomorphism for \(r>0\) but not at the point \(r=0\). Moreover, this diffeommorphism is orientation-preserving.

Graphically, \(r=0\) is a singularity where a line maps to a single point, so the function fails to be bijective (there is no well-defined preimage of \(\fb=\bfzero\)). Elsewhere, the fact that it is orientation-preserving means that it preserves the order of vertices \(ABCD\) of a rectangle.

The polar coordinate mapping.

Remark. From Proposition 6.2 we know the derivative of \(\fb^{-1}\), which is \[ J_{\fb^{-1}}(f_1,f_2) = J_{\fb}^{-1}(r,\theta) = \begin{pmatrix} \cos\theta & r\sin\theta\\ -\displaystyle\frac{\sin\theta}{r} & \displaystyle\frac{1}{r}\cos\theta \end{pmatrix} \] Therefore, \[ \ddy{r}{x} = \cos\theta, \quad \ddy{r}{y}=\sin\theta, \quad \ddy{\theta}{x}=-\frac{\sin\theta}{r}, \quad \ddy{\theta}{y} = \frac{\cos\theta}{r}. \] In fact, the inverse function is \(\fb^{-1}(f_1,f_2) = \displaystyle\begin{pmatrix}\sqrt{f_1^2 + f_2^2}\\ \arctan(f_2/f_1)\end{pmatrix}.\)

[You can check that the derivatives of this function agree with those we just calculated.]

6.5 Implicit functions

Suppose a curve is given implicitly by \(F(x,y)=0\). When can we solve for \(y=y(x)\)?

Suppose this is true. Then applying the chain rule to \(F\big(x, y(x)\big)=0\) gives \[ \ddy{F}{x} + \ddy{F}{y} y'(x) = 0 \quad \iff\quad y'(x) = -\displaystyle\ddy{F}{x}\bigg/\displaystyle\ddy{F}{y}. \] So there is a problem when \(\displaystyle\ddy{F}{y}=0\).

Let \(F(x,y)=x^2+y^2-1\). Then \(\displaystyle\ddy{F}{y} = 2y\). So we can describe the circle by a single-valued, differentiable function \(y(x)\) only for \(y\neq 0\).

Circle.

The derivative of the implicit function is \[ y'(x) = -\ddy{F}{x}\bigg/\ddy{F}{y} = -\frac{2x}{2y} = -\frac{x}{y}. \]

[For \(y>0\) we have \(y(x)=\sqrt{1-x^2}\), whereas in \(y<0\) we have \(y(x)=-\sqrt{1-x^2}\) (and you can check that \(y'(x)=-x/y\) in either case). But there is no differentiable function \(y(x)\) that works around both \((-1,0)\) and \((1,0)\).]

This idea generalises as follows.

Theorem 6.4 (Implicit Function Theorem) Given a \(C^1\) function \(\Fb:\Real^{n+m}\to\Real^m\), let \(\xb\in\Real^n\) and \(\yb\in\Real^m\). Solutions to the system of \(m\) equations \(\Fb(\xb,\yb)=\bfzero\) near a solution point \((\xb,\yb)=(\xb_a,\yb_a)=\ab\in\Real^{n+m}\) can be written as an implicit function \[ \yb = \yb(\xb) \quad \textrm{if} \quad \begin{vmatrix}\Big| & \Big| & & \Big|\\ \displaystyle\ddy{\Fb}{y_1}(\ab) & \displaystyle\ddy{\Fb}{y_2}(\ab) & \cdots & \displaystyle\ddy{\Fb}{y_m}(\ab)\\ \Big| & \Big| & & \Big| \end{vmatrix} \neq 0. \]

A “solution point” means some particular point \(\ab\) where \(\Fb(\ab)=\bfzero\). The important thing is to have the same number of equations as \(\yb\) variables, in this case \(m\) of them.

Proof (not rigorous). Since \(\Fb\) is differentiable, we have near to \(\ab\) that \[ \Fb(\xb,\yb) \approx \Fb(\xb_a,\yb_a) + J_\Fb(\ab)\begin{pmatrix}\xb-\xb_a\\ \yb-\yb_a\end{pmatrix}. \] But we want \(\Fb(\xb,\yb)=\bfzero\), and also by definition \(\Fb(\xb_a,\yb_a)=\bfzero\). So \[\begin{align*} &J_\Fb(\ab)\begin{pmatrix}\xb-\xb_a\\ \yb-\yb_a\end{pmatrix} \approx \bfzero\\ &\iff \begin{pmatrix} \Big| & & \Big| & \Big| & & \Big|\\ \displaystyle\ddy{\Fb}{x_1} & \cdots & \displaystyle\ddy{\Fb}{x_n} & \displaystyle\ddy{\Fb}{y_1} & \cdots &\displaystyle\ddy{\Fb}{y_m}\\ \Big| & & \Big| & \Big| & & \Big| \end{pmatrix} \begin{pmatrix} |\\ \xb-\xb_a\\ |\\ |\\ \yb-\yb_a\\ | \end{pmatrix} \approx \bfzero\\ &\iff\begin{pmatrix} \Big| & & \Big| \\ \displaystyle\ddy{\Fb}{x_1} & \cdots & \displaystyle\ddy{\Fb}{x_n}\\ \Big| & & \Big| \end{pmatrix}\begin{pmatrix} |\\ \xb-\xb_a\\ |\\ \end{pmatrix} + \begin{pmatrix} \Big| & & \Big| \\ \displaystyle\ddy{\Fb}{y_1} & \cdots & \displaystyle\ddy{\Fb}{y_m}\\ \Big| & & \Big| \end{pmatrix}\begin{pmatrix} |\\ \yb-\yb_a\\ |\\ \end{pmatrix} \approx \bfzero\\ &\iff \yb \approx \yb_a - \begin{pmatrix} \Big| & & \Big| \\ \displaystyle\ddy{\Fb}{y_1} & \cdots & \displaystyle\ddy{\Fb}{y_m}\\ \Big| & & \Big| \end{pmatrix}^{-1} \begin{pmatrix} \Big| & & \Big| \\ \displaystyle\ddy{\Fb}{x_1} & \cdots & \displaystyle\ddy{\Fb}{x_n}\\ \Big| & & \Big| \end{pmatrix}\begin{pmatrix} |\\ \xb-\xb_a\\ |\\ \end{pmatrix}. \end{align*}\] The right-hand side is a function only of \(\xb\), not \(\yb\) (remember that the matrices are just full of numbers because they are evaluated at \(\ab\)). We see that this works provided the \(\yb\) part of \(J_\Fb(\ab)\) is invertible.

[A fully rigorous proof would need to take proper care of the remainder terms.]


Notice how this reduces to the single-variable formula when \(m=n=1\).

There are two equations, so \(m=2\), and one desired “dependent variable” \(w\), so \(n=1\) and \[ \xb = (w), \quad \yb=\begin{pmatrix}u\\ v\end{pmatrix}, \quad \Fb(\xb,\yb) = \begin{pmatrix}uv^2 + v^2w^3 + u^5w^4-1\\ u^2w+u^2v^3+v^4w^5+1\end{pmatrix}. \]

First double-check that \(\ab\) is a solution point, otherwise Theorem 6.4 won’t apply. Indeed we have \[ \Fb(1,1,-1) = \begin{pmatrix} 1(1)^2 + 1^2(-1)^3 + 1^5(-1)^4 - 1\\ 1^2(-1) + 1^2(1)^3 + 1^4(-1)^5 + 1 \end{pmatrix} = \begin{pmatrix}0\\0\end{pmatrix}. \] Now calculate the determinant: \[ \begin{vmatrix} \displaystyle\ddy{F_1}{u}(\ab) & \displaystyle\ddy{F_1}{v}(\ab)\\ \displaystyle\ddy{F_2}{u}(\ab) & \displaystyle\ddy{F_2}{v}(\ab) \end{vmatrix} = \begin{pmatrix} v^2 + 5u^4w^4 & 2uv + 2vw^3\\ 2uw + 2uv^3 & 3u^2v^2 + 4v^3w^5 \end{pmatrix} _{(1,1,-1)} = \begin{vmatrix} 6 & 0\\ 0 & -1 \end{vmatrix} = -6. \] This is non-zero so by Theorem 6.4 we can write \(\yb=\yb(\xb)\), meaning \(u(w)\) and \(v(w)\).

Remark. As with inverse functions, we can reconstruct the derivative \(J_\yb(\ab)\) and hence the linearisation of \(\yb(\xb)\), even though we can’t solve for the full nonlinear function. From the proof of Theorem 6.4, we have that \[\begin{align*} J_\yb(\ab) &= - \begin{pmatrix} \displaystyle\ddy{F_1}{u}(\ab) & \displaystyle\ddy{F_1}{v}(\ab) \\ \displaystyle\ddy{F_2}{u}(\ab) & \displaystyle\ddy{F_2}{v}(\ab) \end{pmatrix}^{-1}\begin{pmatrix} \displaystyle\ddy{F_1}{w}(\ab) \\ \displaystyle\ddy{F_2}{w}(\ab) \end{pmatrix}\\ &= -\begin{pmatrix} \displaystyle\frac16 & 0 \\ 0 & -1 \end{pmatrix}\begin{pmatrix} -1\\ 6 \end{pmatrix} = \begin{pmatrix} \displaystyle \frac16\\ \displaystyle 6 \end{pmatrix}. \end{align*}\] Therefore near to \((u,v)=(1,1)\), we have the linear approximations \[\begin{align*} u &\approx 1 + \displaystyle\frac16(w+1),\\ v &\approx 1 + 6(w+1). \end{align*}\]

The Inverse Function Theorem (Theorem 6.3) is really a special case of Theorem 6.4. To see this, suppose we want to find the inverse to \(\fb:\Real^n\to\Real^n\) near \(\ab\in\Real^n\). In other words, we want to find \(\yb(\xb)\) such that \(\fb(\yb)=\xb\). So set \(\Fb(\xb,\yb)=\fb(\yb)-\xb\) (here \(m=n\)). We then have \[ \begin{vmatrix} \Big| & & \Big|\\ \displaystyle\ddy{\Fb}{y_1}(\ab) & \cdots & \displaystyle\ddy{\Fb}{y_n}(\ab)\\ \Big| & & \Big| \end{vmatrix} = \begin{vmatrix} \Big| & & \Big|\\ \displaystyle\ddy{\fb}{y_1}(\ab) & \cdots & \displaystyle\ddy{\fb}{y_n}(\ab)\\ \Big| & & \Big| \end{vmatrix} = \det\,J_\fb(\ab). \] Theorem 6.4 then says that we can find such \(\yb(\xb)\) provided \(J_\fb(\ab)\) is invertible, which is Theorem 6.3.