Section14.5The Gradient and Directional Derivatives
In the previous section, we learned how to calculate partial derivatives, which represent the rate of change of a function as we move strictly parallel to the \(x\)-axis or the \(y\)-axis. However, there are infinitely many other directions we can move on the surface. In this section, we will discuss how to take the derivative of a function in an arbitrary direction.
Partial derivatives tell us about the rate of change of a function at a point in the direction parallel to the \(x\)-axis and the \(y\)-axis. But there are a lot more directions we can move on the surface. How do we measure the rate of change of a function at a point in an arbitrary direction? Directional Derivative will tell us about it!
Directional derivatives, as the name suggests, measure the rate of change of a function at a point in a specified direction. Letβs come up with the definition of directional derivatives!
Let \(z = f(x,y)\) and \(P = (a,b,f(a,b))\) be a point on the surface. Letβs say we want to find the directional derivative of \(f\) at \(P\) in the direction of the unit vector \(\v{u} = \la h,k \ra\text{.}\) The diagram below shows this situation.
The vector \(\v{u}\) and the point \(P\) can determine a vertical plane shown in the figure. This plane intersects the surface in a curve \(\c{C}\text{.}\) Then we can for sure construct a tangent line \(T\) to the curve \(\c{C}\) at the point \(P\text{.}\) Our job here is to find the slope of this tangent line.
Yes we can! The easiest way to do so is to project \(P\) and \(Q\) onto the \(xy\)-plane, resulting in \(P' = (a,b,0)\) and \(Q' = (x,y,0)\text{,}\) respectively. Since \(\overrightarrow{P'Q'} = \la x - a, y - b \ra\) is parallel to \(\v{u}\text{,}\) then
which implies that \(x = a + th\) and \(y = b + tk\text{.}\) Hence, we can express \(Q\) as \(Q(t) = (a + th, b + tk, f(a + th, b + tk))\) (as opposed to keeping track of two new independent variables).
Now we are ready to find the slope of the line \(T\text{!}\) Recall the slope of the line is the rise over the run. Clearly, the rise is the change in the \(z\)-coordinate and the run is the change in the distance between \(P\) and \(Q\text{,}\) which is \(t\text{.}\) Then the slope of the secant line (the line passing through \(P\) and \(Q\)) is
Note14.5.3.But Richard... How is this related to the partial derivatives?
It turns out that the directional derivative is the generalization of the partial derivatives. That is, we can recover the partial derivatives from the directional derivative by choosing the appropriate direction.
Letβs say we pick the unit vector of \(\v{i} = \la 1,0 \ra\text{.}\) Then a point \(P\) along this direction \(\v{i}\) can determine a vertical plane parallel to the \(xz\)-plane. On this plane, the \(y\)-coordinate is fixed. So we expect the directional derivative to be \(f_x\text{.}\)
Similarly, if we pick the unit vector \(\v{j} = \la 0,1 \ra\text{,}\) then the vertical plane determined by \(P\) and \(\v{j}\) is parallel to the \(yz\)-plane. On this plane, the \(x\)-coordinate is fixed. So we expect the directional derivative to be \(f_y\text{.}\)
But no one likes to compute the directional derivative using the limit definition every time. So we want to find a more efficient formula for the directional derivative. The following theorem gives us a nice formula for the directional derivative in terms of the partial derivatives.
If \(f\) is differentiable at \(P = (a,b)\) and \(\v{u} = \la h,k \ra\) is a unit vector, then the directional derivative of \(f\) at \(P\) in the direction of \(\v{u}\) is given by
The proof of this theorem isnβt too bad but it needs a clever setup. Richard will set up the proof for you and feel free to give it a try on your own!
Letβs define a new function \(g(t) = f(a + th, b + tk)\text{.}\) Then what is \(g'(0)\) using the definition of the derivative (the one with the limit)?
Find the directional derivative \(D_\v{u} f(x,y)\) if \(f(x,y) = x^3 - 3xy + 4y^2\) and \(\v{u}\) is the unit vector given by the angle \(\theta = \frac{\pi}{6}\text{.}\)
Graphically, this value represents the slope of the tangent line of the curve obtained by intersecting the surface with the vertical plane determined by the point \(P(1,2)\) and the direction of \(\v{u}\text{.}\)
Observing the formula for computing the directional derivative, one of the vectors consists of the first-order partial derivatives of the function. This vector is called the gradient of the function.
Sometimes the notation can be a bit sloppy and we might drop the reference to the point \(P\) and just write \(\nabla f = \la \dfrac{\partial f}{\partial x}, \dfrac{\partial f}{\partial y} \ra\text{.}\) Or maybe the reference to the point becomes a subscript and we write \(\nabla f_P\text{.}\) They all mean the same thing.
The symbol to denote the gradient, \(\nabla\text{,}\) is called "del". This is an upside-down Delta. The concept of gradient can be extended to functions of \(n\) variables. We will just need to take all the first-order partial derivatives and organize them into a vector. That is, the gradient of \(f(x_1, x_2, \dots, x_n)\) is
Likewise, if we are dealing with functions of a single variable, then the gradient is just the derivative of the function. You can think of the gradient of functions of a single variable, aka the derivative, as a vector if you want but there are only two directions to move on the curve: forward (positive) and backward (negative).
Now we evaluate these partial derivatives at the point \(\lp \pi, \dfrac{3\pi}{2} \rp\text{.}\) First, we determine the value inside the cosine function:
There are some nice properties of the gradient. The proofs should be super straightforward (just use the properties of derivative plus vectors to push the symbols around). So Richard will just state the properties without proof.
Chain Rule for Gradients: If \(F(t)\) is a differentiable function of one variable, then \(\nabla \lp F \lp f(x,y,z) \rp \rp = F'\lp f(x,y,z) \rp \nabla f\)
Remember we picked out the gradient from the formula for computing the directional derivative, so we can rewrite the formula for computing the directional derivative as
But what is the difference between the gradient and the directional derivative, as they both seem to be related to the rate of change of the function at a point?
The short answer is that the gradient is a vector while the directional derivative is a scalar. We can actually learn more about the relationship between them from this formula.
where \(\theta\) is the angle between the gradient and the unit vector. That is, the directional derivative (aka the rate of change of the function at a point in a specific direction) varies with the cosine of the angle \(\theta\) between the gradient and the direction. Since \(\cos(\theta)\) is bounded by \(-1\) and \(1\text{,}\) then we have
Observe that \(D_\v{u} f(P) = \|\nabla f_P\|\) when \(\theta = 0\) (i.e., when \(\v{u}\) is in the same direction of \(\nabla f_P\)). That is, the gradient points in the direction of the maximum rate of increase, and this maximum rate is \(\|\nabla f_P\|\). Letβs make it into a cool theorem!
Now that we know the difference between the gradient and the directional derivative, we will debrief the last part of the theorem that says the gradient is always normal to the level curve (or surface) at the point.
Letβs say we have a surface \(S\) in \(\R^3\text{.}\) Then we will need an equation of three variables to describe the surface. That is, the surface will have an equation \(F(x,y,z) = k\) (this also means that the surface is a level surface of the function \(f\)).
Now imagine there is a point \(P\) on the surface and a curve \(\v{C}\) on the surface passing through \(P = (a,b,c)\text{.}\) If we call the curve \(\v{r}(t) = \la x(t), y(t), z(t) \ra\text{,}\) then there is a parameter \(t_0\) such that \(\v{r}(t_0) = \la a,b,c \ra\text{.}\) Since \(\c{C}\) lies on the surface, then we obtain
Hmm but why do we need a normal vector to the surface at a point? We can find an equation of the tangent plane using the gradient vector and the point! If you compare the formula for the tangent plane in SectionΒ 14.4, the normal vector is exactly the gradient vector!
Let \(P = (a,b,c)\) be a point on the surface given by \(F(x,y,z) = k\) and assume that \(\nabla f_P \neq \v{0}\text{.}\) Then \(\nabla f_P\) is a vector normal to the tangent plane to the surface at \(P\text{.}\) Moreover, the tangent plane to the surface at \(P\) has equation
\begin{equation*}
F_x(a,b,c) (x - a) + F_y(a,b,c) (y - b) + F_z(a,b,c)(z - c) = 0
\end{equation*}
Using this theorem, we can find an equation of the tangent plane to a surface without needing an explicit formula for our function. That is, \(z\) doesnβt need to be isolated in the equation of the surface.
Let \(F(x,y,z) = \frac{x^2}{4} + y^2 + \frac{z^2}{9}\text{.}\) Then the ellipsoid is the level surface \(F(x,y,z) = 3\text{.}\) +We find the gradient vector to determine the normal vector to the surface.
Now that we know what the gradient is (pointing in the direction of the maximum rate of increase) and how it behaves (normal to the level curve/surface), we can create something called the gradient vector field (which is super similar to the vector field you learned back in MTH 253). For example, the gradient vector field of \(f(x,y) = x^2 - y^2\text{,}\) superimposed on a contour map, is shown below.
As expected, the gradient vectors are normal to the level curves. They also point in the direction of the maximum rate of increase. That is, each gradient vector is pointing "uphill".
Below is the graph of the function \(f(x,y) = x^2 - y^2\text{.}\) Try to convince yourself that the gradient vectors are indeed pointing in the direction of the maximum rate of increase and that they are normal to the level curves.
Gradient is an important concept that helps us understand the behavior of the surface. We will see more applications of the gradient in the following sections when we talk about optimization. Just a quick preview: if all the gradient vectors are pointing toward the same point on the gradient vector field, then that point is a local maximum since the surface is going uphill towards this point.
The problems listed below are assigned to be included in your problem set portfolio. Note that a specific selection of these problems will also form the written homework assignments. I recommend working through all of them to ensure a solid grasp of the material. Reach out to Richard for help if you get stuck or have any questions.
The solutions will be posted after the written homework due dates. If you have any questions about your work, talk to Richard and he is happy to discuss the process with you.
Calculate the directional derivative of \(f(x,y) = \tan^{-1}(xy)\) in the direction of \(\v{v} = \la 1,1 \ra\) at the point \(P = (3,4)\text{.}\) Remember to use a unit vector in your directional derivative computation.
Determine the direction in which \(f(x,y,z) = \dfrac{xy}{z}\) has maximum rate of increase from \(P = (1,-1,3)\text{,}\) and give the rate of change in that direction.
To determine if \(f\) is increasing or decreasing, we check the sign of the directional derivative \(D_\v{u}f(P)\text{.}\) The sign of \(D_\v{u}f\) is the same as the sign of the dot product \(\nabla f_P \cdot \v{v}\) (since dividing by \(\|\v{v}\|\) to get \(\v{u}\) is always positive).
Since the dot product is positive (\(12 > 0\)), the directional derivative is positive. Therefore, \(f\) is increasing in the direction of \(\v{v}\text{.}\)
Let \(f(x,y,z) = \sin(xy + z)\) and \(P = (0,-1, \pi)\text{.}\) Calculate \(D_\v{u} f(P)\text{,}\) where \(\v{u}\) is a unit vector making an angle \(\theta = 30^\circ\) with \(\nabla f_P\text{.}\)
Let \(F(x,y,z) = \frac{x^2}{4} + \frac{y^2}{9} + z^2\text{.}\) The normal vector to the tangent plane is given by the gradient \(\nabla F\text{.}\)
\begin{equation*}
\nabla F = \la \frac{x}{2}, \frac{2y}{9}, 2z \ra
\end{equation*}
We want the tangent plane to be normal to \(\v{v} = \la 1,1,-2 \ra\text{,}\) which means the gradient must be parallel to \(\v{v}\text{.}\) So, \(\nabla F = k \v{v}\) for some scalar constant \(k\text{.}\)
The curve \(\c{C}\) is the intersection of two surfaces. The tangent line to \(\c{C}\) must be tangent to both surfaces. This means the direction vector of the tangent line, \(\v{v}\text{,}\) must be perpendicular to the normal vectors of both surfaces.