Paradox#
- The classic definition of a derivative is Instantaneous Rate of Change, which is not exactly true.
- Imagine a car going from
AtoBover the course of 10 sec and covering a distance of 100 meters.
- Plotting a graph for distance vs time:

NOTE: Its a common convention to name a function like this as \( s(t) \) where
sis the distance andtis time. - You can see the slope change over time, as the car is slow to start, so the distance
that is travelled initially dosn’t change much, then as the car speeds up, the distance
changes more rapidly, which corresponds to a steeper slope. And then towards the end,
the car slows down again, so the slope becomes less steep again:

- Also, the velocity curve:
If the s(t) graph changes, the v(t) graph changes too. As velocity is distance per
unit time, so as the s(t) graph gets steeper, the velocity increases(i.e., car
traverses more distance in less time). - With all this, we can see that what we want is not instantaneous rate of change, but rather rate of change between two points that are very close to each other. Infact, this is what speedometers in cars do, the measure speed over a very small interval of time, and show that as the current speed.
- So, if you get a picture of a car and want to know how fast it is going, that is not possible, as at any instant, the car is not moving, it is only moving over an interval of time.
- Let’s call the difference in time between two points as \(dt\), and the difference in distance as \(ds\). So, the velocity, as a function of time, can be defined as: $$ v(t) = \frac{ds}{dt}(t) = \frac{s(t + dt) - s(t)}{dt} $$
Definition#
- Now, as we make \(dt\) smaller and smaller, the value of \(v(t)\) becomes more accurate. And in the limit, as \(dt\) approaches 0, we get the exact value of velocity at time \(t\).
- So, the formula for derivative is: $$ \text{Derivative} \ \frac{ds}{dt}(t) = \underbrace{{\frac{s(t+dt)-s(t)}{dt}}}_{dt \to 0} $$
Visualizing Derivative#
- We can see what this ratio appraches as \(dt\) approaches 0:

- As \(dt\) approaches 0, you can see that it becomes almost like a tangent line to the curve at point \(t\).
- Notice: This \(dt\) is a finitely small non zero value.
- This is how we fix the paradox of instantaneous rate of change. Its not instantaneous, but rather over a very small interval. Still its like the tangent at a single point.
- We can say that the derivative is the Best constant approximation around a point.
How is this useful?#
Ypou may think approaching \(dt\) to 0 will make things more complicated, but in fact, it makes things easier. I will show you how, by deriving the power rule.
Power Rule#
- The power rule states that: $$ \frac{d}{dt} t^n = n t^{n-1} $$
- Let’s find out why this is true, and how approaching \(dt\) to 0 helps us.
Proof#
Let’s avoid getting into binomial theorem, and just use a simple example of \(n=3\): So, I need to prove that:
$$ \frac{d}{dt} t^3 = 3 t^2 $$
Using the definition of derivative:
$$ \frac{d}{dt} t^3 = \frac{(t + dt)^3 - t^3}{dt} = \frac{(t^3 + dt^3 + 3.t.dt(t + dt)) - t^3}{dt} $$
$$ = \frac{dt^3 + 3.t.dt(t + dt)}{dt} = \frac{dt^3}{dt} + \frac{3.t.dt(t + dt)}{dt} = dt^2 + 3.t(t + dt) $$
$$ = 3.t^2 + 3.t.dt + dt^2 $$
Now, as we approach \(dt\) to 0, the terms with \(dt\) vanish, and we are left with:
$$ \frac{d}{dt} t^3 = 3.t^2 $$
As, you can see, approaching \(dt\) to 0 made our life easier, as we could ignore the extra terms with \(dt\) in them.

This is at the heart of why calculus becomes useful.
Instantaneous rate of change is just a conceptual shortcut for the best constant approximation for rate of change.
Derivatives using geometry#
For example, for a graph \( f(x) = x^2 \), if we look at the graph:
\(df/dx\) at any point is the slope of the tangent line at that point.
But this won’t help much. So, let’s think of it geometrically.
Picutre a square with area \( A = x^2 \):
We want to find the change in area when we increase the side length by a small amount \(dx\).
So,$$ df = x dx + x dx + dx^2 = 2x dx + dx^2 $$
Now, always remember to ignore the higher order small terms, so we ignore \(dx^2\), and we get:
$$ df = 2x dx \\ \frac{df}{dx} = 2x $$
Another example, for \(f(x) = x^3\), picture a cube with volume \(f = x^3\):
We want to find the change in volume when we increase the side length by a small
amount \(dx\).
So, to make it easier, we can ignore the very small cube formed at the corner, and
just consider the three rectangular prisms formed on each face:
Each of these prisms has volume \(x^2 dx\), so the total change in volume is:
$$
df = 3 x^2 dx
\\
\frac{df}{dx} = 3 x^2
$$
Note, there will still be some very small prisms formed at the edges, and a tiny cube
at the corner which will be multiples of \(dx^2\) and \(dx^3\), which we ignore.
Graphically, this would mean that the slope of the tangent line at every point in
the graph for \(f(x) = x^3\) is \(3x^2\).

NOTE: Just thinking of graphs won’t have helped us derive the derivative.
Why does our power rule proof work for all n?#
For any \(n\), we can say, for nudged value of \(x + dx\):
$$ f(x) = (x + dx)^n = \underbrace{ (x + dx)(x + dx)(x + dx)\cdots(x + dx) }_{n\ \text{times}} \\ \\ = x^n + n x^{n-1} dx + \text{Multiple of dx}^2 $$
The multiples of \(dx^2\), we can safely ignore. This means: All but the negligible change in \(f(x)\) is due to the \(n x^{n-1} dx\) term.
- Another example from 3blue1brown lesson \(f(x) = frac{1}{x}\):
Here, we are picturing a rectangle with area \(A = 1\) and sides \(x\) and \(1/x\).
This is how the lenghth and breadth will change:
Now, we can make a graph like:
Now, let’s nudge the x by a tiny amount \(dx\):
So, here you can see that nudging added some new area on the right side, but also
decreased some area from the top side. So, the area lost from the top should
cancel out the area gained from the right side, as the total area is still 1.
Also, the area lost from the top should be thought of as negative.