Gradient Descent Method#
A more efficient approach is to use information about the slope of the potential energy surface to guide our search.
The gradient descent method (also called the steepest descent method) works by analogy to releasing a ball on a hill and letting it roll to the bottom. At any point on our potential energy surface, the gradient tells us which direction is “uphill” and how steep the surface is at this point. With this information, we can step in the opposite direction (i.e., downhill), then recalculate the gradient at our new position, and repeat until we reach a point where the gradient is \(\sim0\).
The simplest implementation of this method is to move a fixed distance every step.
Algorithm (Fixed Step Size)#
Start at an initial guess position \(r_0\)
Calculate the gradient \(U^\prime(r) = \frac{\mathrm{d}U}{\mathrm{d}r}\) at this point
Move in the direction opposite to the gradient (i.e., downhill):
\[r_{i+1} = r_i - \Delta r \times \mathrm{sign}(U^\prime(r_i))\]where \(\Delta r\) is a fixed step size and \(\mathrm{sign}(U^\prime)\) gives us the direction (+1 or -1)
Repeat until the gradient is close to zero
Note
In molecular simulations, we often express the gradient of the potential energy in terms of the force. The force is defined as the negative gradient of the potential energy:
This relationship means that forces naturally point “downhill” towards lower energy, whilst the gradient points “uphill” towards higher energy. When particles move in the direction of the force, they move towards lower potential energy.
Exercise: Fixed Step Size Gradient Descent#
Write a function to calculate the first derivative of a harmonic potential:
Using this function, write code to perform a gradient descent search to find the minimum of your harmonic potential energy surface. Use \(r=1.0\) Å as your starting position.
Your code should:
Use a
forloop with a maximum of 50 iterations to prevent infinite loopsStore the position and gradient at each iteration in lists (for plotting later)
Update the position at each step using: \(r_{i+1} = r_i - \Delta r \times \mathrm{sign}(U^\prime(r_i))\)
Print the iteration number, position, and gradient at each step
Stop early (using
break) when \(|U^\prime(r)| < 0.001\)Report whether the algorithm converged or hit the maximum iteration limit
Part 1: Start with \(\Delta r = 0.01\) Å. Does the algorithm converge? How many iterations does it require?
Part 2: Now try \(\Delta r = 0.1\) Å to see if a larger step size converges faster. What behaviour do you observe? Does it still converge?
Questions to consider:
Why does the larger step size cause oscillation?
The minimum is at r = 0.74 Å and we start at r = 1.0 Å. Can you explain why Δr = 0.01 Å converges but Δr = 0.1 Å does not, based on this distance?
What would you predict for Δr = 0.05 Å?
Rescaling the Step Size#
Using a fixed step size makes our method very sensitive to the choice of step size. As you have seen, large step sizes overshoot the minimum and oscillate back and forth, whilst small step sizes converge more reliably but require many more iterations.
A common approach designed to address this problem is to rescale the step size for each iteration based on how far we think we are from the minimum. A simple model assumes that the gradient will be steep if we are far from the minimum, but shallow if we are already close. Therefore, we make our step size proportional to the local gradient magnitude.
Instead of moving a fixed distance \(\Delta r\), we take a step proportional to the gradient:
or equivalently, proportional to the force:
where:
\(\alpha\) is a constant called the learning rate or step size parameter
\(F(r_i) = -U^\prime(r_i)\) is the force at position \(r_i\)
The update equation then becomes:
or equivalently:
With this adaptive approach:
When far from the minimum, \(|U^\prime|\) is large → we take large steps
When close to the minimum, \(|U^\prime|\) is small → we take small steps
The parameter \(\alpha\) controls the overall “aggressiveness” of the optimisation. Too large and we overshoot; too small and we converge slowly.
Exercise: Adaptive Step Size Gradient Descent#
Write a new version of your gradient descent code that rescales the step to be proportional to the local force. You should write this as a loop that iteratively updates the current position:
where \(F(r_i) = -U^\prime(r_i) = -k(r_i - r_0)\).
Your code should:
Start from \(r = 1.0\) Å
Use a
forloop with a maximum of 50 iterationsUpdate the position at each iteration
Print the iteration number, position, gradient, and step size taken at each iteration
Stop early (using
break) when \(|U^\prime(r)| < 0.001\)Report the final predicted equilibrium bond length and number of iterations required
Part 1: Start with \(\alpha = 0.01\). Does it converge? How many iterations does it take? Compare this to the fixed step size results.
Part 2: Try \(\alpha = 0.001\). What happens to the convergence rate?
Part 3: Try \(\alpha = 0.1\). Does the algorithm remain stable?
Questions to consider:
How does the step size change as you approach the minimum? Watch the “Step (Å)” column in your output.
Why doesn’t the adaptive method oscillate like the fixed step size method with \(\Delta r = 0.1\) Å did?
Compare \(\alpha = 0.01\) (adaptive) with \(\Delta r = 0.01\) Å (fixed): which converges faster and why?
What happens when \(\alpha\) is too large? Can you explain this behaviour?
Based on your experiments, what seems to be a good choice of \(\alpha\) for this problem?