Damped Newton's Method: Local Minimum Search

Newton’s method in multidimensional optimization is one of the important second-order methods. However, in practical computations, it is not always convenient to use the full Newton step. That is why a version of the method is considered in which a special parameter is added to the scheme. This approach makes the search process more flexible and allows better control over the move to a new approximation. In the literature, this approach is often called damped Newton’s method, that is, a method in which the full Newton step is reduced when necessary. In what follows, we will look step by step at how this method is constructed, what its iterative formula looks like, and under what condition the computations are stopped.

Damped Newton’s Method: How the Search Direction Is Built

Suppose we are given a function of several variables

\[
f(\bar{x})=f(x_1,x_2,\dots,x_n),
\]

where $ \bar{x} $ is the vector of variables. We need to find a point at which the function attains a local minimum value. But how do we determine in which direction to move from the current approximation? This is exactly the question answered by the first step of the method.

First, near the point $ \bar{x}_k $, the function is replaced by its quadratic approximation. In other words, instead of analyzing the whole function, we use a local model that describes its behavior more accurately near the current point. This is important because in minimization problems, we need to take into account not only the direction in which the function changes, but also the shape of its surface.

This approximation is written as

\[
f(\bar{x}) \approx f(\bar{x}_k)
+ \nabla f(\bar{x}_k)^T \cdot (\bar{x}-\bar{x}_k)
+ \frac{1}{2} \cdot (\bar{x}-\bar{x}_k)^T \cdot H(\bar{x}_k) \cdot (\bar{x}-\bar{x}_k).
\]

Two main quantities are used here. The first is the gradient $ \nabla f(\bar{x}_k) $, which shows how the function changes near the point $ \bar{x}_k $. The second is the Hessian matrix $ H(\bar{x}_k) $, which describes the local curvature of the surface. It is precisely because of the second derivatives that Newton’s method belongs to second-order methods.

Next, from this quadratic model, we determine the vector $ \Delta \bar{x}_k $, which defines the Newton increment, that is, the direction of the move from the current approximation. To do this, we solve the system

\[
H(\bar{x}_k)\cdot \Delta \bar{x}_k=-\nabla f(\bar{x}_k).
\]

If the matrix $ H(\bar{x}_k) $ is invertible, then

\[
\Delta \bar{x}_k=-H(\bar{x}_k)^{-1}\cdot \nabla f(\bar{x}_k).
\]

So, at this stage, the method determines not the new point itself, but rather the direction and magnitude of the Newton increment, which are then used in the update formula.

Damped Newton’s Method: How the New Approximation Is Obtained

After the Newton increment $ \Delta \bar{x}_k $ has been found, the next question arises: is it always worth moving to the new point by taking the entire step? This is exactly where the parameter appears.

In the classical Newton method, the move to a new approximation corresponds to a full step. This means that the entire computed Newton increment is taken. However, in Newton’s method with a parameter, a number $ \alpha_k $ is introduced, which makes it possible to change the length of the move.

Then the new approximation is determined by the formula

\[
\bar{x}_{k+1}=\bar{x}_k+\alpha_k\cdot \Delta \bar{x}_k.
\]

Substituting the expression for $ \Delta \bar{x}_k $ into this formula, we obtain

\[
\bar{x}_{k+1}=\bar{x}_k-\alpha_k\cdot H(\bar{x}_k)^{-1}\cdot \nabla f(\bar{x}_k).
\]

This is the main formula for Newton’s method with a parameter in problems of minimizing a function of several variables.

Here it is important to clearly distinguish between two roles. The vector $ \Delta \bar{x}_k $ determines the direction of the move, while the parameter $ \alpha_k $ determines what part of this step is actually taken.

If $ \alpha_k=1 $, we obtain the classical Newton method. If $ 0<\alpha_k<1 $, the step is reduced, and the method takes on a damped character.

This approach is needed to make the computational process more controllable. The full Newton step is often very efficient, but in some problems it may turn out to be too large. That is why the parameter makes it possible to move to the new approximation more carefully and to better control the course of the iterative process.

Newton’s Method: When the Process Stops and What Is Taken as the Result

After building the iterative formula, one more important thing needs to be understood: when exactly should the computations be stopped? After all, Newton’s method with a parameter is an iterative method, and this means that the sequence of approximations

\[
\bar{x}_0,\ \bar{x}_1,\ \bar{x}_2,\ \dots,\ \bar{x}_k,\ \dots
\]

is computed step by step.

Similarly to the classical Newton method, in the method with a parameter one of the most commonly used stopping conditions is the requirement that the gradient norm be small:

\[
\|\nabla f(\bar{x}_k)\| \le \varepsilon,
\]

where $ \varepsilon>0 $ is a prescribed accuracy.

What does this condition mean? It shows that at the point $ \bar{x}_k $, the function is already changing only very slightly. Therefore, the current approximation can be considered sufficiently close to a point of local minimum.

In many practical schemes, the parameter $ \alpha_k $ is chosen in such a way that the new approximation ensures a decrease in the function value. In other words, the goal is that during the move, the condition

\[
f(\bar{x}_{k+1})<f(\bar{x}_k).
\]
is satisfied.

This is not the only possible idea for choosing the parameter, but it clearly shows its purpose: the parameter helps make the move one that really improves the result.

After the stopping condition is satisfied, the last point found, $ \bar{x}_k $, is taken as the approximate value of the local minimum point, and the value $ f(\bar{x}_k) $ is taken as the approximate minimum value of the function.

So, Newton’s method with a parameter, or damped Newton’s method, combines three consecutive ideas. First, it builds the Newton increment using the gradient and the Hessian matrix. Then it forms a new approximation while taking the step parameter into account. After that, it checks the stopping condition and determines the approximate result. It is exactly this structure that makes the method an important tool in numerical methods for minimizing a function of several variables.

Damped Newton’s Method: A Practical Example for a Function of Two Variables

Now let us move on to the practical part. After the theoretical discussion, it is natural to see how Newton’s method with a parameter works in a specific problem. An example is the best way to see how new approximations are computed and why, in some cases, the scheme with a parameter is more convenient than the classical version.

Example 1. Find the minimum value of the function $ f(x_1,x_2)=x_1^4-3\cdot x_1^2+x_2^2 $ with accuracy $ \varepsilon=0.1 $, using damped Newton’s method

$Graph of the function $ f(x_1,x_2)=x_1^4-3\cdot x_1^2+x_2^2 $$

Let us consider the function

\[
f(x_1,x_2)=x_1^4-3\cdot x_1^2+x_2^2.
\]

For this function, we will apply Newton’s method with a parameter. Let us take the initial approximation

\[
\bar{x}_0=
\begin{pmatrix}
0.8\\
0
\end{pmatrix},
\]

since such a point is convenient for computations and at the same time clearly shows why, in this problem, it is reasonable to use the scheme with a parameter.

First, let us find the gradient of the function:

\[
\nabla f(x_1,x_2)=
\begin{pmatrix}
\frac{\partial f}{\partial x_1}\\
\frac{\partial f}{\partial x_2}
\end{pmatrix}
=
\begin{pmatrix}
4\cdot x_1^3-6\cdot x_1\\
2\cdot x_2
\end{pmatrix}.
\]

Next, let us find the Hessian matrix:

\[
H(x_1,x_2)=
\begin{pmatrix}
\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1\partial x_2}
\\[6pt]
\frac{\partial^2 f}{\partial x_2\partial x_1} & \frac{\partial^2 f}{\partial x_2^2}
\end{pmatrix}
=
\begin{pmatrix}
12\cdot x_1^2-6 & 0\\
0 & 2
\end{pmatrix}.
\]

Now let us compute the gradient and the Hessian matrix at the initial point $ \bar{x}_0=(0.8,0)^T $:

\[
\nabla f(0.8,0)=
\begin{pmatrix}
4\cdot 0.8^3-6\cdot 0.8\\
2\cdot 0
\end{pmatrix}
=
\begin{pmatrix}
-2.752\\
0
\end{pmatrix},
\\[6pt]
H(0.8,0)=
\begin{pmatrix}
12\cdot 0.8^2-6 & 0\\
0 & 2
\end{pmatrix}
=
\begin{pmatrix}
1.68 & 0\\
0 & 2
\end{pmatrix}.
\]

To find the Newton increment, we use the system

\[
H(\bar{x}_k)\cdot \Delta \bar{x}_k=-\nabla f(\bar{x}_k).
\]

So, for the initial point we have

\[
\Delta \bar{x}_0=-H(0.8,0)^{-1}\cdot \nabla f(0.8,0).
\]

Since the matrix ( H(0.8,0) ) is diagonal, its inverse has the form

\[
H(0.8,0)^{-1}=
\begin{pmatrix}
\frac{1}{1.68} & 0\\
0 & \frac{1}{2}
\end{pmatrix}.
\]

Then

\[
\Delta \bar{x}_0=
–
\begin{pmatrix}
\frac{1}{1.68} & 0\\
0 & \frac{1}{2}
\end{pmatrix}
\cdot
\begin{pmatrix}
-2.752\\
0
\end{pmatrix}
=
\begin{pmatrix}
1.638095\\
0
\end{pmatrix}.
\]

Now it is important to pay attention to one point. If we used the classical Newton method here, that is, took the full step $ \alpha_0=1 $, we would obtain the point

\[
\tilde{\bar{x}}_1=
\bar{x}_0+\Delta \bar{x}_0=
\begin{pmatrix}
0.8\\
0
\end{pmatrix}
+
\begin{pmatrix}
1.638095\\
0
\end{pmatrix}
=
\begin{pmatrix}
2.438095\\
0
\end{pmatrix}.
\]

Let us compute the value of the function at the initial point and at this new point:

\[
f(0.8,0)=0.8^4-3\cdot 0.8^2=-1.5104,
\\[6pt]
f(2.438095,0)\approx 2.438095^4-3\cdot 2.438095^2\approx 17.4889.
\]

So, the full Newton step sharply increases the value of the function here. That is exactly why, in this case, using Newton’s method with a parameter is more appropriate. It allows us not to make an excessively large move, but to approach the minimum more carefully. Let us take the parameter

\[
\alpha_k=\frac{1}{4}.
\]

Then the iterative formula takes the form

\[
\bar{x}_{k+1}=\bar{x}_k+\frac{1}{4}\cdot \Delta \bar{x}_k.
\]

For the first iteration, we obtain

\[
\bar{x}_1=
\bar{x}_0+\frac{1}{4}\cdot \Delta \bar{x}_0
=
\begin{pmatrix}
0.8\\
0
\end{pmatrix}
+
\frac{1}{4}
\cdot
\begin{pmatrix}
1.638095\\
0
\end{pmatrix}
=
\begin{pmatrix}
1.209524\\
0
\end{pmatrix}.
\]

Let us check the value of the function:

\[
f(1.209524,0)\approx 1.209524^4-3\cdot 1.209524^2\approx -2.248627.
\]

We can see that the function value has now decreased. So, in this problem, it is precisely the scheme with a parameter that gives a much more reliable first step.

Continuation of the Iterative Process

Let us move on to the second iteration. We compute the gradient at the point $ \bar{x}_1 $:

\[
\nabla f(1.209524,0)=
\begin{pmatrix}
4\cdot 1.209524^3-6\cdot 1.209524\\
0
\end{pmatrix}
\approx
\begin{pmatrix}
-0.179262\\
0
\end{pmatrix}.
\]

The Hessian matrix at this point is

\[
H(1.209524,0)=
\begin{pmatrix}
12\cdot 1.209524^2-6 & 0\\
0 & 2
\end{pmatrix}
\approx
\begin{pmatrix}
11.555374 & 0\\
0 & 2
\end{pmatrix}.
\]

Then the Newton increment is

\[
\Delta \bar{x}_1=
–
H(1.209524,0)^{-1}\cdot \nabla f(1.209524,0)
\approx
\begin{pmatrix}
0.015513\\
0
\end{pmatrix}.
\]

Therefore,

\[
\bar{x}_2=
\bar{x}_1+\frac{1}{4}\cdot \Delta \bar{x}_1
=
\begin{pmatrix}
1.209524\\
0
\end{pmatrix}
+
\frac{1}{4}
\cdot
\begin{pmatrix}
0.015513\\
0
\end{pmatrix}
=
\begin{pmatrix}
1.213402\\
0
\end{pmatrix}.
\]

Let us move on to the third iteration. For the point $ \bar{x}_2 $, we have

\[
\nabla f(1.213402,0)\approx
\begin{pmatrix}
-0.134228\\
0
\end{pmatrix},
\\[6pt]
H(1.213402,0)\approx
\begin{pmatrix}
11.668137 & 0\\
0 & 2
\end{pmatrix}.
\]
From this,
\[
\Delta \bar{x}_2=
–
H(1.213402,0)^{-1}\cdot \nabla f(1.213402,0)
\approx
\begin{pmatrix}
0.011504\\
0
\end{pmatrix}.
\]

Then

\[
\bar{x}_3=
\bar{x}_2+\frac{1}{4}\cdot \Delta \bar{x}_2
=
\begin{pmatrix}
1.213402\\
0
\end{pmatrix}
+
\frac{1}{4}
\cdot
\begin{pmatrix}
0.011504\\
0
\end{pmatrix}
=
\begin{pmatrix}
1.216278\\
0
\end{pmatrix}.
\]

Now let us find the gradient at the point $ \bar{x}_3 $:

\[
\nabla f(1.216278,0)\approx
\begin{pmatrix}
-0.100550\\
0
\end{pmatrix}.
\]

The Hessian matrix at this point is

\[
H(1.216278,0)\approx
\begin{pmatrix}
11.752030 & 0\\
0 & 2
\end{pmatrix}.
\]

Therefore,

\[
\Delta \bar{x}_3=
–
H(1.216278,0)^{-1}\cdot \nabla f(1.216278,0)
\approx
\begin{pmatrix}
0.008556\\
0
\end{pmatrix}.
\]

Then we obtain the fourth approximation:

\[
\bar{x}_4=
\bar{x}_3+\frac{1}{4}\cdot \Delta \bar{x}_3
=
\begin{pmatrix}
1.216278\\
0
\end{pmatrix}
+
\frac{1}{4}
\cdot
\begin{pmatrix}
0.008556\\
0
\end{pmatrix}
=
\begin{pmatrix}
1.218417\\
0
\end{pmatrix}.
\]

Now let us check the stopping condition. For the point $ \bar{x}_4 $, we have

\[
\nabla f(1.218417,0)\approx
\begin{pmatrix}
-0.075189\\
0
\end{pmatrix}.
\]

Therefore,

\[
\|\nabla f(\bar{x}_4)\|\approx 0.075189.
\]

Since

\[
0.075189<0.1,
\]

the iterative process can be stopped.

Thus, we take the last point found as the approximate value of the minimum point:

\[
\bar{x}^*=
\begin{pmatrix}
1.218417\\
0
\end{pmatrix}.
\]

Let us compute the minimum value of the function at this point:

\[
f_{\min}\approx f(1.218417,0).
\]

Substituting the values found, we get

\[
f(1.218417,0)=1.218417^4-3\cdot 1.218417^2+0^2\approx -2.249761.
\]

So, by Newton’s method with a parameter, we obtain

\[
\bar{x}^*=
\begin{pmatrix}
1.218417\\
0
\end{pmatrix},
\qquad
f_{\min}\approx -2.249761.
\]

Since at the point found, both diagonal elements of the Hessian matrix are positive, the Hessian matrix here is positive definite. Therefore, the stationary point found corresponds to a local minimum. For this function, it is one of two symmetric minima.

Thus, in this example, the main idea of Newton’s method with a parameter can be seen step by step. First, the Newton increment is found. Then the step length is adjusted using the parameter. After that, the stopping condition is checked. This example clearly shows that the scheme with a parameter can be more appropriate than the classical Newton method when the full step turns out to be too large and does not provide the desired decrease in the function value.

What to Explore Next: Useful Directions for Further Study

We have already gone through the topic of Newton’s method with a parameter step by step, from the main idea to a practical example. However, learning about minimization methods does not end here. What else is worth reading next in order to better understand the differences between approaches and their capabilities?

Coordinate Descent: How Step-by-Step Minimization Works — This article will discuss the minimization of a function of many variables by changing one coordinate at a time in sequence.
Gradient Descent: How the Minimum Is Found Along the Direction of Decrease — Here it will be shown how, for a function of several variables, an iterative process is built on the basis of the gradient and the choice of a direction of decrease.
Newton’s Method for One Variable: How the Classical Scheme Works — This material will examine the search for the minimum of a function of one variable by Newton’s method and the main steps of this process.

Damped Newton’s Method in Code: An Idea for Your Own Implementation

A ready-made flowchart gives an excellent opportunity to move from theory to programming. On its basis, you can write a small program in any convenient language that will compute the minimum value of a function of several variables using damped Newton’s method. This kind of work helps you understand the approach better, because mathematical reasoning gradually turns into a clear computational process. In addition, it is a good way to test yourself in practice, take a closer look at the role of the parameter, and see how the method behaves in different examples.

Damped Newton’s Method: What Is Worth Knowing About This Approach

Damped Newton’s Method: How the Search Direction Is Built

Damped Newton’s Method: How the New Approximation Is Obtained

Newton’s Method: When the Process Stops and What Is Taken as the Result

Damped Newton’s Method: A Practical Example for a Function of Two Variables

Example 1. Find the minimum value of the function \( f(x_1,x_2)=x_1^4-3\cdot x_1^2+x_2^2 \) with accuracy \( \varepsilon=0.1 \), using damped Newton’s method

Continuation of the Iterative Process

What to Explore Next: Useful Directions for Further Study

Damped Newton’s Method in Code: An Idea for Your Own Implementation

Leave a Reply Cancel reply