Minimize a convex optimization problem using Newton's method¶
This code demonstrates the application of Newton's method to find the minimum of a convex function. Newton's method is an iterative optimization algorithm that utilizes the function's gradient and Hessian matrix to update its estimate of the minimum.
Mathematical Representation
Let's represent the function, its derivatives, and Newton's method formula using ω
(omega) as the variable:
Loss Function (f(ω))
f(ω)=ω2+2ω+1First Derivative (df(ω))
df(ω)=ddωf(ω)=2ω+2Second Derivative (d2f(ω))
d2f(ω)=d2dω2f(ω)=2Newton's Method Formula
ωt+1=ωt−df(ωt)d2f(ωt)Convex Optimization with Newton's Method
In convex optimization, Newton's method excels due to its quadratic convergence rate, meaning it can rapidly approach the minimum when starting from a reasonable initial guess. It leverages the curvature information provided by the Hessian matrix to take more informed steps towards the minimum.
Limitations
Despite its strengths, Newton's method has some limitations:
- Hessian Calculation: Computing the Hessian matrix can be computationally expensive, especially for high-dimensional problems.
- Non-Convex Functions: It may not converge or converge to a local minimum for non-convex functions.
- Saddle Points: It can get stuck at saddle points where the gradient is zero but the Hessian is not positive definite.
# Minimize a convex optimization problem using Newton's method
import numpy as np
def f(x):
"""
Objective function to minimize.
"""
return x**2 + 2*x + 1
def df(x):
"""
First derivative of the objective function.
"""
return 2*x + 2
def d2f(x):
"""
Second derivative of the objective function.
"""
return 2
def newtons_method(x0, tolerance=1e-6, max_iterations=100):
"""
Minimizes a convex function using Newton's method.
Args:
x0: Initial guess for the minimum.
tolerance: Convergence tolerance.
max_iterations: Maximum number of iterations.
Returns:
A tuple containing:
- The approximate minimum.
- The number of iterations performed.
"""
x = x0
for i in range(max_iterations):
delta_x = -df(x) / d2f(x) # Newton's update rule
x = x + delta_x
if abs(delta_x) < tolerance:
return x, i + 1
return x, max_iterations # Return after max iteration if it doesn't converge
# Example usage:
initial_guess = 10 # An example starting point
minimum, iterations = newtons_method(initial_guess)
print(f"Approximate minimum: {minimum}")
print(f"Number of iterations: {iterations}")
print(f"Function value at minimum: {f(minimum)}")
Approximate minimum: -1.0 Number of iterations: 2 Function value at minimum: 0.0
In multiple dimensions¶
Loss Function (f(ω))
f(ω)=100(ω2−ω21)2+(1−ω1)2Gradient (∇f(ω))
∇f(ω)=[−400ω1(ω2−ω21)−2(1−ω1)200(ω2−ω21)]Hessian (H(ω))
H(ω)=[−400(ω2−3ω21)+2−400ω1−400ω1200]Newton's Method Formula (Matrix Form)
ωt+1=ωt−[H(ωt)]−1∇f(ωt)Explanation:
∇f(x): Represents the gradient of the function 'f' at a point 'x'.
x₁, x₂, ..., xₙ: Are the input variables of the function 'f'.
∂f/∂xᵢ: Represents the first-order partial derivative of the function 'f' with respect to variable 'xᵢ'.
The gradient is a vector that points in the direction of the greatest rate of increase of the function. Its components represent the slopes of the function in each direction of the input variables.
Explanation:
H(f): Represents the Hessian matrix of a function 'f'.
x₁, x₂, ..., xₙ: Are the input variables of the function 'f'.
∂²f/∂xᵢ∂xⱼ: Represents the second-order partial derivative of the function 'f' with respect to variable 'xᵢ' and then with respect to variable 'xⱼ'.
The Hessian matrix is a square matrix that contains all the second-order partial derivatives of a multivariable function. Each element in the matrix represents how the rate of change of one partial derivative (slope in one direction) is affected by a change in another variable. It essentially captures the curvature of the function's surface.
import numpy as np
# Minimize a multi-dimensional convex optimization problem using Newton's method
def f(x):
"""
Objective function to minimize (example: Rosenbrock function).
"""
return 100 * (x[1] - x[0]**2)**2 + (1 - x[0])**2
def df(x):
"""
Gradient of the objective function.
"""
grad = np.zeros(2)
grad[0] = -400 * x[0] * (x[1] - x[0]**2) - 2 * (1 - x[0])
grad[1] = 200 * (x[1] - x[0]**2)
return grad
def d2f(x):
"""
Hessian matrix of the objective function.
"""
hessian = np.zeros((2, 2))
hessian[0, 0] = -400 * (x[1] - 3 * x[0]**2) + 2
hessian[0, 1] = -400 * x[0]
hessian[1, 0] = -400 * x[0]
hessian[1, 1] = 200
return hessian
def newtons_method(x0, tolerance=1e-6, max_iterations=100):
"""
Minimizes a multi-dimensional convex function using Newton's method.
Imagine you're standing on a hillside and want to find the lowest point (the minimum).
The gradient tells you the direction of the steepest uphill climb.
The Hessian gives you information about the shape of the hill.
This line of code essentially calculates the direction
and distance you should move to go downhill most efficiently,
taking into account the slope and curvature of the hill.
This movement is stored in delta_x. This process is iteratively repeated,
leading us to the valley, i.e., the minimum.
"""
x = x0
for i in range(max_iterations):
# Solve the linear equations: Hessian_matrix * delta_x = -gradient
delta_x = -np.linalg.solve(d2f(x), df(x))
x = x + delta_x
#magnitude or length of the delta_x vector is small enough
if np.linalg.norm(delta_x) < tolerance:
return x, i + 1
return x, max_iterations
# Example usage:
initial_guess = np.array([-1.2, 1])
minimum, iterations = newtons_method(initial_guess)
print(f"Approximate minimum: {minimum}")
print(f"Number of iterations: {iterations}")
print(f"Function value at minimum: {f(minimum)}")
Approximate minimum: [1. 1.] Number of iterations: 7 Function value at minimum: 0.0
No comments:
Post a Comment