Nama: NIM: Kelompok:

Gradient Descent Method

References

Preparations

# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

Example 1

Given the following function (let us name the function as ‘loss function \(L(w)\)’): $\( L (w ) = \frac{1}{2} (w-4)^2 \)$

Find \(w\) where \(L\) is minimized!

Plot the loss function

w = np.arange(-2, 10, 0.1)
L = 0.5 * (w - 4)**2

plt.figure(figsize = (4,3))
plt.plot(w, L)
plt.xlabel("w")
plt.ylabel("L(w)")
plt.show()
_images/995e6645059c69309c04551453684a14fd904e4fa24f65d16bf205fecabf878a.png

Loss and loss gradient

Loss function:

\[ L (w ) = \frac{1}{2} (w-4)^2 \]

Derivative of the loss function:

\[ \frac{dL(w)}{dw} = (w-4) \]
def loss_fn(w):
    # Calculating the loss
    # Returns the loss and the loss gradient
    loss = 0.5 * (w - 4)**2
    loss_grad = (w-4)

    return loss, loss_grad

The gradient descent method

max_iterations = 10
learning_rate = 0.9
stopping_threshold = 1e-6

# Initializing w
w = 0.

# Data container
losses = [] 
dlosses = []
ws = []
prev_loss = None
  
# Gradient descent iterations 
for i in range(max_iterations):
    # Calculationg the current loss
    loss, dloss = loss_fn(w)

    # If the change in loss is less than or equal to 
    # stopping_threshold we stop the gradient descent
    if prev_loss and abs(prev_loss-loss) <= stopping_threshold:
        break
      
    prev_loss = loss
    losses.append(loss)
    dlosses.append(dloss)
    ws.append(w)
      
    # Updating weights and bias
    w = w - (learning_rate * dloss)
  
    print(f"Iteration {i+1}: L(w)={loss:.5f}, dL(w)/dw={dloss:.5f}, w={w:.5f}")
Iteration 1: L(w)=8.00000, dL(w)/dw=-4.00000, w=3.60000
Iteration 2: L(w)=0.08000, dL(w)/dw=-0.40000, w=3.96000
Iteration 3: L(w)=0.00080, dL(w)/dw=-0.04000, w=3.99600
Iteration 4: L(w)=0.00001, dL(w)/dw=-0.00400, w=3.99960
Iteration 5: L(w)=0.00000, dL(w)/dw=-0.00040, w=3.99996
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(9, 3))
fig.tight_layout() # Or equivalently,  "plt.tight_layout()"

axes[0].plot(losses, '.-')
axes[0].xaxis.set_major_locator(MaxNLocator(integer=True))
axes[0].set_xlabel('#iteration')
axes[0].set_xlabel('#iteration')
axes[0].set_ylabel('L(w)')

w = np.arange(-2, 10, 0.1)
L = 0.5 * (w - 4)**2
axes[1].plot(w, L)
axes[1].plot(ws, losses, '.r')
axes[1].set_xlabel('w')

for k in range(len(dlosses)):
    axes[1].axline((ws[k], losses[k]), slope=dlosses[k], color='gray')
    
plt.show()
_images/d2dcc63ed232832c2662e165b19eff1143a2517569b737848241fef546af6704.png

Example 2

We are given a set of points in 2 dimensions.

X = np.array([
        32.50234527, 53.42680403, 61.53035803, 47.47563963, 59.81320787,
        55.14218841, 52.21179669, 39.29956669, 48.10504169, 52.55001444,
        45.41973014, 54.35163488, 44.1640495 , 58.16847072, 56.72720806,
        48.95588857, 44.68719623, 60.29732685, 45.61864377, 38.81681754])
Y = np.array([
        31.70700585, 68.77759598, 62.5623823 , 71.54663223, 87.23092513,
        78.21151827, 79.64197305, 59.17148932, 75.3312423 , 71.30087989,
        55.16567715, 82.47884676, 62.00892325, 75.39287043, 81.43619216,
        60.72360244, 82.89250373, 97.37989686, 48.84715332, 56.87721319])

We are going to establish the best relationship between X and Y as Y_pred = w * X + b. Find the best value for w and b such that the error between Y_pred and Y is minimum.

Loss and loss gradient

Let \(X=(x_i)\), \(Y = (y_i)\), and \(Y_{\text{pred}} = (w x_i + b)\). Next, let us define a loss (cost) function as the mean squared errors between \(Y\) and \(Y_{\text{pred}}\).

The loss function: $\(J =\frac{1}{n} \sum_{i=1}^n\left(y_i-\underbrace{\left(w x_i+b\right)}_{Y_{\text{pred}}}\right)^2 =\frac{1}{n} \sum_{i=1}^n\left(y_i-w x_i-b\right)^2 \)$

where:

  • \(J\) is the total lost

  • \(y_i\) is the measurement at \(x_i\)

  • \(w\) and \(b\) are the weight and the bias which are associated to \(x_i\) and used to estimate \(y_i\)

The gradient is computed as folows:

\[ \left[\frac{\partial J}{\partial w}, \frac{\partial J}{\partial b}\right] = \left[ \underbrace{\frac{-2}{n} \sum_{i=1}^n x_i \left(y_i-w x_i-b \right)}_{\partial J / \partial w}, \underbrace{\frac{-2}{n} \sum_{i=1}^n \left(y_i-w x_i-b\right)}_{\partial J / \partial b}\right] \]

The gradient is a vector of two elements.

def loss_fn(x, y, w, b):
    # Calculating the loss or cost
    # Returns the loss and the loss gradient

    n = float(len(x))
    
    loss = np.sum((y - w * x - b)**2) / n
    loss_grad = np.array([-2 / n * np.sum(x * (y - w * x - b)), -2 / n * np.sum(y - w * x - b)])

    return loss, loss_grad

The gradient descent method

# Gradient Descent Function
# Here iterations, learning_rate, stopping_threshold can be tuned
def gradient_descent(x, y, iterations = 1000, learning_rate = 0.0001, stopping_threshold = 1e-6):   
    # Initializing weight, bias, learning rate and iterations
    w = 0.1
    b = 0.01
      
    losses = []
    prev_loss = None
      
    # Estimation of optimal parameters 
    for i in range(iterations):
        # Calculationg the current loss
        loss, dloss = loss_fn(x, y, w, b)
  
        # If the change in loss is less than or equal to 
        # stopping_threshold we stop the gradient descent
        if prev_loss and abs(prev_loss-loss) <= stopping_threshold:
            break
          
        prev_loss = loss
  
        losses.append(loss)
          
        # Updating weights and bias
        w = w - (learning_rate * dloss[0])
        b = b - (learning_rate * dloss[1])
      
        print(f"Iteration {i+1}: Loss {loss:.5f}, Weight {w:.5f}, Bias {b:.5f}")
      
      
    # Visualizing the weights and cost at for all iterations
    plt.figure(figsize = (8,6))
    plt.plot(losses, '.-')
    plt.show()
      
    return w, b

Main program

# Data
X = np.array([
        32.50234527, 53.42680403, 61.53035803, 47.47563963, 59.81320787,
        55.14218841, 52.21179669, 39.29956669, 48.10504169, 52.55001444,
        45.41973014, 54.35163488, 44.1640495 , 58.16847072, 56.72720806,
        48.95588857, 44.68719623, 60.29732685, 45.61864377, 38.81681754])
Y = np.array([
        31.70700585, 68.77759598, 62.5623823 , 71.54663223, 87.23092513,
        78.21151827, 79.64197305, 59.17148932, 75.3312423 , 71.30087989,
        55.16567715, 82.47884676, 62.00892325, 75.39287043, 81.43619216,
        60.72360244, 82.89250373, 97.37989686, 48.84715332, 56.87721319])

sorted_idx = np.argsort(X)
X = X[sorted_idx]
Y = Y[sorted_idx]

# Estimating weight and bias using gradient descent
w, b = gradient_descent(X, Y, iterations=2000)
print(f"Estimated W: {w:.5f}\nEstimated b: {b:.5f}")

# Making predictions using estimated parameters
Y_pred = w * X + b

# Plotting the regression line
plt.figure(figsize = (8,6))
plt.scatter(X, Y, marker='o', color='red')
plt.plot(X, Y_pred, '--b')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Iteration 1: Loss 4352.08893, Weight 0.75933, Bias 0.02289
Iteration 2: Loss 1114.85615, Weight 1.08160, Bias 0.02918
Iteration 3: Loss 341.42912, Weight 1.23913, Bias 0.03225
Iteration 4: Loss 156.64495, Weight 1.31612, Bias 0.03375
Iteration 5: Loss 112.49704, Weight 1.35376, Bias 0.03448
Iteration 6: Loss 101.94939, Weight 1.37215, Bias 0.03483
Iteration 7: Loss 99.42939, Weight 1.38115, Bias 0.03500
Iteration 8: Loss 98.82732, Weight 1.38554, Bias 0.03508
Iteration 9: Loss 98.68348, Weight 1.38769, Bias 0.03511
Iteration 10: Loss 98.64911, Weight 1.38874, Bias 0.03513
Iteration 11: Loss 98.64090, Weight 1.38925, Bias 0.03513
Iteration 12: Loss 98.63893, Weight 1.38950, Bias 0.03513
Iteration 13: Loss 98.63847, Weight 1.38963, Bias 0.03512
Iteration 14: Loss 98.63835, Weight 1.38969, Bias 0.03512
Iteration 15: Loss 98.63833, Weight 1.38972, Bias 0.03511
Iteration 16: Loss 98.63832, Weight 1.38973, Bias 0.03510
Iteration 17: Loss 98.63832, Weight 1.38974, Bias 0.03509
_images/d6d9550e86bf3eff9820f8f3ccd1bce94fc9d1f4604a3d93295f3f3ee53d1b7f.png
Estimated W: 1.38974
Estimated b: 0.03509
_images/0b95de404adfec3a1c37a1b5a2432554b11aee7bd9eebf6703040e2af6f5bb5d.png

Programming Homework

Given the following function:

\(f(x,y) \rightarrow \cos \left( \frac{1}{2}x \right) x \cos \left( \frac{1}{2}y\right) \)

By using gradient descent method, find its minimum point (\(x\), \(y\), and \(z\)).

Plot the loss function

To have a better understanding to our problem, let us plot \(f(x,y)\) for \(-4 \leq x \leq 4\) and \(-4 \leq y \leq 4\).

# Put your code here

Loss and loss gradient

def loss_fn(x, y):
    # Put your code here
    z = np.cos(x/2) * x * np.cos(y/2)
    
    loss = 0
    loss_grad = np.array([0., 0.])
    
    return loss, loss_grad

The gradient descent method

def gradient_descent(x, y, iterations = 1000, learning_rate = 0.0001, stopping_threshold = 1e-6):
    # Initializing xmin, ymin, learning rate and iterations
    xmin = 0.
    ymin = 0.
      
    losses = []
    weights = []
    prev_loss = None

    # Put your code here
    
    return xmin, ymin

Main program

# Put your main program here