Introduction to Deep Learning: Creating an Artificial Neural Network in Golang

2023-04-13 2910 words 14 minutes

Contents

Introduction

A few months ago, I made the decision to dedicate myself to understanding Deep Learning, aware of its significant value in the contemporary world. I started my journey using high-level libraries like TensorFlow. This experience provided me with a solid understanding of the fundamental concepts of Deep Learning and taught me how to train different neural network architectures. However, I felt a significant discomfort in not fully comprehending what was happening “behind the scenes.”

That’s when I set out to study the underlying mathematics of Deep Learning. It was a reality check to realize that my mathematical knowledge was limited, and I felt a wave of apprehension. At this point, I decided to go back to basics and delved into mathematical concepts from elementary school level to university-level subjects like differential calculus, integral calculus, and linear algebra.

After six months of intense and consistent study, the mathematical concepts underlying artificial neural networks started to become clear to me. This was the turning point that allowed me to implement, entirely from scratch, an artificial neural network in Go.

This article aims to share my journey and demonstrate how to create an artificial neural network from scratch in Golang, in the hope that it can serve as a guide for others who are on a similar path of discovery and learning in the field of Deep Learning.

You can download the code from GitHub. Access the repository here: https://github.com/bobboyms/deepgo.

By visiting the link, you will have access to the complete source code and can download it to use in your own projects or explore it to gain a better understanding of how the neural network implementation works in Go.

Training Dataset

In this example, I used the Iris dataset, widely known in the field of machine learning. The code below is organized in a way that the data is initially separated into two distinct sets: one for training and another for testing.

ds := datasets.NewIrisDataSet()
X, Y := preprocessing.SeparateXY(linalg.NewMatrixFrom2D(ds.GetData(), len(ds.GetData()), len(ds.GetData()[0])))

XN := preprocessing.NormalizeData(X)
YN := preprocessing.OneHotEncoder(Y.LocalData())

xTrain, xTest, yTrain, yTest := datasets.TrainTestSplit(XN, YN, 0.2, 42)

The code starts with creating the Iris dataset using the command ds := datasets.NewIrisDataSet(). After that, the inputs (flower features) are separated from the outputs (flower species) using the preprocessing.SeparateXY function. Then, these data are normalized and preprocessed to be used in the neural network. For the inputs, we use preprocessing.NormalizeData(X) to scale the data to a standard range. For the outputs, the ``preprocessing.OneHotEncoder(Y.LocalData()) function is used to transform the class labels into a suitable format for training neural networks.

Finally, the datasets.TrainTestSplit(XN, YN, 0.2, 42) function splits the dataset into a training set and a test set, reserving 20% of the data for the test set.

Creating the Neural Network Layers

Now, let’s discuss the code that implements the dense layer, also known as the hidden layer, of our neural network. This code segment is crucial as it defines how the neurons in the layer process the inputs and produce the outputs.

package nn

import (
  linalg "deepgo/functions/linalg"
)

type Dense struct {
  Weights    linalg.Matrix[float64]
  Biases     linalg.Matrix[float64]
  Activation func(matrix linalg.Matrix[float64]) linalg.Matrix[float64]
}

func NewDense(numInputs, numNeurons int, activation func(matrix linalg.Matrix[float64]) linalg.Matrix[float64]) Layer {

  return &Dense{
    Activation: activation,
    Weights:    linalg.NormalDistribution(numInputs, numNeurons),
    Biases:     linalg.NormalDistribution(1, numNeurons),
  }
}

func (d *Dense) B() linalg.Matrix[float64] {
  return d.Biases
}

func (d *Dense) ChangeB(b linalg.Matrix[float64]) {
  d.Biases = b
}

func (d *Dense) ChangeW(w linalg.Matrix[float64]) {
  d.Weights = w
}

func (d *Dense) W() linalg.Matrix[float64] {
  return d.Weights
}

func (d *Dense) Forward(inputs linalg.Matrix[float64]) linalg.Matrix[float64] {
  dotResult := linalg.Dot(inputs, d.Weights)
  row, _ := dotResult.LocalShape()
  return d.Activation(linalg.Sum(dotResult, linalg.NewBroadcasting(d.Biases, row)))
}

In this code, we have the definition of the Dense structure, which represents a dense layer of a neural network. This layer has weights and biases, and also includes an activation function that determines how the neurons process the inputs.

The NewDense function is responsible for creating a new dense layer, taking the number of inputs (numInputs), the number of neurons (numNeurons), and the activation function as parameters. It initializes the weights and biases using a normal distribution.

The B() and W() functions return the values of the biases and weights, respectively. The ChangeB() and ChangeW() functions are used to modify the values of the biases and weights, respectively.

The Forward function is responsible for performing the forward propagation step in the dense layer of the neural network. It takes the inputs in the form of a matrix and returns the corresponding outputs after processing by the dense layer.

First, the code performs the dot product between the inputs and the weights of the dense layer using the Dot function from the linalg package. This results in a matrix called dotResult, which contains the result of multiplying the inputs by the weights.

Next, the dimension of the first dimension of the dotResult matrix is obtained using the LocalShape() function. This dimension represents the number of rows in the resulting matrix.

The next step is to add the bias to each row of the dotResult matrix. This is done using the NewBroadcasting function from the linalg package, which creates a broadcasting matrix from the bias so that it can be added to each row of the dotResult matrix using the Sum function from the linalg package.

Finally, the activation function is applied to the resulting sum matrix, returning the final outputs of the dense layer.

With the Dense class created, we can build a neural network with multiple layers. Take a look at the code example below:

layer1 := nn.NewDense(4, 7, activation.Sigmoid)
layer2 := nn.NewDense(7, 8, activation.Sigmoid)
layer3 := nn.NewDense(8, 4, activation.Sigmoid)

In this code, we are creating a neural network with three layers. The first layer (layer1) has 4 inputs and 7 neurons, using the sigmoid activation function. The second layer (layer2) takes the 7 outputs from the previous layer and has 8 neurons. Finally, the third layer (layer3) takes the 8 outputs from the previous layer and has 4 neurons.

This approach of creating dense layers allows for the construction of deep and complex neural networks capable of learning hierarchical representations of the input data.

The activation function

Now let’s take a look at our activation function. The activation function plays a crucial role in processing information in a neural network. It introduces non-linearity and allows the network to learn and model complex relationships in the data.

Nonlinearity refers to the property of a function or system that does not follow a direct linear relationship between input and output. In the context of neural networks, nonlinearity is essential to allow the network to learn and model complex relationships in the data.

In a neural network, dense layers and activation functions introduce nonlinearity. Without nonlinearity, the network would be equivalent to a sequence of linear operations, which would limit its ability to model nonlinear relationships in the data.

Nonlinearities in activation functions, such as the sigmoid function, Rectified Linear Unit (ReLU), or hyperbolic tangent function, enable the network to learn to map complex inputs to desired outputs. They can shape the data in a nonlinear way, expanding the network’s capacity to capture patterns and complexities in the input data.

The introduction of nonlinearity is crucial for neural networks to handle more challenging problems, such as image classification, natural language processing, and time series forecasting. Nonlinearity allows the neural network to adapt and learn richer and more complex representations, enhancing its generalization capacity and performance across a wide range of tasks.

Code:

func SigmoidDerivative(matrix linalg.Matrix[float64]) linalg.Matrix[float64] {
    result := Sigmoid(matrix)
    return linalg.Mul(result, linalg.SubScalar(1, result))
}

func Sigmoid(matrix linalg.Matrix[float64]) linalg.Matrix[float64] {
    row, col := matrix.LocalShape()
    data := make([]float64, row*col)
    for i, x := range matrix.LocalData() {
        data[i] = 1 / (1 + math.Exp(-x))
    }
    return linalg.NewMatrix(data, row, col)
}

In this code, we have the implementation of the sigmoid activation function. The Sigmoid function takes a matrix as input and applies the sigmoid function to each element of the matrix. It uses the formula 1 / (1 + math.Exp(-x)) to calculate the sigmoid value for each element.

Additionally, we have the SigmoidDerivative function, which calculates the derivative of the sigmoid function. It uses the Sigmoid function to obtain the sigmoid values of the input matrix and then calculates the derivative using the formula sigmoid(x) * (1 - sigmoid(x)).

Training the Neural Network with SGD

Now it’s time to start training our neural network using the Stochastic Gradient Descent (SGD) algorithm. SGD is a popular optimization method used in neural network training. Let’s take a look at the code that implements this process:

learningRate := 0.01
xRow, xCol := xTrain.LocalShape()
yRow, yCol := yTrain.LocalShape()

xRows := preprocessing.CreateBatches(linalg.GetRow(xTrain.LocalData(), xRow, xCol), 50)
yRows := preprocessing.CreateBatches(linalg.GetRow(yTrain.LocalData(), yRow, yCol), 50)

for epoch := 0; epoch < 110; epoch++ {
    totalLoss := 0.0
    for i := range xRows {
        xi := linalg.NewMatrixFrom2D(xRows[i], len(xRows[i]), xCol)
        yi := linalg.NewMatrixFrom2D(yRows[i], len(yRows[i]), yCol)

In this snippet, we are initiating the training process of our neural network using the optimization algorithm called Stochastic Gradient Descent (SGD). SGD is widely used in neural network training due to its effectiveness.

In the presented code, we start by defining a learning rate of 0.01. Then, we obtain the dimensions of the training matrices (xTrain and yTrain) for further processing.

Next, we divide the data into batches using the CreateBatches function from the preprocessing package. This is useful for mini-batch training, where data is processed in smaller subsets to iteratively update the network’s parameters.

In the main loop, we iterate over training epochs. For each epoch, we iterate over the data batches (xRows and yRows) using a for range loop. Within this loop, we create xi and yi matrices from the corresponding data batches to be used in the process of updating the weights and biases of the neural network.

This code snippet demonstrates the initial steps of training a neural network using SGD. During training, the total loss is updated to monitor the model’s performance. The training process can be executed for multiple epochs to gradually enhance the neural network’s ability to make accurate predictions based on the training data.

Performing Feedforward and Calculating the Error

Now we have reached a crucial part of training our neural network: performing the forward pass and calculating the error. These steps are essential for evaluating the network’s performance and making necessary weight updates. Let’s take a look at the corresponding code:

r1 := layer1.Forward(xi)
r2 := layer2.Forward(regularization.Dropout(r1, 0.15))
output := layer3.Forward(r2)

totalLoss += loss.Mse(yi, output)

In this code snippet, we perform the steps of forward pass and error calculation during the training of our neural network. First, we create matrices xi and yi from the corresponding data batches.

Next, we propagate the inputs through the layers of the neural network. We use the Forward method to pass the inputs through the layers. In the code example, we have three layers: layer1, layer2, and layer3. The result of each layer is stored in intermediate variables (r1, r2, and output).

After the forward pass, we calculate the error using the Mean Squared Error (MSE) loss function between the obtained outputs (output) and the expected outputs (yi). The error is added to the `totalLoss, which is used to evaluate the overall performance of the network during training.

These forward pass and error calculation steps are crucial for the learning of the neural network, allowing us to assess how close the predicted outputs are to the expected values. Based on the error, necessary updates are made to the network’s weights to improve performance throughout the training process.

Calculation of Delta

Now let’s proceed with the backpropagation step using the SGD algorithm. In this step, we calculate the delta, which is a measure of the error in each layer of the neural network. Let’s look at the corresponding code:

w3Deltas := linalg.Mul(linalg.Sub(output, yi), activation.SigmoidDerivative(output))
w2Deltas := linalg.Mul(linalg.Dot(w3Deltas, layer3.W().Transpose()), activation.SigmoidDerivative(r2))
w1Deltas := linalg.Mul(linalg.Dot(w2Deltas, layer2.W().Transpose()), activation.SigmoidDerivative(r1))

In this code snippet, we are calculating the delta for each layer of the neural network. Delta represents the error rate in each neuron of a layer, which is used to adjust the weights during training.

We start by calculating the delta in the last layer, also known as the output layer. We subtract the current output from the desired output yi and apply the derivative of the activation function (sigmoid) to output. The result is stored in w3Deltas.

Next, we propagate the error back to the previous hidden layer (second hidden layer). We multiply the transposed weight matrix of the output layer layer3.W().Transpose() by the delta of the output layer w3Deltas. Again, we apply the derivative of the activation function (sigmoid) to the result and store it in w2Deltas.

Finally, we propagate the error to the previous hidden layer (first hidden layer). We multiply the transposed weight matrix of the second hidden layer layer2.W().Transpose() by the delta of the second hidden layer w2Deltas. We apply the derivative of the activation function (sigmoid) to the result and store it in w1Deltas.

The calculation of delta is crucial to determine the contribution of each neuron to the network’s error. This allows us to adjust the weights appropriately during training and improve the performance and accuracy of the model.

Weight Adjustment

Now let’s calculate the new values for the weights and perform the corresponding updates. Take a look at the relevant code:

newWeights3 := linalg.Sub(layer3.W(), linalg.MulScalar(learningRate, linalg.Dot(r2.Transpose(), w3Deltas)))
layer3.ChangeW(newWeights3)

newWeights2 := linalg.Sub(layer2.W(), linalg.MulScalar(learningRate, linalg.Dot(r1.Transpose(), w2Deltas)))
layer2.ChangeW(newWeights2)

newWeights1 := linalg.Sub(layer1.W(), linalg.MulScalar(learningRate, linalg.Dot(xi.Transpose(), w1Deltas)))
layer1.ChangeW(newWeights1)

In these lines of code, we are updating the weights in each layer of the neural network according to the learning rate (learningRate) and the product of the delta and the transposed matrices of the previous layers.

For the output layer (weights3), we subtract the product of the output layer delta (w3Deltas) and the transposed matrix of the second hidden layer (r2). Then, we update the weights of the output layer.

Similarly, for the second hidden layer (weights2), we subtract the product of the second hidden layer delta (w2Deltas) and the transposed matrix of the first hidden layer (r1). Then, we update the weights of the second hidden layer.

Finally, for the first hidden layer (weights1), we subtract the product of the first hidden layer delta (w1Deltas) and the transposed matrix of the input (xi). We then update the weights of the first hidden layer.

These weight updates are crucial to adjust the parameters of the neural network and improve the performance of the model during training.

Biases Adjustment

newBiases3 := linalg.Sub(layer3.B(), linalg.MulScalar(learningRate, linalg.SumAxis(w3Deltas, 0)))
layer3.ChangeB(newBiases3)

newBiases2 := linalg.Sub(layer2.B(), linalg.MulScalar(learningRate, linalg.SumAxis(w2Deltas, 0)))
layer2.ChangeB(newBiases2)

newBiases1 := linalg.Sub(layer1.B(), linalg.MulScalar(learningRate, linalg.SumAxis(w1Deltas, 0)))
layer1.ChangeB(newBiases1)

In the code snippet above, we are adjusting the biases of the neural network after adjusting the weights. Bias adjustment is based on the deltas calculated during backpropagation.

For each layer, we create new biases (newBiases) calculated by subtracting the product of the learning rate and the sum of deltas along the correct axis (using the SumAxis function) from the current bias of the layer.

Next, we update the biases of each layer using the ChangeB method, replacing the old values with the newly calculated biases.

Adjusting biases is an important step in the neural network training process, allowing the network to adapt to the data and improve its performance. Biases help control the activation level of neurons in each layer, enabling the network to learn more complex and accurate representations of the input data.

Evaluating the progress of training

In the following code snippet, we calculate the average loss (meanLoss) by dividing the totalLoss by the number of samples (xRow). Then, we check if the epoch number (epoch) is divisible by 10 using the modulo operator (%). If it is, we print the epoch and the value of the mean loss.

meanLoss := totalLoss / float64(len(xRows))
if epoch%10 == 0 {
    fmt.Printf("Epoch: %d, Loss: %f\n", epoch, meanLoss)
}

This allows us to track the progress of training the neural network over the epochs. The periodic display of the epoch and the mean loss helps us understand how the model is converging and if we are seeing improvements in performance.

Monitoring the loss during training is important because we expect it to decrease as the neural network learns from the data. This display provides an overview of the performance and helps us make necessary adjustments to the model if needed.

Training Conclusion

After completing the training, we will evaluate the accuracy of our neural network:

r1 := layer1.Forward(xTest)
r2 := layer2.Forward(r1)
output := layer3.Forward(r2)

fmt.Println("-------------------")
fmt.Printf("Accuracy: %f", metrics.Accuracy(yTest, output))

In this code snippet, we perform the forward pass process with the test data. We pass the inputs xTest through the neural network, through the layers, until we obtain the final output.

Next, we print the model’s accuracy using the metrics.Accuracy() function. This function compares the predicted outputs output with the actual values yTest\ and calculates the model’s accuracy.

Accuracy is an important metric for evaluating the performance of the neural network. It provides us with a measure of the proportion of correct predictions out of the total test samples. The higher the accuracy, the better the model’s performance in correctly classifying the test data.

Checking the accuracy after training allows us to assess the model’s performance and verify if it generalizes well to previously unseen data. This is crucial to ensure that the neural network is making predictions with the desired accuracy and quality.

Therefore, by evaluating the accuracy after training, we obtain an objective measure of the quality of our neural network and can make adjustments or improvements if necessary.