## Inspiration for artificial neural networks

Artificial neural networks are biologically inspired algorithms that try to roughly imitate how neurons in the human brain work.

A synapse, which is the point of communication between two biological neurons, can either enhance or inhibit the signal that passes through it. A biological neuron fires, i.e., transmits a signal to other neurons, if its incoming signal exceeds a certain threshold. This type of operation is adapted to artificial neural networks.

## Feedforward neural network

The simplest artificial neural networks are feedforward neural networks (FNN), where information flows only in the forward direction and there are no feedback loops from the opposite direction to previous layers. Nodes (neurons) are organized into layers, as can be seen in the following figure.

The dimension of the input data determines the number of neurons in the input layer. The output layer has as many nodes as the problem to be solved by the neural network requires. To use a simple example, we want to train a neural network to detect whether an image (input: pixels of the image) contains cats (output: 1=yes, 0=no).

Two examples of feedforward networks are single-layer and multi-layer perceptrons.

### Single-layer perceptron

The first and basic model of neural networks is the single-layer perceptron, which has no hidden layers, only input and output nodes exist. Connections between them have different weights (from w₁ to wₙ), which play the role of synapse in neurons, exciting or inhibiting the incoming signal. A bias value (b₁) is added to the sum of the weighted inputs, and this result (∑) is passed through an activation function (f).

y₁ = f(x₁⋅w₁ + x₂⋅w₂ + ... + xₙ⋅wₙ + b₁)

Using binary step activation function f, the final output is 0 or 1:

f(x) = { 1 if x ≥ 0,

0 if x < 0 }

This results in a linearly separable function, which means that the perceptron model makes the classification decision based on the value of a linear combination of features. For example, a perceptron is able to learn how to separate two groups (Class 1 and Class 2) by the red line, considering only two features x₁ and x₂. These classes can be, for example, different animals with features such as height, width, mass etc.

However, this model has huge limitations in complex problems. For instance, this architecture is not able to learn how to separate the classes below with only one straight line. This kind of problem can be solved by the following multi-layer perceptrons.

### Multi-layer perceptron (MLP)

MLP contains one or more hidden layers and non-linear activation functions to solve problems with linearly non-separable data.

The input layer receives the input and carries the information from the input layer to the hidden layer. Activation functions (f) of the hidden layer decide which neuron should be activated. The value computed from the activation function acts as input to the output layer.

## Training a neural network

### Training set

Feedforward neural networks in supervised learning estimate the mapping from inputs to outputs. Ideally, the network learns better functions by iteratively adjusting its weights and biases as training examples are fed in. A training set contains input and output pairs (x⁽¹⁾, y⁽¹⁾), (x⁽²⁾, y⁽²⁾), ..., (x⁽ⁱ⁾, y⁽ⁱ⁾), ..., (x⁽ᵐ⁾, y⁽ᵐ⁾), where (x⁽ⁱ⁾, y⁽ⁱ⁾) is the i-th sample. x⁽ⁱ⁾ is the input or feature vector, y⁽ⁱ⁾ is the desired output or label. The mapping learned by the network is only an estimation, ŷ⁽ⁱ⁾ is called the prediction for the i-th input sample, which depends on the current weights and biases of the network.

### Cost function

During training, only the input vectors of the training set are presented to the neural network. Based on its current knowledge, it predicts an output value for each input. We would like to measure the performance of the neural network, therefore we can compare the expected outputs y⁽ⁱ⁾ from the training set to the predicted values ŷ⁽ⁱ⁾. The cost function is used for this purpose that has different types designed to specific problems. Two main groups are functions for regression and classification tasks.

Building a model to predict house prices based on various factors (e.g. location, number of bedrooms, etc.) is a regression problem. The Mean Squared Error (MSE) is a commonly used cost function that calculates the errors between the true output (known price) and the predicted output (predicted price) for each input-output pair and takes the average of these squared error values:

J(w,b) = ⅟ₘ ∑ (y⁽ⁱ⁾-ŷ⁽ⁱ⁾)²,

where y⁽ⁱ⁾ is the known expected output value for input x⁽ⁱ⁾,

ŷ⁽ⁱ⁾ is the predicted output for input x⁽ⁱ⁾ from the neural network,

m is the number of samples,

and the sum goes from i=1 to m.

### Gradient descent

The aim is to minimize this cost value during training, i.e., to create predictions that are close to the true outputs. The gradient descent optimization algorithm finds the local minimum of the cost function by iteratively moving in the direction of the steepest descent at the current point. An illustration of this mechanism is shown below.

The value of the cost function J(w,b) depends on the weights and biases of the neural network. We can calculate the gradient of the cost function relative to the weights and biases, and then update the network values through the gradient descent process to get better predictions.

At the end of the training, when the value of the cost function is close to zero, we have a trained feedforward neural network that can make good predictions not only for the trained examples but also for new input data that it has not seen before.

## Conclusion

Now you know the basics of feedforward neural networks and how they are trained. More details on activation functions, cost functions and optimization algorithms will come in the next article. Stay tuned and Join WAI!

In the meantime, what other predictions could neural networks be used for besides the task of predicting house prices? Please leave your comment below!