Why we talk about neural networks in Computer Science?
Machine Learning is a growing and important field of computer science. Initially it was used for classification and regression, but now many use it as a tool for creativity, applied goals, and more. For this reason, it’s vital to understand the many algorithms that are actually used to train the networks to maximize a specific goal.
Many of the artificial intelligence algorithms are modeled after biological and computational biology models. In biological models, neurons are made from DNA and each neuron has three inputs. At the same time, each neuron is connected to the other neurons through what is called an axon.
These connections are known as synaptic connections. Synapses create a neural network in the brain by creating loops. For example, a neuron connected to every other neuron will connect it to each neuron in the initial loop. Similarly, a neuron connected to a few other neurons in a loop will create a neuron that is the final connection.
The role of a neuron is to react to inputs, just like with a biological model. In the AI model, the calculations that take place are done using machine learning networks, which in the past took quite a while to train. Machine learning networks are also trained using gradient descent. The degree of power in these algorithms is often described by the number of parameters the neural network can use. Typically, a deep neural network has about 1000 parameters, whereas simple networks use around 15. The learning can then be done by learning an equation called a reward function. This equation is plugged into a neural network so it can “see” if something good happens or something bad happens in reaction to an input. With large deep neural networks, the reward function can do the following:
If something good happens, you will have a neural network in which the activation vectors are more correlated, meaning they tend to be weighted towards the same values.
If something bad happens, you will have a neural network in which the activation vectors are more random, meaning they tend to be uncorrelated.
A Model includes a lot of randomness, both of input and output, because the system cannot make any predictions until it has gone through the inputs hundreds of thousands of times. During the process of learning, the reward function will always favor an uncorrelated network over a more correlated one, even if they have the same number of parameters. This is called “negative overfitting”, which you can hear about in a Wikipedia article.
Normalizing the network is important because you want to have a single parameter that represents one prediction.
Neural networks allow us to build “smart” machines that can learn from experiences and also make predictions based on data. However, sometimes it is more useful to express something in a functional, mathematical way.
What does that mean? Instead of having hard-coded values, we will be able to give a single function for all possible operations inside our model.
This simple question is a “meta” question, asking why we need multiple operating functions to tackle it. Is learning the data something we simply need to use functions for or does there also need to be a function for caching or for returning predictions? And how do we express all of those things into a mathematical way?
Those questions are a good entry point for another article: a Machine Learning Crash Course.