Machine Learning is divided into two main areas: supervised learning and unsupervised learning. Although it may seem that the first refers to prediction with human intervention and the second does not, these two concepts have more to do with what we want to do with the data.
The machine is provided with detailed and labeled data and information. This is the knowledge base for your analyses. The different examples serve you to make further generalizations.
It’s more like the way our brain works. The computer receives no prior information about the data. It has to track its database and establish patterns through understanding and abstraction.
Learning by reinforcement
The computer learns from experience. It is based on the trial-and-error system, by which the observation of the world around it is the basis of its learning. That feedback makes it better.
Linear Regression and Logistic Regression (Supervised Learning)
Linear Regression is a Machine Learning algorithm used to obtain a numerical result. This algorithm tries to establish a linear relationship between independent variables and output or dependent variable. An example of the application of Linear Regression, shown in Figure, would be the prediction of the demand for a product at a given time from a set of previously recorded demand data. Logistic Regression, on the other hand, is not used for numerical variables but is used to predict the outcome of a categorical variable as a function of independent variables.
An example of the application of logistic regression is to predict whether a customer will give up a certain service from a telephone company.
Decision Tree (Supervised Learning)
Decision tree models where the target variable can take a finite set of values are called classification trees. A Decision Tree can be used, for example, to produce a model for medical diagnostics.
Boosted Decision Tree and Decision Forest (Supervised Learning)
The Boosted Decision Tree algorithm, whose operation is schematically illustrated in Figure, is based on a set of Decision Trees in which the second tree corrects the errors of the first, the third corrects the errors of the first and second and so on. On the other hand, a Decision Forest works by building multiple Decision Trees and “voting” for the most popular type of output. Therefore, unlike the Boosted Decision Tree, where the results are additive, in a Decision Forest, the results are averaged. The Boosted Decision Tree and Decision Forest algorithms are used to study the same type of problems that are analyzed with Decision Trees.
Neural Network and Averaged Perceptron (Supervised Learning)
A Neural Network is composed of a set of interconnected layers in which the input values or inputs give rise to the output values or outputs through a series of nodes with their corresponding weights. Between the input and output layers, there may be one or more hidden layers, as shown in Figure 5. The Averaged Perceptron is a simplified version of the Neural Network that classifies the inputs into the different possible outputs based on a linear function.
Clustering Algorithms (Unsupervised learning)
They classify the data in groups or clusters according to the similarities of their attributes. At the same time, they look for data grouped in different clusters to be different. To assess the differences between the data, the clustering algorithms calculate the Euclidean distance between numerical attributes, so that the lower this value is, the more similar the instances are and the more likely they are to be grouped in the same cluster.
Anomaly Detection Algorithms (Unsupervised learning)
One of the most used algorithms in Anomaly Detection is the Isolation Forest algorithm, in which very anomalous instances will present very different attributes from the usual ones, which will allow us to differentiate and separate them from the rest that compose the data set. By establishing successive conditions on the attributes, the instances are separated in nodes. Anomaly Detection algorithms are often applied to detect fraud, for example in cases of bank loans.
A comparison of various classifiers in scikit-learn into synthetic data sets. The purpose of this example is to illustrate the nature of the decision boundaries of the different classifiers. Particularly in high dimensional spaces, data can be more easily separated linearly and the simplicity of classifiers such as naive Bayes and linear SVM could lead to better generalization than other classifiers.
Let’s play Machine Learning
The price of a house:
The price could be:
- COP 80.000
- COP 120.000
- COP 190.000
The price is 💁 $120,000
Is machine learning magic?
Once you realize how easy it is to apply machine learning techniques to seemingly difficult problems (such as handwriting recognition), you begin to feel that you can use machine learning to solve any problem and get a satisfactory answer as long as you have enough data. Just feed the data and watch the computer magically find the answer!. But it is important to remember that machine learning only works if the problem can be solved first with the data you have.
How to learn more about machine learning?
If you want to learn about this wonderful world, I leave you this repository where you will find a list of courses and materials about ML and others issues.