Want to find out a little more about what machine learning is and the most commonly used machine learning models? It is worth highlighting that there a vast range of models and sub-categories of machine learning, involving layers of complexity that can be difficult to understand as a beginner. As a result, the models are discussed on an overview basis. In this guide, we go over the most frequently used models and how they function.
Two main categories of machine learning models
First of all, it is important to establish that machine learning models fall under one of the two categories.
Supervised learning is a type of machine learning model that involves the system learning a function, mapping an input to an output. This is based on the examplar input-output the computer has been given.
Within supervised learning, you have two further models of machine learning. These are:
Regression models in machine learning are used for predicting continuous value. The output is continuous. A regression model in practice, for example, could be making house value predictions.
Machine learning is a bit like a Russian doll, as there are many different models and sub-categories. For example, regression models also have a number of types:
- Simple linear regression: a linear connection must be apparent between the target variable and predictor
- Polynomial regression: involves applying linear regression on the polynomial features of a given degree
- Decision trees regression: can be used for regression and classification purposes (which will get to in the next section) where each level requires the identification of a splitting attribute
- Neural network regression: a model that is multi-layered by nature and gets its name from the human brain. Every node in the many hidden layers represents different functions that each input goes through. These then result in an output
- Random Forest regression: takes into consideration the prediction of a number of decision regression trees at once
Classification models have an output that is discrete and therefore the predict responses, such as if a tumour is benign or cancerous, or if an email is spam or not.
It has a number of different sub-category models. Some of these are:
- Logistic regression: works in a similar way to linear regression, but used primarily to calculate the probability of outcomes (typically just two).
- Support vector machine: used for both classification and regression models. The goal for this algorithm is to identify the hyperplane in an N-dimensional space, which classifies the data points
- Decision Tree, Neural Network and Random Forest are all used here, as in the previously mentioned regression models. The main difference is that the output is not continuous but discrete.
Unsupervised learning is primarily used in order to identify patterns as well as gain inferences from the input data. This is all done without having any references to labelled outcomes. There are two main types of unsupervised learning:
This is a machine learning technique (including k-means clustering and hierarchical clustering) that involves the clustering of data points, and it is used in a variety of circumstances. For example, it is commonly used to detect fraud, document classification as well as customer segmentation.
Dimensionality reduction models
This unsupervised learning technique involves reducing the number of features. There are a variety of techniques that can be used, but most fall into one of two categories: feature extraction or feature elimination.