Machine Learning

Avatto > > DATA SCIENTIST > > SHORT QUESTIONS > > Machine Learning

We use Model selection to determine the most suitable model to solve our problem. In Machine learning, there are many models.

We can create these models to solve a problem. But these models provide different results.

We evaluate the performance of a model based on these results. The criteria for these results can be accuracy, precision, recall, or any other derived score.

In certain cases, precision is more important. E.g. In the case of user experience, we emphasize precision.

In certain cases recall is more important. E.g. In the case of disease detection, it is better to pick a model with less precision but a lower false-negative rate.
We can use the following methods to prevent overfitting in Machine learning.

Cross-validation: We can divide initial data into multiple mini test-train data splits. These mini splits can be used for tuning the model as per requirements. We can use k-fold cross-validation.

Gather More Training Data: We can use more training data to tune the model. If data is not available, we can find ways to gather more data.

Data Augmentation and Noise: If it is not possible to gather more training data, we can augment existing data to make it appear more diverse. It makes the model less prone to overfitting.

Model Simplification: We can simplify the model to use less features. This can lead to a model simple enough to not overfit and make it good enough to learn from data.

Regularization: We can use regularization technique to prevent overfitting. We can tune the values of L1 and L2 weights to make the model more general.
Regularization is a technique in Machine learning to prevent the problem of overfitting in statistical models.

This technique discourages Machine learning to form a complex model to avoid overfitting.

We can use Ridge regression or Lasso regression techniques for tuning the Regularization parameter.

With Regularization the amount of variance in the model is reduced. Whereas bias in the model remains unaffected by Regularization.

In a simple Linear regression equation of y = aX + b, b is bias.
Perceptron is a supervised learning algorithm. We use Perceptron for binary classification problems.

It is a simple linear classifier. It uses a linear function to make a prediction about the class of a data set.

In its simplest form it follows following formula:

y = aX + b

If aX b > 0 then class is 1 else class is 0.

Since Perceptron is a linear classifier, it works well when classes can be separated by a straight line.
We can use the following methods for calibration in Supervised learning:

Platt Calibration: In Platt calibration, we transform the outputs of a classification model into a probability distribution over classes. In binary classification, we distribute data over two classes. But sometimes we need prediction about class as well as the probability of certainty about the prediction. By using Platt calibration we can get the probability estimate. It means we get how sure we are of our classification being correct.

Isotonic Regression: It is also known as monotonic regression. We use Isotonic regression to calibrate the linearity imposed by linear regression. In Isotonic regression, we fit an isotonic curve to means of an experimental result. Isotonic regression is not constrained by any concrete form like a linear function in linear regression.