Machine Learning

Avatto > > DATA SCIENTIST > > SHORT QUESTIONS > > Machine Learning

Feature projection (FP) is an approach to dimension reduction in machine learning.

We use FP to transform the data from high dimension space to space with fewer dimensions. Some of the techniques of FP are as follows:

Principal component analysis (PCA): PCA performs a linear mapping of data from higher to lower dimensions.

Non negative matrix factorization (NMF): NMF decomposes a non-negative matrix to product of two non-negative ones.

Kernel PCA: In this technique PCA can be done by using kernel trick. It is used for constructing non-linear mappings.

Autoencoder: We use autoencoder to learn non-linear dimension reduction functions. We use inverse functions in this technique to create original representation from coding.
Support vector machine (SVM) is a supervised learning model of machine learning.

SVM is used for classification and regression problems. It is mostly used in classification analysis to divide the dataset into multiple classes.

In SVM we try to identify the hyperplane by which data can be divided into classes. Then we try to maximize the distance between classes so that data can distinctly be labeled. This distance is called the margin.

Most of the time we get a linear classification in SVM. But sometimes we have to deal with non-linear classification. In such a scenario we can use kernel trick. The kernel trick takes low dimensional input space and takes it to higher-dimensional space.
Bias in the machine learning model comes from simple assumptions about the model. This can lead to under fitting of data. It reduces the accuracy of the model.

Variance in machine learning comes from high complexity in an algorithm. Due to variance, the model becomes sensitive to variation in data. Variance causes the inclusion of noise in the model. This leads to over fitting of data.

Therefore in machine learning, we have to balance the bias and variance so that model provides prediction with optimum accuracy. We do not want a high bias or high variance in our model.

It is an art to maintain a balance between bias and variance while creating a machine-learning model.
KNN is also known as the K nearest neighbor algorithm. KNN is a classification algorithm based on a supervised learning approach. K-means clustering is a clustering algorithm based on an unsupervised learning approach.
We need labeled data in the KNN algorithm. In K-means clustering we need unlabeled points and a threshold.  The algorithm takes unlabeled points and clusters them into groups by calculating the mean distance between the points.
Precision is also known as the positive predictive value in machine learning.

Let say we have a dataset of 12 fruits, apples, and oranges. Our model identifies 8 apples out of these. Out of these 5 apples are actually apples. This is called true positives. These rest 3 apples identified by the model are called false positives. In this case, the precision of the model is 5/8.

Precision tells us how effective is our model in identifying the true positives.

The other way of looking at precision is how useful the prediction is.

Precision is a measure of quality or exactness of results.