With machine learning, to perform any task, we need to design the right set of features and feed those features to the machine learning model. Feature engineering is a vital task for the success of any machine learning model. But it is hard to engineer the right set of features when dealing with unstructured data like text and images. In those cases, we can use deep learning.
With deep learning, we are not required to engineer the features since the deep neural network consists of several numbers of hidden layers. It implicitly learns and extracts the right set of features by itself. So, we don’t have to perform feature engineering by ourselves. Thus deep learning is widely used in the task where it is hard to perform feature engineering such as image recognition, text classification, and so on. Thus, in this way, deep learning differs from machine learning.
Dropout often referred to the dropping off some of the neurons in the neural network. That is, while training the network we can ignore certain neurons randomly and this helps is preventing the network from over fitting to the training data.
Early stopping is often used to control overfitting. With early stopping, we stop the process of training the neural network before the weights have converged. That is, we check the performance of our network on the validation set which is not used for the training. When the performance of the network has not been improved over the validation set then we stop training the network.
While training the deep network, the distribution of the hidden units activation value changes due to the change in weights and bias. This leads to the problem called the internal covariate shift and causes the training time to slow down. We can avoid this problem of internal covariate shift by applying batch normalization. Batch normalization as the name suggests denotes the normalizing hidden units activation value. It also helps in reducing the training time of the network.
There is not any standard and optimal way to decide the number of hidden layers. We can choose the number of hidden layers based on the intuitiveness obtained from the problem we are dealing with.
For a simple problem, we can build the network with 2 or less than 2 hidden layers and for a complex problem, we can build a deep network with many hidden layers. As specified earlier, there is no rule of thumb in deciding the number of hidden layers.