Some of the most common reasons for the loss not to decrease during training the network includes when we stuck at a local minimum when we set the learning rate to a low value when the regularization parameter is high.
Some of the most common reasons for the loss leading to nan during training the network includes when the learning rate is set to a high value when the gradient blows up and improper or poor loss function.
We train the network by performing backpropagation. During backpropagation, we apply any optimization method and find the optimal weights. Gradient descent is the most commonly used optimization method while training the network during backpropagation.