As we know, there are several training methods, such as SGD (stochastic gradient descent), GD (gradient descent) and mb GD (mini-batch gradient descent). However, these methods do not guarantee successful training. So, we will introduce several optimizers in this part that apply different strategies to update the weights of the model. As a result, the model will converge stably and efficiently.
By implement those tools and start with the commonest learning scenario, supervised learning, we use cross-validation to evaluate the performance of this model. During the training processes, there are two problems overfitting and under-fitting. We will define them and provide some possible solutions for them.
For the overfitting, we will cover some methods like including regularization, normalization, dropout, and initialization. Both optimizers and those overfitting solutions introduce new hyperparameters into the model, so it is necessary to find the optimal hyperparameter combination to maximize the performance of the model. We will cover two hyperparameter searching algorithms, i.e., grid search and random search, and compare them in terms of performance and effectiveness.