To further demonstrate the effectiveness of different optimizers, we apply them on a dog \& cat classification problem~\cite{elson2007asirra}. We apply AlexNet~\cite{krizhevsky2012imagenet} as the training model and use three optimizers, vanilla Stochastic Gradient Descent, RMSprop and Adam. All three optimizers are trained with the same hyperparameters, e.g., learning rate, number of layers and so on. As results shown in Fig.1, Adam achieves better accuracy and lower loss than the other two methods and that’s why Adam is widely used in practice. Those three optimizer can be initialized easily with Pytorch as the code we shown below:
lr = learning_rate
model = AlexNet()
# Adam
adam =torch.optim.Adam(model.parameters(),
lr=lr, weight_decay=0)
#RMSprop
rprop = torch.optim.RMSprop(model.parameters(), lr=lr, alpha=0.99)
# SGD
sgd = torch.optim.SGD(model.parameters(), lr=lr)

For more optimizers check out at: \url{https://pytorch.org/docs/stable/optim.html}