TY - GEN
T1 - A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks
AU - Dogo, E. M.
AU - Afolabi, O. J.
AU - Nwulu, N. I.
AU - Twala, B.
AU - Aigbavboa, C. O.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12
Y1 - 2018/12
N2 - In this paper, we perform a comparative evaluation of seven most commonly used first-order stochastic gradient-based optimization techniques in a simple Convolutional Neural Network (ConvNet) architectural setup. The investigated techniques are the Stochastic Gradient Descent (SGD), with vanilla (vSGD), with momentum (SGDm), with momentum and nesterov (SGDm+n)), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Adaptive moment estimation Extension based on infinity norm (Adamax) and Nesterov-accelerated Adaptive Moment Estimation (Nadam). We trained the model and evaluated the optimization techniques in terms of convergence speed, accuracy and loss function using three randomly selected publicly available image classification datasets. The overall experimental results obtained show Nadam achieved better performance across the three datasets in comparison to the other optimization techniques, while AdaDelta performed the worst.
AB - In this paper, we perform a comparative evaluation of seven most commonly used first-order stochastic gradient-based optimization techniques in a simple Convolutional Neural Network (ConvNet) architectural setup. The investigated techniques are the Stochastic Gradient Descent (SGD), with vanilla (vSGD), with momentum (SGDm), with momentum and nesterov (SGDm+n)), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Adaptive moment estimation Extension based on infinity norm (Adamax) and Nesterov-accelerated Adaptive Moment Estimation (Nadam). We trained the model and evaluated the optimization techniques in terms of convergence speed, accuracy and loss function using three randomly selected publicly available image classification datasets. The overall experimental results obtained show Nadam achieved better performance across the three datasets in comparison to the other optimization techniques, while AdaDelta performed the worst.
KW - Artificial Intelligence
KW - deep learning
KW - optimizers
KW - performance measures
KW - stochastic gradient descent
UR - http://www.scopus.com/inward/record.url?scp=85070414431&partnerID=8YFLogxK
U2 - 10.1109/CTEMS.2018.8769211
DO - 10.1109/CTEMS.2018.8769211
M3 - Conference contribution
AN - SCOPUS:85070414431
T3 - Proceedings of the International Conference on Computational Techniques, Electronics and Mechanical Systems, CTEMS 2018
SP - 92
EP - 99
BT - Proceedings of the International Conference on Computational Techniques, Electronics and Mechanical Systems, CTEMS 2018
A2 - Niranjan, S. K.
A2 - Desai, Veena
A2 - Rajpurohit, Vijay S.
A2 - Nadkatti, M N
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st International Conference on Computational Techniques, Electronics and Mechanical Systems, CTEMS 2018
Y2 - 21 December 2018 through 23 December 2018
ER -