Complex Systems

Information-Theoretic Based Error-Metrics for Gradient Descent Learning in Neural Networks Download PDF

Joseph C. Park
Atlantic Undersea Test and Evaluation Center (AUTEC),
West Palm Beach, FL 33401, USA

Perambur S. Neelakanta
Salahalddin Abusalah
Dolores F. De Groff
Raghavan Sudhakar
Department of Electrical Engineering,
Florida Atlantic University,
Boca Raton, FL 33431, USA

Abstract

Conventionally, square error (SE) and/or relative entropy (RE) error functions defined over a training set are adopted towards optimization of gradient descent learnings in neural networks. As an alternative, a set of divergence (or distance) measures can be specified in the information-theoretic plane that functionally have pragmatic values similar to (or improved upon) The SE or RE metrics. Kullback-Leibler (KL), Jensen (J), and Jensen-Shannon (JS) measures are suggested as possible information-theoretic error-metric candidates that are defined and derived explicitly. Both conventional SE/RE measures, as well as the proposed information-theoretic error-metrics, are applied to train a multilayer perceptron topology. This is done in order to elucidate their relative efficacy in deciding the performance of the network as evidenced from the convergence rates and training times involved. Pertinent simulation results are presented and discussed.