Parallel Networks that Learn to Pronounce English Text

Terrence J. Sejnowski; Charles R. Rosenberg

Parallel Networks that Learn to Pronounce English Text

Terrence J. Sejnowski
Department of Biophysics, The Johns Hopkins University,
Baltimore, MD 21218, USA

Charles R. Rosenberg
Cognitive Science Laboratory, Princeton University,
Princeton, NJ 08542, USA

Abstract

This paper describes NETtalk, a class of massively-parallel network systems that learn to convert English text to speech. The memory representations for pronunciations are learned by practice and are shared among many processing units. The performance of NETtalk has some similarities with observed human performance. (i) The learning follows a power law. (ii) The more words the network learns, the better it is at generalizing and correctly pronouncing new words. (iii) The performance of the network degrades very slowly as connections in the network are damaged: no single link or processing unit is essential. (iv) Relearning after damage is much faster than learning during the original training. (v) Distributed or spaced practice is more effective for long-term retention than massed practice.

Network models can be constructed that have the same performance and learning characteristics on a particular task, but differ completely at the levels of synaptic strengths and single-unit responses. However, hierarchical clustering techniques applied to NETtalk reveal that these different networks have similar internal representations of letter-to-sound correspondences within groups of processing units. This suggests that invariant internal representations may be found in assemblies of neurons intermediate in size between highly localized and completely distributed representations.