Complex Systems

Repeated Sequences in Linear Genetic Programming Genomes Download PDF

William B. Langdon
Computer Science,
University College, London,
Gower Street, London, UK

Wolfgang Banzhaf
Computer Science,
Memorial University of Newfoundland,
St. John's, A1B 3X5, Canada

Abstract

Biological chromosomes are replete with repetitive sequences, microsatellites, SSR tracts, ALU, and so on, in their DNA base sequences. We started looking for similar phenomena in evolutionary computation. First studies find copious repeated sequences, which can be hierarchically decomposed into shorter sequences, in programs evolved using both homologous and two-point crossover but not with headless chicken crossover or other mutations. In bloated programs the small number of effective or expressed instructions appear in both repeated and nonrepeated code. Hinting that building-blocks or code reuse may evolve in unplanned ways.

Mackey-Glass chaotic time series prediction and eukaryotic protein localization (both previously used as artificial intelligence machine learning benchmarks) demonstrate the evolution of Shannon information (entropy) and lead to models capable of lossy Kolmogorov compression. Our findings with diverse benchmarks and genetic programming (GP) systems suggest this emergent phenomenon may be widespread in genetic systems.