Untitled Document

PROCEEDINGS OF IEEE 5TH INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION, IEEE PRESS, NJ, 1998.

Evolutionary Robotics --- A Children's Game.

Henrik Hautop Lund, Orazio Miglino, Luigi Pagliarini, Aude Billard, Auke Ijspeert

Abstract--- We explore the concept of development without programming by children. Especially, we look at the case of developing robot control systems. The evolutionary robotics approach has shown that in some cases, given a mathematically described fitness function, it is possible to achieve an automatic development of robot controllers. However, it is questionable how one is to construct the mathematical fitness function. So we applied an interactive genetic algorithm to the problem of developing robot controllers and achieved an evolutionary robotics approach that allows children without any programming knowledge to develop controllers for LEGO robots. We used neural networks as robot controllers, and found that combining the interactive genetic algorithm with a kind of reinforcement learning -- development at the evolutionary time scale combined with lifetime development -- reduces the development time dras tically. Hence, we overcome one of the major drawbacks of the interactive genetic algorithm, namely the development time.

I. Introduction.

In recent years, we have shown different properties of the evolutionary robotics approach at the IEEE ICEC conferences. Initially, we [5] showed how to evolve robot controllers in simulation before transferring the evolved robot controllers to the real robot in the real environment and achieve very similar performance. In this work, we argued that for the simulationreality transfer it is advantageous to use control systems that allow generalisation over noise and other discrepancies between the simulator and the real world. Therefore, we used neural networks as control systems for the robot in that study. Secondly, we [1], [4] showed how it is possible to coevolve robot controllers and robot body plans. Some of this work was done in simulation only, because of the difficulties in designing selfmodifying hardware. However, we showed in [4] how to modify preamplifiers, synthesizers, and mixers in an auditory system for a mobile robot. Our argument was that one can coevolve robot controllers and robot body plans in simulation, build the physical robot according to the evolved robot body plans, download the evolved control system, and then observe a real robot behaviour similar to the one in simulation. At least, the simulator should be sufficient to describe the nearoptimal robot body plan for a specific task. The LEGO robot simulator that we describe in this paper will act as a test for this hypothesis. Finally, we [3] showed how to evolve sufficient neural network controllers for specific tasks, and we [2] designed an approach to evolve complex robot behaviours without a monolithic fitness function. This was achieved by designing an evolvable behaviorbased system, in which primitives for simple behaviours were evolved first, and then arbitrators were evolved to decide which primitives to perform at each given moment. In all these evolutionary robotics studies presented at the IEEE ICEC conferences and in our other evolutionary robotics studies, there is the underlying assumption that it is in some sense trivial to design mathematical fitness functions that can be used by the evolutionary algorithm to guide the development of robot controllers to task achieving behaviours. Yet, in some cases it might be difficult to think of the right fitness function before doing empirical testing, and some researchers [6], [8] do indeed suggest that some kind of shaping is necessary in the process of evolving robot behaviours to achieve some task. These difficulties lead us to propose the evolvable behaviorbased system, that combines the advances of evolutionary robotics and behaviorbased robotics in an attempt to avoid the necessity of describing complex fitness function. However, we still need to design mathematical fitness functions at some level. Further, the difficulties in describing adequate fitness functions (based on extensive studies of the problem domain), are seen by us as one of the main reasons why practically no researchers in the evolutionary robotics field have achieved truly complex robot behaviours, yet. Indeed, we will show here that most of the simple robot behaviours that have been developed by researchers in the evolutionary robotics field can be developed by children! Of course, given the right developmental tool.

II. Interactive Evolutionary Robotics

Since our goal was to allow children to develop robot controllers for their individual ``needs'', we could not use the traditional evolutionary robotics approach, since it would require the children to know how to design mathematical fitness functions. Another possibility would be to use a system such as LEGO Dacta that has been developed to ease children's access to development and play with motorised constructions. In the case of LEGO Dacta, the children have to program the control in *LOGO. As with any traditional programming language, this is difficult for children. However, it might be justified in the case of LEGO Dacta, since LEGO Dacta is an educational tool to be used in schools, and to give children both programming skills and knowledge about control. As a playtool concept, the approach of using traditional programming for controlling motorised LEGO constructions might fail. The reason for this is that children will have to go through a very long process in order to learn the programming language before being able to make just the most simple controllers. Even when the programming language is learned, there is the problem of typing and other errors, and debugging will have to take place. So it will take a lot of efforts before children can actually get down to playing with the motorised LEGO construction, if ever. The fact is that it is too difficult for children to control motorised LEGO constructions with traditional programming! This is even further worsened when children are to construct LEGO robots, since these have a more complicated environmentsensormotorcontrol system interaction. We will here describe an approach to the development of LEGO robot controllers that overcomes the problems with traditional programming. In fact, we have constructed a simulator so that there is no programming involved for the enduser, and therefore it is much easier for children to use. The general idea is an Interactive Evolutionary Robotics approach by which children can develop robot controllers in the simulator by choosing among different robot behaviours that are shown on the screen, and then, when they are satisfied with the simulated robot's behaviour, download the developed control system to the real LEGO robot and further play with it in the real environment. The interactive evolutionary robotics approach is inspired by our previous work using interactive genetic algorithms to evolve simulated robot controllers, facial expressions and artistic images (see e.g. [7], [9]). In this approach, there is no need of programming knowledge, since all the enduser has to provide is a specification of preference of the solu tions suggested graphically on the screen. Hence, there is no description of a fitness function, but the selection in the genetic algorithm is performed by the user.

III. Materials

Electronics

The electronic hardware was developed at the Department of Artificial Intelligence, University of Edinburgh. It is composed of the following items:

Control board. The control board is controlled by a Motorola 68K microprocessor. It has a serial input for the downloading of programs from a PC, a I 2 C entry for interacting with external sensor boards (light detection and infrared detection boards) and a predefined set of connections for mechanical switches and for the motors. The maximum output voltage of the board is 6 Volts.
Light detection board. This transforms the voltage of a light sensor (between 0 and 5 Volt) into a digital value between 0 and 255. A board has input connections for 4 light sensors.
InfraRed (IR) detection board. This board creates impulses of IR light and measures the intensity of the reflection of the environment. The intensity of the reflection is encoded into a value between 0 and 255. A board has input connections for 2 IR detectors.
Batteries. The control board is powered by a 6V rechargeable battery. One of the robots has also a standard 9V battery for powering light bulbs.

Sensors and actuators

Mechanical switches. LEGO mechanical switches are used to create touch sensitive bumpers.
Light sensors. These sensors are Light Dependent Resistors which react to the whole range of visible light (no specific wavelength). They have a large opening angle (approximately 130 degrees). These are not LEGO sensors.
IR detectors. These sensors are used with the corresponding board. They are composed of an emitter and a receiver of IR. As mentioned, they send impulses of infrared light and measure within an angle of approximately 70 degrees the reflection of the environment. This measure gives a combination of the distance and the reflectivity of an obstacle within its opening angle. These are not LEGO sensors.
Motors. 6V LEGO motors are used, as their voltage corresponds to that of the control board. These are old LEGO motors.
Light bulbs. Two LEGO 9V light bulbs.

Construction elements

The robots are built out of stanard LEGO Technics pieces. This means that except for the electronic boards and the light and IR sensors, the whole construction, including the chassis, the gears, the wheels, the bumpers, is made of LEGO.

A. Robot designs

We built a couple of LEGO robots for testing purposes, but because of lack of space, we will only report on one design here. The robot was a stable jeeplike vehicle with three wheels (Figure 1). It had two powered wheels for locomotion and steering; the third wheel was a passive trolley wheel. The gear ratio between the motor and the powered wheels (8 cm in diameter) was 1/75. The robot had a good acceleration and reached a high top speed.

Fig. 1. Schematic (upper) view of the LEGO robot. The robot had a jeeplike shape and received sensory information from 4 mechanical switches mounted within two bumpers, 2 ambient light sensors at its front, and 2 reflection detectors under the chassis. © Lund et al. 1997.

The sensory input came from four mechanical switches Mechanical switches Reflection sensors and four light sensors. The mechanical switches were mounted in two touch sensitive bumpers at the front and the rear of the robot. The bumpers were made of a lattice structure and continuously pressed the switches as long as there was no shock (Figure 1). This type of construction has the advantages to be light and to protect the switches from direct shocks. The sensitivity of the bumper was determined by the elasticity of the rubber band keeping the bumper in contact with the switches. Three directions of collision could be detected by each bumper, namely left, right and frontal collisions.
The robot was able to measure the ground reflection with two light sensors mounted under the chassis of the robot. Two reflection sensors were constructed by making a device out of black cardboard in which a light sensor and a light bulb were placed side by side and directed to the ground. The black cardboard was used to prevent direct lighting of the sensor by the bulb and from external light sources. These devices enabled the robot to distinguish beween black and white grounds in a large variety of external lighting conditions.
The robot had also two light sensors mounted horizontally at its front. The light sensors had an angle of approximately 80 degrees between them and enabled the robot to locate light sources in its environment.

Fig. 2. Photo of the LEGO jeeprobot. It has bumpers on the front and the back, and light sensors underneath and on the front. © Lund et al. 1997.

IV. LEGO Robot Simulator

As mentioned in [3], [5], it is possible to build an accurate simulator that allows very good transference from simulation to reality by basing the simulator on the robot's own samplings of sensor and motor responses. The disadvantage is that data has to be collected. In the construction of the simulator, this data had to be collected for the different sensors and different motor configurations. For instance, we had to measure the motor response for each individual LEGO robot design that we wanted to use in the simulator. This is the disadvantage of the approach.
The sensor and motor data was collected in a similar way to that described in [3], [5], and the collected data was put into lookup tables that is used by the simulator to look up specific sensory readings and displacements of the simulated LEGO robot.
The architecture of the control system that calculates the motor output given a sensory input in the simulated and real LEGO robots was a feedforward neural network that connected sensory input to motor output. The jeeplike LEGO robot had 8 input neurons (4 bumpers and 4 light sensors), and the output was coded in 6 output neurons. However, in some experiments, we used only 2 of the light sensors (those on the bottom of the LEGO robot), and then we have only 6 input neurons, 6 output neurons, and 6 bias units. In a total, this gave 42 connection weights.

V. Interactive Genetic Algorithm

In order to obtain a controller that guides the LEGO robot to perform a desired task, one will have to choose the right values for the 42 connection weights in the feedforward neural network control system. In a sense, this corresponds to finding a specific point (or region) in a 42 dimensional space. This is, of course, impossible for children (and most adults). So therefore, we used an interactive genetic algorithm to develop these connection weights. This algorithm allowed children to develop the connection weights (i.e. the control system) by choosing among robot behaviours shown on the computer screen. The interactive genetic algorithm in the simulator works as follows.
The simulator allows different population sizes, but let us here look at a population of 9 robots. These 9 robots are placed in their copies of the environment. The 9 similar environments are shown on the screen at the same time (see Figure 3). Initially, the 9 robots in the population (generation 1), may have connections weights generated at random or loaded from a set of previously saved connection weights. The first of the 9 robots is put into the first environment. The robot will produce reactions based on the connection weights of the neural network control system and the sensory inputs. These reactions (movements) of the robot are shown in the first environment on the screen. After this robot has run around for a given time, the second robot is put into the (similar) second environment, and the produced movements are shown on the screen. This is done for all 9 individual robots. Hence, at this moment, the child has seen the 9 robots moving around in each their environment one after another. Let us look at the example where the robots' neural network control systems are generated at random. Some of the robots will differ in their behaviour, since the connection weights in the neural networks differ. The child might like some of these behaviours and dislike others. Based on such an evaluation where the child shows preferences, the preferred robots in the population can be chosen to reproduce. The child can, for instance, choose 3 different ones, or choose the same robot 3 times.
In the reproduction phase, the 3 selected robots will be copied 3 times each in order to produce the next generation (generation 2) of 9 robots. However, as the neural network control systems are copied, mutation will be applied to 10% of the connection weights chosen at random. If a weight is selected, it will have added a value chosen at random in the interval [10,10]. Hence, the new generation of robots will be similar to their parents because they are copies, but they will also differ to some degree because of the mutation.
The new generation of 9 robots is shown on the screen, one robot after another, as was the case with the previous generation. The child can then again select the 3 preferred robots and they will reproduce through copy and mutation to make the next generation (generation 3). This loop can continue until the child is satisfied with the obtained behaviour of the LEGO robot. It is important to note, that the way the LEGO robots change is based on the child selecting which ones to reproduce according to the child's individual preferences, and based on the mutations.

Fig. 3. The population of 9 robots placed in each their (similar) environment. Here, the environment contains a number of round obstacles. The trajectory of each robot is shown in each of the 9 environments. The figures represent generation 1, 3, 5, and 7 of an evolutionary process. Here, the user is able to evolve successful obstacle avoidance for the LEGO robot within 7 generations. © Lund et al. 1997.

Another way to change a LEGO robot's behaviour is to develop a single LEGO robot's behaviour rather than develop on a whole population. If this option is chosen, an enlarged image with the environment and robot will appear on the screen (see Figure 4). The robot will start moving around in this environment, just as it did when the whole population was presented. Now, if the child dislikes a movement of the robot (e.g., the robot gets stuck when hitting into a wall), the child can press a "Bad" button. This will immediately change the underlying neural network control system of that robot. But, in this case, the change does not happen at random. The system keeps a record of the last input to the robot, and it is the connections weights going out from the input neurons that were activated that are changed (with random portions). In the case where the robot hits into the wall with the front left bumper, it will be the connection weights going out from the input unit for front left bumper that are changed. Hence, the reaction upon a bump with the front left bumper is changed. Each time the "Bad" button is pressed, there is also a slight chance of the bias connections being changed, resulting in a change in the default behaviour.

Fig. 4. The enlarged environment in which the child can train a single LEGO robot through reinforcement learning. The control system of the LEGO robot is changed every time the "Bad" button is pressed. In this environment there are two lines on the floor and 3 round obstacles. © Lund et al. 1997.

When the child is satisfied with the behaviour obtained in the teaching process, the child can return to the whole poplation. The trained robot keeps the trained network, and can then, eventually, be selected to reproduce. In this way, we allow a Lamarckian evolution that consists of characeristics being learned during lifetime being inherited over evolution. We definitely do not support such a theory for natural evolution, but it has proved to be a very powerful tool as a means for fast development of LEGO robots.
In a sense, what this type of reinforcement learning does is to make a guided search in specific regions. In the case with the bump with the front left bumpers, the algorithm performs a search on the weights connecting that input unit to the output units, and the search is made only on these weights. It seems fast to find a solution, because the dimension of the search space has been limited. This is not a traditional reinforcement learning algorithm, but we found that it works very well and quickly, at least for the simple tasks of the LEGO robots. In order to speed up development time, we have also made some preevolved populations of robot controllers that the children can choose from as the starting population. After having chosen the specific environment to play with (Figure 5), the children can then choose 1 out of 5 different populations of robots as the starting population. These populations of neural network controllers have been evolved using the same simulator, but with a fixed fitness formula for each behaviour, and hence have (previously) run automatically for 100300 generations. With these preevolved populations, children can start the development from a certain starting point that includes some of the characteristics of the robot that the child might like to evolve. This is instead of starting from a randomly chosen starting point, from where it might take long time to develop something sensible and interesting for the individual child.

Fig. 5. The initial screen in the LEGO robot simulator. Here, children can choose between 5 different environments. © Lund et al. 1997.

The selection of individuals from the preevolved population of 100 neural network controllers works as follows. When creating the initial population, there is a 1% probability of taking a preevolved neural network (from the 100 preevolved in that population) against 99% probability of creating a neural network controller with random weights. The same thing happens when doing user guided reproduction. The user chooses the, for example, 3 LEGO robots (neural networks) to reproduce. When making reproduction (+ mutation), in most cases (99%) the algorithm copy the neural network chosen by the user and mutate it, while in 1% of the cases, instead of making mutation, the algorithm picks up a preevolved neural network from the above mentioned preevolved population.

VI. Experiments and Discussion

We performed a series of experiments with the interactive evolutionary robotics approach described above, and had many children play with the LEGO robots. As expected, it turned out to be very simple to evolve exploratory behaviours, different kinds of obstacle avoidance, line & wall following, etc. and combinations between such simple behaviours. Figure 3 in the previous section showed the development of obstacle avoidance in a cluttered environment within only 7 generations. Hence, within minutes, children can develop a wide range of the robot behaviours that has traditionally been evolved in the field of evolutionary robotics. This is due to the use of the interactive evolutionary robotics approach combined with reinforcement learning. Indeed, with this approach children do not need to have any programming skills whatsoever! All that is demanded is that the child specify individual preferences in order to develop robots according to their individual taste. As the simulator is built now, even though children can develop a wide range of their own robot behaviours, there is a limitation, since we as designers have provided 5 specific environments and a specific neural network topology. Future extension might involve development of neural network topology and allowing children to design their own environments in the simulator, as they do in the real world. From an engineering point of view, observing unskilled children developing robot behaviours to the same level of complexity of what researchers have achieved so far in the evolutionary robotics community must in some way be frustrating for these researchers (including ourselves).

Acknowledgements.

Materials and facilities were provided by the LEGO Group, the Danish National Centre for IT Research -- CIT, the National Research Council, Rome, Italy, and the University of Edinburgh.

References

[1] W.P. Lee, J. Hallam, and H. H. Lund. A Hybrid GP/GA Approach for Coevolving Controllers and Robot Bodies to Achieve FitnessSpecified Tasks. In Proceedings of IEEE Third International Conference on Evolutionary Computation, NJ, 1996. IEEE Press.

[2] W.P. Lee, J. Hallam, and H. H. Lund. Applying Genetic Pro gramming to Evolve Behavior Primitives and Arbitrators for Mobile Robots. In Proceedings of IEEE Fourth International Conference on Evolutionary Computation, pages 501--506, NJ, 1997. IEEE Press.

[3] H. H. Lund and J. Hallam. Evolving Sufficient Robot Controllers. In Proceedings of IEEE Fourth International Conference on Evolutionary Computation, NJ, 1997. IEEE Press.

[4] H. H. Lund, J. Hallam, and W.P. Lee. Evolving Robot Morphology. In Proceedings of IEEE Fourth International Conference on Evolutionary Computation, NJ, 1997. IEEE Press. Invited paper.

[5] H. H. Lund and O. Miglino. From Simulated to Real Robots. In Proceedings of IEEE Third International Conference on Evolutionary Computation, NJ, 1996. IEEE Press.

[6] S. Nolfi and D. Parisi. Evolving nontrivial behaviors on real robots: A garbage collecting robot. Robotics and Autonomous Systems, 1997. To appear.

[7] L. Pagliarini, H. H. Lund, O. Miglino, and D. Parisi. Artificial Life: A New Way to Build Educational and Therapeutic Games. In C. Langton and K. Shimohara, editors, Proceedings of Artificial Life V, pages 152--156, Cambridge, MA, 1997. MIT Press.

[8] S. Perkins and G. Hayes. Incremental acquisition of complex behaviour using structured evolution. In Proceedings of the Inter national Conference on Artificial Neural Networks and Genetic Algorithms '97. SpringerVerlag, 1997. To appear.

[9] V. Vucic and H. H. Lund. SelfEvolving Arts --- Organisms versus Fetishes. Muhely Hungarian Journal of Modern Art, 104:69--79, 1997.