Research Report AI-1991-01 Artificial Intelligence Programs The University of Georgia Athens, Georgia 30602 Available by ftp from aisun1.ai.uga.edu (128.192.12.9) Series editor: Michael Covington mcovingt@aisun1.ai.uga.edu Research Report AI-1991-01 SALMON: a TEMPERAMENTAL Program that Learns Gregg H. Rosenberg Artificial Intelligence Programs The University of Georgia Athens, Georgia 30602 U.S.A. 1 GREGG H. ROSENBERG SALMON, a TEMPERAMENTAL Program That Learns A new approach to machine learning is introduced that utilizes semantic information in a connectionist network. The approach is implemented in a program that learns to act appropriately in the dynamic environment of a children's game of tag. The model is interesting in several respects including the ability to begin with no connections and then make and break them according to its experience, the ability to adjust the weights on its connections and the ability to interact with its environment. 2 1.0 Introduction This thesis introduces a new approach to machine learning that uses aspects of semantic and connectionist systems. The name of the programming technique is TEMPERAMENTAL programming. TEMPERAMENTAL stands for The Effective Management of Process Evolution and Response in an Associative Memory Emulating a Neural Type Action Language. The algorithm has been implemented successfully as a program named SALMON (Semantic Action Learner Modeled On Neurons) that learned an appropriate strategy for a simulated game of tag from watching the other participants. The main features of interest in a TEMPERAMENTAL program are: 1) The system is a self-organizing learner which uses experience in a dynamic environment to learn how to react appropriately in it. 2) The learner begins with only unconnected semantic nodes then makes and breaks connections between them based on what occurs in the environment. 3) The learner adjusts the weights of connections to conform to its experiences in the environment. 4) Both the nodes and connections have semantic meaning, but the information passed through the connections is numerical la the connectionist paradigm. 5) Despite its semantics, the algorithm is potentially as massively parallel as other connectionist schemes. Sections two and three describe TEMPERAMENTAL programming. Section two contains information concerning the components of a TEMPERAMENTAL system, and how TEMPERAMENTAL programs learn. Section three details how activation is spread in a TEMPERAMENTAL network and how the results are interpreted. Sections four through six deal with different aspects of the implementation, section seven analyzes the results obtained from the implementation, and section eight is a general discussion of issues raised by TEMPERAMENTAL programming and possible directions for future work. 2.0 TEMPERAMENTAL Programming TEMPERAMENTAL programming is a hybrid scheme which uses the interpretive power of semantics and the pattern recognition and parallelism of connectionist strategies to learn about a dynamic environment. It connects nodes with semantic meaning using links that also have semantic meaning, but weights the links numerically and has them activate the nodes in parallel. Once trained, the activation and the semantics cooperate by combining their knowledge to decide upon an action given the current circumstances. The key ideas taken from 1 2 connectionism are parallelism and communication through numeric activation rather than semantic message passing. The key ideas taken from semantics are consistent interpretation of symbols and procedural interaction with an environment. 2.1 Node Types There are four types of nodes in a TEMPERAMENTAL system: objects, dynamic attributes, actions and action attributes. 1) Objects each object in the environment is assigned a node with a unique label. Objects have a full repertoire of connectionist behaviors, these being the reception, accumulation, and passing along of activation energy. 2) Dynamic Attributes Each environment has a set of attributes, like a particular location, that an object may or may not possess. These are called dynamic attributes and a node with a unique label is assigned to each attribute. Dynamic attributes act as conduits for activation from objects to other nodes but do not accumulate activation themselves. 3) Actions An action is a change to an attribute or the value of an attribute of one or more objects. Each action in the environment is given a node with a unique label. Like objects, action nodes have the full repertoire of connectionist behaviors including reception, accumulation and transference of activation. In addition, action nodes have a procedure and a function associated with them. The function accepts an object and determines if the action is enabled with respect to that object. The procedure contains the semantic instructions for performing the action when the node is activated. 4) Action Attributes Many actions can be performed in more than one way. For instance, a player in a game of tag must decide not only to flee, but also in which direction to flee. The direction can be thought of as an attribute of the action. One node with a unique label is assigned to each possible attribute of each action defined. The attribute is explicitly bound to the action it is responsible for helping to perform. One attribute may belong to many actions. The action attributes accumulate activation but do not transfer it. 2.2 Connection Types Connectionist networks normally define their connections by the type of activation that they convey. Thus the connections are either inhibitory or excitory. Semantic systems usually define their connections by the type of relationship that they represent. Thus, we have IS-A links, HAVE links, MEMBER links and so forth. TEMPERAMENTAL programs do both. Each connection is assigned a semantic meaning used in making the connection and in adjusting its weights. It is also assigned a role in spreading activation that defines it as either a companion, excitory, or enemy connection. Finally, each connection is 3 assigned a directionality. Being a companion or an enemy connection in a TEMPERAMENTAL system is not the same thing as being inhibitory or excitory in a normal connectionist network. Companion connections between two nodes represent, with a certain strength, the possibility that the two nodes will co-activate. Enemy connections represent the possibility, with a certain strength, that two nodes will not co- activate. Therefore, whether a connection passes along an inhibitory or an excitory signal depends not only on what kind of connection it is, but also on what kind of activation the node received. If a node receives an excitory signal then it would send an excitory signal along its companion connections. But if the signal received by the node was inhibitory, then it would send inhibitory signals along its companion connections. In general we can think of the activation received by a node as containing a message. Excitory activations contain the message that, "You are my friend." Inhibitory activations contain the message that, "You are my enemy." The goal is not to pass the form of the message along intact, but rather the content. Therefore, we can follow these old proverbs: "The friend of my friend is my friend." "The enemy of my friend is my enemy." "The friend of my enemy is my enemy." "The enemy of my enemy is my friend." After running experiments with the system I found that it was absolutely necessary for there to be strictly excitory connections as well as companion connections. The justification can be understood by imagining this situation: Let's say that a friend of yours has the peculiar habit of walking backwards every once in a while. After he walks backwards he always walks forwards again. Now, of course you will get an expectation that he will walk backwards from seeing him walk backwards, but from seeing him walk forwards again you also will get a negative expectation (called an exclusive connection in TEMPERAMENTAL programming) that he will walk backwards (these connections are talked about soon). Further, he always walks forwards after he walks backwards so you get a very strong positive expectation that this will be the sequence of events. Now, since he does not walk backwards very often, your negative expectation for his walking backwards grows very strong, stronger than your positive expectation that he will walk backwards. When wondering whether your friend will walk forwards or backwards, which will you say? Intuitively, you will say that he will walk forwards. But imagine that the expectation for the sequencing is a companion connection as outlined above. There is a very strong negative expectation that gets fed to walking backwards, and walking backwards has a very strong companion connection to walking forwards through sequencing. This companion connection is a path that will convey this strong negative expectation about walking backwards to walking forward. Because the weight of the sequence connection is so strong the negative energy is not diminished very much as it moves along the path, and it may even 4 be enough to cause you to have more belief that he will walk backwards than you do that he will walk forward. That is all wrong! Just because you don't believe something will happen doesn't mean that you also believe what usually follows it won't happen. You remain neutral regarding what will follow it. If the preceding action A occurs, then it raises your expectation that the following action B will occur, but the unlikelihood of A merely means that B will have to get activation from some other source if it is to compete with your other expectations. Therefore, your sequencing expectations are not companion but strictly excitory. Another way of putting it is to remember that companion connections represent co-activation, but sequences are serial activations, not co-activations. Therefore, sequence expectations are not companions. In the following discussion the terms "giver node" and "giving node" refer to the node which passes activation along in the connection. The terms "receiver node" and "receiving node" refer to the node which acquires activation from the connection. In two-way connections both nodes are givers and also receivers. 1) Expectation Connections Expectation connections are formed between objects and the actions that they perform. The heuristic is, if an object is performing an action then establish an expectation connection between the object and the action. Expectation connections are two-way and companion. 2) Exclusive Connections Of course, sometimes attributes an object possesses make it less likely to perform an action. Exclusive connections try to capture these kinds of relationships. Formation of exclusive connections is motivated by an object changing the action it is performing. When an object stops performing an action, make an exclusive connection between each attribute the object had at the moment it stopped and the action no longer being performed. Also make a connection between the object and the action. The idea is that the attribute responsible will eventually emerge as the connection is re- enforced through the attribute's continual presence when the action is not being performed and absence when it is. 3) Focus Connections The idea of a focus is currently ill defined. Many actions seem to have them, though. If one object is pursuing another, the focus of the action is the object being pursued. However, a focus relation is not symmetric. A zebra may flee a lion even though the lion is not concerned with the zebra at all; thus the lion is the focus of the zebra's action but the zebra is not the focus of the lion's action. Vaguely, we can think of the focus of an action as being the object which motivates it. The focus object of giving flowers is a loved one, the focus object of driving to work is the place of employment, etc. In the performance of any action with a focus there are at least three nodes involved, the performer, the focus object and the action. The TEMPERAMENTAL system makes a focus connection between the performer and the focus object. The connection is two-way companion. 5 The TEMPERAMENTAL system also makes a connection from the action to the focus object. The connection is two-way companion. One reason the system makes separate connections for the action and the performer is because it is naive. It does not know if this relationship holds because of the focus object's relationship to the performer or because of some special relationship to the action, or both. For instance, the net must be able to capture fixations one object might have for another without contaminating the connection for the action in general. By separating the focus connections we are able to maintain a strong relationship between the performer and the focus object without the focus object becoming a fixation of the action also. Another reason for making the connections separately is to reduce the number of connections the system has to make. The number of connections that need to be made are roughly exponential to the number of nodes involved in the connection. Full connectivity between four objects and five actions in triplet form would require 80 (4 5 4) connections; however, in pairs of doubles the same set of associations is represented with only 36 connections (4 4) + (5 4). Finally, representation as pairs of doublets can be defended on the grounds of ineffectiveness. More complicated connections would work too well. A learner that remembered in triplets, or quadruplets, or whatever is necessary would always know exactly what to expect. Why even have a scene represented by sets of connections? Why not have the entire scene captured in one big beautiful connection? The reason is that no learner ever actually does learn that well. The doublets crystallize the correct expectation or response much of the time, but we cannot be sure they will do so. 4) Using Connections Many actions require tools or use objects in some other way. The TEMPERAMENTAL system should make connections to these tools. Therefore, a two-way companion connection is made between the object performing the action and the tool being used. Also, the system makes a using connection from the action being performed to the object being used. The connection is two-way. The connection from the user object to the used object, and the connection from the action to the used object are separated for the same reasons given for separating the focus connections. 5) Sequence Connections If an object is performing an action A one moment and an action B the next, then form a sequence connection from action A to action B. The sequence connection is one- way and strictly excitory from A to B. 6) Dependency Connections Action attributes depend on the attributes of the objects involved. Therefore, in a situation where object O1 is performing action A1, using O2 with focus O3, make dependency connections between A1's current attribute and the attributes of O1, O2 and O3. The dependency connections are from dynamic attributes to action attributes, and the connection-making strategy is naive. For each object involved, make a dependency 6 connection from each attribute of the object to each current attribute of the action. 7) Cooperative Connections Each action has a function attached to it that decides if any given object is able to perform an action. Cooperative connections are formed based on the following rules. a) If the action possesses attributes, then the object is enabled or disabled with respect to action/attribute pairs, and not just the action. b) If an action (or an action/attribute pair) is disabled with respect to an object one moment and enabled the next, then make a cooperative connection from the disabled object to the action the object was just performing. Again, the connection making strategy is naive and the connection is one-way companion from the newly enabled action to the actions which the object was involved with. 8) Competitive Connections Competitive connections are the opposite of cooperative connections. If an action was enabled with respect to some object and then becomes disabled, make competitive connections between the newly disabled action and the action the object was just performing. The connection is one-way enemy from the newly disabled action to the action suspected. Cooperative and competitive connections are meant to convey information from an action to other actions about the likelihood that the other action's occurrence will enable/disable the action sending the activation. In real environments, in addition to actions being disabled and enabled by other actions an object performs, they may also be affected by actions that the object is a focus of or used by, and also by circumstances arrived at indifferently to the object. Presently, TEMPERAMENTAL programming does not take into account this complication. 9) Connections from dynamic attributes to other nodes Whenever a connection is made between an object O and another node, then a connection is also made between each dynamic attribute of O and the node. 2.3 Learning By Making and Breaking Connections As mentioned, all the strategies for making the different kinds of connections are naive. They will each result in many spurious connections between nodes. This is done on principle. A connection is made between nodes only if one does not already exist, and if a connection does not already exist then the system has no way of knowing, a priori, that it is spurious. TEMPERAMENTAL programming balances its naivet with methods for discovering and breaking the spurious connections as it gains experience. Additionally, the technique for spreading activation includes mechanisms to safeguard against undiscovered spurious connections achieving an inordinate amount of influence. Finally, the TEMPERAMENTAL system has more than one learning mode, and the rules for making connections are applied 7 differently in each. The learning modes are characterized as "Beginner" and "Expert," but do not take the names to mean that they model how beginners or experts actually learn. They just denote that one method is used early and the other later in the learning cycle for the TEMPERAMENTAL program. Originally, they differed in two important respects. They differed first concerning the level at which they consider a connection to have degraded and gone bad, and secondly they differed in how they treat a connection that has degraded but is a candidate for being reconsidered. After running experiments I found that a rudimentary ability to discover and remove irrelevant connections that maintain strong weights was desirable. The system was extended to incorporate this ability in expert mode. After a connection is made, the weight on it is continually being adjusted to reflect the system's experience in its environment. Experience then causes some connections to strengthen and others to weaken. When a connection becomes too weak it is considered to have degraded and is broken, but it is saved by the system in a list of bad'' connections. In Beginner mode, the system sets a relatively low level for a connection to be broken. In addition, if the TEMPERAMENTAL system subsequently has a reason to form the connection again, it is automatically taken off of the bad'' list and re- established at its previous strength level. In Expert mode the system is more skeptical about its connections. The level at which a connection is considered degraded and broken is higher and it is not as easy to retrieve a connection from the bad'' list. A confidence function is defined, and a connection is only retrieved from the bad'' list if the system has confidence in it. Confidence is based on the experience the system has working with the receiver node of the connection. Specifically, the confidence function is: c = Bias Signal If c > 1 and the signal was excitory, then the system has confidence in the connection, or if c < 1 and the signal was inhibitory, then the system has confidence in the connection; otherwise, the system does not have confidence in the connection. The signal is the value received by the giver node and that is being considered for transference to the receiver node. The Bias is given by: Bias = log10(CurrentTotal)/log10(HighestTotal) where CurrentTotal is the total number of opportunities on the weight records of the connection under consideration, and HighestTotal is the highest number of opportunities from other connections from the same kind of node (object, action, dynamic_attribute, etc) that are also currently relevant to the receiver node. The log10 of these totals is taken to flatten the shape of the bias. Only large differences should play a significant role in shaking confidence. Differences within an order of magnitude do not matter a great deal. For example, in the game of tag we may define a dynamic attribute for objects that specifies their current distance from the tagged player. A dependency connection from the dynamic attribute of distance to flee's action attribute for direction will be made 8 whenever an object is fleeing from that distance. We may represent the connection this way: connection(dependency(da, aa), distance(21, 23), [flee, change(minusminus), [[frequency, 3, 3], [recency, 1]]). Distance(21, 23) means the player is between 21 and 23 units from the it, [flee, change(minusminus)] is the receiver node and means that the flee action should be in the minus X direction and the minus Y direction, and the last spot contains the information for calculating the weight of the connection. The frequency means that for 3 out of 3 times that an object has been at that distance and fled, it has fled in the minusminus direction. Now let us imagine there is another connection currently relevant to the flee action. This signal comes from direction(plusplus) and goes to the attribute change(plusplus). The weights are [[frequency, 450, 578], [recency, 1]]. Naively, a connection of 450 out of 578 is weaker than a connection of 3 out of 3. The question is, should we accept the weights naively? The confidence function does not. The fact that one connection has only had 3 chances to be related to the flee attribute while the other has had 578 chances should be taken into account. It is likely that the perfect correlation between the former weight was just coincidence resulting from the smaller sample size. We use the larger total as a standard to bias the multiplier. Therefore, we obtain a bias Bias = log10(3)/log10(578) = 0.172 The bias is used to obtain the confidence for the activation signal. Say the signal is 1.4, then c = 1.4 0.172 = 0.24 Since 1 is the neutral signal, the bias has changed the signal from excitory to inhibitory and there is no confidence in the connection. Experiments turned up the need for a test for spuriousness independent of the strength of the connection or the confidence in it. Some connections from attributes to other nodes will remain strong because there are only a small number of actions to be performed, and the attribute has no effect on performance. Since the attribute has no effect on performance, the connection simply records the independent tendency of the system to perform the particular action and this tendency may be very high. Therefore, the connection never becomes weak and never degrades. Additionally, it may be an attribute that the objects possess quite frequently, thus making its opportunities high, enabling it to pass the confidence test. To catch these kinds of connections the TEMPERAMENTAL system implements a relevancy test for keeping a connection when in Expert mode. Because it was added late, the mechanism is quite crude and only tests for the relevancy of connections between dynamic attributes and actions. However, I believe that natural extensions to the present theory 9 would smoothly allow more sophisticated tests1. The relevancy test hinges on the variation of connection strength between different values of a dynamic attribute and the action being considered. A connection between a dynamic attribute's value and an action is relevant if it satisfies at least one of these criteria: 1) The frequency on the connection for this value of the attribute varies significantly from the average frequency for connections between the other values of the attribute and the action. 2 a) Some values of the attribute are not connected to the action, b) the unconnected values possess some connections to some actions, and c) the system has confidence in at least one of these connections. 3) The dynamic attribute has only one value. The explanation for 2 is simple. If only some values of an attribute are connected to the action, then the system must decide if this implies that the connection from the attribute value under consideration to the action under consideration is, indeed, significant. To decide, it checks to see if the unconnected values have been encountered before. If they have been encountered frequently enough that the system has gained confidence in one of the connections from them, then the system assumes that the reason the unconnected value is unconnected is because the value of the dynamic attribute does, indeed, make a difference. On the other hand, if it has no confidence in any connections to the unconnected value (meaning that it does not have much experience with it), then it makes the assumption that if the connection existed, then it would not make a difference to the final average. If a connection is deemed irrelevant, it is put on a list of irrelevant connections and can never be retrieved. To recap, in Beginner mode the level for degradation of connections is set relatively low and connections are retrieved from the bad'' list readily. In expert mode, the level for degradation is raised and a connection is retrieved from the bad'' list only if the system has confidence in it, and a test for the irrelevancy of connections between dynamic attributes and actions is implemented. These two strategies applied serially are generally enough to preserve the good connections and to remove the spurious ones. 1 The extensions that I have in mind include establishing category nodes for dynamic attributes and relations between dynamic attributes. The category nodes would help control the flow of activation and are something I planned to add to a more sophisticated system well before considering the relevancy issue. It is fortuitous that they should be able to help with relevancy also. 10 2.4 Learning by Adjusting Weights Unlike many other connectionist systems, the weights on connections never settle into a learned state but are continually recalculated from moment to moment to reflect the learner's experiences in the environment. The weights are calculated from the average of frequency and a recency function, or w = (fq + f(rc))/2 The frequency portion of the weight also consists of two measures. The first measure is of the frequency of success of the connection, and the second measure is of the number of opportunities to succeed that the connection has had. Remember, each connection has a semantic meaning, and the meaning denotes a relationship between the nodes connected (focus, using, expectation, etc). We use this semantic meaning to determine if the relationship between the nodes is satisfied in the current moment. If it is, then we increment the success portion of the frequency weight. We can also determine if the current moment contained situations in which there was an opportunity for the relationship to be satisfied. We then increment the opportunities portion of the frequency weight for each opportunity found. The frequency is then simply: fq = Successes/Opportunities A connection is degraded if its frequency function falls below a set level. The level is set by the learning mode, Beginner or Expert , that the TEMPERAMENTAL program is in. While the frequency function reflects the overall trends in the environment, TEMPERAMENTAL systems can also be biased towards the recent past. If a relationship denoted by a connection is currently satisfied, we shall represent this by assigning the connection a recency weight of 1. If the relationship could have been satisfied but was not, then the recency weight will be incremented by 1 from whatever its current level is. Thus, if a relationship is currently satisfied its recency goes to 1, and then if it goes 3 opportunities without being satisfied the recency will be at 4. A limit on the amount recency may be incremented is established, and the value associated with each recency is between 0 and 1. The values are stored in a table with the number 1 having the highest value and the values then decreasing sharply downward. The table used by SALMON had the values recency_table(1, 0.80). recency_table(_, 0.65). The value given by 1 must be below 1.0 so that perfect correlations in the frequency function do not endlessly cycle through the network. I tried differences greater than 0.15 between the high value and the low value, but they seemed to cause the system to continually repeat it's last action. 0.80 seems to work well, at least for this particular implementation. The other weight was similarly hand-tuned. As it turns out, the rule governing what values should be in the table is the obvious one they should reflect the effect recency has in the environment to be learned. Since the simulation I wrote to act as an environment for SALMON only biases its players towards their most recent behavior without regard for more than the immediate past, only the value for 1 should be biased 11 in the learner. Allowing many more levels of "memory" by adding to the table produces a bias towards recent behavior not found in the simulation. 2.5 Adjusting Particular Connections 1) The Expectation Connection The expectation connection is satisfied if object O is performing the action expected. An opportunity occurs if the object is enabled to perform the action. 2) The Exclusive Connection An exclusive connection between attributes and actions is a success if the object is not performing the action, is enabled to perform the action, and has the attribute. An opportunity for an exclusive connection occurs whenever the object has the attribute, was previously performing the excluded action, and is currently enabled to perform it. Successes and opportunities between objects and actions in exclusive connections are judged similarly, except the judgement is made without regard to attributes the object may possess. 3) Focus Connections The focus connection between objects is satisfied between O1 and O2 if O2 currently is the focus of O1's action. There is an opportunity if some object is the focus of O1. The focus connection from an action to an object is satisfied if the object is currently the focus of the action. An opportunity exists if the action is being performed. Since an action can be multiply instantiated, the adjustment must take into account each instantiation. 4) Using Connections A success in a using connection between objects O1 and O2 occurs if O1 is using O2. An opportunity occurs if O1 is using something. A success for using connections between objects and actions occurs for an action A and an object O2 if the action A is currently using O2. An opportunity occurs if the action is being performed and is using something. 5) Sequence Connections A success for a sequence connection from A1 to A2 occurs if an object is performing A1 one moment and A2 the next. An opportunity occurs if an object was performing A1 at the previous moment. 6) Dependency Connections A successful dependency connection occurs when an object is performing the action with the specified attribute, and either it has the dynamic attribute or the focus of its action has the dynamic attribute. An opportunity occurs if an object is performing the action and either the object or its focus has the attribute. 7) Cooperative Connections A success for a cooperative connection occurs if the giving action in the connection becomes enabled with respect to an object after being disabled, and the object just performed the receiving action. An opportunity occurs simply if the giving node was disabled with respect to an object and the receiving action was performed by the object. An alternative way to count an opportunity would be if the giving node changed states from disabled to enabled. This alternative method captures the likelihood that a change in state is caused by the receiving node, whereas the method used captures the likelihood that the receiving node being performed will cause the change in state. Since the 12 connection is used by the giver to enable itself, the method used was deemed to provide the more valuable information. 8) Competitive connections A competitive connection is a success if the giving action was enabled with respect to an object, and became disabled after the object was performing the receiving action. An opportunity is occurs if the giving action was previously enabled and then becomes disabled. 3.0 Spreading Activation Once the TEMPERAMENTAL learner has had a significant number of moments observing the environment he may participate. Participation in an environment occurs by inputting activation into the connectionist network it built as an observer and letting the activation spread through the nodes. Before spreading activation, all nodes have their activation levels returned to a start'' point. There is no need to save the activation levels from previous computations because the result is captured in the recency weights on the connections. The input nodes into the net are the ones corresponding to the objects presently in the environment and the actions that they are performing. Thus, the nodes input into are not fixed, but rather change with every moment. This is in stark contrast to many other connectionist architectures, particularly those which use back-propagation, which are arranged hierarchically with fixed input and output nodes. Each node passes activation along its connections to other nodes and uses the information on the connections to determine a transfer value for the activation. Following is a discussion of the factors used to calculate the transfer value. 3.1 Relevance of a connection Nodes form many connections, not all of which are relevant at any given time. Before a connection can transfer a signal from one node to another, it must pass a relevance test. A connection is relevant if one or more of the following are true: 1) At least one of the nodes is the relevant object. 2) The connection is a sequence connection and the relevant object is enabled with respect to the giving node. 3) At least one of the nodes is a dynamic attribute of the relevant object. 4) The connection is not an expectation or exclusive connection and at least one of the nodes is currently the focus of the relevant object. 5) The connection is not an expectation or exclusive connection and at least one of the nodes is a dynamic attribute possessed by an object that currently is the focus of the relevant object. 6) The connection is an expectation or exclusive connection, at least one of the nodes is currently the focus of the learner, and the focus is enabled with respect to the action involved. 7) The connection is an expectation or exclusive connection, at least one of the nodes is a dynamic attribute possessed by an 13 object that currently is the focus of the learner, and the focus is currently enabled with respect to the action involved. 8) The relevant object is currently the focus of at least one of the nodes. 9) At least one of the nodes is an attribute of an object which currently has the relevant object as its focus. The relevant object is usually the learner but may be changed dynamically to suit the system's purpose. Detailed discussion of tests 4 - 7 is deferred until section 9. Activation is input into the network as excitory through the nodes corresponding to each object and each action present in the environment. The point of view of the learner is achieved by biasing the input into the nodes representing itself and the action it is currently performing. The bias multiplies the normal input level. 3.2 The Transfer Function Each input represents a signal which increases the activation level of the node receiving it. Signals above one are excitory and signals between 0 and 1 are inhibitory. Every time a signal passes through a connection, it is moved closer to the neutral value. Eventually all activation stops as the signals expend their energy moving through connections. The transfer function takes the values stored in the connection as arguments and determines how much energy the signal will expend traversing the connection. When an activation expends so much energy traversing a connection that it falls within a defined neighborhood of 1, then it is exhausted and is not transferred. All signals are input as excitory. If an excitory signal must pass through an enemy connection, then it becomes inhibitory. If M is the excitory value of the signal, then 1/M is the inhibitory value. Conversely, an inhibitory signal passing through an enemy connection becomes excitory (see section 2.3). If M is the value of the inhibitory signal, its corresponding excitory value is 1/M. Otherwise, the signal maintains its type. We define for each connection a raw weight. The raw weight of a connection is the average of the frequency function and the recency function given by rw = (fq + f(rc))/2 (see section 2.4) The actual weight of the connection is dependent on the type of signal being transferred. If the signal being transferred is inhibitory then the actual weight is given by actual = 1/rw Otherwise, actual = rw The signal received is multiplied by the actual weight of the connection, which always moves the signal closer to 1 than it was before passing through the connection. If it becomes too close to 1 then the signal is not transferred. An additional check is made by passing the signal and connection to the confidence function. The confidence function bias may move the activation even further towards the neutral level. Recall, the confidence bias is given by 14 Bias = log10(Giver)/log10(HighestRelevant) where Giver is the opportunity value on the connection the signal is passing through, and HighestRelevant is the highest opportunity of any connection currently relevant to the receiver node, the reciever node is also receiving in the HighestRelevant connection, and the giver node in the HighestRelevant connection is of the same type as the giver node in the Giver connection (See section 2.3 for a detailed discussion of the confidence function). If the signal fails the confidence test then it is not transferred. When signals are passed through object nodes, three things occur. 1) The object's activation level is multiplied by the signal received. 2) The signal is transferred to all relevant nodes connected to the object. 3) The signal is transferred to all relevant nodes connected to the object's dynamic attributes. When activation is passed through an action, two things occur. 1) Its activation level is multiplied by the signal. 2) The signal is transferred to all relevant nodes connected to it. Dynamic attributes do not have activation levels. They merely act as conduits for signals from objects to other nodes. The transfer function for a connection involving a dynamic attribute is exactly the same as for connections involving any other kind of nodes. There is no penalty as the signal moves from the object to its dynamic attributes, but the transfer function is applied as the signal propagates from the dynamic attribute to other nodes. Currently, a signal passing through a dynamic attribute is finagled sometimes. More than one object may possess any given attribute at any given time. If many objects with relatively weak activation levels possess an attribute, the nodes connected to that attribute receive a great deal of activation. Generally, this is enough to overpower the effect of any dynamic attributes represented by only a single object. However, this group effect is often not desirable, as the system is really concerned only with the effects associated with individuals and does not care how they are grouped. To correct this grouping effect, whenever a signal is passed through a multiply instantiated dynamic attribute the system biases the signal in the following way divide the activation level of the object the signal is emanating from by the highest activation level of the other objects with the dynamic attribute, and then multiply the signal by this ratio. That is, if object O is passing a signal M through its attribute DA, then GroupingBias = activation level(O)/activation level(O2) 15 where O2 is the most highly activated object possessing that attribute. In principle, a multiply instantiated attribute could still greatly affect the result obtained in the network, but in practice activation levels quickly diverge so only the activation from the most highly activated possessor has much influence. Of course, sometimes a group effect is appropriate. This solution is temporary and, admittedly, a compromise. A better solution would be for the system to determine for itself when a group effect matters, and choose its strategy for spreading activation appropriately. Action attributes merely collect activation. They do not transfer it. So when a signal is received by an action attribute then the attribute's activation level is modified and the activation stops there. 3.3 Choosing an Action Once activation has been input and spread into the network, then the learner decides on the course of action to be taken. There are no thresholds in TEMPERAMENTAL programs. Competition between nodes is strictly the accumulation of wealth, wealth being represented by the activation level. The procedure for choosing an action is as follows: 1) Choose the action with the highest activation level. 2) If the learner is enabled with respect to this action, go to choosing an attribute. 3) Else choose the action with the next highest activation level. 4) Go to step 2. Once an action has been chosen, then the system must decide on an attribute. The procedure is as follows: 1) If the action has no attributes, perform the action. 2) Else, choose the attribute with the highest activation level. 3) If the learner is enabled with respect to this action/attribute pair then perform the action with that attribute. 4) Else, input activation into the highly activated node and spread the activation only through the cooperative and competitive connections, then go to "Choosing an action."1 1The algorithm does not call for the system to search for the next most highly activated attribute because, generally, this is not reliable. It is not unusual for only the appropriate attribute to have been activated. Therefore, choosing the next highest would be random. Spreading the energy from the cooperative and competitive brings out the appropriate response. 16 If the action requires a focus, then choose the highest activated object in the environment which the semantics say is appropriate and available. If the action requires a tool, then choose the highest activated object in the environment which the semantics say is appropriate and available. The execution of the action consists of running a procedure which affects the properties or relationships of objects in the environment. These semantics are responsible for specifying and choosing a focus if one is needed, and also any tools. Additionally, it is responsible for making any changes in the tools, focus or learner that are necessary. In execution of it's duties, the semantics are empowered to spread activation through the learner in appropriate ways, including simulating other points of view by altering which node the input bias is used on and sending signals to nodes which are not currently present in the environment. This is how the system answers "What if?" questions, and, also, can daydream. The daydreaming ability is realized by allowing activation to be input into the network from within the TEMPERAMENTAL program as well as by the environment. In principle, the TEMPERAMENTAL system can activate any arbitrary combination of nodes and see what consequences follow. This capability can be used to theorize, daydream, imagine counterfactuals, etc. Its greatest advantage is that, harnessed and used in a sophisticated manner, it could help the system learn. 4.0 SALMON A TEMPERAMENTAL system has been implemented to learn the game of tag by observing the execution of a discrete event simulation of the game. The system was implemented in Quintus Prolog version 2.0 on a SUN Sparcstation. The name of the implemented program is SALMON. The simulation outputs a game state from moment to moment which is then translated by an interface module into a predicate language that SALMON and his TEMPERAMENTAL learning mechanism can understand. The only objects in the environment are the players. The number of players varied from 4 to 7. The number of players does not affect SALMON's performance after he has learned, but during learning a high number of players gives better results. Two factors are responsible. First, more players give a statistically better sampling so we would expect better behavior. The second reason has to do with the way the simulation is programmed. If a player is being chased, it always flees. Therefore, if you have a small number of players then one is always tagged and of the remaining players at least one is always fleeing. Therefore, the sample becomes biased towards fleeing, especially the sequence connections which will record the large number of sequences that the chased player flees consecutively without the less determined sequences of the other players tempering it. After the learning period, SALMON can be inserted as a player and participate in the game. Upon insertion, the simulation simply transfers control to SALMON whenever it is time to decide what his next move will be. After SALMON has decided and executed his decision, control passes back to the simulation. Six actions were defined for the game: 17 1) Flee An untagged player flees by increasing the distance between himself and the player currently tagged. 2) Tease An untagged player teases by decreasing the distance between himself and the player currently tagged. 3) Chase The tagged player chases another player by decreasing the distance between himself and the player being chased. 4) Count After a tag has been made, the newly tagged player must stand still and count for a specified number of moments. 5) Make-tag If an untagged player is within one step of the tagged player, then the tagged player may make a tag on the untagged player. Making a tag consists of transferring the tagged attribute. 6) Stayput A player may, at any time, stand still. Three different kinds of dynamic attributes were defined which objects (players) in the environment can possess. 1) Being tagged SALMON was informed who was currently tagged by making tagged a dynamic attribute. In real life there are versions of the game of tag, such as ball tag where the it carries a ball allowing being tagged to be physically observable. We can think of this game of tag as being like ball tag. 2) Distance from the it Since the currently tagged player is the focal point of the game, SALMON was informed of the distance each player was from the it at each moment. This distance was presented as a dynamic attribute of the players. 3) Direction from the it The it was also considered the origin (0,0) of a cartesian graph, and the direction that each player was from the it was attached to objects as a dynamic attribute. The directions were (in (X Y) notation): (plus plus), (plus minus), (plus same), (minus plus), (minus minus), (minus same), (same plus), and (same minus). The action attributes were the general direction which any of the movement actions could take. They were mapped onto the direction from the it dynamic attributes with the addition of (same same) for standing still. 5.0 The Simulation The simulation was a discrete event simulation in which each player's next set of attributes was determined solely from his current set, mostly without regard to changes in any other player's attributes. The player's strategies were quite straightforward. If a player was tagged he chose another player to chase based on a function of that player's distance from him and the difference in their speeds. Chasing a player meant decreasing the difference between the pursuer's X and Y coordinates so that they converged towards the chased player's coordinates. 18 Player's who were untagged had a choice of actions: tease, flee, and stayput. Functions were also defined for these actions based on a player's current distance from the it and the difference in speed between the player and the it. This function returned a value between 0 and 1. A uniformly distributed random number was generated, and if it was below the value returned by the function then the action being considered was performed, otherwise another action was considered. The functions were defined so that faster or more distant players were more likely to tease or stand still. Slower or nearer players were likely to flee. Fleeing players were not given knowledge about how to use open space or any other sophisticated strategies. The playing field was bounded and untagged players were not allowed to go out of bounds 6.0 The Interface to the Learner SALMON was always provided with the following information about the current moment and the immediately preceding moment. A translator was responsible for asserting 1) clauses telling SALMON what objects were performing what actions, with any attributes included. 2) clauses telling SALMON what the focus of each action and object was. 3) clauses informing SALMON what the dynamic attributes possessed by the objects were. 4) clauses saying which actions were enabled and disabled with regard to each object. These clauses play the role of a rather sophisticated perceptual system. In particular, informing SALMON of the focus of actions and objects assumes a highly developed instinct. Such recognition in animals requires hardwired instincts for recognizing certain traits in the environment; brightly colored objects, sudden movement, movement in general, diminishing and increasing distances, concepts of personal space, etc. Additionally, such recognition uses inborn and developed abilities animals have to recognize what satisfies and motivates them and allows them to project it to other creatures. For instance, an animal is satisfied by food and so if it sees another animal stalking prey, killing it and eventually eating it, it is likely that it may know by projection that the killed animal was the focus of all the killer's action. This is because the actions led to a situation it desires, namely, satisfying hunger. When another animal satisfies a similar desire, then the observing animal perhaps feels a titular satisfaction also, and uses this feeling to recognize that it should imitate. These subtle clues in the environment and the internal models that allow animals to pick them out are presently beyond the ability of the simulation to provide or TEMPERAMENTAL systems to simulate. Therefore, it is necessary that focus information be provided to SALMON. The rules of the game were semantically encoded as enabling conditions on actions. The rules were 1) untagged players must stay in bounds. 19 2) only tagged players can chase, count or make-tag. 3) only untagged players can flee or tease. 4) a player may only count if the count is above one. This information amounted to permission information for the different actions. A great deal was left for SALMON to learn. In particular, it had to discover that flee meant moving away from the it, and, indeed, what it means to move away. It had to discover that tease meant moving closer to the it and what moving closer is. It had to induct the functions for deciding when to flee or tease, meaning it had to tease more often when it was far away or the it was slower and flee conversely. It had to discover what to do when it reaches a boundary and cannot perform the action it would most like to. It had to discover that it should make-tag when it is close to another player. It had to discover that it should count after being tagged. Finally, it had to learn to make the rather difficult judgement concerning who to chase when it is tagged and what direction to move in to run the player down. All in all, there was a great deal for the system to learn. Of course, an on-going goal of TEMPERAMENTAL programming as a research effort is to reduce the semantics that must be provided to the system. Performing actions consisted simply of updating the X and Y coordinates of SALMON, and his tagged attribute if necessary. 7.0 Results The results were surprisingly good. The learning session consisted of 500 moments watching the simulation in Beginner mode and 1000 watching in Expert mode. SALMON was then inserted into the game as various players. Scripts were written to files and animated. To the naked eye, SALMON appears virtually identical to the simulated players in an animation. Examination of the scripts and connections reveals a difference between SALMON's behavior and the other players. Four scripts, inserting SALMON as a different player in each, were run for 200 moments each. Each script was begun from the same point in the action. The simulation was then run from this same point without SALMON's participation. A comparison of the results of each run with SALMON to the run without SALMON turned up some interesting results. 7.1 Analysis We can list each of the items available for SALMON to learn: 1) Count after being tagged. 2) Chase after counting. 3) Choose a "reasonable" player to chase. 4) Move in the direction of the player chosen for the chase. 5) Tag players it is chasing when it gets close to them. 6) Flee when close to the it. 7) Flee more often if the it is faster than SALMON. 8) When fleeing, move away from the it. 9) Tease when further away from the it. 20 10) Tease slower players more often. 11) When teasing, move towards the it. 12) Choose an appropriate move when at the boundaries. 13) Stand still sometimes. SALMON eventually learned to do 1-12 almost flawlessly. Still, finding a training set that provided good tease behavior proved difficult. In no training set would SALMON stand still. SALMON's difficulty in picking up both these behaviors is related. Flee is performed more often by untagged players than either tease or stayput. As the disparity grows larger from flee to tease to stayput SALMON has more and more difficulty deciding when the less frequently performed actions are appropriate, and defaults to the global tendency. Especially apparent was the tendency for SALMON to continue with whatever action it was performing. Of course, the sequence connection was responsible for this tendency, especially in the case of fleeing which quite often follows itself in time. In a situation in which tease has only a small advantage over flee for the other connections, the sequence connection tended to make a huge difference in favor of flee if SALMON was already fleeing. Eventually, I was able to coax perfect tease behavior from SALMON but it took a great deal of experimentation. The final result was not caused from an ad hoc change, but rather resulted from the discovery of general faults with the earlier versions of the system and corrections to them. The changes were: 1) The criteria which determines the momentary relevance of connections must direct the spread of activation adequately. Specifically, signals from the sequence connections should not be spread unless SALMON is enabled for the giver node in the connection. It makes no sense for SALMON to consider what follows from actions that it cannot perform. 2) In the same vein, expectation and exclusive connections to an object which is SALMON's focus at the moment should not be spread unless the focus object is enabled with respect to the action in the connection. For example, if SALMON is focusing on the player that is the it, SALMON should not spread activation to the expectation that the it will flee. 3) The values in the recency table should accurately reflect the effect of recency in the environment to be learned. Improper deviation's can cause SALMON to either care too much for the recent past or disregard it totally. 4) The sequence connections should be purely excitory rather than companion. These corrections to the original algorithm are not fully general, but they are not completely ad hoc either. Rather, they are moves in the right direction that an expanded TEMPERAMENTAL model would have to improve upon. The first three problems can be considered generally as controlling the spread of activation (1 & 2) and correctly tuning the system's parameters (3). More detailed discussion of these problems is deferred until section 10. Correction 4, the redefinition of the sequence connection, occurred because the natal stage of the theory made it easy to incorrectly define connections, not from fundamental problems with the ideas behind the 21 theory. Basically, the terms need to be formalized. I did not realize that sequence connections should not be companion because I was working with a vague, intuitive idea of it. All the connections, nodes and other elements of the system need more formal, precise definitions. The weight adjusting algorithm was pathologically bad in the case of stayput. It is unusual for a player in the simulation to perform stayput for more than 1, maybe 2, consecutive moments. Expectation connections were strengthened whenever an object performed an action it was enabled to perform, and weakened whenever an object did not perform an action it was enabled to perform. Oppositely, exclusive connections were strengthened whenever an object was not performing an action that it was enabled to perform and weakened whenever an object was performing the action. Since objects generally had the same attributes (because they stood still) when they stopped performing the stayput action as they had when they started, the system strengthened the exclusive connection to stayput nearly every time it strengthened the expectation connection! With these adjustments nearly always canceling one another out, the system never really built any power to excite stayput. I would like to add however, as a caveat, that TEMPERAMENTAL programming really was not designed to pick up on patterns like stayput behavior and I never expected the system to do it. The system is designed to be able to detect and mimic caused changes, whereas the stayput behavior by the simulation's players is not only the least frequently performed, but also the most uniformly distributed action and the one involving the least amount of change. On the positive side, the system nearly always had the correct direction for whatever action it performed. It always counted after being tagged, acted reasonably at the boundaries and made the tag when it got close to a player it was chasing. What was, to me, more impressive was the good behavior it showed when it had to chase. I considered the chase decision to be the most difficult it had to make and actually expected to get something more akin to a random walk than a chase. Below is a typical excerpt from a script. In it, SALMON is player number 4, the it. The first number in the clause is just an index for the simulation to use in its memory management. The second number is the player number. The next two are the X and Y coordinates, and, finally, the value of the tagged property and action for each player. It is informative to study the script from moment to moment. I provide comments to point out the interesting facets of the chase. moment(Index, Player Number, X, Y, Property Value, Action) 1 The simulation is programmed so that the player being chased always flees, so we know immediately that SALMON must be chasing players 1, 7, or 2 since they are the only players fleeing. moment(1,4,22.1559,13.5921,tagged,chase) moment(1,1,19.0,12.5,untagged,flee) moment(1,7,17.9619,7.61374,untagged,flee) 22 moment(1,2,17.0,3.25,untagged,flee) moment(1,5,39.8124,8.29486,untagged,tease) moment(1,3,41.8862,6.32001,untagged,stayput) moment(1,6,22.8083,9.09282,untagged,tease) 2 SALMON has continued to chase (as opposed to stayput, flee, count, tease, make-tag, etc... we should remember it has to make a choice at every moment) and has decreased both its X and Y coordinates. Unfortunately, all the candidate players for being chased have lower X and Y coordinates than SALMON so this doesn't help us discover which it is chasing. moment(2,4,20.7559,12.1921,tagged,chase) moment(2,1,17.5,11.0,untagged,flee) moment(2,7,16.6619,6.31374,untagged,flee) moment(2,2,15.75,2.0,untagged,flee) moment(2,5,38.4624,9.64485,untagged,tease) moment(2,3,43.1661,5.04001,untagged,flee) moment(2,6,22.8083,9.09282,untagged,stayput) 3 SALMON has once again lowered his X and Y coordinate. Player 7 has decided to tease, so we know that it must be chasing 1 or 2. Notice that SALMON does not chase in a direction where there are no players, nor does it chase in a direction where the closest player is far away, like player 5 in the +X, +Y direction. moment(1,4,19.3558,10.7921,tagged,chase) moment(1,1,16.0,9.5,untagged,flee) moment(1,7,17.9619,7.61373,untagged,tease) moment(1,2,15.75,3.25,untagged,flee) moment(1,5,37.1124,10.9948,untagged,tease) moment(1,3,41.8861,6.32001,untagged,tease) moment(1,6,21.2582,10.6428,untagged,tease) 4 SALMON is consistent. It continues on its course and does not take a random walk around the playing field. moment(2,4,17.9558,9.39204,tagged,chase) moment(2,1,16.0,8.0,untagged,flee) moment(2,7,16.6619,6.31373,untagged,flee) moment(2,2,15.75,2.0,untagged,flee) moment(2,5,35.7623,9.64484,untagged,tease) moment(2,3,40.6061,7.60001,untagged,tease) moment(2,6,22.8082,9.0928,untagged,flee) 23 5 SALMON appears to be closing in. moment(1,4,16.5558,7.99204,tagged,chase) moment(1,1,16.0,6.5,untagged,flee) moment(1,7,15.3619,5.01373,untagged,flee) moment(1,2,15.75,3.25,untagged,flee) moment(1,5,34.4123,8.29483,untagged,tease) moment(1,3,39.326,8.88,untagged,tease) moment(1,6,24.3582,7.5428,untagged,flee) 6 It will be a difficult choice who to finish off among the three players all bunched up together. moment(2,4,15.1558,6.59204,tagged,chase) moment(2,1,16.0,5.0,untagged,flee) moment(2,7,15.3619,3.71373,untagged,flee) moment(2,2,15.75,2.0,untagged,flee) moment(2,5,33.0623,6.94483,untagged,tease) moment(2,3,38.046,7.60001,untagged,tease) moment(2,6,25.9082,5.9928,untagged,flee) 7 SALMON inexplicably raises its Y coordinate, perhaps distracted by player's 3 and 5, two of the slower players, both in that direction. A simulation player would never do this, so SALMON has not learned perfectly. Actually, I considered this good. I wanted heuristic learning methods that gave good behavior but not perfect behavior. I don't know any perfect biological learners. moment(1,4,16.5558,7.99203,tagged,chase) moment(1,1,17.5,3.5,untagged,flee) moment(1,7,16.6618,2.41373,untagged,flee) moment(1,2,15.75,2.0,untagged,stayput) moment(1,5,31.7122,5.59483,untagged,tease) moment(1,3,39.326,8.88,untagged,flee) moment(1,6,27.4582,4.44279,untagged,flee) 8 It seems back on track. SALMON has moved +X, -Y, so obviously it is going after player 7. Player 1 is also in that direction but it is teasing so can't be the player being chased. Player 1 is pretty fast, but player 7 is slow. The tease by player 1 should tempt SALMON. moment(2,4,17.9557,6.59203,tagged,chase) moment(2,1,16.0,5.0,untagged,tease) moment(2,7,17.9618,1.11373,untagged,flee) moment(2,2,15.75,2.0,untagged,stayput) moment(2,5,33.0622,4.24483,untagged,flee) moment(2,3,38.046,7.60001,untagged,tease) moment(2,6,29.0082,2.8928,untagged,flee) 24 9 SALMON is continuing after player 7. This is a good choice because player 1 is faster than SALMON. moment(1,4,19.3557,5.19203,tagged,chase) moment(1,1,16.0,3.5,untagged,flee) moment(1,7,19.2618,1.11373,untagged,flee) moment(1,2,15.75,3.25,untagged,flee) moment(1,5,31.7122,5.59483,untagged,tease) moment(1,3,36.7659,6.32001,untagged,tease) moment(1,6,30.5582,1.3428,untagged,flee) 10 This is a tough situation for SALMON. The player it has been chasing, player 7, is in the (plus minus) direction, but two players in the (minus minus) direction appear to be only a step or two away from being caught also. I am not sure what the simulation would do in this circumstance. moment(2,4,17.9557,3.79203,tagged,chase) moment(2,1,16.0,2.0,untagged,flee) moment(2,7,17.9618,1.11373,untagged,flee) moment(2,2,15.75,2.0,untagged,flee) moment(2,5,30.3622,4.24483,untagged,tease) moment(2,3,35.4859,5.04001,untagged,tease) moment(2,6,32.1082,1.3428,untagged,flee) 11 SALMON goes after the two players and appears ready to finish off the chase, decreasing its own X and Y coordinates for the second moment in a row. I wonder if its decision here was affected by the grouping? It appears close enough to tag either player 1 or player 2. moment(1,4,16.5557,2.39203,tagged,chase) moment(1,1,16.0,3.5,untagged,flee) moment(1,7,19.2618,1.11373,untagged,flee) moment(1,2,15.75,3.25,untagged,flee) moment(1,5,31.7122,5.59483,untagged,flee) moment(1,3,34.2059,3.76001,untagged,tease) moment(1,6,33.6581,1.3428,untagged,flee) 12 SALMON decides to make the tag. Remember, the semantics do not force it to do so; the system was free to continue chasing indefinitely. moment(2,2,15.75,3.25,tagged,count) moment(2,4,16.5557,2.39203,untagged,make_tag) moment(2,1,16.0,5.0,untagged,flee) moment(2,7,20.5618,1.11373,untagged,flee) moment(2,5,30.3622,4.24483,untagged,tease) moment(2,3,32.9258,2.48001,untagged,tease) moment(2,6,32.1081,2.8928,untagged,tease) Sequences such as this are typical and impressive. SALMON was quite consistent in producing this kind of behavior when tagged. When 25 untagged, it also always chose a proper direction, the one real problem, as mentioned earlier, is that it never stood still. 8.0 Issues Raised As it stands, SALMON and TEMPERAMENTAL programming cannot make any strong claims about learning and cognition. Their most useful contribution is as an exercise for determining just what abilities a hybrid system built along these lines might have, what limitations, and what problems need to be addressed. This section is a discussion along those lines. SALMON shares one limitation with human beings that is rather interesting. As it tries to think'' about more than one issue simultaneously its performance on all the issues deteriorates. This is a consequence of using a bias on the input to simulate a point of view. If only one bias is used SALMON performs very well. One bias is like asking it What is object X (where X is the object receiving the bias) going to do next?'' But if you wish to ask it to consider two or more situations simultaneously then the ordering of activation levels becomes less reliable. The highest activated objects and actions sometimes may be appropriately paired together, but finding the appropriate pairing for the second highest activated object and action is more difficult. It is not a simple mapping. Since the brain is highly parallel I have always thought it was mysterious that we must focus so often on a single issue in order to think clearly about it. The answer to how a limitation like this can arise in a parallel network is not obvious. Therefore, I find it interesting that this kind of concentration'' which also is a property of animal intelligence should emerge from the highly parallel TEMPERAMENTAL algorithm. Neurons have been characterized simplistically as having all or nothing firing behaviors. Accordingly, many connectionist nets use thresholds and have nodes with all or nothing firing behavior. It is significant that TEMPERAMENTAL systems diverge from this behavior. The divergence can be defended on two grounds, one from the point of view of connectionist systems and the other from the point of view of semantic systems 1) As a hybrid system, SALMON's nodes represent complex concepts. These concepts would likely be represented in the brain by many millions of neurons and not a single neuron. With any given input, some neurons in this representation will fire as activation spreads, even though the whole neural ensemble encoding the action or object may not. The various biases in the TEMPERAMENTAL system, the confidence function and connection weight, can be thought of as representing the proportion of neurons in the ensemble representing the giving concept that will fire into the receiving concept given a certain level of activation passed into the whole ensemble1. The 1 Of course, I am not saying that the actual strength is captured by the weights on the connection. All I mean to imply is that, in the brain, any ensemble of neurons will have constituents 26 highest activation levels at the end then represent the ensemble which had the highest number of individual neural firings during computation. 2) On mentalistic grounds, spreading activation through a transfer function can be defended on the basis of communication between concepts. The TEMPERAMENTAL system must reason through time, and therefore must take into account the future. Let us imagine that action nodes had thresholds and the firing of a node represented a decision to perform the action. In this case, the action could not have input from the future about whether or not it should fire until after the decision was made! Admittedly, a more complicated interpretation could be put on the firing of a node, but that would also require a more complicated algorithm for spreading activation. It is better to use a transfer function and have nodes "fire" only once, when computation is finished. Philosophically, SALMON gains expectations from the simplest possible induction over experience. It is interesting that a calculation as simple as the successes/opportunities ratio is enough to make the correct induction when magnified and balanced in a network, even in SALMON's simplified world. Additionally, perfect correlations lead to very strong expectations. SALMON's highest action was often count regardless of the context. Only the semantic enabling conditions kept it from always counting1. This is because the count action had a perfect expectation. It was always performed whenever it was enabled. This seems interestingly like how we might take for granted propositions like, "The sun will rise tomorrow receiving signals at any given time. Each of these individual neurons participates in other ensembles and some will fire, sending signals here and others over there. Some pairs of ensembles share more interconnections between constituent neurons, and share more neurons, than others. We can think of the TEMPERAMENTAL weights as vaguely representing the degree to which neurons are interconnected or outright shared between concepts. Therefore, the entire concept does not have to "fire" to signal another concept, but only has to pass along some proportion of the signals that its constituent neurons send out from the signals that they receive. This is what the transfer function does and explains why there are no thresholds on nodes. 1 Because so much emphasis has been put on explaining the learning components of the system in this thesis, it may be tempting to think of SALMON's semantic information as a kind of cheating, but that is inappropriate. TEMPERAMENTAL programs are, by design, hybrid systems with the semantic components in no way taking a back seat to the knowledge contained in the connections. The whole point is that they can work together to do things neither could do alone, or could only do with much more difficulty. It would be just as easy to think of the connectionist components as cheating to make up for what the semantics cannot do. 27 morning." SALMON would not think of questioning the truth of that statement, and neither, usually, do we. 8.1 Problems Exhibited by SALMON Besides the grouping effect already written about, the model exhibited several other troublesome tendencies. For instance, there are the spurious connections SALMON cannot discover simply by having them degrade. For example, whenever an untagged player performs a flee or tease action, a connection is made not only to the player but also to the dynamic attributes possessed. One of the dynamic attributes the player possesses will be the direction he is from the it. So, an expectation connection is made, say, that looks like this direction(minusplus) ------> flee meaning that if an object has the direction(minusplus) attribute than it should be expected to flee. This is spurious. However, currently there is no way SALMON could discover this. The problem is that, on the whole, players perform the flee action nearly 60% of the time they are enabled to. The frequency function on the weight is then nearly 60%. The only way SALMON could catch this spurious connection would be if he had a degradation level above 60%, and such a high level would simply eliminate too many good connections. Therefore, SALMON must use the relevancy test to discover these connections. However, I don't feel the current relevancy test is very general and TEMPERAMENTAL systems need more sophisticated (and local) ways of discovering these kinds of connections. Of course, this really points out a much broader problem. Namely, there is a great art to defining a dynamic attribute. For instance, if we had other objects like trees or fire hydrants or sidewalks or bushes in the playing field they would also have directions and distances relative to the it. Still, we would not tell SALMON this because it would throw off all its connection weights. The programmer must make this choice for SALMON and the choice is really on pragmatic grounds and not principle. SALMON should be able to tell when an attribute is relevant for an object without being spoon-fed good descriptions of the phenomena. Just as SALMON cannot tell if an attribute is irrelevant despite its high correlation to an action, he cannot tell if an attribute is not even worth noticing. The result of this inability to distinguish correctly between relevant and irrelevant attributes independently of connection strength can be a bias in SALMON to perform the overall action tendencies of the simulation at the expense of context. In plain English, this means SALMON never stood still. Another problem is that there are many parameters in the model and the quality of the connections is very sensitive to some of them. Particularly, the parameter which sets the lowest weight allowed on a connection before breaking it is crucial. If it is too high nonsense behavior can result. If it is too low the number of connections increases rapidly and learning and decision making slow appreciably. I have already mentioned that the system is sensitive to the weights in the recency table. If these weights do not reflect the effect of recency in the actual environment then SALMON will not learn as effectively. In principle, the model is not too sensitive to 28 the strength of the input signals, as long as the bias is 3 or greater. However, when input is too large the activation level of the nodes goes to infinity very quickly. On the positive side, ultimately it may not be necessary for the programmer to hand-tune the parameters. When the TEMPERAMENTAL theory is extended to include feedback from the environment it is likely that the system will be able to tune its own parameters. Also, even though activation levels become very high with strong input signals, the activation levels on SALMON's nodes were usually irreversibly ordered long before signals were fully spent. In general, the result obtained after 500 or so signals had been passed produced an ordering that did not change if signals were allowed to continue propagating until completely spent. If something like this is generally true of TEMPERAMENTAL systems, and I think it likely is, then the system can stop computation at this signaling limit and not necessarily when signals are spent. Since the limit is low relative to the number of signals that would be passed if all were allowed to expire, the system can probably avoid activation levels spiraling towards infinity even if the input signals are fairly strong. Most likely, the number of signals that need to be sent is proportional in some way to the number of relevant connections involved in the computation. Finally, the nodes in the model compete with each other primarily indirectly, by gathering connections. I feel there must be a place in the learning for more enemy or even purely inhibitory connections. The problem is how to make them. We can intuitively see that some actions compete with one another even though they do not disable one another. For instance, the flee and tease actions in a game of tag somehow compete'' in the sense that they are exclusive options in a single scenario. But it is not at all obvious to me how to provide a general heuristic for establishing these enemy connections and, if necessary, adjusting their weights. 8.2 Areas to develop Each problem listed above suggests an area of the model which needs more development. In addition, there are areas not even addressed by the model that should be tackled. The problem of discovering spurious connections could be alleviated considerably if SALMON could learn by experimenting as well as observing. SALMON has the ability, as he stands, to make predictions about any aspect of his environment. There should be a way to make him predict, check his results against what actually happened and then try to discover the culprit(s) responsible for any incorrect predictions. Of course, the problem of how to incorporate environmental feedback in learning is universal in both semantic and connectionist systems. SALMON's problems arise not only from bad connections, but also from a relatively naive way of spreading activation. He could possibly be provided with introspective mechanisms for thinking about his connections, their meaning, and then noticing and correcting deficiencies in how activation is spread through them. Along these lines, an activation level could be provided to each node for each kind of connection. Thus, it would have a focus activation, a performance activation and a using activation. Multiple activation 29 levels would provide a smoother interface to the semantics when more complicated actions have to be performed. The drawback would be that signals between nodes would have to carry more than just numerical information. In general, the nodes could move towards being something like frames with slots for each type of connection. Each slot has multiple default values, the connections attached to them, and the weight on the connection represents the strength of the default assumption by the system. Activation is spread as in a normal connectionist network and the slots become filled with the highest activated nodes connected to them. SALMON should be able to categorize attributes and objects. Ideally, it should connect up attributes to objects in such a way that the psychological prototypes investigated by Rosch (1978) emerge. This would require that a TEMPERAMENTAL system be able to establish and retract nodes as well as connections. These nodes would then have to play some role in the reasoning process, perhaps conducting or filtering activation for their members the way actions direct the choosing of attributes. Also, frequency tabulations on the category nodes can help in deciding if a strong connection is, nevertheless, irrelevant. TEMPERAMENTAL systems should have goals and priorities that bias certain actions and direct attention, and hence input. The goals should compete with one another and establish perspectives in the system. They should help in organizing the categories discussed in the previous paragraph and the organization should support planning. Incorporating goals will probably require meta-connections (connections to connections) and defining types'' for groups of connections that can be discovered and classified. The system could then activate the type'' just as if it were another node, thus bringing into prominence the connections associated with it. Just as semantic procedures are associated with action nodes, learned parameters would be associated with meta-connection nodes and become dominant when the node is active. This is all still very speculative, though. The game of tag would be just one environment out of many a full TEMPERAMENTAL system experienced. Upon entering or considering an environment, the system would need to "crystallize" the appropriate set of responses. The effect would be analogous to calling up a script, but the process would be analogous to having a fluid super-saturated by many elements and knowing the proper way of stirring up the solution so that just the proper one falls out. This may be as difficult as it sounds but meta-connections may be the key. SALMON uses its dynamic attributes to react inside its environment, but it does not have any idea that the attributes themselves are systematically related. For instance, the attributes distance(1,3) and distance(3,5) are just two nodes to SALMON. It has no concept of before or after for them, and consequently could never purposefully move from distance(1,3) to distance(3,5). Heuristics for organizing spatial and other attributes systematically through connections need to be developed. Overall, SALMON was a learning experience, not a technological or theoretical advance. As it stands, neither SALMON nor the 30 TEMPERAMENTAL method are either useful nor theoretically significant. However, there is also much promise for the system. The success of SALMON in his domain, although not perfect, was encouraging. The problems that occurred are problems that were anticipated, and I do not think their solutions entail significant revisions to the model as it stands but rather an expansion of its capabilities into a larger domain of mentality. References Agha, Gul A. (1988) Actors:A model of Concurrent Computation in Distributed Systems. The MIT Press, Cambridge, Mass. Arbib, Michael A. (1989) The Metaphorical Brain 2. John Wiley and Sons, New York. Brachman, Ronald J. (1985) '"I Lied About the Trees", Or, Defaults and Definitions in Knowledge Representation.' The AI Magazine, 6.3, Fall: pp. 80-93. Carpenter, Gail A. and Grossberg, Stephen. (1987) 'Invariant Pattern Recognition and Recall By an Attentive Self-Organizing ART Architecture in a Nonstationary World.' Proceedings of the IEEE International Conference on Neural Networks. Vol. 1.: pp. 736- 742. Carpenter, Gail A.; Grossberg, Stephen. (1987) 'ART 2: Self- Organization of Stable Category Recoginition Codes For Analog Input Patterns.' Proceedings of the IEEE International Conference on Neural Networks. Vol. 1.: pp. 727 - 735. Churchland, Patricia. (1989) Neurophilosophy. The MIT Press, Cambridge, MA. Davis, Lawrence and Steenstrup, Martha. (1987) Genentic Algorithms and Simulated Annealing: An Overview.'' Genetic Algorithms and Simulated Annealing. Morgan Kaufmann Publishers, Inc., Los Altos, CA.: pp. 1-11. Holland, J.H. (1975) Adaptation in Natural and Artifical Systems. University of Michigan Press, Ann Arbor, MI. Hofstadter, Douglas. (1983) The Architecture of Jumbo.'' Proceeding of the 2nd Machine Learning Workshop: pp. 161-170. Johnson, R. Colin, (1988) Cognizers. Chapel Brown, New York. Lee, Y.C., ed. (1988) Evolution, Learning, and Cognition. World Scientific, New Jersey. Minsky, M. (1975) A Framework for Representing Knowledge.'' The Psychology of Computer Vision. P.H. Winston (ed.) McGraw Hill, New York: 211-277. 31 32 Minsky, M. (1986) The Society of Mind. Simon & Schuster, New York. Reeke, George N, Sporns, Olaf, and Edelman, Gerald. (1989) Synthetic Neural Modeling: Comparisons of Population and Connectionist Approaches.'' Connectionism in Perspective. Pfeifer, R; et al (eds). North- Holland, New York: pp 113-139. Rosch, Eleanor and Lloyd, Barbara. (1978) Cognition and Categorization. Lawrence Erlbaum Associates, Publishers, New York. Rumelhart, D.E., McClelland, J.L. and the PDP Research Group. (1986) Parallel Distributed Processing. Vol. 1: Foundations. MIT Press, Cambridge, MA. Schmidhuber, Jrgen. (1988) The Neural Bucket Brigade'' Connectionism in Perspective. Pfeifer, R; et al (eds). North- Holland, New York: pp 429-437. Von Seelen, W., Shaw, G. and Leinhos, U.M. (eds.) (1988) Organization Of Neural Networks. VCH, Federal Republic of Germany.