3.3.1 Operator Probability Trees

Next: 3.4 Communication Up: 3.3 Planning Previous: 3.3 Planning

3.3.1 Operator Probability Trees

Operator probability trees are used to estimate the probability of success for actions an actor may attempt. Accurate estimates improve individual performance since the actors' planner uses probabilities to guide search, including role-binding selections. Operator probability trees are incrementally updated as the actor interacts with the domain; this enables the planner to produce higher-quality plans during the course of solving a problem.

The successes and failures of attempted actions are stored in a COBWEB [20] tree associated with all of the observable features of the various role fillers for the action. We assume that actions might fail for reasons not explicitly considered (c.f., the qualification problem: [54]) and COBWEB can handle this noisy data. Also, COBWEB trees can be updated incrementally, which allows the actors to learn during the course of their activity.

The planner estimates success probability by examining past experience, both specifically related experiences (the same action with identical role fillers) as well as more abstractly related experiences (actions and/or role fillers that are similar). For example, deciding whether to lift a box alone is specifically related to whether L1 has attempted to lift the box before. It is more abstractly related to L1 previously attempting to unload that box or to L1 attempting to lift a box of similar dimensions. If only one of the two types of experience is available, the probability is determined from it. If neither is available, a default probability is used. If both are available, the computed probability depends on weighing factors concerning exactly how much specific versus general experience is applicable. Further details can be found in [23].

To get a feel for why operator probabilities are needed, consider a common decision a lifter must make. A lifter can lift some boxes alone, but not all of them. If a lifter can lift the box alone, it is more efficient to do so because they do not have to spend time asking for assistance (and there is no guarantee the assistance will be given). On the other hand, if there is little chance that the lifter can handle the box on its own, it is a waste of time (and energy) to make the attempt. So, individuals should be able to recognize which of these two possibilities is more likely and act accordingly.

Initially, MOVERS-WORLD actors use default operator probabilities of 50%. This means an inexperienced lifter will prefer to try to lift boxes alone because a plan to do so will be shorter (because there is no communication needed) and thus more likely to succeed. So, at the beginning of a problem, lifter L1 could try to pick up a large box LBOX1 by herself. L1 would fail; the next time she wants to lift LBOX1, L1 will decide to ask for the help of another actor. This decision is based upon L1's specific past experience interacting with that particular box.

Unless there is a mechanism for L1 to generalize its past experiences, L1 will repeatedly attempt (and fail) to lift other large boxes alone, wasting time and effort. Operator probability trees prevent this from occurring. If the observable characteristics (e.g., height, width, depth, texture) of another box LBOX2 exactly match LBOX1, L1 will not attempt to lift LBOX2 alone since the general past experience in this case will include the failed attempt to lift LBOX1. If the features do not match exactly and there are other experiences stored in the tree, the general past experience may or may not be based upon LBOX1 depending on whether the COBWEB classification algorithm considers the feature set of that experience to be the best match with the current feature set.

**Figure 2:** Operator probability tree.

See Figure 2 for a sample tree that can be used to predict the probability of success for an actor who is considering attempting to lift a 2 by 2 by 2 box by herself. The three features shown in the figure are object height, width and depth. In a given node, each feature value is paired with the number of observations in the sub-tree rooted at that node that have the same feature value. For example, depth: (1 . 5)(3 . 1) means that five observations had depth 1 and one observation had depth 3. For the data underlying this tree, the actor cannot lift a box alone if any dimension of the box is 3 or if all the dimensions are 2.

The success probability based up the general experience in the lower right hand node shown in Figure 2 is 16.7%. (For comparison, if the node had contained 1, 3, 4, or 5 unsuccessful attempts, the probability would have been 25.0%, 10.0%, 5.6%, or 2.9% respectively.) In the absence of specific experience, 16.7% overall operator probability as well. In this particular case, 16.7% overall probability if there were a single specific failure, but not if there were more than one.

DAEDALUS [47] included a learning mechanism that also relied upon probabilities and COBWEB. However, differences exist between DAEDALUS' use of probabilities and the learning techniques described here. Given a state description, DAEDALUS selects the operator (and partial role bindings) that are most likely to be the correct choice based upon past planning histories. In this way, DAEDALUS learns to produce a single plan in more situations; it does not produce more than one plan or provide a mechanism for comparing plans. By contrast, MW actors can compare alternate plans based upon their respective likelihood of runtime success. In MW, actors store past execution histories in order to map actions to the probability of successfully executing them and can combine the operator probabilities into a measure of the probability of runtime success for an entire plan.

There is also a relationship between learning operator probabilities and reinforcement learning techniques [42]. Although reinforcement learning also allows an actor to act more capably by learning from the outcome of run-time actions, there are major differences in representation and the type of action the learning supports. Reinforcement learning allows an actor to learn the best single next action to take given that she is in a particular (pre-defined) state and consequently does not directly support learning better long-range planning. Further, it is an unsolved research problem how an open world containing multiple actors can pre-define all of the possible states. Nonetheless, reinforcement learning has been successfully applied in communication-free settings [53,68,67] and has even been used to learn a very simple communication protocol [76]. Another approach to learning to control reactive behaviors can be found in [69].

Next: 3.4 Communication Up: 3.3 Planning Previous: 3.3 Planning

Last Update: March 10, 1999 by Andy Garland