Next: Learning Cooperative Procedures Up: Multiagent Learning through Collective Memory Previous: MOVERS-WORLD

Planning and Acting with CM

We propose a structure for CM composed of two parts. The first is a casebase of previous problem solving experiences, which the agents can rely on in lieu of their planner. The second is a tree structure, which can be used to estimate the probability of success for operators (the probabilities are initially 50%).

When using this structure of CM for planning (step 3a above), the agents refer to their casebase of procedural knowledge CM whenever there is a change in their top-level goals or a significant change to their task environment (e.g., changing locations). When either there are no such changes or no relevant plan is found in the casebase, the agent adapts the current plan or plans from scratch.

When planning from scratch, the agent uses the operator probabilities CM to guide it through the search space in a number of ways:

Individual agents use a hierarchical, best-first, adaptive planner that sorts plans by their likelihood of success; the probability of success for a plan composed of several operators is the product of the operators' probabilities.
Operator success probabilities are used to guide role-binding selections, i.e. the planner selects role bindings that seem most likely to lead to a successful operator. More specifically, the role-bindings are also selected via a separate best-first search, which is ordered by:

P (planner satisfying preconditions)
*
P (agent can execute the operator)

The former probabilities are difficult to estimate precisely, and currently are only non-unity for preconditions that require interaction with other agents.
Probabilities are also used to constrain the search space of the planner. Each agent has a frustration level (or progress estimator), and the planner throws away plans that are less likely to succeed than the threshold associated with the frustration level. So, as she becomes more frustrated, the agent is willing to attempt to undertake plans that are less likely to succeed.

We consider the use of operator probabilities be a form of CM, yet we do not so consider traditional methods of improving planner performance such as those used in SOAR [Laird, Rosenbloom, & Newell1986], PRODIGY [Minton et al. 1989], and DAEDALUS [Langley & Allen1991]. This is because we are encoding execution time knowledge, not planner knowledge. The output of the planner can be different even with identical state and problem descriptions.

Subsections:

Next: Learning Cooperative Procedures Up: Multiagent Learning through Collective Memory Previous: MOVERS-WORLD

Andrew Garland
Thu Apr 9 11:37:41 EDT 1998