WP4: Planning of action, sensing and learning

A cognitive system that is self-extending and needs to act in environments with uncertainty, change
and lack of knowledge needs representations of the state of the world and its internal state, and
must be able to plan actions based on this knowledge. These actions may change the external
state, but they may also be sensing actions (including dialogue acts such as asking questions) or
algorithmic actions such as running a vision algorithm on an image. These last two, which we will
refer to as information-gathering, only change the internal state of the system, and can be treated
together.

There are a number of competing requirements for the planning component of CogX. It must be
able to build plans that include physical and information-gathering actions. These actions may be
stochastic or non-deterministic, and the system must reason about their possible outcomes. Also it
must plan in a world that is not known with certainty (otherwise information-gathering would be
unnecessary). Finally, the system must make decisions quickly. To achieve these goals we propose
a system that can switch between a fast continual planner and a more computationally expensive
decision-theoretic planner. In the “get the cornflakes” scenario a symbolic planner that operates
over epistemic states can be used if the information needed to carry out the plan is available, or if
it is easily obtainable, for example by asking questions. If all the information required is not easily
available then an efficient plan will require reasoning about the possible outcomes of informationgathering
actions when deciding what to do. To build this switching planning system we will have
to extend the state of the art in both classical planning for epistemic states and in decision-theoretic
planning, as well as developing a bounded rationality-based reasoning system to determine which
to use.

As we have said, to be truly self-extending the cognitive system must be able to learn about the
world and learn new actions. This may involve planning actions to learn new things, or to refine
existing knowledge. This kind of planning requires reasoning about the system’s internal model
and how it might be changed by future experiences. It might include trying out an action in a
new situation to learn what its effects are, or planning to test a hypothesis. While model-learning
approaches used in reinforcement learning address this challenge to some extent, there is relatively
little work on active learning of representations needed for planning or reasoning at a high level.
In summary the planner should have the following characteristics:

  • It should operate continually, interleaving planning and execution.
  • It should be able to reason about non-deterministic and stochastic outcomes of actions when
    building plans.
  • It should be able to cope with state ambiguity and gaps in its knowledge. It should build
    plans that include information-gathering actions or conformant plans that achieve their goals
    without requiring the missing information.
  • It should be capable of planning dialogue activities and reasoning about both its own mental
    state and that of others.
  • It should be able to plan to change its internal model of the world, choosing actions to
    facilitate learning of new actions or concepts.
  • It should be capable of doing all this in real-time.