WP2: Object perception and manipulation

The ability to manipulate novel objects detected in the environment and to predict their behaviour after a certain action is applied to them is important for a robot that can extend its own abilities. The role of this work package is to provide the necessary sensory input for the above by exploiting the interplay between perception and manipulation. We will develop robust, generalisable and extensible manipulation strategies based on visual and haptic input. We envisage two forms of object manipulation: pushing using a “finger” containing a force-torque sensor and grasping using a parallel jaw gripper and a three-finger Barrett hand. Through coupling of perception and action we will thus be able to extract additional information about objects, e.g. weight, and reason about object properties such as empty or full. To summarise:

  • to develop representations that allow robust detection of objects in realistic environments,
  • to provide methodologies for manipulation of known and novel objects,
  • to learn predictive models of object behaviour from a small set of objects,
  • to develop generalisable and extensible manipulation strategies for two and three fingered robot hands.

Underpinning this work package will be a strand of work on how to represent objects so as to be able to detect objects in the environment, learn predictive models of object behaviour from a small set of objects, and then generalise our models of their behaviour under action to novel, previously unseen objects. Where our model fails to generalise successfully to a new object the system should, by introspection on the extracted sensory input and previously learned models generate hypothesised experiments that would provide the information about the new objects. Perception has two roles in this work package. Firstly we need to perceive object structure, as it is the object’s behaviour under robot actions we are ultimately interested in, and behaviour depends on structure. We will use a combination of contour based segmentation approaches and structure from motion techniques.
Secondly we have to detect objects in cluttered scenes, from drastically varying view points and distances and with illumination changes, occlusions and real-time constraints. We shall follow the active vision paradigm where, instead of passively observing the world, viewing conditions are actively changed to improve vision performance. To this end we plan to use cameras mounted on a pan/tilt-unit and zoom-able cameras. We plan to combine global appearance based methods (for initial detection) and local feature-based methods (for verification). Regarding manipulation, we will focus on developing a theory of modular prediction of the effects of actions on objects. This will be based on the theory of modular motor learning. Based on 3D shape models, we will acquire representations of pushing and grasping strategies that generalise across object categories and allow extension to novel objects. Current models of modular motor learning are essentially uni-modal and only predict the effects of an action on variables describing the internal state of the manipulator (e.g. proprioception). We aim to extend the theory to allow input and output channels from position sensing, force and vision. In relation to grasping, we will deal with both two- and three-fingered hands and investigate how different shape representations can be facilitated to generate the input necessary for defining grasp strategies through combination of approach vector (where to place the hand with respect to the object) and preshape (what type of grasp to use in order to grasp the object).