VSL

* Visuospatial Skill Learning was published as a chapter in the book "Handling Uncertainty and Networked Structure in Robot Control ", [publisher website][book website].

* A Demo of Visuospatial Skill Learning (VSL) in MATLAB can be found here.

(this article is under construction. for more information on VSL, see this article that can also be downloaded from my publications. )

Visuospatial Skill Learning (VSL)

A Robot Learning from Demonstration Approach

by Reza Ahmadzadeh

Abstract

Visuospatial skill Learning is a learning by demonstration approach that enables robots to acquire novel object manipulation skills using visual perception. Using VSL the robot learns the spatial relationship among objects. By learning such visuospatial capabilities, the robot can reproduce and generalize the learned skill from different situations.

Definition

Google

visuospatial: relating to or denoting the visual perception of the spatial relationships of objects.

Merriam-Webster

visuospatial: of, relating to, or being thought processes that involve visual and spatial awareness <visuospatial problem solving>

Wiktionary

visuospatial: of or pertaining to the visual perception of spatial relationships

Introduction

In Humans, visuospatial perception is a component of cognitive functioning and visuospatial skill is the ability to visually perceive objects and the spatial relationship among them. For instance, completing a jigsaw puzzle requires visuospatial ability. Visuospatial Skill Learning (VSL) is a robot learning approach based on visuospatial ability in humans [1,2].

Visuospatial Skill Learning (VSL) is a goal-based visual learning approach based on demonstration that allows a robot to acquire new skills from a tutor. VSL focuses on achieving a desired goal configuration of objects relative to one another while maintaining the sequence of operations. VSL is capable of learning and generalizing multi-operation skills from a single demonstration, while requiring minimum a priori knowledge about the environment.

VSL consists of two main phases, demonstration and reproduction, which are illustrated as a flow diagram in Figure 1.

Figure 1: A high-level flow diagram illustrating the main phases of VSL.

Terminology

The basic terms that are used to describe VSL consist of:

  • World: the workspace of the robot which is observable by the vision sensor. The world includes objects which are being used during the learning task, and can be reconfigured by the human tutor and the robot.

  • Frame: a bounding box which defines a cuboid in 3D space or a rectangle in 2D space. The size of the frame can be fixed or variable. The maximum size of the frame is equal to the size of the world.

  • Observation: the captured context of the world from a predefined viewpoint using a specific frame. An observation can be a 2D image or a cloud of 3D points.

  • Pre-action observation: an observation which is captured just before the action is executed. The robot searches for preconditions in the pre-action observations before selecting and executing an action.

  • Post-action observation: an observation which is captured just after the action is executed. The robot perceives the effects of the executed actions in the post-action observations.

Problem Formulation

A process of Visuospatial Skill Learning is defined as a tuple, V = {W,O,F,A,C,P,B},where W is a matrix which represents the context of the world including the workspace and all objects. WD and WR indicate the world during the demonstration and reproduction phases respectively; O is a set of observation dictionaries O={Opre,Opost}, Opre and Opost are observation dictionaries comprising a sequence of pre-action and post-action observations respectively. F is an observation frame which is used for capturing the observations. A is a set of primitive actions defined in the learning task (e.g. pick). C is a set of constraint dictionaries C={Cpre,Cpost}. Cpre and Cpost are constraint dictionaries comprising a sequence of pre-action and post-action constraints respectively. B is a vector containing extracted features from observations (e.g. SIFT features).

VSL performed by animals

References

  1. S. R. Ahmadzadeh, A. Paikan, F. Mastrogiovanni, L. Natale, P. Kormushev, D. G. Caldwell, "Learning Symbolic Representations of Actions from Human Demonstrations", In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA 2015), Seattle, Washington, USA, 26-30 May 2015. [PDF][bibtex][PDF]

  2. S. R. Ahmadzadeh, P. Kormushev, D. G. Caldwell, "Interactive Robot Learning of Visuospatial Skills", In Proc. 16th IEEE Intl Conf. on Advanced Robotics (ICAR 2013), Montevideo, Uruguay, 25-29 Nov. 2013. [PDF][bibtex][video][IEEE]

  3. S. R. Ahmadzadeh, P. Kormushev, D. G. Caldwell, "Visuospatial Skill Learning for Object Reconfiguration Tasks", In Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS 2013), Tokyo, Japan, pp. 685-691, 3-8 Nov. 2013. [PDF][bibtex][video][IEEE]