RAVEL (Robots with Audio-Visual abiLities) is a dataset for benchmarking action recognition, robot gesture and human-robot interaction (HRI).

This HUMAVIPS dataset is a unique corpora of audio-visual recordings made with a realistic dummy head (the POPEYE robot head) equipped with four microphones (two binaural pairs) and a stereo pair of cameras. All the recordings were performed in a standard meeting room enclosing all the challenges of natural indoor scenes. This data set provides a basis to test and benchmark methods and algorithms on audio-visual data with the ultimate goal of enabling robots to interact with people in unconstrained environments and in the most natural way. The data set can be used for scene analysis purposes such as action recognition, gender identification, audio-visual object detection, localization and recognition, dialog modeling, etc. The recordings were made in December 2010 at INRIA Grenoble Rhône-Alpes.

  • Detailed descriptions of the scenarios and technical facts can be found in deliverable D1.2.
  • The RAVEL corpora is publicly available at:
Xavier Alameda-Pineda, Jordi Sanchez-Riera, Vojtech Franc, Johannes Wienke, Jan Cech, Kaustubh Kulkarni, Antoine Deleforge, Radu P. Horaud. The RAVEL dataset. IEEE/ACM ICMI’11 Workshop on Multimodal Corpora, Alicante, Spain. November 2011.
author = “Alameda-Pineda, X. and Sanchez-Riera, J. and Franc, V. and Wienke, J. and Cech, J. and Kulkarni, K. and Deleforge, A. and Horaud, R.”,
title = “The {RAVEL} data set”,
booktitle = “IEEE/ACM ICMI 2011 Workshop on Multimodal Corpora”,
month = “November”,
year = “2011″,
publisher = “ACM Press”,
address = “Alicante, Spain”,
url = “”

Comments are closed.