Robots have gradually moved from factory floors to populated areas. Therefore, there is a crucial need to endow robots with perceptual and interaction skills enabling them to communicate with people in the most natural way. With auditory signals distinctively characterizing physical environments and speech being the most effective means of communication among people, robots must be able to fully extract the rich auditory information from their environment.
This course will address fundamental issues in robot hearing ; it will describe methodologies requiring two or more microphones embedded into a robot head, thus enabling sound-source localization, sound-source separation, and fusion of auditory and visual information.
The course will start by briefly describing the role of hearing in human-robot interaction, overviewing the human binaural system, and introducing the computational auditory scene analysis paradigm. Then, it will describe in detail sound propagation models, audio signal processing techniques, geometric models for source localization, and unsupervised and supervised machine learning techniques for characterizing binaural hearing, fusing acoustic and visual data, and designing practical algorithms. The course will be illustrated with numerous videos shot in the author’s laboratory.