How effectively can scientists fashion robots that can hear? And, by hearing, how can hearing be ramped up to include speech recognition, even with noise interference? HEARBO, which stands for (HEAR-ing roBOt) is a robot that has been developed at Japan's Honda Research Institute–(HRI-JP), and its creators want HEARBO to stand out as an above-average example of how robots can understand sound.
Their line of research is called Computational Auditory Scene Analysis. In brief, HEARBO can pick up, distinguish, and analyze multiple simultaneous sound sources without difficulty. HEARBO's edge is in the word "analyze," as it can sift and sort different sounds going on at the same time, such as children playing on one side of the room, with a doorbell ringing on the other.
The Honda researchers say that the robot has a three-step process of: localization, separation, and recognition. Sound Source Localization (SSL) in robot audition conveys the location and number of sound sources, used for sound source separation.
"Since robots should work in real-time and localize sound sources in a noisy environment, SSL for robots mainly requires noise-robustness, high-resolution, and real-time processing," according to Honda researchers.
The team prepared a video to show how HEARBO is able to differentiate three sound sources around the robot, an alarm clock; music; and a person speaking. The robot is designed to capture all the sounds and determine their location, before focusing on each sound source.
The Japan Daily Press has enthused over Honda, saying it probably possesses "the most advanced Computer Auditory Scene Analysis System currently in existence." Honda itself is not shy about branding itself as a leading research seedbed for auditory analysis, saying that in past years "we became the first in the world to propose the research field of 'robot audition' combining research in auditory scene analysis and research of robots."
Their intent has been to ensure that robots understand all sounds, not just voices, At the IROS 2012 (International Conference on Intelligent Robots and Systems), the new sound source localization algorithm for detecting sounds was presented. The team refers to their software system that can distinguish simultaneous sounds as HARK, which stands for HRI-JP Audition for Robots with Kyoto University. Using HARK, they said who spoke and from where in a room can be recorded.
Using the methods developed by the team at HRI-JP, up to four different simultaneous sounds or voices can be detected and recognized in practice. As such, HEARBO's researchers are taking the "beamforming" approach to sound recognition a step further. They work from a premise that "noise" should not just be filtered out but rather separated out and then analyzed.
HEARBO has been taught music, human voice, and environmental sounds. HEARBO was trained with numerous different songs to learn about the general characteristics of music. HEARBO can tell the difference between a human issuing commands and a singer on the radio. The team's intent is to generally advance intelligent speech and sound recognition technology.