The ability to imitate sounds with our voice, such as a faulty car engine or a cat's meow, can be an effective way to convey concepts when words fall short. This vocal imitation is much like drawing a quick sketch to communicate an idea. Inspired by cognitive science, researchers from MIT's CSAIL have developed an AI system that can create human-like vocal imitations without prior training or exposure to human vocal impressions. The researchers constructed a model of the human vocal tract, simulating how throat, tongue, and lips shape sounds from the voice box. A cognitively-inspired AI algorithm controls this model to produce imitations, considering how humans choose to communicate sounds. The model can imitate various sounds, such as rustling leaves, a snake's hiss, and an ambulance siren. It can also reverse the process, guessing real-world sounds from human vocal imitations, similar to retrieving images from sketches. For example, it can distinguish between a human-imitated cat's "meow" and "hiss. " The research suggests potential uses for the model, such as imitation-based interfaces for sound designers, enhancing AI characters in virtual reality, and aiding language learners.
Co-lead authors from MIT CSAIL highlight that, like in visual expression, realism isn’t always the ultimate goal in sound imitation. Their work offers insights into auditory abstraction. To refine their model, the team developed three versions, starting with a baseline model that aimed for realistic sound imitation but didn't match human behavior well. They then created a "communicative" model focusing on a sound's distinctive features, which improved results. Finally, they added nuances accounting for the effort humans invest in imitation, leading to more human-like results. In a behavioral experiment, human judges sometimes preferred AI-generated vocal imitations over human ones for specific sounds. The researchers aim to apply their model in various fields, including language development, infant speech learning, and bird imitation behaviors. Although the model still faces challenges, such as accurately imitating some consonants or cross-language sound differences, it offers a promising step towards a deeper understanding of vocal imitation's role in communication and language evolution. The work highlights the interplay between physiological, social, and communicative factors, with implications for future technologies in music, art, and beyond.
MIT Develops AI for Human-like Vocal Imitation
According to recent TrendForce research, the rising demand for artificial intelligence (AI) servers is significantly shaping the strategies of leading North American cloud service providers (CSPs).
Dive Brief: According to a report released Nov
Meta Platforms Inc.
OpenAI has officially announced the launch of GPT-5, the newest advancement in its series of cutting-edge AI language models.
Verizon has seen a remarkable sales increase of nearly 40% after implementing an AI assistant to support its customer service representatives.
This week, Google rolled out several updates enhancing AI integration and user experience across its platforms, reinforcing its control over the search journey.
Salesforce CEO Marc Benioff is actively working to rebuild investor confidence in the company’s artificial intelligence (AI) strategy following a significant drop in its stock value.
Launch your AI-powered team to automate Marketing, Sales & Growth
and get clients on autopilot — from social media and search engines. No ads needed
Begin getting your first leads today