lang icon English
Auto-Filling SEO Website as a Gift

Launch Your AI-Powered Business and get clients!

No advertising investment needed—just results. AI finds, negotiates, and closes deals automatically

Nov. 26, 2024, 9:17 a.m.
117

Nvidia's Fugatto: Revolutionizing Generative AI for Sound

Nvidia's new "Fugatto" model enhances generative AI by transforming music, voices, and sounds, even creating previously unheard sounds. Not yet publicly available, examples on the website showcase its ability to modify audio traits, from saxophones sounding like barking to underwater speech or choirs of ambulance sirens. This broad capability has led Nvidia to describe Fugatto as a "Swiss Army knife for sound. " The challenge lies in crafting a training dataset that highlights meaningful relationships between audio and language. Nvidia's researchers, using an LLM-generated Python script, created numerous template-based and free-form instructions to describe audio "personas. " These were applied to a wide array of open-source audio datasets, annotating them with natural language descriptions quantified by emotion, gender, and speech quality. The researchers held certain factors constant while varying others to teach the model distinctions like happier speech or different instrument sounds. After processing 20 million samples (50, 000 hours of audio), they used Nvidia tensor cores to develop a model with 2. 5 billion parameters, showcasing reliable audio quality scores. Beyond training, Fugatto's "ComposableART" system allows customizable audio output. It combines traits from its dataset to create new, unheard sounds, using "conditional guidance" for unseen combinations.

While not all outputs are pitch-perfect, the variety of sounds, like a violin sounding like a laughing baby, showcases Fugatto's transformative ability. Crucially, Fugatto treats audio traits as tunable continuums, not binaries. It combines sounds, like an acoustic guitar with running water, by altering the balance, and adjusts accents or emotions in speech. It performs tasks such as altering spoken text emotion, isolating vocal tracks, and replacing notes in MIDI music with varied vocal performances. Nvidia sees Fugatto as a step toward unsupervised multitask learning and envisions applications in song prototyping and dynamic video game scores. Such models are intended as tools for audio artists rather than replacements. As producer/songwriter Ido Zmishlany states, technology continuously reshapes music, with AI marking a new chapter in musical innovation.



Brief news summary

Nvidia's Fugatto is a cutting-edge audio synthesis technology that transforms text prompts into sounds, though it remains unavailable to the public. A demo showcases its impressive ability to add effects like underwater speech and choir-like sirens. One major challenge in developing Fugatto was constructing a dataset that captures intricate audio-language interactions. Nvidia tackled this by employing a language model to create scripts for diverse audio personas, resulting in a 50,000-hour dataset essential for training the model, which boasts 2.5 billion parameters. A key feature of Fugatto is "ComposableART," enabling users to blend characteristics from the training data for meticulous control over audio aspects such as accents and emotions. This capability allows adjustments in speech emotions and the separation of vocal tracks in music, offering creative possibilities beyond basic synthesis. Nvidia foresees Fugatto as a tool to enhance audio creativity in areas like music prototyping and dynamic game scoring, aiming to complement traditional methods rather than replace them. The company believes AI tools like Fugatto could profoundly impact the future landscape of musical creativity.
Business on autopilot

AI-powered Lead Generation in Social Media
and Search Engines

Let AI take control and automatically generate leads for you!

I'm your Content Manager, ready to handle your first test assignment

Language

Content Maker

Our unique Content Maker allows you to create an SEO article, social media posts, and a video based on the information presented in the article

news image

Last news

The Best for your Business

Learn how AI can help your business.
Let’s talk!

May 16, 2025, 4:14 a.m.

Franklin Templeton Opens Blockchain Fund With $20…

Key Takeaways: Singapore pioneers globally by launching its first tokenized fund aimed at retail investors

May 16, 2025, 3:22 a.m.

Introducing AI Alive: Bringing Your Photos to Lif…

Creativity ignites inspiration, joy, and deeper connections for over one billion people on TikTok.

May 16, 2025, 2:23 a.m.

Crypto Crescendos and Crashes: When Music Artists…

Cryptocurrency promised to revolutionize the music industry.

May 16, 2025, 1:46 a.m.

'We're Definitely Going to Build a Bunker Before …

OpenAI, initially lauded for its mission to develop artificial general intelligence (AGI) for humanity’s broad benefit, is currently embroiled in internal conflict and a shifting strategic focus that has sparked debate within tech and ethics circles.

May 16, 2025, 12:40 a.m.

CFTC Commissioner Mersinger to Be CEO at Blockcha…

Summer Mersinger, a Republican commissioner at the Commodity Futures Trading Commission (CFTC), is set to become the next chief executive of the Blockchain Association, a top official from the organization confirmed on Wednesday.

May 15, 2025, 11:57 p.m.

Intel's Race for Second and India's Deep Tech Fun…

This week's technology roundup highlights significant global developments shaping the semiconductor and technology sectors, driven by shifting policies, market goals, and regional growth trends.

May 15, 2025, 11:14 p.m.

Practitioners: Shrewd Innovation Merges Death and…

The 2025 FT Innovative Lawyers Awards once again recognize outstanding legal professionals driving transformative change across law and various industries through ingenuity and innovation.

All news