Inception's Revolutionary AI Model: A Blend of Diffusion and Language Technology
Brief news summary
Inception, a startup launched by Stanford professor Stefano Ermon in Palo Alto, has unveiled an innovative diffusion-based large language model (DLM). This model integrates the strengths of conventional large language models (LLMs) with the rapid processing capabilities of diffusion models, known for their prowess in generating multimedia content like images, videos, and audio. Ermon explains that traditional LLMs generate text sequentially, leading to slower outputs, while diffusion models leverage extensive data representations to facilitate parallel processing. This significantly accelerates text production, a breakthrough achieved through comprehensive research by Ermon and his student. The development has attracted interest from Fortune 100 companies eager to enhance AI performance by reducing latency and optimizing GPU utilization. Inception offers an API and various deployment solutions, claiming that their DLMs can deliver results up to ten times faster than current LLMs while lowering operational costs. With a strong emphasis on efficiency, Inception seeks to establish itself as a leading player in the dynamic AI landscape.Inception, a newly founded company in Palo Alto, initiated by Stanford computer science professor Stefano Ermon, claims to have created a groundbreaking AI model utilizing “diffusion” technology. This innovative model is referred to as a diffusion-based large language model, or “DLM” for short. Currently, the generative AI models garnering the most attention can be categorized into two main types: large language models (LLMs) and diffusion models. LLMs, which are designed on transformer architecture, specialize in text generation. In contrast, diffusion models, the technology behind AI platforms like Midjourney and OpenAI’s Sora, primarily focus on generating images, video, and audio. According to Inception, its model combines the capabilities of conventional LLMs—such as code generation and question-answering—with significantly enhanced speed and lower computing costs. Ermon shared with TechCrunch that he has long explored the application of diffusion models to text generation in his research lab at Stanford. His work emerged from the observation that traditional LLMs operate at a slower pace compared to diffusion technologies. With LLMs, Ermon explained, “you cannot generate the second word until you’ve produced the first one, and the third word can’t be generated until the first two are complete. ” Seeking an approach to apply diffusion mechanisms to text generation, Ermon noted that, unlike LLMs that operate sequentially, diffusion models begin with a rough approximation of the output (for example, an image) and refine the data comprehensively in one go. Ermon theorized that generating and modifying substantial text blocks in parallel could be feasible using diffusion models.
After several years of research, he and one of his students achieved a significant breakthrough, which they documented in a research paper published last year. Recognizing the potential of this advancement, Ermon established Inception last summer, bringing on board former students Aditya Grover, a professor at UCLA, and Volodymyr Kuleshov from Cornell University to co-lead the venture. While Ermon opted not to disclose specific funding details for Inception, TechCrunch has learned that the Mayfield Fund is among its investors. Inception has already secured contracts with various clients, including unnamed Fortune 100 companies, by addressing their pressing requirements for lower AI latency and enhanced speed, according to Ermon. “Our models can leverage GPUs significantly more efficiently, ” Ermon asserted, referring to the graphics processing units typically employed to run production models. “I believe this is transformative and will alter how language models are developed. ” The company provides an API alongside options for on-premises and edge device deployments, model fine-tuning support, and a range of ready-to-use DLMs tailored for various applications. Inception claims that its DLMs can operate up to 10 times faster than traditional LLMs while incurring costs that are also 10 times lower. A company representative informed TechCrunch, “Our ‘small’ coding model equals the performance of [OpenAI’s] GPT-4o mini yet operates at more than 10 times the speed. Our ‘mini’ model surpasses small open-source alternatives like [Meta’s] Llama 3. 1 8B, achieving over 1, 000 tokens per second. ”
Watch video about
Inception's Revolutionary AI Model: A Blend of Diffusion and Language Technology
Try our premium solution and start getting clients — at no cost to you