ElasticDiffusion: Enhancing Image Generation with AI at Rice University
Brief news summary
Generative artificial intelligence, especially diffusion models, often faces challenges in producing consistent and detailed images, particularly with maintaining fine features like facial symmetry in non-square formats. Researchers at Rice University have developed a novel approach called ElasticDiffusion, as presented by doctoral student Moayed Haji Ali at the IEEE 2024 Conference on Computer Vision and Pattern Recognition in Seattle. Unlike earlier models such as Stable Diffusion and DALL-E, which perform well with square images but struggle with distortion in other aspect ratios, ElasticDiffusion enhances image generation by distinguishing local pixel details from global shapes. This advancement minimizes errors in non-square images while ensuring visual coherence, without the need for additional training. Currently, ElasticDiffusion operates at a speed that is 6-9 times slower than conventional models; however, the researchers are optimizing its performance to align with existing methods, enabling its use across various aspect ratios.Generative artificial intelligence (AI), including models like Stable Diffusion, Midjourney, and DALL-E, often struggles with producing consistent images, especially when it comes to details like facial symmetry and appropriate finger representation. These models generally generate square images, leading to issues when tasked with creating images in different aspect ratios, resulting in anomalies such as extra fingers or distorted shapes. To address these problems, computer scientists at Rice University have developed ElasticDiffusion, a novel method leveraging pre-trained diffusion models. Moayed Haji Ali, a doctoral student at Rice, presented this method at the IEEE 2024 Conference on Computer Vision and Pattern Recognition in Seattle. Haji Ali explained that traditional diffusion models can only generate images at a specific resolution, which is a consequence of overfitting, where an AI model performs well on familiar data but struggles with variations.
ElasticDiffusion improves the approach by separating local and global information during image generation, rather than combining them. This separation helps avoid visual imperfections arising from repetitive data when adapting to non-square images. Haji Ali noted that the process involves initially obtaining a global score encapsulating the image’s overall structure, followed by filling in pixel-level details in sections. This method enables the generation of clearer images across various aspect ratios without necessitating additional model training. While ElasticDiffusion offers enhanced consistency and adaptability in image generation, it comes with a trade-off: it currently requires 6-9 times longer to create images compared to conventional diffusion models. Haji Ali aims to optimize the method to achieve equivalent inference times while retaining the ability to generate high-quality images regardless of aspect ratio.
Watch video about
ElasticDiffusion: Enhancing Image Generation with AI at Rice University
Try our premium solution and start getting clients — at no cost to you